Blame doc/extsearch.doc

Packit 1c1d7e
/******************************************************************************
Packit 1c1d7e
 *
Packit 1c1d7e
 * Copyright (C) 1997-2015 by Dimitri van Heesch.
Packit 1c1d7e
 *
Packit 1c1d7e
 * Permission to use, copy, modify, and distribute this software and its
Packit 1c1d7e
 * documentation under the terms of the GNU General Public License is hereby 
Packit 1c1d7e
 * granted. No representations are made about the suitability of this software 
Packit 1c1d7e
 * for any purpose. It is provided "as is" without express or implied warranty.
Packit 1c1d7e
 * See the GNU General Public License for more details.
Packit 1c1d7e
 *
Packit 1c1d7e
 * Documents produced by Doxygen are derivative works derived from the
Packit 1c1d7e
 * input used in their production; they are not affected by this license.
Packit 1c1d7e
 *
Packit 1c1d7e
 */
Packit 1c1d7e
/*! \page extsearch External Indexing and Searching
Packit 1c1d7e
Packit 1c1d7e
[TOC]
Packit 1c1d7e
Packit 1c1d7e
\section extsearch_intro Introduction
Packit 1c1d7e
Packit 1c1d7e
With release 1.8.3, doxygen provides the ability to search through HTML using
Packit 1c1d7e
an external indexing tool and search engine.
Packit 1c1d7e
This has several advantages:
Packit 1c1d7e
- For large projects it can have significant performance advantages over
Packit 1c1d7e
  doxygen's built-in search engine, as doxygen uses a rather simple indexing
Packit 1c1d7e
  algorithm.
Packit 1c1d7e
- It allows combining the search data of multiple projects into one index,
Packit 1c1d7e
  allowing a global search across multiple doxygen projects.
Packit 1c1d7e
- It allows adding additional data to the search index, i.e. other web pages
Packit 1c1d7e
  not produced by doxygen.
Packit 1c1d7e
- The search engine needs to run on a web server, but clients can still browse
Packit 1c1d7e
  the web pages locally.
Packit 1c1d7e
Packit 1c1d7e
To avoid that everyone has to start writing their own indexer and search 
Packit 1c1d7e
engine, doxygen provides an example tool for each action: `doxyindexer` 
Packit 1c1d7e
for indexing the data and `doxysearch.cgi` for searching through the index.
Packit 1c1d7e
Packit 1c1d7e
The data flow is shown in the following diagram:
Packit 1c1d7e
Packit 1c1d7e
\image html extsearch_flow.png "External Search Data Flow"
Packit 1c1d7e
\image latex extsearch_flow.eps "External Search Data Flow" height=10cm
Packit 1c1d7e
Packit 1c1d7e
- `doxygen` produces the raw search data
Packit 1c1d7e
- `doxyindexer` indexes the data into a search database `doxysearch.db`
Packit 1c1d7e
- when a user performs a search from a doxygen generated HTML page, 
Packit 1c1d7e
  the CGI binary `doxysearch.cgi` will be invoked.
Packit 1c1d7e
- the `doxysearch.cgi` tool will perform a query on the database and return
Packit 1c1d7e
  the results. 
Packit 1c1d7e
- The browser will show the search results.
Packit 1c1d7e
Packit 1c1d7e
\section extsearch_config Configuring
Packit 1c1d7e
Packit 1c1d7e
The first step is to make the search engine available via a web server.
Packit 1c1d7e
If you use `doxysearch.cgi` this means making the
Packit 1c1d7e
CGI binary
Packit 1c1d7e
available from the web server (i.e. be able to run it from a 
Packit 1c1d7e
browser via an URL starting with http:)
Packit 1c1d7e
Packit 1c1d7e
How to setup a web server is outside the scope of this document,
Packit 1c1d7e
but if you for instance have Apache installed, you could simply copy the 
Packit 1c1d7e
`doxysearch.cgi` file from doxygen's `bin` dir to the `cgi-bin` of the
Packit 1c1d7e
Apache web server. Read the apache documentation for details.
Packit 1c1d7e
Packit 1c1d7e
To test if `doxysearch.cgi` is accessible start your web browser and
Packit 1c1d7e
point to URL to the binary and add `?test` at the end
Packit 1c1d7e
Packit 1c1d7e
    http://yoursite.com/path/to/cgi/doxysearch.cgi?test
Packit 1c1d7e
Packit 1c1d7e
You should get the following message:
Packit 1c1d7e
Packit 1c1d7e
    Test failed: cannot find search index doxysearch.db
Packit 1c1d7e
Packit 1c1d7e
If you use Internet Explorer you may be prompted to download a file,
Packit 1c1d7e
which will then contain this message. 
Packit 1c1d7e
Packit 1c1d7e
Since we didn't create or install a doxysearch.db it is ok for the test to
Packit 1c1d7e
fail for this reason. How to correct this is discussed in the next section.
Packit 1c1d7e
Packit 1c1d7e
Before continuing with the next section add the above 
Packit 1c1d7e
URL (without the `?test` part) to the `SEARCHENGINE_URL` tag in
Packit 1c1d7e
doxygen's configuration file:
Packit 1c1d7e
Packit 1c1d7e
    SEARCHENGINE_URL = http://yoursite.com/path/to/cgi/doxysearch.cgi
Packit 1c1d7e
Packit 1c1d7e
\subsection extsearch_single Single project index
Packit 1c1d7e
Packit 1c1d7e
To use the external search option, make sure the following options are enabled
Packit 1c1d7e
in doxygen's configuration file:
Packit 1c1d7e
Packit 1c1d7e
    SEARCHENGINE           = YES
Packit 1c1d7e
    SERVER_BASED_SEARCH    = YES
Packit 1c1d7e
    EXTERNAL_SEARCH        = YES
Packit 1c1d7e
Packit 1c1d7e
This will make doxygen generate a file called `searchdata.xml` in the output 
Packit 1c1d7e
directory (configured with \ref cfg_output_directory "OUTPUT_DIRECTORY").
Packit 1c1d7e
You can change the file name (and location) with the 
Packit 1c1d7e
\ref cfg_searchdata_file "SEARCHDATA_FILE" option.
Packit 1c1d7e
Packit 1c1d7e
The next step is to put the raw search data into an index for efficient 
Packit 1c1d7e
searching. You can use `doxyindexer` for this. Simply run it from the command 
Packit 1c1d7e
line:
Packit 1c1d7e
Packit 1c1d7e
    doxyindexer searchdata.xml
Packit 1c1d7e
Packit 1c1d7e
This will create a directory called `doxysearch.db` with some files in it.
Packit 1c1d7e
By default the directory will be created at the location from which doxyindexer
Packit 1c1d7e
was started, but you can change the directory using the `-o` option.
Packit 1c1d7e
Packit 1c1d7e
Copy the `doxysearch.db` directory to the same directory as where 
Packit 1c1d7e
the `doxysearch.cgi` is located and rerun the browser test by pointing 
Packit 1c1d7e
the browser to
Packit 1c1d7e
Packit 1c1d7e
    http://yoursite.com/path/to/cgi/doxysearch.cgi?test
Packit 1c1d7e
Packit 1c1d7e
You should now get the following message:
Packit 1c1d7e
Packit 1c1d7e
    Test successful.
Packit 1c1d7e
Packit 1c1d7e
Now you should be able to search for words and symbols from the HTML output.
Packit 1c1d7e
Packit 1c1d7e
\subsection extsearch_multi Multi project index
Packit 1c1d7e
Packit 1c1d7e
In case you have more than one doxygen project and these projects are related, 
Packit 1c1d7e
it may be desirable to allow searching for words in all projects from within 
Packit 1c1d7e
the documentation of any of the projects.
Packit 1c1d7e
Packit 1c1d7e
To make this possible all that is needed is to combine the search data
Packit 1c1d7e
for all projects into a single index, e.g. for two projects A and B for which the
Packit 1c1d7e
searchdata.xml is generated in directories project_A and project_B run:
Packit 1c1d7e
Packit 1c1d7e
    doxyindexer project_A/searchdata.xml project_B/searchdata.xml
Packit 1c1d7e
Packit 1c1d7e
and then copy the resulting `doxysearch.db` to the directory where also
Packit 1c1d7e
`doxysearch.cgi` is located.
Packit 1c1d7e
Packit 1c1d7e
The `searchdata.xml` file doesn't contain any absolute paths or links, 
Packit 1c1d7e
so how can the search results from multiple projects be linked back to the right documentation set?
Packit 1c1d7e
This is where the `EXTERNAL_SEARCH_ID` and `EXTRA_SEARCH_MAPPINGS` options come into play.
Packit 1c1d7e
Packit 1c1d7e
To be able to identify the different projects, one needs to
Packit 1c1d7e
set a unique ID using \ref cfg_external_search_id "EXTERNAL_SEARCH_ID"
Packit 1c1d7e
for each project.
Packit 1c1d7e
Packit 1c1d7e
To link the search results to the right project, you need to define a
Packit 1c1d7e
mapping per project using the \ref cfg_extra_search_mappings "EXTRA_SEARCH_MAPPINGS" tag.
Packit 1c1d7e
With this option to can define the mapping from IDs of other projects to the
Packit 1c1d7e
(relative) location of documentation of those projects.
Packit 1c1d7e
Packit 1c1d7e
So for projects A and B the relevant part of the configuration file 
Packit 1c1d7e
could look as follows:
Packit 1c1d7e
Packit 1c1d7e
    project_A/Doxyfile
Packit 1c1d7e
    ------------------
Packit 1c1d7e
    EXTERNAL_SEARCH_ID    = A
Packit 1c1d7e
    EXTRA_SEARCH_MAPPINGS = B=../../project_B/html
Packit 1c1d7e
Packit 1c1d7e
for project A and for project B
Packit 1c1d7e
Packit 1c1d7e
    project_B/Doxyfile
Packit 1c1d7e
    ------------------
Packit 1c1d7e
    EXTERNAL_SEARCH_ID    = B
Packit 1c1d7e
    EXTRA_SEARCH_MAPPINGS = A=../../project_A/html
Packit 1c1d7e
    
Packit 1c1d7e
with these settings, projects A and B can share the same search database,
Packit 1c1d7e
and the search results will link to the right documentation set.
Packit 1c1d7e
Packit 1c1d7e
\section extsearch_update Updating the index
Packit 1c1d7e
Packit 1c1d7e
When you modify the source code, you should re-run doxygen to get up to date
Packit 1c1d7e
documentation again. When using external searching you also need to update the
Packit 1c1d7e
search index by re-running `doxyindexer`. You could wrap the call to doxygen
Packit 1c1d7e
and doxyindexer together in a script to make this process easier.
Packit 1c1d7e
Packit 1c1d7e
\section extsearch_api Programming interface
Packit 1c1d7e
Packit 1c1d7e
Previous sections have assumed you use the tools `doxyindexer` 
Packit 1c1d7e
and `doxysearch.cgi` to do the indexing and searching, but you could also 
Packit 1c1d7e
write your own index and search tools if you like.
Packit 1c1d7e
Packit 1c1d7e
For this 3 interfaces are important
Packit 1c1d7e
- The format of the input for the index tool.
Packit 1c1d7e
- The format of the input for the search engine.
Packit 1c1d7e
- The format of the output of search engine.
Packit 1c1d7e
Packit 1c1d7e
The next subsections describe these interfaces in more detail.
Packit 1c1d7e
Packit 1c1d7e
\subsection extsearch_api_index Indexer input format
Packit 1c1d7e
Packit 1c1d7e
The search data produced by doxygen follows the 
Packit 1c1d7e
Solr XML index message
Packit 1c1d7e
format.
Packit 1c1d7e
Packit 1c1d7e
The input for the indexer is an XML file, which consists of one `<add>` tag containing 
Packit 1c1d7e
multiple `<doc>` tags, which in turn contain multiple `<field>` tags. 
Packit 1c1d7e
Packit 1c1d7e
Here is an example of one doc node, which contains the search data and meta data for 
Packit 1c1d7e
one method:
Packit 1c1d7e
Packit 1c1d7e
    <add>
Packit 1c1d7e
      ...
Packit 1c1d7e
      <doc>
Packit 1c1d7e
        <field name="type">function</field>
Packit 1c1d7e
        <field name="name">QXmlReader::setDTDHandler</field>
Packit 1c1d7e
        <field name="args">(QXmlDTDHandler *handler)=0</field>
Packit 1c1d7e
        <field name="tag">qtools.tag</field>
Packit 1c1d7e
        <field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field>
Packit 1c1d7e
        <field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field>
Packit 1c1d7e
        <field name="text">Sets the DTD handler to handler DTDHandler()</field>
Packit 1c1d7e
      </doc>
Packit 1c1d7e
      ...
Packit 1c1d7e
    </add>
Packit 1c1d7e
Packit 1c1d7e
Each field has a name. The following field names are supported:
Packit 1c1d7e
- *type*: the type of the search entry; can be one of: source, function, slot, 
Packit 1c1d7e
          signal, variable, typedef, enum, enumvalue, property, event, related, 
Packit 1c1d7e
          friend, define, file, namespace, group, package, page, dir
Packit 1c1d7e
- *name*: the name of the search entry; for a method this is the qualified name of the method,
Packit 1c1d7e
          for a class it is the name of the class, etc.
Packit 1c1d7e
- *args*: the parameter list (in case of functions or methods)
Packit 1c1d7e
- *tag*:  the name of the tag file used for this project.
Packit 1c1d7e
- *url*:  the (relative) URL to the HTML documentation for this entry.
Packit 1c1d7e
- *keywords*: important words that are representative for the entry. When searching for such
Packit 1c1d7e
          keyword, this entry should get a higher rank in the search results.
Packit 1c1d7e
- *text*: the documentation associated with the item. Note that only words are present, no markup.
Packit 1c1d7e
Packit 1c1d7e
@note Due to the potentially large size of the XML file, it is recommended to use a 
Packit 1c1d7e
SAX based parser to process it.
Packit 1c1d7e
Packit 1c1d7e
\subsection extsearch_api_search_in Search URL format
Packit 1c1d7e
Packit 1c1d7e
When the search engine is invoked from a doxygen generated HTML page, a number of parameters are
Packit 1c1d7e
passed to via the query string.
Packit 1c1d7e
Packit 1c1d7e
The following fields are passed:
Packit 1c1d7e
- *q*:  the query text as entered by the user
Packit 1c1d7e
- *n*:  the number of search results requested.
Packit 1c1d7e
- *p*:  the number of search page for which to return the results. Each page has *n* values.
Packit 1c1d7e
- *cb*: the name of the callback function, used for JSON with padding, see the next section.
Packit 1c1d7e
       
Packit 1c1d7e
From the complete list of search results, the range `[n*p - n*(p+1)-1]` should be returned.
Packit 1c1d7e
Packit 1c1d7e
Here is an example of how a query looks like.
Packit 1c1d7e
Packit 1c1d7e
    http://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy
Packit 1c1d7e
Packit 1c1d7e
It represents a query for the word 'list' (`q=list`) requesting 20 search results (`n=20`), 
Packit 1c1d7e
starting with the result number 20 (`p=1`) and using callback 'dummy' (`cb=dummy`):
Packit 1c1d7e
Packit 1c1d7e
Packit 1c1d7e
@note The values are URL encoded so they
Packit 1c1d7e
have to be decoded before they can be used.
Packit 1c1d7e
Packit 1c1d7e
\subsection extsearch_api_search_out Search results format
Packit 1c1d7e
Packit 1c1d7e
When invoking the search engine as shown in the previous subsection, it should reply with
Packit 1c1d7e
the results. The format of the reply is
Packit 1c1d7e
JSON with padding, which is basically
Packit 1c1d7e
a javascript struct wrapped in a function call. The name of function should be the name of
Packit 1c1d7e
the callback (as passed with the *cb* field in the query).
Packit 1c1d7e
Packit 1c1d7e
With the example query as shown the previous subsection the main structure of the reply should
Packit 1c1d7e
look as follows:
Packit 1c1d7e
Packit 1c1d7e
    dummy({
Packit 1c1d7e
      "hits":179,
Packit 1c1d7e
      "first":20,
Packit 1c1d7e
      "count":20,
Packit 1c1d7e
      "page":1,
Packit 1c1d7e
      "pages":9,
Packit 1c1d7e
      "query": "list",
Packit 1c1d7e
      "items":[
Packit 1c1d7e
      ...
Packit 1c1d7e
     ]})
Packit 1c1d7e
Packit 1c1d7e
The fields have the following meaning:
Packit 1c1d7e
- *hits*:  the total number of search results (could be more than was requested).
Packit 1c1d7e
- *first*: the index of first result returned: \f$\min(n*p,\mbox{\em hits})\f$.
Packit 1c1d7e
- *count*: the actual number of results returned: \f$\min(n,\mbox{\em hits}-\mbox{\em first})\f$
Packit 1c1d7e
- *page*:  the page number of the result: \f$p\f$
Packit 1c1d7e
- *pages*: the total number of pages: \f$\lceil\frac{\mbox{\em hits}}{n}\rceil\f$.
Packit 1c1d7e
- *items*: an array containing the search data per result.
Packit 1c1d7e
Packit 1c1d7e
Here is an example of how the element of the *items* array should look like:
Packit 1c1d7e
Packit 1c1d7e
    {"type": "function",
Packit 1c1d7e
     "name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter, int sortSpec=DefaultSort) const",
Packit 1c1d7e
     "tag": "qtools.tag",
Packit 1c1d7e
     "url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef",
Packit 1c1d7e
     "fragments":[
Packit 1c1d7e
       "Returns a list of QFileInfo objects for all files and directories...",
Packit 1c1d7e
       "... pointer to a QFileInfoList The list is owned by the QDir object...",
Packit 1c1d7e
       "... to keep the entries of the list after a subsequent call to this..."
Packit 1c1d7e
     ]
Packit 1c1d7e
    },
Packit 1c1d7e
Packit 1c1d7e
The fields for such an item have the following meaning:
Packit 1c1d7e
- *type*: the type of the item, as found in the field with name "type" in the raw search data.
Packit 1c1d7e
- *name*: the name of the item, including the parameter list, as found in the fields with
Packit 1c1d7e
          name "name" and "args" in the raw search data.
Packit 1c1d7e
- *tag*:  the name of the tag file, as found in the field with name "tag" in the raw search data.
Packit 1c1d7e
- *url*:  the name of the (relative) URL to the documentation, as found in the field with name "url"
Packit 1c1d7e
          in the raw search data.
Packit 1c1d7e
- "fragments": an array with 0 or more fragments of text containing words that have been search for.
Packit 1c1d7e
          These words should be wrapped in `` and `` tags to highlight them
Packit 1c1d7e
          in the output.
Packit 1c1d7e
*/