Blame doc/extsearch.doc

Packit Service 50c9f2
/******************************************************************************
Packit Service 50c9f2
 *
Packit Service 50c9f2
 * Copyright (C) 1997-2015 by Dimitri van Heesch.
Packit Service 50c9f2
 *
Packit Service 50c9f2
 * Permission to use, copy, modify, and distribute this software and its
Packit Service 50c9f2
 * documentation under the terms of the GNU General Public License is hereby 
Packit Service 50c9f2
 * granted. No representations are made about the suitability of this software 
Packit Service 50c9f2
 * for any purpose. It is provided "as is" without express or implied warranty.
Packit Service 50c9f2
 * See the GNU General Public License for more details.
Packit Service 50c9f2
 *
Packit Service 50c9f2
 * Documents produced by Doxygen are derivative works derived from the
Packit Service 50c9f2
 * input used in their production; they are not affected by this license.
Packit Service 50c9f2
 *
Packit Service 50c9f2
 */
Packit Service 50c9f2
/*! \page extsearch External Indexing and Searching
Packit Service 50c9f2
Packit Service 50c9f2
[TOC]
Packit Service 50c9f2
Packit Service 50c9f2
\section extsearch_intro Introduction
Packit Service 50c9f2
Packit Service 50c9f2
With release 1.8.3, doxygen provides the ability to search through HTML using
Packit Service 50c9f2
an external indexing tool and search engine.
Packit Service 50c9f2
This has several advantages:
Packit Service 50c9f2
- For large projects it can have significant performance advantages over
Packit Service 50c9f2
  doxygen's built-in search engine, as doxygen uses a rather simple indexing
Packit Service 50c9f2
  algorithm.
Packit Service 50c9f2
- It allows combining the search data of multiple projects into one index,
Packit Service 50c9f2
  allowing a global search across multiple doxygen projects.
Packit Service 50c9f2
- It allows adding additional data to the search index, i.e. other web pages
Packit Service 50c9f2
  not produced by doxygen.
Packit Service 50c9f2
- The search engine needs to run on a web server, but clients can still browse
Packit Service 50c9f2
  the web pages locally.
Packit Service 50c9f2
Packit Service 50c9f2
To avoid that everyone has to start writing their own indexer and search 
Packit Service 50c9f2
engine, doxygen provides an example tool for each action: `doxyindexer` 
Packit Service 50c9f2
for indexing the data and `doxysearch.cgi` for searching through the index.
Packit Service 50c9f2
Packit Service 50c9f2
The data flow is shown in the following diagram:
Packit Service 50c9f2
Packit Service 50c9f2
\image html extsearch_flow.png "External Search Data Flow"
Packit Service 50c9f2
\image latex extsearch_flow.eps "External Search Data Flow" height=10cm
Packit Service 50c9f2
Packit Service 50c9f2
- `doxygen` produces the raw search data
Packit Service 50c9f2
- `doxyindexer` indexes the data into a search database `doxysearch.db`
Packit Service 50c9f2
- when a user performs a search from a doxygen generated HTML page, 
Packit Service 50c9f2
  the CGI binary `doxysearch.cgi` will be invoked.
Packit Service 50c9f2
- the `doxysearch.cgi` tool will perform a query on the database and return
Packit Service 50c9f2
  the results. 
Packit Service 50c9f2
- The browser will show the search results.
Packit Service 50c9f2
Packit Service 50c9f2
\section extsearch_config Configuring
Packit Service 50c9f2
Packit Service 50c9f2
The first step is to make the search engine available via a web server.
Packit Service 50c9f2
If you use `doxysearch.cgi` this means making the
Packit Service 50c9f2
CGI binary
Packit Service 50c9f2
available from the web server (i.e. be able to run it from a 
Packit Service 50c9f2
browser via an URL starting with http:)
Packit Service 50c9f2
Packit Service 50c9f2
How to setup a web server is outside the scope of this document,
Packit Service 50c9f2
but if you for instance have Apache installed, you could simply copy the 
Packit Service 50c9f2
`doxysearch.cgi` file from doxygen's `bin` dir to the `cgi-bin` of the
Packit Service 50c9f2
Apache web server. Read the apache documentation for details.
Packit Service 50c9f2
Packit Service 50c9f2
To test if `doxysearch.cgi` is accessible start your web browser and
Packit Service 50c9f2
point to URL to the binary and add `?test` at the end
Packit Service 50c9f2
Packit Service 50c9f2
    http://yoursite.com/path/to/cgi/doxysearch.cgi?test
Packit Service 50c9f2
Packit Service 50c9f2
You should get the following message:
Packit Service 50c9f2
Packit Service 50c9f2
    Test failed: cannot find search index doxysearch.db
Packit Service 50c9f2
Packit Service 50c9f2
If you use Internet Explorer you may be prompted to download a file,
Packit Service 50c9f2
which will then contain this message. 
Packit Service 50c9f2
Packit Service 50c9f2
Since we didn't create or install a doxysearch.db it is ok for the test to
Packit Service 50c9f2
fail for this reason. How to correct this is discussed in the next section.
Packit Service 50c9f2
Packit Service 50c9f2
Before continuing with the next section add the above 
Packit Service 50c9f2
URL (without the `?test` part) to the `SEARCHENGINE_URL` tag in
Packit Service 50c9f2
doxygen's configuration file:
Packit Service 50c9f2
Packit Service 50c9f2
    SEARCHENGINE_URL = http://yoursite.com/path/to/cgi/doxysearch.cgi
Packit Service 50c9f2
Packit Service 50c9f2
\subsection extsearch_single Single project index
Packit Service 50c9f2
Packit Service 50c9f2
To use the external search option, make sure the following options are enabled
Packit Service 50c9f2
in doxygen's configuration file:
Packit Service 50c9f2
Packit Service 50c9f2
    SEARCHENGINE           = YES
Packit Service 50c9f2
    SERVER_BASED_SEARCH    = YES
Packit Service 50c9f2
    EXTERNAL_SEARCH        = YES
Packit Service 50c9f2
Packit Service 50c9f2
This will make doxygen generate a file called `searchdata.xml` in the output 
Packit Service 50c9f2
directory (configured with \ref cfg_output_directory "OUTPUT_DIRECTORY").
Packit Service 50c9f2
You can change the file name (and location) with the 
Packit Service 50c9f2
\ref cfg_searchdata_file "SEARCHDATA_FILE" option.
Packit Service 50c9f2
Packit Service 50c9f2
The next step is to put the raw search data into an index for efficient 
Packit Service 50c9f2
searching. You can use `doxyindexer` for this. Simply run it from the command 
Packit Service 50c9f2
line:
Packit Service 50c9f2
Packit Service 50c9f2
    doxyindexer searchdata.xml
Packit Service 50c9f2
Packit Service 50c9f2
This will create a directory called `doxysearch.db` with some files in it.
Packit Service 50c9f2
By default the directory will be created at the location from which doxyindexer
Packit Service 50c9f2
was started, but you can change the directory using the `-o` option.
Packit Service 50c9f2
Packit Service 50c9f2
Copy the `doxysearch.db` directory to the same directory as where 
Packit Service 50c9f2
the `doxysearch.cgi` is located and rerun the browser test by pointing 
Packit Service 50c9f2
the browser to
Packit Service 50c9f2
Packit Service 50c9f2
    http://yoursite.com/path/to/cgi/doxysearch.cgi?test
Packit Service 50c9f2
Packit Service 50c9f2
You should now get the following message:
Packit Service 50c9f2
Packit Service 50c9f2
    Test successful.
Packit Service 50c9f2
Packit Service 50c9f2
Now you should be able to search for words and symbols from the HTML output.
Packit Service 50c9f2
Packit Service 50c9f2
\subsection extsearch_multi Multi project index
Packit Service 50c9f2
Packit Service 50c9f2
In case you have more than one doxygen project and these projects are related, 
Packit Service 50c9f2
it may be desirable to allow searching for words in all projects from within 
Packit Service 50c9f2
the documentation of any of the projects.
Packit Service 50c9f2
Packit Service 50c9f2
To make this possible all that is needed is to combine the search data
Packit Service 50c9f2
for all projects into a single index, e.g. for two projects A and B for which the
Packit Service 50c9f2
searchdata.xml is generated in directories project_A and project_B run:
Packit Service 50c9f2
Packit Service 50c9f2
    doxyindexer project_A/searchdata.xml project_B/searchdata.xml
Packit Service 50c9f2
Packit Service 50c9f2
and then copy the resulting `doxysearch.db` to the directory where also
Packit Service 50c9f2
`doxysearch.cgi` is located.
Packit Service 50c9f2
Packit Service 50c9f2
The `searchdata.xml` file doesn't contain any absolute paths or links, 
Packit Service 50c9f2
so how can the search results from multiple projects be linked back to the right documentation set?
Packit Service 50c9f2
This is where the `EXTERNAL_SEARCH_ID` and `EXTRA_SEARCH_MAPPINGS` options come into play.
Packit Service 50c9f2
Packit Service 50c9f2
To be able to identify the different projects, one needs to
Packit Service 50c9f2
set a unique ID using \ref cfg_external_search_id "EXTERNAL_SEARCH_ID"
Packit Service 50c9f2
for each project.
Packit Service 50c9f2
Packit Service 50c9f2
To link the search results to the right project, you need to define a
Packit Service 50c9f2
mapping per project using the \ref cfg_extra_search_mappings "EXTRA_SEARCH_MAPPINGS" tag.
Packit Service 50c9f2
With this option to can define the mapping from IDs of other projects to the
Packit Service 50c9f2
(relative) location of documentation of those projects.
Packit Service 50c9f2
Packit Service 50c9f2
So for projects A and B the relevant part of the configuration file 
Packit Service 50c9f2
could look as follows:
Packit Service 50c9f2
Packit Service 50c9f2
    project_A/Doxyfile
Packit Service 50c9f2
    ------------------
Packit Service 50c9f2
    EXTERNAL_SEARCH_ID    = A
Packit Service 50c9f2
    EXTRA_SEARCH_MAPPINGS = B=../../project_B/html
Packit Service 50c9f2
Packit Service 50c9f2
for project A and for project B
Packit Service 50c9f2
Packit Service 50c9f2
    project_B/Doxyfile
Packit Service 50c9f2
    ------------------
Packit Service 50c9f2
    EXTERNAL_SEARCH_ID    = B
Packit Service 50c9f2
    EXTRA_SEARCH_MAPPINGS = A=../../project_A/html
Packit Service 50c9f2
    
Packit Service 50c9f2
with these settings, projects A and B can share the same search database,
Packit Service 50c9f2
and the search results will link to the right documentation set.
Packit Service 50c9f2
Packit Service 50c9f2
\section extsearch_update Updating the index
Packit Service 50c9f2
Packit Service 50c9f2
When you modify the source code, you should re-run doxygen to get up to date
Packit Service 50c9f2
documentation again. When using external searching you also need to update the
Packit Service 50c9f2
search index by re-running `doxyindexer`. You could wrap the call to doxygen
Packit Service 50c9f2
and doxyindexer together in a script to make this process easier.
Packit Service 50c9f2
Packit Service 50c9f2
\section extsearch_api Programming interface
Packit Service 50c9f2
Packit Service 50c9f2
Previous sections have assumed you use the tools `doxyindexer` 
Packit Service 50c9f2
and `doxysearch.cgi` to do the indexing and searching, but you could also 
Packit Service 50c9f2
write your own index and search tools if you like.
Packit Service 50c9f2
Packit Service 50c9f2
For this 3 interfaces are important
Packit Service 50c9f2
- The format of the input for the index tool.
Packit Service 50c9f2
- The format of the input for the search engine.
Packit Service 50c9f2
- The format of the output of search engine.
Packit Service 50c9f2
Packit Service 50c9f2
The next subsections describe these interfaces in more detail.
Packit Service 50c9f2
Packit Service 50c9f2
\subsection extsearch_api_index Indexer input format
Packit Service 50c9f2
Packit Service 50c9f2
The search data produced by doxygen follows the 
Packit Service 50c9f2
Solr XML index message
Packit Service 50c9f2
format.
Packit Service 50c9f2
Packit Service 50c9f2
The input for the indexer is an XML file, which consists of one `<add>` tag containing 
Packit Service 50c9f2
multiple `<doc>` tags, which in turn contain multiple `<field>` tags. 
Packit Service 50c9f2
Packit Service 50c9f2
Here is an example of one doc node, which contains the search data and meta data for 
Packit Service 50c9f2
one method:
Packit Service 50c9f2
Packit Service 50c9f2
    <add>
Packit Service 50c9f2
      ...
Packit Service 50c9f2
      <doc>
Packit Service 50c9f2
        <field name="type">function</field>
Packit Service 50c9f2
        <field name="name">QXmlReader::setDTDHandler</field>
Packit Service 50c9f2
        <field name="args">(QXmlDTDHandler *handler)=0</field>
Packit Service 50c9f2
        <field name="tag">qtools.tag</field>
Packit Service 50c9f2
        <field name="url">de/df6/class_q_xml_reader.html#a0b24b1fe26a4c32a8032d68ee14d5dba</field>
Packit Service 50c9f2
        <field name="keywords">setDTDHandler QXmlReader::setDTDHandler QXmlReader</field>
Packit Service 50c9f2
        <field name="text">Sets the DTD handler to handler DTDHandler()</field>
Packit Service 50c9f2
      </doc>
Packit Service 50c9f2
      ...
Packit Service 50c9f2
    </add>
Packit Service 50c9f2
Packit Service 50c9f2
Each field has a name. The following field names are supported:
Packit Service 50c9f2
- *type*: the type of the search entry; can be one of: source, function, slot, 
Packit Service 50c9f2
          signal, variable, typedef, enum, enumvalue, property, event, related, 
Packit Service 50c9f2
          friend, define, file, namespace, group, package, page, dir
Packit Service 50c9f2
- *name*: the name of the search entry; for a method this is the qualified name of the method,
Packit Service 50c9f2
          for a class it is the name of the class, etc.
Packit Service 50c9f2
- *args*: the parameter list (in case of functions or methods)
Packit Service 50c9f2
- *tag*:  the name of the tag file used for this project.
Packit Service 50c9f2
- *url*:  the (relative) URL to the HTML documentation for this entry.
Packit Service 50c9f2
- *keywords*: important words that are representative for the entry. When searching for such
Packit Service 50c9f2
          keyword, this entry should get a higher rank in the search results.
Packit Service 50c9f2
- *text*: the documentation associated with the item. Note that only words are present, no markup.
Packit Service 50c9f2
Packit Service 50c9f2
@note Due to the potentially large size of the XML file, it is recommended to use a 
Packit Service 50c9f2
SAX based parser to process it.
Packit Service 50c9f2
Packit Service 50c9f2
\subsection extsearch_api_search_in Search URL format
Packit Service 50c9f2
Packit Service 50c9f2
When the search engine is invoked from a doxygen generated HTML page, a number of parameters are
Packit Service 50c9f2
passed to via the query string.
Packit Service 50c9f2
Packit Service 50c9f2
The following fields are passed:
Packit Service 50c9f2
- *q*:  the query text as entered by the user
Packit Service 50c9f2
- *n*:  the number of search results requested.
Packit Service 50c9f2
- *p*:  the number of search page for which to return the results. Each page has *n* values.
Packit Service 50c9f2
- *cb*: the name of the callback function, used for JSON with padding, see the next section.
Packit Service 50c9f2
       
Packit Service 50c9f2
From the complete list of search results, the range `[n*p - n*(p+1)-1]` should be returned.
Packit Service 50c9f2
Packit Service 50c9f2
Here is an example of how a query looks like.
Packit Service 50c9f2
Packit Service 50c9f2
    http://yoursite.com/path/to/cgi/doxysearch.cgi?q=list&n=20&p=1&cb=dummy
Packit Service 50c9f2
Packit Service 50c9f2
It represents a query for the word 'list' (`q=list`) requesting 20 search results (`n=20`), 
Packit Service 50c9f2
starting with the result number 20 (`p=1`) and using callback 'dummy' (`cb=dummy`):
Packit Service 50c9f2
Packit Service 50c9f2
Packit Service 50c9f2
@note The values are URL encoded so they
Packit Service 50c9f2
have to be decoded before they can be used.
Packit Service 50c9f2
Packit Service 50c9f2
\subsection extsearch_api_search_out Search results format
Packit Service 50c9f2
Packit Service 50c9f2
When invoking the search engine as shown in the previous subsection, it should reply with
Packit Service 50c9f2
the results. The format of the reply is
Packit Service 50c9f2
JSON with padding, which is basically
Packit Service 50c9f2
a javascript struct wrapped in a function call. The name of function should be the name of
Packit Service 50c9f2
the callback (as passed with the *cb* field in the query).
Packit Service 50c9f2
Packit Service 50c9f2
With the example query as shown the previous subsection the main structure of the reply should
Packit Service 50c9f2
look as follows:
Packit Service 50c9f2
Packit Service 50c9f2
    dummy({
Packit Service 50c9f2
      "hits":179,
Packit Service 50c9f2
      "first":20,
Packit Service 50c9f2
      "count":20,
Packit Service 50c9f2
      "page":1,
Packit Service 50c9f2
      "pages":9,
Packit Service 50c9f2
      "query": "list",
Packit Service 50c9f2
      "items":[
Packit Service 50c9f2
      ...
Packit Service 50c9f2
     ]})
Packit Service 50c9f2
Packit Service 50c9f2
The fields have the following meaning:
Packit Service 50c9f2
- *hits*:  the total number of search results (could be more than was requested).
Packit Service 50c9f2
- *first*: the index of first result returned: \f$\min(n*p,\mbox{\em hits})\f$.
Packit Service 50c9f2
- *count*: the actual number of results returned: \f$\min(n,\mbox{\em hits}-\mbox{\em first})\f$
Packit Service 50c9f2
- *page*:  the page number of the result: \f$p\f$
Packit Service 50c9f2
- *pages*: the total number of pages: \f$\lceil\frac{\mbox{\em hits}}{n}\rceil\f$.
Packit Service 50c9f2
- *items*: an array containing the search data per result.
Packit Service 50c9f2
Packit Service 50c9f2
Here is an example of how the element of the *items* array should look like:
Packit Service 50c9f2
Packit Service 50c9f2
    {"type": "function",
Packit Service 50c9f2
     "name": "QDir::entryInfoList(const QString &nameFilter, int filterSpec=DefaultFilter, int sortSpec=DefaultSort) const",
Packit Service 50c9f2
     "tag": "qtools.tag",
Packit Service 50c9f2
     "url": "d5/d8d/class_q_dir.html#a9439ea6b331957f38dbad981c4d050ef",
Packit Service 50c9f2
     "fragments":[
Packit Service 50c9f2
       "Returns a list of QFileInfo objects for all files and directories...",
Packit Service 50c9f2
       "... pointer to a QFileInfoList The list is owned by the QDir object...",
Packit Service 50c9f2
       "... to keep the entries of the list after a subsequent call to this..."
Packit Service 50c9f2
     ]
Packit Service 50c9f2
    },
Packit Service 50c9f2
Packit Service 50c9f2
The fields for such an item have the following meaning:
Packit Service 50c9f2
- *type*: the type of the item, as found in the field with name "type" in the raw search data.
Packit Service 50c9f2
- *name*: the name of the item, including the parameter list, as found in the fields with
Packit Service 50c9f2
          name "name" and "args" in the raw search data.
Packit Service 50c9f2
- *tag*:  the name of the tag file, as found in the field with name "tag" in the raw search data.
Packit Service 50c9f2
- *url*:  the name of the (relative) URL to the documentation, as found in the field with name "url"
Packit Service 50c9f2
          in the raw search data.
Packit Service 50c9f2
- "fragments": an array with 0 or more fragments of text containing words that have been search for.
Packit Service 50c9f2
          These words should be wrapped in `` and `` tags to highlight them
Packit Service 50c9f2
          in the output.
Packit Service 50c9f2
*/