Blame doc/arch.doc

Packit 1c1d7e
/******************************************************************************
Packit 1c1d7e
 *
Packit 1c1d7e
 * 
Packit 1c1d7e
 *
Packit 1c1d7e
 * Copyright (C) 1997-2015 by Dimitri van Heesch.
Packit 1c1d7e
 *
Packit 1c1d7e
 * Permission to use, copy, modify, and distribute this software and its
Packit 1c1d7e
 * documentation under the terms of the GNU General Public License is hereby 
Packit 1c1d7e
 * granted. No representations are made about the suitability of this software 
Packit 1c1d7e
 * for any purpose. It is provided "as is" without express or implied warranty.
Packit 1c1d7e
 * See the GNU General Public License for more details.
Packit 1c1d7e
 *
Packit 1c1d7e
 * Documents produced by Doxygen are derivative works derived from the
Packit 1c1d7e
 * input used in their production; they are not affected by this license.
Packit 1c1d7e
 *
Packit 1c1d7e
 */
Packit 1c1d7e
/*! \page arch Doxygen's Internals
Packit 1c1d7e
Packit 1c1d7e

Doxygen's internals

Packit 1c1d7e
Packit 1c1d7e
Note that this section is still under construction!
Packit 1c1d7e
Packit 1c1d7e
The following picture shows how source files are processed by doxygen.
Packit 1c1d7e
Packit 1c1d7e
\image html archoverview.gif "Data flow overview"
Packit 1c1d7e
\image latex archoverview.eps "Data flow overview" width=14cm
Packit 1c1d7e
Packit 1c1d7e
The following sections explain the steps above in more detail.
Packit 1c1d7e
Packit 1c1d7e

Config parser

Packit 1c1d7e
Packit 1c1d7e
The configuration file that controls the settings of a project is parsed
Packit 1c1d7e
and the settings are stored in the singleton class \c Config 
Packit 1c1d7e
in src/config.h. The parser itself is written using \c flex 
Packit 1c1d7e
and can be found in src/config.l. This parser is also used 
Packit 1c1d7e
directly by \c doxywizard, so it is put in a separate library.
Packit 1c1d7e
Packit 1c1d7e
Each configuration option has one of 5 possible types: \c String, 
Packit 1c1d7e
\c List, \c Enum, \c Int, or \c Bool. The values of these options are
Packit 1c1d7e
available through the global functions \c Config_getXXX(), where \c XXX is the
Packit 1c1d7e
type of the option. The argument of these function is a string naming
Packit 1c1d7e
the option as it appears in the configuration file. For instance: 
Packit 1c1d7e
\c Config_getBool("GENERATE_TESTLIST") returns a reference to a boolean
Packit 1c1d7e
value that is \c TRUE if the test list was enabled in the config file. 
Packit 1c1d7e
Packit 1c1d7e
The function \c readConfiguration() in \c src/doxygen.cpp 
Packit 1c1d7e
reads the command line options and then calls the configuration parser.
Packit 1c1d7e
Packit 1c1d7e

C Preprocessor

Packit 1c1d7e
Packit 1c1d7e
The input files mentioned in the config file are (by default) fed to the
Packit 1c1d7e
C Preprocessor (after being piped through a user defined filter if available).
Packit 1c1d7e
Packit 1c1d7e
The way the preprocessor works differs somewhat from a standard C Preprocessor.
Packit 1c1d7e
By default it does not do macro expansion, although it can be configured to
Packit 1c1d7e
expand all macros. Typical usage is to only expand a user specified set
Packit 1c1d7e
of macros. This is to allow macro names to appear in the type of 
Packit 1c1d7e
function parameters for instance.
Packit 1c1d7e
Packit 1c1d7e
Another difference is that the preprocessor parses, but not actually includes 
Packit 1c1d7e
code when it encounters a \c \#include (with the exception of \c \#include
Packit 1c1d7e
found inside { ... } blocks). The reasons behind this deviation from 
Packit 1c1d7e
the standard is to prevent feeding multiple definitions of the 
Packit 1c1d7e
same functions/classes to doxygen's parser. If all source files would 
Packit 1c1d7e
include a common header file for instance, the class and type 
Packit 1c1d7e
definitions (and their documentation) would be present in each 
Packit 1c1d7e
translation unit. 
Packit 1c1d7e
Packit 1c1d7e
The preprocessor is written using \c flex and can be found in
Packit 1c1d7e
\c src/pre.l. For condition blocks (\c \#if) evaluation of constant expressions 
Packit 1c1d7e
is needed. For this a \c yacc based parser is used, which can be found 
Packit 1c1d7e
in \c src/constexp.y and \c src/constexp.l.
Packit 1c1d7e
Packit 1c1d7e
The preprocessor is invoked for each file using the \c preprocessFile() 
Packit 1c1d7e
function declared in \c src/pre.h, and will append the preprocessed result 
Packit 1c1d7e
to a character buffer. The format of the character buffer is
Packit 1c1d7e
Packit 1c1d7e
\verbatim
Packit 1c1d7e
0x06 file name 1 
Packit 1c1d7e
0x06 preprocessed contents of file 1
Packit 1c1d7e
...
Packit 1c1d7e
0x06 file name n
Packit 1c1d7e
0x06 preprocessed contents of file n
Packit 1c1d7e
\endverbatim
Packit 1c1d7e
Packit 1c1d7e

Language parser

Packit 1c1d7e
Packit 1c1d7e
The preprocessed input buffer is fed to the language parser, which is 
Packit 1c1d7e
implemented as a big state machine using \c flex. It can be found 
Packit 1c1d7e
in the file \c src/scanner.l. There is one parser for all 
Packit 1c1d7e
languages (C/C++/Java/IDL). The state variables \c insideIDL 
Packit 1c1d7e
and \c insideJava are uses at some places for language specific choices. 
Packit 1c1d7e
Packit 1c1d7e
The task of the parser is to convert the input buffer into a tree of entries 
Packit 1c1d7e
(basically an abstract syntax tree). An entry is defined in \c src/entry.h 
Packit 1c1d7e
and is a blob of loosely structured information. The most important field 
Packit 1c1d7e
is \c section which specifies the kind of information contained in the entry.
Packit 1c1d7e
 
Packit 1c1d7e
Possible improvements for future versions:
Packit 1c1d7e
 - Use one scanner/parser per language instead of one big scanner.
Packit 1c1d7e
 - Move the first pass parsing of documentation blocks to a separate module.
Packit 1c1d7e
 - Parse defines (these are currently gathered by the preprocessor, and
Packit 1c1d7e
   ignored by the language parser).
Packit 1c1d7e
Packit 1c1d7e

Data organizer

Packit 1c1d7e
Packit 1c1d7e
This step consists of many smaller steps, that build 
Packit 1c1d7e
dictionaries of the extracted classes, files, namespaces, 
Packit 1c1d7e
variables, functions, packages, pages, and groups. Besides building
Packit 1c1d7e
dictionaries, during this step relations (such as inheritance relations),
Packit 1c1d7e
between the extracted entities are computed.
Packit 1c1d7e
Packit 1c1d7e
Each step has a function defined in \c src/doxygen.cpp, which operates
Packit 1c1d7e
on the tree of entries, built during language parsing. Look at the
Packit 1c1d7e
"Gathering information" part of \c parseInput() for details.
Packit 1c1d7e
Packit 1c1d7e
The result of this step is a number of dictionaries, which can be
Packit 1c1d7e
found in the doxygen "namespace" defined in \c src/doxygen.h. Most
Packit 1c1d7e
elements of these dictionaries are derived from the class \c Definition;
Packit 1c1d7e
The class \c MemberDef, for instance, holds all information for a member. 
Packit 1c1d7e
An instance of such a class can be part of a file ( class \c FileDef ), 
Packit 1c1d7e
a class ( class \c ClassDef ), a namespace ( class \c NamespaceDef ), 
Packit 1c1d7e
a group ( class \c GroupDef ), or a Java package ( class \c PackageDef ).
Packit 1c1d7e
Packit 1c1d7e

Tag file parser

Packit 1c1d7e
Packit 1c1d7e
If tag files are specified in the configuration file, these are parsed
Packit 1c1d7e
by a SAX based XML parser, which can be found in \c src/tagreader.cpp. 
Packit 1c1d7e
The result of parsing a tag file is the insertion of \c Entry objects in the
Packit 1c1d7e
entry tree. The field \c Entry::tagInfo is used to mark the entry as
Packit 1c1d7e
external, and holds information about the tag file.
Packit 1c1d7e
Packit 1c1d7e

Documentation parser

Packit 1c1d7e
Packit 1c1d7e
Special comment blocks are stored as strings in the entities that they
Packit 1c1d7e
document. There is a string for the brief description and a string
Packit 1c1d7e
for the detailed description. The documentation parser reads these
Packit 1c1d7e
strings and executes the commands it finds in it (this is the second pass
Packit 1c1d7e
in parsing the documentation). It writes the result directly to the output 
Packit 1c1d7e
generators.
Packit 1c1d7e
Packit 1c1d7e
The parser is written in C++ and can be found in src/docparser.cpp. The
Packit 1c1d7e
tokens that are eaten by the parser come from src/doctokenizer.l.
Packit 1c1d7e
Code fragments found in the comment blocks are passed on to the source parser.
Packit 1c1d7e
Packit 1c1d7e
The main entry point for the documentation parser is \c validatingParseDoc()
Packit 1c1d7e
declared in \c src/docparser.h.  For simple texts with special 
Packit 1c1d7e
commands \c validatingParseText() is used.
Packit 1c1d7e
Packit 1c1d7e

Source parser

Packit 1c1d7e
Packit 1c1d7e
If source browsing is enabled or if code fragments are encountered in the
Packit 1c1d7e
documentation, the source parser is invoked.
Packit 1c1d7e
Packit 1c1d7e
The code parser tries to cross-reference to source code it parses with
Packit 1c1d7e
documented entities. It also does syntax highlighting of the sources. The
Packit 1c1d7e
output is directly written to the output generators.
Packit 1c1d7e
Packit 1c1d7e
The main entry point for the code parser is \c parseCode() 
Packit 1c1d7e
declared in \c src/code.h.
Packit 1c1d7e
Packit 1c1d7e

Output generators

Packit 1c1d7e
Packit 1c1d7e
After data is gathered and cross-referenced, doxygen generates 
Packit 1c1d7e
output in various formats. For this it uses the methods provided by 
Packit 1c1d7e
the abstract class \c OutputGenerator. In order to generate output
Packit 1c1d7e
for multiple formats at once, the methods of \c OutputList are called
Packit 1c1d7e
instead. This class maintains a list of concrete output generators,
Packit 1c1d7e
where each method called is delegated to all generators in the list.
Packit 1c1d7e
Packit 1c1d7e
To allow small deviations in what is written to the output for each
Packit 1c1d7e
concrete output generator, it is possible to temporarily disable certain
Packit 1c1d7e
generators. The OutputList class contains various \c disable() and \c enable()
Packit 1c1d7e
methods for this. The methods \c OutputList::pushGeneratorState() and 
Packit 1c1d7e
\c OutputList::popGeneratorState() are used to temporarily save the
Packit 1c1d7e
set of enabled/disabled output generators on a stack. 
Packit 1c1d7e
Packit 1c1d7e
The XML is generated directly from the gathered data structures. In the
Packit 1c1d7e
future XML will be used as an intermediate language (IL). The output
Packit 1c1d7e
generators will then use this IL as a starting point to generate the
Packit 1c1d7e
specific output formats. The advantage of having an IL is that various
Packit 1c1d7e
independently developed tools written in various languages, 
Packit 1c1d7e
could extract information from the XML output. Possible tools could be:
Packit 1c1d7e
- an interactive source browser
Packit 1c1d7e
- a class diagram generator
Packit 1c1d7e
- computing code metrics.
Packit 1c1d7e
Packit 1c1d7e

Debugging

Packit 1c1d7e
Packit 1c1d7e
Since doxygen uses a lot of \c flex code it is important to understand
Packit 1c1d7e
how \c flex works (for this one should read the \c man page) 
Packit 1c1d7e
and to understand what it is doing when \c flex is parsing some input. 
Packit 1c1d7e
Fortunately, when flex is used with the `-d` option it outputs what rules
Packit 1c1d7e
matched. This makes it quite easy to follow what is going on for a 
Packit 1c1d7e
particular input fragment. 
Packit 1c1d7e
Packit 1c1d7e
To make it easier to toggle debug information for a given flex file I
Packit 1c1d7e
wrote the following perl script, which automatically adds or removes `-d`
Packit 1c1d7e
from the correct line in the \c Makefile:
Packit 1c1d7e
Packit 1c1d7e
\verbatim
Packit 1c1d7e
#!/usr/bin/perl 
Packit 1c1d7e
Packit 1c1d7e
$file = shift @ARGV;
Packit 1c1d7e
print "Toggle debugging mode for $file\n";
Packit 1c1d7e
if (!-e "../src/${file}.l")
Packit 1c1d7e
{
Packit 1c1d7e
  print STDERR "Error: file ../src/${file}.l does not exist!\n";
Packit 1c1d7e
  exit 1;
Packit 1c1d7e
}
Packit 1c1d7e
system("touch ../src/${file}.l");
Packit 1c1d7e
unless (rename "src/CMakeFiles/_doxygen.dir/build.make","src/CMakeFiles/_doxygen.dir/build.make.old") {
Packit 1c1d7e
  print STDERR "Error: cannot rename src/CMakeFiles/_doxygen.dir/build.make!\n";
Packit 1c1d7e
  exit 1;
Packit 1c1d7e
}
Packit 1c1d7e
if (open(F,"
Packit 1c1d7e
  unless (open(G,">src/CMakeFiles/_doxygen.dir/build.make")) {
Packit 1c1d7e
    print STDERR "Error: opening file build.make for writing\n";
Packit 1c1d7e
    exit 1;
Packit 1c1d7e
  }
Packit 1c1d7e
  print "Processing build.make...\n";
Packit 1c1d7e
  while (<F>) {
Packit 1c1d7e
    if ( s/flex \$\(LEX_FLAGS\) -d(.*) ${file}.l/flex \$(LEX_FLAGS)$1 ${file}.l/ ) {
Packit 1c1d7e
      print "Disabling debug info for $file\n";
Packit 1c1d7e
    }
Packit 1c1d7e
    elsif ( s/flex \$\(LEX_FLAGS\)(.*) ${file}.l$/flex \$(LEX_FLAGS) -d$1 ${file}.l/ ) {
Packit 1c1d7e
      print "Enabling debug info for $file.l\n";
Packit 1c1d7e
    }
Packit 1c1d7e
    print G "$_";
Packit 1c1d7e
  }
Packit 1c1d7e
  close F;
Packit 1c1d7e
  unlink "src/CMakeFiles/_doxygen.dir/build.make.old";
Packit 1c1d7e
}
Packit 1c1d7e
else {
Packit 1c1d7e
  print STDERR "Warning file src/CMakeFiles/_doxygen.dir/build.make does not exist!\n"; 
Packit 1c1d7e
}
Packit 1c1d7e
Packit 1c1d7e
# touch the file
Packit 1c1d7e
$now = time;
Packit 1c1d7e
utime $now, $now, $file;
Packit 1c1d7e
\endverbatim
Packit 1c1d7e
Another way to get rules matching / debugging information
Packit 1c1d7e
from the \c flex code is setting LEX_FLAGS with \c make (`make LEX_FLAGS=-d`).
Packit 1c1d7e
Packit 1c1d7e
Note that by running doxygen with `-d lex` you get information about which 
Packit 1c1d7e
`flex codefile` is used.
Packit 1c1d7e
Packit 1c1d7e
\htmlonly
Packit 1c1d7e
Return to the index.
Packit 1c1d7e
\endhtmlonly
Packit 1c1d7e
Packit 1c1d7e
*/
Packit 1c1d7e
Packit 1c1d7e