Blame doc/docbook2X.texi

Packit e4b6da
\input texinfo
Packit e4b6da
@setfilename docbook2X.info
Packit e4b6da
@documentencoding us-ascii
Packit e4b6da
@settitle docbook2X
Packit e4b6da
@dircategory Document Preparation
Packit e4b6da
@direntry
Packit e4b6da
* docbook2X: (docbook2X).       Convert DocBook into man pages and Texinfo
Packit e4b6da
@end direntry
Packit e4b6da
Packit e4b6da
@node Top, Quick start, , (dir)
Packit e4b6da
@documentlanguage en
Packit e4b6da
@top docbook2X
Packit e4b6da
@cindex DocBook
Packit e4b6da
Packit e4b6da
@i{docbook2X} converts 
Packit e4b6da
DocBook
Packit e4b6da
documents into man pages and 
Packit e4b6da
Texinfo documents.
Packit e4b6da
Packit e4b6da
It aims to support DocBook version 4.2, excepting the features
Packit e4b6da
that cannot be supported or are not useful in a man page or 
Packit e4b6da
Texinfo document.
Packit e4b6da
@cindex web site
Packit e4b6da
@cindex download
Packit e4b6da
Packit e4b6da
For information on the latest releases of docbook2X, and downloads,
Packit e4b6da
please visit the @uref{http://docbook2x.sourceforge.net/,docbook2X home page}.
Packit e4b6da
Packit e4b6da
@menu
Packit e4b6da
* Quick start::                 Examples to get you started
Packit e4b6da
* Converting to man pages::     Details on man-page conversion
Packit e4b6da
* Converting to Texinfo::       Details on Texinfo conversion
Packit e4b6da
* The XSLT stylesheets::        How to run the docbook2X XSLT stylesheets
Packit e4b6da
* Character set conversion::    Discussion on reproducing non-ASCII
Packit e4b6da
                                  characters in the converted output
Packit e4b6da
* FAQ::                         Answers and tips for common problems
Packit e4b6da
* Performance analysis::        Discussion on conversion speed
Packit e4b6da
* How docbook2X is tested::     Discussion of correctness-testing
Packit e4b6da
* To-do list::                  Ideas for future improvements
Packit e4b6da
* Release history::             Changes to the package between releases
Packit e4b6da
* Design notes::                Author's notes on the grand scheme of
Packit e4b6da
                                  docbook2X
Packit e4b6da
* Package installation::        Where to get docbook2X, and details on how
Packit e4b6da
                                  to install it
Packit e4b6da
* Index: Concept index.
Packit e4b6da
Packit e4b6da
@detailmenu
Packit e4b6da
--- The Detailed Node Listing ---
Packit e4b6da
Packit e4b6da
Converting to man pages
Packit e4b6da
Packit e4b6da
* docbook2man: docbook2man wrapper script.   Convert DocBook to man pages
Packit e4b6da
* db2x_manxml::                 Make man pages from Man-XML
Packit e4b6da
Packit e4b6da
Converting to Texinfo
Packit e4b6da
Packit e4b6da
* docbook2texi: docbook2texi wrapper script.   Convert DocBook to Texinfo
Packit e4b6da
* db2x_texixml::                Make Texinfo files from Texi-XML
Packit e4b6da
Packit e4b6da
The XSLT stylesheets
Packit e4b6da
Packit e4b6da
* db2x_xsltproc::               XSLT processor invocation wrapper
Packit e4b6da
* sgml2xml-isoent::             Convert SGML to XML with support for ISO
Packit e4b6da
                                  entities
Packit e4b6da
Packit e4b6da
Character set conversion
Packit e4b6da
Packit e4b6da
* utf8trans::                   Transliterate UTF-8 characters according to
Packit e4b6da
                                  a table
Packit e4b6da
Packit e4b6da
Package installation
Packit e4b6da
Packit e4b6da
* Installation::                Package install procedure
Packit e4b6da
* Dependencies on other software::   Other software packages that docbook2X
Packit e4b6da
                                       needs
Packit e4b6da
Packit e4b6da
@end detailmenu
Packit e4b6da
@end menu
Packit e4b6da
Packit e4b6da
@node Quick start, Converting to man pages, Top, Top
Packit e4b6da
@chapter Quick start
Packit e4b6da
@cindex example usage
Packit e4b6da
@cindex converting to man pages
Packit e4b6da
@cindex converting to Texinfo
Packit e4b6da
Packit e4b6da
To convert to man pages, you run the command @code{docbook2man} (@pxref{docbook2man wrapper script}).  For example,
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ docbook2man --solinks manpages.xml
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
The man pages will be output to your current directory.
Packit e4b6da
Packit e4b6da
The @code{--solinks} options tells @code{docbook2man} to create man page
Packit e4b6da
links.  You may want to omit this option when developing documentation
Packit e4b6da
so that your working directory does not explode with many stub man pages.
Packit e4b6da
(If you don't know what this means, you can read about it in detail in @code{db2x_manxml},
Packit e4b6da
or just ignore the previous two sentences and always specify this option.)
Packit e4b6da
Packit e4b6da
To convert to Texinfo, you run the command @code{docbook2texi} (@pxref{docbook2texi wrapper script}).  For example,
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ docbook2texi tdg.xml
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
One (or more) Texinfo files will be output to your current directory.
Packit e4b6da
Packit e4b6da
The rest of this manual describes in detail all the other options
Packit e4b6da
and how to customize docbook2X's output.
Packit e4b6da
Packit e4b6da
@node Converting to man pages, Converting to Texinfo, Quick start, Top
Packit e4b6da
@chapter Converting to man pages
Packit e4b6da
@cindex man pages
Packit e4b6da
@cindex converting to man pages
Packit e4b6da
@cindex XSLT stylesheets
Packit e4b6da
@cindex Man-XML
Packit e4b6da
Packit e4b6da
DocBook documents are converted to man pages in two steps:
Packit e4b6da
Packit e4b6da
@enumerate 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The DocBook source is converted by a XSLT stylesheet into an 
Packit e4b6da
intermediate XML format, Man-XML.
Packit e4b6da
Packit e4b6da
Man-XML is simpler than DocBook and closer to the man page format;
Packit e4b6da
it is intended to make the stylesheets' job easier.
Packit e4b6da
Packit e4b6da
The stylesheet for this purpose is in
Packit e4b6da
@file{xslt/man/docbook.xsl}.
Packit e4b6da
For portability, it should always be referred to
Packit e4b6da
by the following URI:
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
http://docbook2x.sourceforge.net/latest/xslt/man/docbook.xsl
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
Run this stylesheet with @ref{db2x_xsltproc,,@code{db2x_xsltproc}}.
Packit e4b6da
@cindex customizing
Packit e4b6da
Packit e4b6da
@strong{Customizing. } 
Packit e4b6da
You can also customize the output by
Packit e4b6da
creating your own XSLT stylesheet ---
Packit e4b6da
changing parameters or adding new templates ---
Packit e4b6da
and importing @file{xslt/man/docbook.xsl}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Man-XML is converted to the actual man pages by @ref{db2x_manxml,,@code{db2x_manxml}}.
Packit e4b6da
@end enumerate
Packit e4b6da
Packit e4b6da
The @code{docbook2man} (@pxref{docbook2man wrapper script}) command does both steps automatically,
Packit e4b6da
but if any problems occur, you can see the errors more clearly
Packit e4b6da
if you do each step separately:
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ db2x_xsltproc -s man mydoc.xml -o mydoc.mxml
Packit e4b6da
$ db2x_manxml mydoc.mxml
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
Options to the conversion stylesheet are described in
Packit e4b6da
@ref{Top,,the man-pages stylesheets reference,docbook2man-xslt,docbook2X Man-pages Stylesheets Reference}.
Packit e4b6da
@cindex pure XSLT
Packit e4b6da
Packit e4b6da
@strong{Pure XSLT conversion. } 
Packit e4b6da
An alternative to the @code{db2x_manxml} Perl script is the XSLT
Packit e4b6da
stylesheet in 
Packit e4b6da
@file{xslt/backend/db2x_manxml.xsl}.
Packit e4b6da
This stylesheet performs a similar function
Packit e4b6da
of converting Man-XML to actual man pages.
Packit e4b6da
It is useful if you desire a pure XSLT
Packit e4b6da
solution to man-page conversion.
Packit e4b6da
Of course, the quality of the conversion using this stylesheet
Packit e4b6da
will never be as good as the Perl @code{db2x_manxml},
Packit e4b6da
and it runs slower.  
Packit e4b6da
In particular, the pure XSLT version
Packit e4b6da
currently does not support tables in man pages,
Packit e4b6da
but its Perl counterpart does.
Packit e4b6da
Packit e4b6da
@menu
Packit e4b6da
* docbook2man: docbook2man wrapper script.   Convert DocBook to man pages
Packit e4b6da
* db2x_manxml::                 Make man pages from Man-XML
Packit e4b6da
@end menu
Packit e4b6da
Packit e4b6da
@node docbook2man wrapper script, db2x_manxml, , Converting to man pages
Packit e4b6da
@section docbook2man
Packit e4b6da
@cindex man pages
Packit e4b6da
@cindex converting to man pages
Packit e4b6da
@cindex wrapper script
Packit e4b6da
@cindex @code{docbook2man}
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{docbook2man} --- Convert DocBook to man pages
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{docbook2man [options]  xml-document }
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
Packit e4b6da
@code{docbook2man} converts the given DocBook XML document into man pages.
Packit e4b6da
By default, the man pages will be output to the current directory.
Packit e4b6da
Packit e4b6da
@cindex @code{refentry}
Packit e4b6da
Only the @code{refentry} content
Packit e4b6da
in the DocBook document is converted.
Packit e4b6da
(To convert content outside of a @code{refentry},
Packit e4b6da
stylesheet customization is required.  See the docbook2X
Packit e4b6da
package for details.)
Packit e4b6da
Packit e4b6da
The @code{docbook2man} command is a wrapper script
Packit e4b6da
for a two-step conversion process.
Packit e4b6da
@subheading Options
Packit e4b6da
Packit e4b6da
The available options are essentially the union of the options
Packit e4b6da
from @ref{db2x_xsltproc,,@code{db2x_xsltproc}} and @ref{db2x_manxml,,@code{db2x_manxml}}.
Packit e4b6da
Packit e4b6da
Some commonly-used options are listed below:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--encoding=@var{encoding}}
Packit e4b6da
Sets the character encoding of the output.
Packit e4b6da
Packit e4b6da
@item @code{--string-param @var{parameter}=@var{value}}
Packit e4b6da
Sets a stylesheet parameter (options that affect how the output looks).
Packit e4b6da
See ``Stylesheet parameters'' below for the parameters that
Packit e4b6da
can be set.
Packit e4b6da
Packit e4b6da
@item @code{--sgml}
Packit e4b6da
Accept an SGML source document as input instead of XML.
Packit e4b6da
Packit e4b6da
@item @code{--solinks}
Packit e4b6da
Make stub pages for alternate names for an output man page.
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Stylesheet parameters
Packit e4b6da
@cindex stylesheet parameters
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{uppercase-headings}
Packit e4b6da
@strong{Brief. } Make headings uppercase?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
Headings in man page content should be or should not be uppercased.
Packit e4b6da
Packit e4b6da
@item @code{manvolnum-cite-numeral-only}
Packit e4b6da
@strong{Brief. } Man page section citation should use only the number
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
When citing other man pages, the man-page section is either given as is,
Packit e4b6da
or has the letters stripped from it, citing only the number of the
Packit e4b6da
section (e.g. section @samp{3x} becomes
Packit e4b6da
@samp{3}).  This option specifies which style. 
Packit e4b6da
Packit e4b6da
@item @code{quotes-on-literals}
Packit e4b6da
@strong{Brief. } Display quotes on @code{literal}
Packit e4b6da
elements?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{0} (boolean false)
Packit e4b6da
Packit e4b6da
If true, render @code{literal} elements
Packit e4b6da
with quotes around them.
Packit e4b6da
Packit e4b6da
@item @code{show-comments}
Packit e4b6da
@strong{Brief. } Display @code{comment} elements?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, comments will be displayed, otherwise they are suppressed.
Packit e4b6da
Comments here refers to the @code{comment} element,
Packit e4b6da
which will be renamed @code{remark} in DocBook V4.0,
Packit e4b6da
not XML comments (<-- like this -->) which are unavailable.
Packit e4b6da
Packit e4b6da
@item @code{function-parens}
Packit e4b6da
@strong{Brief. } Generate parentheses after a function?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{0} (boolean false)
Packit e4b6da
Packit e4b6da
If true, the formatting of
Packit e4b6da
a @code{<function>} element will include
Packit e4b6da
generated parenthesis.
Packit e4b6da
Packit e4b6da
@item @code{xref-on-link}
Packit e4b6da
@strong{Brief. } Should @code{link} generate a
Packit e4b6da
cross-reference?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
Man pages cannot render the hypertext links created by @code{link}.  If this option is set, then the
Packit e4b6da
stylesheet renders a cross reference to the target of the link.
Packit e4b6da
(This may reduce clutter).  Otherwise, only the content of the @code{link} is rendered and the actual link itself is
Packit e4b6da
ignored.
Packit e4b6da
Packit e4b6da
@item @code{header-3}
Packit e4b6da
@strong{Brief. } Third header text
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
Packit e4b6da
Specifies the text of the third header of a man page,
Packit e4b6da
typically the date for the man page.  If empty, the @code{date} content for the @code{refentry} is used.
Packit e4b6da
Packit e4b6da
@item @code{header-4}
Packit e4b6da
@strong{Brief. } Fourth header text
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
Packit e4b6da
Specifies the text of the fourth header of a man page.
Packit e4b6da
If empty, the @code{refmiscinfo} content for
Packit e4b6da
the @code{refentry} is used.
Packit e4b6da
Packit e4b6da
@item @code{header-5}
Packit e4b6da
@strong{Brief. } Fifth header text
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
Packit e4b6da
Specifies the text of the fifth header of a man page.
Packit e4b6da
If empty, the `manual name', that is, the title of the
Packit e4b6da
@code{book} or @code{reference} container is used.
Packit e4b6da
Packit e4b6da
@item @code{default-manpage-section}
Packit e4b6da
@strong{Brief. } Default man page section
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1}
Packit e4b6da
Packit e4b6da
The source document usually indicates the sections that each man page
Packit e4b6da
should belong to (with @code{manvolnum} in
Packit e4b6da
@code{refmeta}).  In case the source
Packit e4b6da
document does not indicate man-page sections, this option specifies the
Packit e4b6da
default.
Packit e4b6da
Packit e4b6da
@item @code{custom-localization-file}
Packit e4b6da
@strong{Brief. } URI of XML document containing custom localization data
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
Packit e4b6da
This parameter specifies the URI of a XML document
Packit e4b6da
that describes text translations (and other locale-specific information)
Packit e4b6da
that is needed by the stylesheet to process the DocBook document.
Packit e4b6da
Packit e4b6da
The text translations pointed to by this parameter always
Packit e4b6da
override the default text translations 
Packit e4b6da
(from the internal parameter @code{localization-file}).
Packit e4b6da
If a particular translation is not present here,
Packit e4b6da
the corresponding default translation 
Packit e4b6da
is used as a fallback.
Packit e4b6da
Packit e4b6da
This parameter is primarily for changing certain
Packit e4b6da
punctuation characters used in formatting the source document.
Packit e4b6da
The settings for punctuation characters are often specific
Packit e4b6da
to the source document, but can also be dependent on the locale.
Packit e4b6da
Packit e4b6da
To not use custom text translations, leave this parameter 
Packit e4b6da
as the empty string.
Packit e4b6da
Packit e4b6da
@item @code{custom-l10n-data}
Packit e4b6da
@strong{Brief. } XML document containing custom localization data
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{document($custom-localization-file)}
Packit e4b6da
Packit e4b6da
This parameter specifies the XML document
Packit e4b6da
that describes text translations (and other locale-specific information)
Packit e4b6da
that is needed by the stylesheet to process the DocBook document.
Packit e4b6da
Packit e4b6da
This parameter is internal to the stylesheet.
Packit e4b6da
To point to an external XML document with a URI or a file name, 
Packit e4b6da
you should use the @code{custom-localization-file}
Packit e4b6da
parameter instead.
Packit e4b6da
Packit e4b6da
However, inside a custom stylesheet 
Packit e4b6da
(@emph{not on the command-line})
Packit e4b6da
this paramter can be set to the XPath expression
Packit e4b6da
@samp{document('')},
Packit e4b6da
which will cause the custom translations 
Packit e4b6da
directly embedded inside the custom stylesheet to be read.
Packit e4b6da
Packit e4b6da
@item @code{author-othername-in-middle}
Packit e4b6da
@strong{Brief. } Is @code{othername} in @code{author} a
Packit e4b6da
middle name?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1}
Packit e4b6da
Packit e4b6da
If true, the @code{othername} of an @code{author}
Packit e4b6da
appears between the @code{firstname} and
Packit e4b6da
@code{surname}.  Otherwise, @code{othername}
Packit e4b6da
is suppressed.
Packit e4b6da
@end table
Packit e4b6da
@subheading Examples
Packit e4b6da
@cindex example usage
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ docbook2man --solinks manpages.xml
Packit e4b6da
$ docbook2man --solinks --encoding=utf-8//TRANSLIT manpages.xml
Packit e4b6da
$ docbook2man --string-param header-4="Free Recode 3.6" document.xml
Packit e4b6da
@end example
Packit e4b6da
@subheading Limitations
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Internally there is one long pipeline of programs which your 
Packit e4b6da
document goes through.  If any segment of the pipeline fails
Packit e4b6da
(even trivially, like from mistyped program options), 
Packit e4b6da
the resulting errors can be difficult to decipher ---
Packit e4b6da
in this case, try running the components of docbook2X
Packit e4b6da
separately.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node db2x_manxml, , docbook2man wrapper script, Converting to man pages
Packit e4b6da
@section @code{db2x_manxml}
Packit e4b6da
@cindex man pages
Packit e4b6da
@cindex converting to man pages
Packit e4b6da
@cindex Man-XML
Packit e4b6da
@cindex stub pages
Packit e4b6da
@cindex symbolic links
Packit e4b6da
@cindex encoding
Packit e4b6da
@cindex output directory
Packit e4b6da
@cindex @code{db2x_manxml}
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{db2x_manxml} --- Make man pages from Man-XML
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{db2x_manxml [options] [xml-document]}
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
Packit e4b6da
@code{db2x_manxml} converts a Man-XML document into one or 
Packit e4b6da
more man pages.  They are written in the current directory.
Packit e4b6da
Packit e4b6da
If @var{xml-document} is not given, then the document
Packit e4b6da
to convert is read from standard input.  
Packit e4b6da
@subheading Options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--encoding=@var{encoding}}
Packit e4b6da
Select the character encoding used for the output files.
Packit e4b6da
The available encodings are those of 
Packit e4b6da
iconv(1). 
Packit e4b6da
The default encoding is @samp{us-ascii}.  
Packit e4b6da
Packit e4b6da
The XML source may contain characters that are not representable in the encoding that
Packit e4b6da
you select;  in this case the program will bomb out during processing, and you should 
Packit e4b6da
choose another encoding.
Packit e4b6da
(This is guaranteed not to happen with any Unicode encoding such as 
Packit e4b6da
UTF-8, but unfortunately not everyone is able to 
Packit e4b6da
process Unicode texts.)
Packit e4b6da
Packit e4b6da
If you are using GNU's version of 
Packit e4b6da
iconv(1), you can affix 
Packit e4b6da
@samp{//TRANSLIT} to the end of the encoding name
Packit e4b6da
to attempt transliterations of any unconvertible characters in the output.
Packit e4b6da
Beware, however, that the really inconvertible characters will be turned
Packit e4b6da
into another of those damned question marks.  (Aren't you sick of this?)
Packit e4b6da
Packit e4b6da
The suffix @samp{//TRANSLIT} applied
Packit e4b6da
to a Unicode encoding --- in particular, @samp{utf-8//TRANSLIT} ---
Packit e4b6da
means that the output files are to remain in Unicode,
Packit e4b6da
but markup-level character translations using @code{utf8trans} 
Packit e4b6da
are still to be done.  So in most cases, an English-language
Packit e4b6da
document, converted using 
Packit e4b6da
@code{--encoding=@samp{utf-8//TRANSLIT}}
Packit e4b6da
will actually end up as a US-ASCII document,
Packit e4b6da
but any untranslatable characters 
Packit e4b6da
will remain as UTF-8 without any warning whatsoever.
Packit e4b6da
(Note: strictly speaking this is not ``transliteration''.)
Packit e4b6da
This method of conversion is a compromise over strict
Packit e4b6da
@code{--encoding=@samp{us-ascii}}
Packit e4b6da
processing, which aborts if any untranslatable characters are 
Packit e4b6da
encountered.
Packit e4b6da
Packit e4b6da
Note that man pages and Texinfo documents 
Packit e4b6da
in non-ASCII encodings (including UTF-8)
Packit e4b6da
may not be portable to older (non-internationalized) systems,
Packit e4b6da
which is why the default value for this option is 
Packit e4b6da
@samp{us-ascii}.
Packit e4b6da
Packit e4b6da
To suppress any automatic character mapping or encoding conversion
Packit e4b6da
whatsoever, pass the option 
Packit e4b6da
@code{--encoding=@samp{utf-8}}.
Packit e4b6da
Packit e4b6da
@item @code{--list-files}
Packit e4b6da
Write a list of all the output files to standard output,
Packit e4b6da
in addition to normal processing.
Packit e4b6da
Packit e4b6da
@item @code{--output-dir=@var{dir}}
Packit e4b6da
Specify the directory where the output files are placed.
Packit e4b6da
The default is the current working directory.
Packit e4b6da
Packit e4b6da
This option is ignored if the output is to be written
Packit e4b6da
to standard output (triggered by the 
Packit e4b6da
option @code{--to-stdout}).
Packit e4b6da
Packit e4b6da
@item @code{--to-stdout}
Packit e4b6da
Write the output to standard output instead of to individual files.
Packit e4b6da
Packit e4b6da
If this option is used even when there are supposed to be multiple
Packit e4b6da
output documents, then everything is concatenated to standard output.
Packit e4b6da
But beware that most other programs will not accept this concatenated
Packit e4b6da
output.
Packit e4b6da
Packit e4b6da
This option is incompatible with @code{--list-files},
Packit e4b6da
obviously.
Packit e4b6da
Packit e4b6da
@item @code{--help}
Packit e4b6da
Show brief usage information and exit.
Packit e4b6da
Packit e4b6da
@item @code{--version}
Packit e4b6da
Show version and exit.
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
Some man pages may be referenced under two or more
Packit e4b6da
names, instead of just one.  For example, 
Packit e4b6da
strcpy(3)
Packit e4b6da
and
Packit e4b6da
strncpy(3)
Packit e4b6da
often point to the same man page which describes the two functions together.
Packit e4b6da
Choose one of the following options to select
Packit e4b6da
how such man pages are to be generated:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--symlinks}
Packit e4b6da
For each of all the alternate names for a man page,
Packit e4b6da
erect symbolic links to the file that contains the real man page content.
Packit e4b6da
Packit e4b6da
@item @code{--solinks}
Packit e4b6da
Generate stub pages (using @samp{.so} roff requests)
Packit e4b6da
for the alternate names, pointing them to the real man page content.
Packit e4b6da
Packit e4b6da
@item @code{--no-links}
Packit e4b6da
Do not make any alternative names available.
Packit e4b6da
The man page can only be referenced under its principal name.
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
This program uses certain other programs for its operation.
Packit e4b6da
If they are not in their default installed locations, then use
Packit e4b6da
the following options to set their location:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--utf8trans-program=@var{path}}
Packit e4b6da
@itemx @code{--utf8trans-map=@var{charmap}}
Packit e4b6da
Use the character map @var{charmap}
Packit e4b6da
with the @ref{utf8trans,,@code{utf8trans}} program, included with docbook2X, found
Packit e4b6da
under @var{path}.
Packit e4b6da
Packit e4b6da
@item @code{--iconv-program=@var{path}}
Packit e4b6da
The location of the 
Packit e4b6da
iconv(1) program, used for encoding
Packit e4b6da
conversions.
Packit e4b6da
@end table
Packit e4b6da
@subheading Notes
Packit e4b6da
@cindex @code{groff}
Packit e4b6da
@cindex compatibility
Packit e4b6da
Packit e4b6da
The man pages produced should be compatible
Packit e4b6da
with most troff implementations and other tools
Packit e4b6da
that process man pages.
Packit e4b6da
Some backwards-compatible 
Packit e4b6da
groff(1) extensions
Packit e4b6da
are used to make the output look nicer.
Packit e4b6da
@subheading See Also
Packit e4b6da
Packit e4b6da
The input to @code{db2x_manxml} is defined by the XML DTD
Packit e4b6da
present at @file{dtd/Man-XML} in the docbook2X
Packit e4b6da
distribution.
Packit e4b6da
Packit e4b6da
@node Converting to Texinfo, The XSLT stylesheets, Converting to man pages, Top
Packit e4b6da
@chapter Converting to Texinfo
Packit e4b6da
@cindex Texinfo
Packit e4b6da
@cindex converting to Texinfo
Packit e4b6da
@cindex XSLT stylesheets
Packit e4b6da
@cindex Texi-XML
Packit e4b6da
Packit e4b6da
DocBook documents are converted to Texinfo in two steps:
Packit e4b6da
Packit e4b6da
@enumerate 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The DocBook source is converted by a XSLT stylesheet into an intermediate
Packit e4b6da
XML format, Texi-XML.
Packit e4b6da
Packit e4b6da
Texi-XML is simpler than DocBook and closer to the Texinfo format;
Packit e4b6da
it is intended to make the stylesheets' job easier.
Packit e4b6da
Packit e4b6da
The stylesheet for this purpose is in
Packit e4b6da
@file{xslt/texi/docbook.xsl}.
Packit e4b6da
For portability, it should always be referred to
Packit e4b6da
by the following URI:
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
http://docbook2x.sourceforge.net/latest/xslt/texi/docbook.xsl
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
Run this stylesheet with @ref{db2x_xsltproc,,@code{db2x_xsltproc}}.
Packit e4b6da
@cindex customizing
Packit e4b6da
Packit e4b6da
@strong{Customizing. } 
Packit e4b6da
You can also customize the output by
Packit e4b6da
creating your own XSLT stylesheet ---
Packit e4b6da
changing parameters or adding new templates ---
Packit e4b6da
and importing @file{xslt/texi/docbook.xsl}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Texi-XML is converted to the actual Texinfo files by @ref{db2x_texixml,,@code{db2x_texixml}}.
Packit e4b6da
@end enumerate
Packit e4b6da
Packit e4b6da
The @code{docbook2texi} (@pxref{docbook2texi wrapper script}) command does both steps automatically,
Packit e4b6da
but if any problems occur, you can see the errors more clearly
Packit e4b6da
if you do each step separately:
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ db2x_xsltproc -s texi mydoc.xml -o mydoc.txml
Packit e4b6da
$ db2x_texixml mydoc.txml
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
Options to the conversion stylesheet are described
Packit e4b6da
in @ref{Top,,the Texinfo stylesheets reference,docbook2texi-xslt,docbook2X Texinfo Stylesheets Reference}.
Packit e4b6da
Packit e4b6da
@menu
Packit e4b6da
* docbook2texi: docbook2texi wrapper script.   Convert DocBook to Texinfo
Packit e4b6da
* db2x_texixml::                Make Texinfo files from Texi-XML
Packit e4b6da
@end menu
Packit e4b6da
Packit e4b6da
@node docbook2texi wrapper script, db2x_texixml, , Converting to Texinfo
Packit e4b6da
@section docbook2texi
Packit e4b6da
@cindex Texinfo
Packit e4b6da
@cindex converting to Texinfo
Packit e4b6da
@cindex wrapper script
Packit e4b6da
@cindex @code{docbook2texi}
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{docbook2texi} --- Convert DocBook to Texinfo
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{docbook2texi [options]  xml-document }
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
Packit e4b6da
@code{docbook2texi} converts the given 
Packit e4b6da
DocBook XML document into one or more Texinfo documents.
Packit e4b6da
By default, these Texinfo documents will be output to the current
Packit e4b6da
directory.
Packit e4b6da
Packit e4b6da
The @code{docbook2texi} command is a wrapper script
Packit e4b6da
for a two-step conversion process.
Packit e4b6da
@subheading Options
Packit e4b6da
Packit e4b6da
The available options are essentially the union of the options
Packit e4b6da
for @ref{db2x_xsltproc,,@code{db2x_xsltproc}} and @ref{db2x_texixml,,@code{db2x_texixml}}.
Packit e4b6da
Packit e4b6da
Some commonly-used options are listed below:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--encoding=@var{encoding}}
Packit e4b6da
Sets the character encoding of the output.
Packit e4b6da
Packit e4b6da
@item @code{--string-param @var{parameter}=@var{value}}
Packit e4b6da
Sets a stylesheet parameter (options that affect how the output looks).
Packit e4b6da
See ``Stylesheet parameters'' below for the parameters that
Packit e4b6da
can be set.
Packit e4b6da
Packit e4b6da
@item @code{--sgml}
Packit e4b6da
Accept an SGML source document as input instead of XML.
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Stylesheet parameters
Packit e4b6da
@cindex stylesheet parameters
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{captions-display-as-headings}
Packit e4b6da
@strong{Brief. } Use heading markup for minor captions?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{0} (boolean false)
Packit e4b6da
Packit e4b6da
If true, @code{title}
Packit e4b6da
content in some (formal) objects are rendered with the Texinfo
Packit e4b6da
@code{@@@var{heading}} commands.
Packit e4b6da
Packit e4b6da
If false, captions are rendered as an emphasized paragraph.
Packit e4b6da
Packit e4b6da
@item @code{links-use-pxref}
Packit e4b6da
@strong{Brief. } Translate @code{link} using
Packit e4b6da
@code{@@pxref}
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, @code{link} is translated
Packit e4b6da
with the hypertext followed by the cross reference in parentheses.
Packit e4b6da
Packit e4b6da
Otherwise, the hypertext content serves as the cross-reference name
Packit e4b6da
marked up using @code{@@ref}.  Typically info displays this
Packit e4b6da
contruct badly.
Packit e4b6da
Packit e4b6da
@item @code{explicit-node-names}
Packit e4b6da
@strong{Brief. } Insist on manually constructed Texinfo node
Packit e4b6da
names
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{0} (boolean false)
Packit e4b6da
Packit e4b6da
Elements in the source document can influence the Texinfo node name
Packit e4b6da
generation specifying either a @code{xreflabel}, or for the sectioning elements,
Packit e4b6da
a @code{title} with @code{role='texinfo-node'} in the 
Packit e4b6da
@code{@var{*}info} container.
Packit e4b6da
Packit e4b6da
However, for the majority of source documents, explicit Texinfo node
Packit e4b6da
names are not available, and the stylesheet tries to generate a
Packit e4b6da
reasonable one instead, e.g. from the normal title of an element.  
Packit e4b6da
The generated name may not be optimal.  If this option is set and the
Packit e4b6da
stylesheet needs to generate a name, a warning is emitted and 
Packit e4b6da
@code{generate-id} is always used for the name.
Packit e4b6da
Packit e4b6da
When the hashtable extension is not available, the stylesheet cannot
Packit e4b6da
check for node name collisions, and in this case, setting this option
Packit e4b6da
and using explicit node names are recommended.  
Packit e4b6da
Packit e4b6da
This option is not set (i.e. false) by default.
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@strong{Note}
Packit e4b6da
Packit e4b6da
The absolute fallback for generating node names is using the XSLT
Packit e4b6da
function @code{generate-id}, and the stylesheet always
Packit e4b6da
emits a warning in this case regardless of the setting of
Packit e4b6da
@code{explicit-node-names}.
Packit e4b6da
@end quotation
Packit e4b6da
Packit e4b6da
@item @code{show-comments}
Packit e4b6da
@strong{Brief. } Display @code{comment} elements?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, comments will be displayed, otherwise they are suppressed.
Packit e4b6da
Comments here refers to the @code{comment} element,
Packit e4b6da
which will be renamed @code{remark} in DocBook V4.0,
Packit e4b6da
not XML comments (<-- like this -->) which are unavailable.
Packit e4b6da
Packit e4b6da
@item @code{funcsynopsis-decoration}
Packit e4b6da
@strong{Brief. } Decorate elements of a FuncSynopsis?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, elements of the FuncSynopsis will be decorated (e.g. bold or
Packit e4b6da
italic).  The decoration is controlled by functions that can be redefined
Packit e4b6da
in a customization layer.
Packit e4b6da
Packit e4b6da
@item @code{function-parens}
Packit e4b6da
@strong{Brief. } Generate parentheses after a function?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{0} (boolean false)
Packit e4b6da
Packit e4b6da
If true, the formatting of
Packit e4b6da
a @code{<function>} element will include
Packit e4b6da
generated parenthesis.
Packit e4b6da
Packit e4b6da
@item @code{refentry-display-name}
Packit e4b6da
@strong{Brief. } Output NAME header before 'RefName'(s)?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, a "NAME" section title is output before the list
Packit e4b6da
of 'RefName's.
Packit e4b6da
Packit e4b6da
@item @code{manvolnum-in-xref}
Packit e4b6da
@strong{Brief. } Output @code{manvolnum} as part of
Packit e4b6da
@code{refentry} cross-reference?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
if true, the @code{manvolnum} is used when cross-referencing
Packit e4b6da
@code{refentry}s, either with @code{xref}
Packit e4b6da
or @code{citerefentry}.
Packit e4b6da
Packit e4b6da
@item @code{prefer-textobjects}
Packit e4b6da
@strong{Brief. } Prefer @code{textobject}
Packit e4b6da
over @code{imageobject}?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, the 
Packit e4b6da
@code{textobject}
Packit e4b6da
in a @code{mediaobject}
Packit e4b6da
is preferred over any
Packit e4b6da
@code{imageobject}.
Packit e4b6da
Packit e4b6da
(Of course, for output formats other than Texinfo, you usually
Packit e4b6da
want to prefer the @code{imageobject},
Packit e4b6da
but Info is a text-only format.)
Packit e4b6da
Packit e4b6da
In addition to the values true and false, this parameter
Packit e4b6da
may be set to @samp{2} to indicate that
Packit e4b6da
both the text and the images should be output.
Packit e4b6da
You may want to do this because some Texinfo viewers
Packit e4b6da
can read images.  Note that the Texinfo @code{@@image}
Packit e4b6da
command has its own mechanism for switching between text
Packit e4b6da
and image output --- but we do not use this here.
Packit e4b6da
Packit e4b6da
The default is true.
Packit e4b6da
Packit e4b6da
@item @code{semantic-decorations}
Packit e4b6da
@strong{Brief. } Use Texinfo semantic inline markup?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1} (boolean true)
Packit e4b6da
Packit e4b6da
If true, the semantic inline markup of DocBook is translated into
Packit e4b6da
(the closest) Texinfo equivalent.  This is the default.
Packit e4b6da
Packit e4b6da
However, because the Info format is limited to plain text,
Packit e4b6da
the semantic inline markup is often distinguished by using 
Packit e4b6da
explicit quotes, which may not look good.  
Packit e4b6da
You can set this option to false to suppress these.
Packit e4b6da
(For finer control over the inline formatting, you can
Packit e4b6da
use your own stylesheet.)
Packit e4b6da
Packit e4b6da
@item @code{custom-localization-file}
Packit e4b6da
@strong{Brief. } URI of XML document containing custom localization data
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
Packit e4b6da
This parameter specifies the URI of a XML document
Packit e4b6da
that describes text translations (and other locale-specific information)
Packit e4b6da
that is needed by the stylesheet to process the DocBook document.
Packit e4b6da
Packit e4b6da
The text translations pointed to by this parameter always
Packit e4b6da
override the default text translations 
Packit e4b6da
(from the internal parameter @code{localization-file}).
Packit e4b6da
If a particular translation is not present here,
Packit e4b6da
the corresponding default translation 
Packit e4b6da
is used as a fallback.
Packit e4b6da
Packit e4b6da
This parameter is primarily for changing certain
Packit e4b6da
punctuation characters used in formatting the source document.
Packit e4b6da
The settings for punctuation characters are often specific
Packit e4b6da
to the source document, but can also be dependent on the locale.
Packit e4b6da
Packit e4b6da
To not use custom text translations, leave this parameter 
Packit e4b6da
as the empty string.
Packit e4b6da
Packit e4b6da
@item @code{custom-l10n-data}
Packit e4b6da
@strong{Brief. } XML document containing custom localization data
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{document($custom-localization-file)}
Packit e4b6da
Packit e4b6da
This parameter specifies the XML document
Packit e4b6da
that describes text translations (and other locale-specific information)
Packit e4b6da
that is needed by the stylesheet to process the DocBook document.
Packit e4b6da
Packit e4b6da
This parameter is internal to the stylesheet.
Packit e4b6da
To point to an external XML document with a URI or a file name, 
Packit e4b6da
you should use the @code{custom-localization-file}
Packit e4b6da
parameter instead.
Packit e4b6da
Packit e4b6da
However, inside a custom stylesheet 
Packit e4b6da
(@emph{not on the command-line})
Packit e4b6da
this paramter can be set to the XPath expression
Packit e4b6da
@samp{document('')},
Packit e4b6da
which will cause the custom translations 
Packit e4b6da
directly embedded inside the custom stylesheet to be read.
Packit e4b6da
Packit e4b6da
@item @code{author-othername-in-middle}
Packit e4b6da
@strong{Brief. } Is @code{othername} in @code{author} a
Packit e4b6da
middle name?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{1}
Packit e4b6da
Packit e4b6da
If true, the @code{othername} of an @code{author}
Packit e4b6da
appears between the @code{firstname} and
Packit e4b6da
@code{surname}.  Otherwise, @code{othername}
Packit e4b6da
is suppressed.
Packit e4b6da
Packit e4b6da
@item @code{output-file}
Packit e4b6da
@strong{Brief. } Name of the Info file
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
@cindex Texinfo metadata
Packit e4b6da
Packit e4b6da
This parameter specifies the name of the final Info file,
Packit e4b6da
overriding the setting in the document itself and the automatic
Packit e4b6da
selection in the stylesheet.  If the document is a @code{set}, this parameter has no effect. 
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@strong{Important}
Packit e4b6da
Packit e4b6da
Do @emph{not} include the @samp{.info}
Packit e4b6da
extension in the name.
Packit e4b6da
@end quotation
Packit e4b6da
Packit e4b6da
(Note that this parameter has nothing to do with the name of
Packit e4b6da
the @emph{Texi-XML output} by the XSLT processor you 
Packit e4b6da
are running this stylesheet from.)
Packit e4b6da
Packit e4b6da
@item @code{directory-category}
Packit e4b6da
@strong{Brief. } The categorization of the document in the Info directory
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
@cindex Texinfo metadata
Packit e4b6da
Packit e4b6da
This is set to the category that the document
Packit e4b6da
should go under in the Info directory of installed Info files.
Packit e4b6da
For example, @samp{General Commands}.
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@strong{Note}
Packit e4b6da
Packit e4b6da
Categories may also be set directly in the source document.
Packit e4b6da
But if this parameter is not empty, then it always overrides the 
Packit e4b6da
setting in the source document.
Packit e4b6da
@end quotation
Packit e4b6da
Packit e4b6da
@item @code{directory-description}
Packit e4b6da
@strong{Brief. } The description of the document in the Info directory
Packit e4b6da
Packit e4b6da
@strong{Default setting. } (blank)
Packit e4b6da
@cindex Texinfo metadata
Packit e4b6da
Packit e4b6da
This is a short description of the document that appears in
Packit e4b6da
the Info directory of installed Info files.
Packit e4b6da
For example, @samp{An Interactive Plotting Program.}
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@strong{Note}
Packit e4b6da
Packit e4b6da
Menu descriptions may also be set directly in the source document.
Packit e4b6da
But if this parameter is not empty, then it always overrides the 
Packit e4b6da
setting in the source document.
Packit e4b6da
@end quotation
Packit e4b6da
Packit e4b6da
@item @code{index-category}
Packit e4b6da
@strong{Brief. } The Texinfo index to use
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{cp}
Packit e4b6da
Packit e4b6da
The Texinfo index for @code{indexterm}
Packit e4b6da
and @code{index} is specified using the
Packit e4b6da
@code{role} attribute.  If the above
Packit e4b6da
elements do not have a @code{role}, then
Packit e4b6da
the default specified by this parameter is used.
Packit e4b6da
Packit e4b6da
The predefined indices are:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @samp{c}
Packit e4b6da
@itemx @samp{cp}
Packit e4b6da
Concept index
Packit e4b6da
Packit e4b6da
@item @samp{f}
Packit e4b6da
@itemx @samp{fn}
Packit e4b6da
Function index
Packit e4b6da
Packit e4b6da
@item @samp{v}
Packit e4b6da
@itemx @samp{vr}
Packit e4b6da
Variable index
Packit e4b6da
Packit e4b6da
@item @samp{k}
Packit e4b6da
@itemx @samp{ky}
Packit e4b6da
Keystroke index
Packit e4b6da
Packit e4b6da
@item @samp{p}
Packit e4b6da
@itemx @samp{pg}
Packit e4b6da
Program index
Packit e4b6da
Packit e4b6da
@item @samp{d}
Packit e4b6da
@itemx @samp{tp}
Packit e4b6da
Data type index
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
@noindent
Packit e4b6da
User-defined indices are not yet supported.
Packit e4b6da
Packit e4b6da
@item @code{qanda-defaultlabel}
Packit e4b6da
@strong{Brief. } Sets the default for defaultlabel on QandASet.
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{}
Packit e4b6da
Packit e4b6da
If no defaultlabel attribute is specified on a QandASet, this
Packit e4b6da
value is used. It must be one of the legal values for the defaultlabel
Packit e4b6da
attribute.
Packit e4b6da
Packit e4b6da
@item @code{qandaset-generate-toc}
Packit e4b6da
@strong{Brief. } Is a Table of Contents created for QandASets?
Packit e4b6da
Packit e4b6da
@strong{Default setting. } @samp{}
Packit e4b6da
Packit e4b6da
If true, a ToC is constructed for QandASets.
Packit e4b6da
@end table
Packit e4b6da
@subheading Examples
Packit e4b6da
@cindex example usage
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ docbook2texi tdg.xml
Packit e4b6da
$ docbook2texi --encoding=utf-8//TRANSLIT tdg.xml
Packit e4b6da
$ docbook2texi --string-param semantic-decorations=0 tdg.xml
Packit e4b6da
@end example
Packit e4b6da
@subheading Limitations
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Internally there is one long pipeline of programs which your 
Packit e4b6da
document goes through.  If any segment of the pipeline fails
Packit e4b6da
(even trivially, like from mistyped program options), 
Packit e4b6da
the resulting errors can be difficult to decipher ---
Packit e4b6da
in this case, try running the components of docbook2X
Packit e4b6da
separately.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node db2x_texixml, , docbook2texi wrapper script, Converting to Texinfo
Packit e4b6da
@section @code{db2x_texixml}
Packit e4b6da
@cindex Texinfo
Packit e4b6da
@cindex converting to Texinfo
Packit e4b6da
@cindex Texi-XML
Packit e4b6da
@cindex encoding
Packit e4b6da
@cindex output directory
Packit e4b6da
@cindex @code{makeinfo}
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{db2x_texixml} --- Make Texinfo files from Texi-XML
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{db2x_texixml [options]@dots{} [xml-document]}
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
Packit e4b6da
@code{db2x_texixml} converts a Texi-XML document into one or 
Packit e4b6da
more Texinfo documents.
Packit e4b6da
Packit e4b6da
If @var{xml-document} is not given, then the document
Packit e4b6da
to convert comes from standard input.  
Packit e4b6da
Packit e4b6da
The filenames of the Texinfo documents are determined by markup in the
Packit e4b6da
Texi-XML source.  (If the filenames are not specified in the markup,
Packit e4b6da
then @code{db2x_texixml} attempts to deduce them from the name of the input
Packit e4b6da
file.  However, the Texi-XML source should specify the filename, because
Packit e4b6da
it does not work when there are multiple output files or when the
Packit e4b6da
Texi-XML source comes from standard input.)
Packit e4b6da
@subheading Options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--encoding=@var{encoding}}
Packit e4b6da
Select the character encoding used for the output files.
Packit e4b6da
The available encodings are those of 
Packit e4b6da
iconv(1). 
Packit e4b6da
The default encoding is @samp{us-ascii}.  
Packit e4b6da
Packit e4b6da
The XML source may contain characters that are not representable in the encoding that
Packit e4b6da
you select;  in this case the program will bomb out during processing, and you should 
Packit e4b6da
choose another encoding.
Packit e4b6da
(This is guaranteed not to happen with any Unicode encoding such as 
Packit e4b6da
UTF-8, but unfortunately not everyone is able to 
Packit e4b6da
process Unicode texts.)
Packit e4b6da
Packit e4b6da
If you are using GNU's version of 
Packit e4b6da
iconv(1), you can affix 
Packit e4b6da
@samp{//TRANSLIT} to the end of the encoding name
Packit e4b6da
to attempt transliterations of any unconvertible characters in the output.
Packit e4b6da
Beware, however, that the really inconvertible characters will be turned
Packit e4b6da
into another of those damned question marks.  (Aren't you sick of this?)
Packit e4b6da
Packit e4b6da
The suffix @samp{//TRANSLIT} applied
Packit e4b6da
to a Unicode encoding --- in particular, @samp{utf-8//TRANSLIT} ---
Packit e4b6da
means that the output files are to remain in Unicode,
Packit e4b6da
but markup-level character translations using @code{utf8trans} 
Packit e4b6da
are still to be done.  So in most cases, an English-language
Packit e4b6da
document, converted using 
Packit e4b6da
@code{--encoding=@samp{utf-8//TRANSLIT}}
Packit e4b6da
will actually end up as a US-ASCII document,
Packit e4b6da
but any untranslatable characters 
Packit e4b6da
will remain as UTF-8 without any warning whatsoever.
Packit e4b6da
(Note: strictly speaking this is not ``transliteration''.)
Packit e4b6da
This method of conversion is a compromise over strict
Packit e4b6da
@code{--encoding=@samp{us-ascii}}
Packit e4b6da
processing, which aborts if any untranslatable characters are 
Packit e4b6da
encountered.
Packit e4b6da
Packit e4b6da
Note that man pages and Texinfo documents 
Packit e4b6da
in non-ASCII encodings (including UTF-8)
Packit e4b6da
may not be portable to older (non-internationalized) systems,
Packit e4b6da
which is why the default value for this option is 
Packit e4b6da
@samp{us-ascii}.
Packit e4b6da
Packit e4b6da
To suppress any automatic character mapping or encoding conversion
Packit e4b6da
whatsoever, pass the option 
Packit e4b6da
@code{--encoding=@samp{utf-8}}.
Packit e4b6da
Packit e4b6da
@item @code{--list-files}
Packit e4b6da
Write a list of all the output files to standard output,
Packit e4b6da
in addition to normal processing.
Packit e4b6da
Packit e4b6da
@item @code{--output-dir=@var{dir}}
Packit e4b6da
Specify the directory where the output files are placed.
Packit e4b6da
The default is the current working directory.
Packit e4b6da
Packit e4b6da
This option is ignored if the output is to be written
Packit e4b6da
to standard output (triggered by the 
Packit e4b6da
option @code{--to-stdout}).
Packit e4b6da
Packit e4b6da
@item @code{--to-stdout}
Packit e4b6da
Write the output to standard output instead of to individual files.
Packit e4b6da
Packit e4b6da
If this option is used even when there are supposed to be multiple
Packit e4b6da
output documents, then everything is concatenated to standard output.
Packit e4b6da
But beware that most other programs will not accept this concatenated
Packit e4b6da
output.
Packit e4b6da
Packit e4b6da
This option is incompatible with @code{--list-files},
Packit e4b6da
obviously.
Packit e4b6da
Packit e4b6da
@item @code{--info}
Packit e4b6da
Pipe the Texinfo output to 
Packit e4b6da
makeinfo(1),
Packit e4b6da
creating Info files directly instead of
Packit e4b6da
Texinfo files.
Packit e4b6da
Packit e4b6da
@item @code{--plaintext}
Packit e4b6da
Pipe the Texinfo output to @code{makeinfo
Packit e4b6da
@code{--no-headers}}, thereby creating
Packit e4b6da
plain text files.
Packit e4b6da
Packit e4b6da
@item @code{--help}
Packit e4b6da
Show brief usage information and exit.
Packit e4b6da
Packit e4b6da
@item @code{--version}
Packit e4b6da
Show version and exit.
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
This program uses certain other programs for its operation.
Packit e4b6da
If they are not in their default installed locations, then use
Packit e4b6da
the following options to set their location:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--utf8trans-program=@var{path}}
Packit e4b6da
@itemx @code{--utf8trans-map=@var{charmap}}
Packit e4b6da
Use the character map @var{charmap}
Packit e4b6da
with the @ref{utf8trans,,@code{utf8trans}} program, included with docbook2X, found
Packit e4b6da
under @var{path}.
Packit e4b6da
Packit e4b6da
@item @code{--iconv-program=@var{path}}
Packit e4b6da
The location of the 
Packit e4b6da
iconv(1) program, used for encoding
Packit e4b6da
conversions.
Packit e4b6da
@end table
Packit e4b6da
@subheading Notes
Packit e4b6da
Packit e4b6da
@strong{Texinfo language compatibility. } 
Packit e4b6da
@cindex compatibility
Packit e4b6da
The Texinfo files generated by @code{db2x_texixml} sometimes require
Packit e4b6da
Texinfo version 4.7 (the latest version) to work properly.
Packit e4b6da
In particular:
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{db2x_texixml} relies on @code{makeinfo}
Packit e4b6da
to automatically add punctuation after a @code{@@ref}
Packit e4b6da
if it it not already there.  Otherwise the hyperlink will 
Packit e4b6da
not work in the Info reader (although
Packit e4b6da
@code{makeinfo} will not emit any error).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The new @code{@@comma@{@}} command is used for commas
Packit e4b6da
(@samp{,}) occurring inside argument lists to 
Packit e4b6da
Texinfo commands, to disambiguate it from the comma used
Packit e4b6da
to separate different arguments.  The only alternative 
Packit e4b6da
otherwise would be to translate @samp{,} to 
Packit e4b6da
@samp{.}
Packit e4b6da
which is obviously undesirable (but earlier docbook2X versions
Packit e4b6da
did this).
Packit e4b6da
Packit e4b6da
If you cannot use version 4.7 of
Packit e4b6da
@code{makeinfo}, you can still use a
Packit e4b6da
@code{sed} script to perform manually the procedure 
Packit e4b6da
just outlined.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@strong{Relation of Texi-XML with the XML output format of @code{makeinfo}. } 
Packit e4b6da
The Texi-XML format used by docbook2X is @emph{different}
Packit e4b6da
and incompatible with the XML format generated by 
Packit e4b6da
makeinfo(1)
Packit e4b6da
with its @code{--xml} option.
Packit e4b6da
This situation arose partly because the Texi-XML format
Packit e4b6da
of docbook2X was designed and implemented independently 
Packit e4b6da
before the appearance
Packit e4b6da
of @code{makeinfo}'s XML format.
Packit e4b6da
Also Texi-XML is very much geared towards being 
Packit e4b6da
@emph{machine-generated from other XML formats},
Packit e4b6da
while there seems to be no non-trivial applications
Packit e4b6da
of @code{makeinfo}'s XML format.
Packit e4b6da
So there is no reason at this point for docbook2X
Packit e4b6da
to adopt @code{makeinfo}'s XML format
Packit e4b6da
in lieu of Texi-XML.
Packit e4b6da
@subheading Bugs
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Text wrapping in menus is utterly broken for non-ASCII text.
Packit e4b6da
It is probably also broken everywhere else in the output, but 
Packit e4b6da
that would be @code{makeinfo}'s fault.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{--list-files} might not work correctly
Packit e4b6da
with @code{--info}.  Specifically, when the output
Packit e4b6da
Info file get too big, @code{makeinfo} will decide
Packit e4b6da
to split it into parts named 
Packit e4b6da
@file{@var{abc}.info-1},
Packit e4b6da
@file{@var{abc}.info-2},
Packit e4b6da
@file{@var{abc}.info-3}, etc.
Packit e4b6da
@code{db2x_texixml} does not know exactly how many of these files
Packit e4b6da
there are, though you can just do an @code{ls} 
Packit e4b6da
to find out.
Packit e4b6da
@end itemize
Packit e4b6da
@subheading See Also
Packit e4b6da
Packit e4b6da
The input to @code{db2x_texixml} is defined by the XML DTD
Packit e4b6da
present at @file{dtd/Texi-XML} in the docbook2X
Packit e4b6da
distribution.
Packit e4b6da
Packit e4b6da
@node The XSLT stylesheets, Character set conversion, Converting to Texinfo, Top
Packit e4b6da
@chapter The XSLT stylesheets
Packit e4b6da
@cindex XSLT processor
Packit e4b6da
@cindex libxslt
Packit e4b6da
@cindex SAXON
Packit e4b6da
@cindex catalog
Packit e4b6da
@cindex @code{db2x_xsltproc}
Packit e4b6da
Packit e4b6da
docbook2X uses a XSLT 1.0 processor to run its stylesheets.
Packit e4b6da
docbook2X comes with a wrapper script,
Packit e4b6da
@ref{db2x_xsltproc,,@code{db2x_xsltproc}}, that invokes the XSLT processor, 
Packit e4b6da
but you can invoke the XSLT processor in any other
Packit e4b6da
way you wish.
Packit e4b6da
Packit e4b6da
The stylesheets are described in
Packit e4b6da
@ref{Top,,the man-pages stylesheets reference,docbook2man-xslt,docbook2X Man-pages Stylesheets Reference}
Packit e4b6da
and @ref{Top,,the Texinfo stylesheets reference,docbook2texi-xslt,docbook2X Texinfo Stylesheets Reference}.
Packit e4b6da
@cindex pure XSLT
Packit e4b6da
@cindex @code{xsltproc}
Packit e4b6da
Packit e4b6da
Pure-XSLT implementations of @code{db2x_manxml}
Packit e4b6da
and @code{db2x_texixml} also exist.  
Packit e4b6da
They may be used as follows (assuming libxslt as the XSLT processor).
Packit e4b6da
@anchor{Convert to man pages using pure-XSLT db2x_manxml}
Packit e4b6da
Packit e4b6da
@strong{Convert to man pages using pure-XSLT db2x_manxml}
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ xsltproc -o mydoc.mxml \
Packit e4b6da
    docbook2X-path/xslt/man/docbook.xsl \
Packit e4b6da
    mydoc.xml
Packit e4b6da
$ xsltproc \
Packit e4b6da
    docbook2X-path/xslt/backend/db2x_manxml.xsl \
Packit e4b6da
    mydoc.mxml
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
@noindent
Packit e4b6da
@anchor{Convert to Texinfo using Pure-XSLT db2x_texixml}
Packit e4b6da
Packit e4b6da
@strong{Convert to Texinfo using Pure-XSLT db2x_texixml}
Packit e4b6da
Packit e4b6da
@example
Packit e4b6da
$ xsltproc -o mydoc.txml \
Packit e4b6da
    docbook2X-path/xslt/texi/docbook.xsl \
Packit e4b6da
    mydoc.xml
Packit e4b6da
$ xsltproc \
Packit e4b6da
    docbook2X-path/xslt/backend/db2x_texixml.xsl \
Packit e4b6da
    mydoc.txml
Packit e4b6da
@end example
Packit e4b6da
Packit e4b6da
Here, 
Packit e4b6da
xsltproc(1) is used instead of @code{db2x_xsltproc}, since
Packit e4b6da
if you are in a situtation where you cannot use the Perl implementation 
Packit e4b6da
of @code{db2x_manxml}, you probably cannot use @code{db2x_xsltproc} either.
Packit e4b6da
Packit e4b6da
If for portability reasons you prefer not to use the file-system path 
Packit e4b6da
to the docbook2X files, you can use the XML catalog
Packit e4b6da
provided in @file{xslt/catalog.xml}
Packit e4b6da
and the global URIs contained therein.
Packit e4b6da
Packit e4b6da
@menu
Packit e4b6da
* db2x_xsltproc::               XSLT processor invocation wrapper
Packit e4b6da
* sgml2xml-isoent::             Convert SGML to XML with support for ISO
Packit e4b6da
                                  entities
Packit e4b6da
@end menu
Packit e4b6da
Packit e4b6da
@node db2x_xsltproc, sgml2xml-isoent, , The XSLT stylesheets
Packit e4b6da
@section @code{db2x_xsltproc}
Packit e4b6da
@cindex XSLT processor
Packit e4b6da
@cindex libxslt
Packit e4b6da
@cindex @code{db2x_xsltproc}
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{db2x_xsltproc} --- XSLT processor invocation wrapper
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{db2x_xsltproc [options]  xml-document }
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
Packit e4b6da
@code{db2x_xsltproc} invokes the XSLT 1.0 processor for docbook2X.
Packit e4b6da
Packit e4b6da
This command applies the XSLT stylesheet 
Packit e4b6da
(usually given by the @code{--stylesheet} option)
Packit e4b6da
to the XML document in the file @var{xml-document}.
Packit e4b6da
The result is written to standard output (unless changed with 
Packit e4b6da
@code{--output}).  
Packit e4b6da
Packit e4b6da
To read the source XML document from standard input,
Packit e4b6da
specify @samp{-} as the input document.
Packit e4b6da
@subheading Options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--version}
Packit e4b6da
Display the docbook2X version.
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Transformation output options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--output @var{file}}
Packit e4b6da
@itemx @code{-o @var{file}}
Packit e4b6da
Write output to the given file (or URI), instead of standard output.
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Source document options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--xinclude}
Packit e4b6da
@itemx @code{-I}
Packit e4b6da
Process XInclude directives in the source document.
Packit e4b6da
Packit e4b6da
@item @code{--sgml}
Packit e4b6da
@itemx @code{-S}
Packit e4b6da
@cindex SGML
Packit e4b6da
Packit e4b6da
Indicate that the input document is SGML instead of XML.
Packit e4b6da
You need this set this option if @var{xml-document}
Packit e4b6da
is actually a SGML file.
Packit e4b6da
Packit e4b6da
SGML parsing is implemented by conversion to XML via 
Packit e4b6da
sgml2xml(1) from the
Packit e4b6da
SP package (or 
Packit e4b6da
osx(1) from the OpenSP package).  All tag names in the
Packit e4b6da
SGML file will be normalized to lowercase (i.e. the @code{-xlower}
Packit e4b6da
option of 
Packit e4b6da
sgml2xml(1) is used).  ID attributes are available
Packit e4b6da
for the stylesheet (i.e. option @code{-xid}).  In addition,
Packit e4b6da
any ISO SDATA entities used in the SGML document are automatically converted
Packit e4b6da
to their XML Unicode equivalents.  (This is done by a
Packit e4b6da
@code{sed} filter.)
Packit e4b6da
Packit e4b6da
The encoding of the SGML document, if it is not
Packit e4b6da
@samp{us-ascii}, must be specified with the standard
Packit e4b6da
SP environment variables: @samp{SP_CHARSET_FIXED=1
Packit e4b6da
SP_ENCODING=@var{encoding}}.
Packit e4b6da
(Note that XML files specify their encoding with the XML declaration
Packit e4b6da
@samp{}
Packit e4b6da
at the top of the file.)
Packit e4b6da
Packit e4b6da
The above conversion options cannot be changed.  If you desire different
Packit e4b6da
conversion options, you should invoke 
Packit e4b6da
sgml2xml(1) manually, and then pass
Packit e4b6da
the results of that conversion to this program.
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Retrieval options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--catalogs @var{catalog-files}}
Packit e4b6da
@itemx @code{-C @var{catalog-files}}
Packit e4b6da
@cindex catalog
Packit e4b6da
Packit e4b6da
Specify additional XML catalogs to use for resolving Formal
Packit e4b6da
Public Identifiers or URIs.  SGML catalogs are not supported.
Packit e4b6da
Packit e4b6da
These catalogs are @emph{not} used for parsing an SGML
Packit e4b6da
document under the @code{--sgml} option.  Use
Packit e4b6da
the environment variable @env{SGML_CATALOG_FILES} instead 
Packit e4b6da
to specify the catalogs for parsing the SGML document.
Packit e4b6da
Packit e4b6da
@item @code{--network}
Packit e4b6da
@itemx @code{-N}
Packit e4b6da
@code{db2x_xsltproc} will normally refuse to load
Packit e4b6da
external resources from the network, for security reasons.  
Packit e4b6da
If you do want to load from the network, set this option.
Packit e4b6da
Packit e4b6da
Usually you want to have installed locally the relevent DTDs and other
Packit e4b6da
files, and set up catalogs for them, rather than load them automatically
Packit e4b6da
from the network.
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Stylesheet options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--stylesheet @var{file}}
Packit e4b6da
@itemx @code{-s @var{file}}
Packit e4b6da
Specify the filename (or URI) of the stylesheet to use.  
Packit e4b6da
The special values @samp{man} and @samp{texi} 
Packit e4b6da
are accepted as abbreviations, to specify that
Packit e4b6da
@var{xml-document} is in DocBook and
Packit e4b6da
should be converted to man pages or Texinfo (respectively).
Packit e4b6da
Packit e4b6da
@item @code{--param @var{name}=@var{expr}}
Packit e4b6da
@itemx @code{-p @var{name}=@var{expr}}
Packit e4b6da
Add or modify a parameter to the stylesheet.
Packit e4b6da
@var{name} is a XSLT parameter name, and
Packit e4b6da
@var{expr} is an XPath expression that evaluates to
Packit e4b6da
the desired value for the parameter.  (This means that strings must be
Packit e4b6da
quoted, @emph{in addition} to the usual quoting of shell
Packit e4b6da
arguments; use @code{--string-param} to avoid this.)
Packit e4b6da
Packit e4b6da
@item @code{--string-param @var{name}=@var{string}}
Packit e4b6da
@itemx @code{-g @var{name}=@var{string}}
Packit e4b6da
Add or modify a string-valued parameter to the stylesheet.
Packit e4b6da
Packit e4b6da
The string must be encoded in UTF-8 (regardless of the locale 
Packit e4b6da
character encoding).
Packit e4b6da
@end table
Packit e4b6da
@subsubheading Debugging and profiling
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{--debug}
Packit e4b6da
@itemx @code{-d}
Packit e4b6da
Display, to standard error, logs of what is happening during the 
Packit e4b6da
XSL transformation.
Packit e4b6da
Packit e4b6da
@item @code{--nesting-limit @var{n}}
Packit e4b6da
@itemx @code{-D @var{n}}
Packit e4b6da
Change the maximum number of nested calls to XSL templates, used to
Packit e4b6da
detect potential infinite loops.  
Packit e4b6da
If not specified, the limit is 500 (libxslt's default).
Packit e4b6da
Packit e4b6da
@item @code{--profile}
Packit e4b6da
@itemx @code{-P}
Packit e4b6da
Display profile information: the total number of calls to each template
Packit e4b6da
in the stylesheet and the time taken for each.  This information is
Packit e4b6da
output to standard error.
Packit e4b6da
Packit e4b6da
@item @code{--xslt-processor @var{processor}}
Packit e4b6da
@itemx @code{-X @var{processor}}
Packit e4b6da
Select the underlying XSLT processor used.  The possible choices for
Packit e4b6da
@var{processor} are: @samp{libxslt}, @samp{saxon}, @samp{xalan-j}.
Packit e4b6da
Packit e4b6da
The default processor is whatever was set when docbook2X was built.
Packit e4b6da
libxslt is recommended (because it is lean and fast),
Packit e4b6da
but SAXON is much more robust and would be more helpful when
Packit e4b6da
debugging stylesheets.
Packit e4b6da
Packit e4b6da
All the processors have XML catalogs support enabled.
Packit e4b6da
(docbook2X requires it.)
Packit e4b6da
But note that not all the options above work with processors
Packit e4b6da
other than the libxslt one.
Packit e4b6da
@end table
Packit e4b6da
@subheading Environment
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @env{XML_CATALOG_FILES}
Packit e4b6da
Specify XML Catalogs.
Packit e4b6da
If not specified, the standard catalog
Packit e4b6da
(@file{/etc/xml/catalog}) is loaded, if available.
Packit e4b6da
Packit e4b6da
@item @env{DB2X_XSLT_PROCESSOR}
Packit e4b6da
Specify the XSLT processor to use.
Packit e4b6da
The effect is the same as the @code{--xslt-processor}
Packit e4b6da
option.  The primary use of this variable is to allow you to quickly 
Packit e4b6da
test different XSLT processors without having to add 
Packit e4b6da
@code{--xslt-processor} to every script or make file in 
Packit e4b6da
your documentation build system.
Packit e4b6da
@end table
Packit e4b6da
@subheading Conforming to
Packit e4b6da
Packit e4b6da
@uref{http://www.w3.org/TR/xslt,XML Stylesheet Language -- Transformations (XSLT)@comma{} version 1.0}, a W3C Recommendation.
Packit e4b6da
@subheading Notes
Packit e4b6da
@cindex XSLT extensions
Packit e4b6da
Packit e4b6da
In its earlier versions (< 0.8.4),
Packit e4b6da
docbook2X required XSLT extensions to run, and
Packit e4b6da
@code{db2x_xsltproc} was a special libxslt-based processor that had these
Packit e4b6da
extensions compiled-in. When the requirement for XSLT extensions
Packit e4b6da
was dropped, @code{db2x_xsltproc} became a Perl script which translates
Packit e4b6da
the options to @code{db2x_xsltproc} to conform to the format accepted by
Packit e4b6da
the stock 
Packit e4b6da
xsltproc(1) which comes with libxslt.
Packit e4b6da
Packit e4b6da
The prime reason for the existence of this script
Packit e4b6da
is backward compatibility with any scripts
Packit e4b6da
or make files that invoke docbook2X.  However,
Packit e4b6da
it also became easy to add in support for invoking
Packit e4b6da
other XSLT processors with a unified command-line interface.
Packit e4b6da
Indeed, there is nothing special in this script to docbook2X, 
Packit e4b6da
or even to DocBook, and it may be used for running other sorts of
Packit e4b6da
stylesheets if you desire.  Certainly the author prefers using this
Packit e4b6da
command, because its invocation format is sane and is easy to 
Packit e4b6da
use.  (e.g. no typing long class names for the Java-based processors!)
Packit e4b6da
@subheading See Also
Packit e4b6da
Packit e4b6da
You may wish to consult the documentation that comes
Packit e4b6da
with libxslt, SAXON, or Xalan.  The W3C XSLT 1.0 specification
Packit e4b6da
would be useful for writing stylesheets.
Packit e4b6da
Packit e4b6da
@node sgml2xml-isoent, , db2x_xsltproc, The XSLT stylesheets
Packit e4b6da
@section @code{sgml2xml-isoent}
Packit e4b6da
@cindex SGML
Packit e4b6da
@cindex ISO entities
Packit e4b6da
@cindex @code{sgml2xml-isoent}
Packit e4b6da
@cindex DocBook
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{sgml2xml-isoent} --- Convert SGML to XML with support for ISO
Packit e4b6da
entities
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{sgml2xml-isoent [sgml-document]}
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
Packit e4b6da
@code{sgml2xml-isoent} converts an SGML document to XML,
Packit e4b6da
with support for the ISO entities.
Packit e4b6da
This is done by using 
Packit e4b6da
sgml2xml(1) from the
Packit e4b6da
SP package (or 
Packit e4b6da
osx(1) from the OpenSP package),
Packit e4b6da
and the declaration for the XML version of the ISO entities
Packit e4b6da
is added to the output.
Packit e4b6da
This means that the output of this conversion
Packit e4b6da
should work as-is with any XML tool.
Packit e4b6da
Packit e4b6da
This program is often used for processing SGML DocBook documents
Packit e4b6da
with XML-based tools.  In particular, @ref{db2x_xsltproc,,@code{db2x_xsltproc}}
Packit e4b6da
calls this program as part of its @code{--sgml}
Packit e4b6da
option.  On the other hand, it is probably not helpful for 
Packit e4b6da
migrating a source SGML text file to XML, since the conversion 
Packit e4b6da
mangles the original formatting.
Packit e4b6da
Packit e4b6da
Since the XML version of the ISO entities 
Packit e4b6da
are referred to directly, not via a DTD, this tool 
Packit e4b6da
also works with document types other than DocBook.
Packit e4b6da
@subheading Notes
Packit e4b6da
Packit e4b6da
The ISO entities are referred using the public identifiers 
Packit e4b6da
@samp{ISO 8879:1986//ENTITIES//@var{@dots{}}//EN//XML}.  
Packit e4b6da
The catalogs used when parsing the converted document should 
Packit e4b6da
resolve these entities to the appropriate place (on the local
Packit e4b6da
filesystem).  If the entities are not resolved in the catalog, 
Packit e4b6da
then the fallback is to get the entity files
Packit e4b6da
from the @samp{http://www.docbook.org/} Web site.
Packit e4b6da
@subheading See Also
Packit e4b6da
Packit e4b6da
sgml2xml(1), 
Packit e4b6da
osx(1)
Packit e4b6da
Packit e4b6da
@node Character set conversion, FAQ, The XSLT stylesheets, Top
Packit e4b6da
@chapter Character set conversion
Packit e4b6da
@cindex character map
Packit e4b6da
@cindex character sets
Packit e4b6da
@cindex charsets
Packit e4b6da
@cindex encoding
Packit e4b6da
@cindex transliteration
Packit e4b6da
@cindex re-encoding
Packit e4b6da
@cindex UTF-8
Packit e4b6da
@cindex Unicode
Packit e4b6da
@cindex @code{utf8trans}
Packit e4b6da
@cindex escapes
Packit e4b6da
@cindex @code{iconv}
Packit e4b6da
Packit e4b6da
When translating XML to legacy ASCII-based formats
Packit e4b6da
with poor support for Unicode, such as man pages and Texinfo,
Packit e4b6da
there is always the problem that Unicode characters in
Packit e4b6da
the source document also have to be translated somehow.
Packit e4b6da
Packit e4b6da
A straightforward character set conversion from Unicode 
Packit e4b6da
does not suffice,
Packit e4b6da
because the target character set, usually US-ASCII or ISO Latin-1,
Packit e4b6da
do not contain common characters such as 
Packit e4b6da
dashes and directional quotation marks that are widely
Packit e4b6da
used in XML documents.  But document formatters (man and Texinfo)
Packit e4b6da
allow such characters to be entered by a markup escape:
Packit e4b6da
for example, @code{\(lq} for the left directional quote 
Packit e4b6da
@samp{``}.
Packit e4b6da
And if a markup-level escape is not available,
Packit e4b6da
an ASCII transliteration might be used: for example,
Packit e4b6da
using the ASCII less-than sign @code{<} for 
Packit e4b6da
the angle quotation mark @code{<}.
Packit e4b6da
Packit e4b6da
So the Unicode character problem can be solved in two steps:
Packit e4b6da
Packit e4b6da
@enumerate 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@ref{utf8trans,,@code{utf8trans}}, a program included in docbook2X, maps
Packit e4b6da
Unicode characters to markup-level escapes or transliterations.
Packit e4b6da
Packit e4b6da
Since there is not necessarily a fixed, official mapping of Unicode characters,
Packit e4b6da
@code{utf8trans} can read in user-modifiable character mappings 
Packit e4b6da
expressed in text files and apply them.  (Unlike most character
Packit e4b6da
set converters.)
Packit e4b6da
Packit e4b6da
In @file{charmaps/man/roff.charmap}
Packit e4b6da
and @file{charmaps/man/texi.charmap}
Packit e4b6da
are character maps that may be used for man-page and Texinfo conversion.
Packit e4b6da
The programs @ref{db2x_manxml,,@code{db2x_manxml}} and @ref{db2x_texixml,,@code{db2x_texixml}} will apply
Packit e4b6da
these character maps, or another character map specified by the user,
Packit e4b6da
automatically.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The rest of the Unicode text is converted to some other character set 
Packit e4b6da
(encoding).
Packit e4b6da
For example, a French document with accented characters 
Packit e4b6da
(such as @samp{@'e}) might be converted to ISO Latin 1.
Packit e4b6da
Packit e4b6da
This step is applied after @code{utf8trans} character mapping,
Packit e4b6da
using the 
Packit e4b6da
iconv(1) encoding conversion tool.
Packit e4b6da
Both @ref{db2x_manxml,,@code{db2x_manxml}} and @ref{db2x_texixml,,@code{db2x_texixml}} can call
Packit e4b6da
iconv(1) automatically when producing their output.
Packit e4b6da
@end enumerate
Packit e4b6da
Packit e4b6da
@menu
Packit e4b6da
* utf8trans::                   Transliterate UTF-8 characters according to
Packit e4b6da
                                  a table
Packit e4b6da
@end menu
Packit e4b6da
Packit e4b6da
@node utf8trans, , , Character set conversion
Packit e4b6da
@section @code{utf8trans}
Packit e4b6da
@cindex character map
Packit e4b6da
@cindex UTF-8
Packit e4b6da
@cindex Unicode
Packit e4b6da
@cindex @code{utf8trans}
Packit e4b6da
@cindex escapes
Packit e4b6da
@cindex transliteration
Packit e4b6da
@subheading Name
Packit e4b6da
Packit e4b6da
@code{utf8trans} --- Transliterate UTF-8 characters according to a table
Packit e4b6da
@subheading Synopsis
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@t{utf8trans  charmap  [file]@dots{}}
Packit e4b6da
@end quotation
Packit e4b6da
@subheading Description
Packit e4b6da
@cindex utf8trans
Packit e4b6da
Packit e4b6da
@code{utf8trans} transliterates characters in the specified files (or 
Packit e4b6da
standard input, if they are not specified) and writes the output to
Packit e4b6da
standard output.  All input and output is in the UTF-8 encoding.  
Packit e4b6da
Packit e4b6da
This program is usually used to render characters in Unicode text files
Packit e4b6da
as some markup escapes or ASCII transliterations.
Packit e4b6da
(It is not intended for general charset conversions.)
Packit e4b6da
It provides functionality similar to the character maps
Packit e4b6da
in XSLT 2.0 (XML Stylesheet Language -- Transformations, version 2.0).
Packit e4b6da
@subheading Options
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @code{-m}
Packit e4b6da
@itemx @code{--modify}
Packit e4b6da
Modifies the given files in-place with their transliterated output,
Packit e4b6da
instead of sending it to standard output.
Packit e4b6da
Packit e4b6da
This option is useful for efficient transliteration of many files
Packit e4b6da
at once.
Packit e4b6da
Packit e4b6da
@item @code{--help}
Packit e4b6da
Show brief usage information and exit.
Packit e4b6da
Packit e4b6da
@item @code{--version}
Packit e4b6da
Show version and exit.
Packit e4b6da
@end table
Packit e4b6da
@subheading Usage
Packit e4b6da
Packit e4b6da
The translation is done according to the rules in the `character
Packit e4b6da
map', named in the file @var{charmap}.  It
Packit e4b6da
has the following format:
Packit e4b6da
Packit e4b6da
@enumerate 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Each line represents a translation entry, except for
Packit e4b6da
blank lines and comment lines, which are ignored.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Any amount of whitespace (space or tab) may precede 
Packit e4b6da
the start of an entry.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Comment lines begin with @samp{#}.
Packit e4b6da
Everything on the same line is ignored.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Each entry consists of the Unicode codepoint of the
Packit e4b6da
character to translate, in hexadecimal, followed
Packit e4b6da
@emph{one} space or tab, followed by the translation
Packit e4b6da
string, up to the end of the line.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The translation string is taken literally, including any
Packit e4b6da
leading and trailing spaces (except the delimeter between the codepoint
Packit e4b6da
and the translation string), and all types of characters.  The newline
Packit e4b6da
at the end is not included.  
Packit e4b6da
@end enumerate
Packit e4b6da
Packit e4b6da
The above format is intended to be restrictive, to keep
Packit e4b6da
@code{utf8trans} simple.  But if a XML-based format is desired,
Packit e4b6da
there is a @file{xmlcharmap2utf8trans} script that 
Packit e4b6da
comes with the docbook2X distribution, that converts character
Packit e4b6da
maps in XSLT 2.0 format to the @code{utf8trans} format.
Packit e4b6da
@subheading Limitations
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{utf8trans} does not work with binary files, because malformed
Packit e4b6da
UTF-8 sequences in the input are substituted with
Packit e4b6da
U+FFFD characters.  However, null characters in the input
Packit e4b6da
are handled correctly. This limitation may be removed in the future.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
There is no way to include a newline or null in the substitution string.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node FAQ, Performance analysis, Character set conversion, Top
Packit e4b6da
@chapter FAQ
Packit e4b6da
@cindex FAQ
Packit e4b6da
@cindex tips
Packit e4b6da
@cindex problems
Packit e4b6da
@cindex bugs
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
I have a SGML DocBook document.  How do I use docbook2X?
Packit e4b6da
@cindex SGML
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
Use the @code{--sgml} option to @code{db2x_xsltproc}.
Packit e4b6da
Packit e4b6da
(Formerly, we described a quite intricate hack here to convert
Packit e4b6da
to SGML to XML while preserving the ISO entities.  That hack
Packit e4b6da
is actually what @code{--sgml} does.)
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
docbook2X bombs with this document!
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
It is probably a bug in docbook2X.  (Assuming that the input
Packit e4b6da
document is valid DocBook in the first place.)  Please file a bug
Packit e4b6da
report.  In it, please include the document which causes
Packit e4b6da
docbook2X to fail, or a pointer to it, or a test case that reproduces
Packit e4b6da
the problem.
Packit e4b6da
Packit e4b6da
I don't want to hear about bugs in obsolete tools (i.e. tools that are
Packit e4b6da
not in the current release of docbook2X.)  I'm sorry, but maintaining all
Packit e4b6da
that is a lot of work that I don't have time for.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
Must I use @code{refentry}
Packit e4b6da
to write my man pages?
Packit e4b6da
@cindex @code{refentry}
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
Under the default settings of docbook2X: yes, you have to.
Packit e4b6da
The contents of the source document
Packit e4b6da
that lie outside of @code{refentry}
Packit e4b6da
elements are probably written in a book/article style
Packit e4b6da
that is usually not suited for the reference style of man pages.
Packit e4b6da
Packit e4b6da
Nevertheless, sometimes you might want to include inside your man page,
Packit e4b6da
(small) snippets or sections of content from other parts of your book
Packit e4b6da
or article.
Packit e4b6da
You can achieve this by using a custom XSLT stylesheet to include
Packit e4b6da
the content manually.
Packit e4b6da
The docbook2X documentation demonstrates this technique:
Packit e4b6da
see the 
Packit e4b6da
docbook2man(1)
Packit e4b6da
and the
Packit e4b6da
docbook2texi(1)
Packit e4b6da
man pages and the stylesheet that produces them
Packit e4b6da
in @file{doc/ss-man.xsl}.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
Where have the SGML-based docbook2X tools gone?
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
They are in a separate package now, docbook2man-sgmlspl.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
I get some @code{iconv} error when converting documents.
Packit e4b6da
@cindex @code{iconv}
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
It's because there is some Unicode character in your document
Packit e4b6da
that docbook2X fails to convert to ASCII or a markup escape (in roff
Packit e4b6da
or Texinfo).  The error message is intentional because it alerts
Packit e4b6da
you to a possible loss of information in your document, although
Packit e4b6da
admittedly it could be less cryptic, but I unfortunately can't control what
Packit e4b6da
@code{iconv} says.
Packit e4b6da
Packit e4b6da
You can look at the partial man or Texinfo output --- the offending
Packit e4b6da
Unicode character should be near the point that the output is
Packit e4b6da
interrupted.  Since you probably wanted that Unicode character
Packit e4b6da
to be there, the way you want to fix this error is to add
Packit e4b6da
a translation for that Unicode character to the @code{utf8trans} character map.
Packit e4b6da
Then use the @code{--utf8trans-map} option to the Perl
Packit e4b6da
docbook2X tools to use your custom character map.
Packit e4b6da
Packit e4b6da
Alternatively, if you want to close your eyes to the utterly broken
Packit e4b6da
Unicode handling in groff and Texinfo, just use the 
Packit e4b6da
@code{--encoding=utf-8} option.
Packit e4b6da
Note that the UTF-8 output is unlikely to display correctly everywhere.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
Texinfo output looks ugly.
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
You have to keep in mind that Info is extremely limited in its
Packit e4b6da
formatting.  Try setting the various parameters to the stylesheet
Packit e4b6da
(see @file{xslt/texi/param.xsl}).
Packit e4b6da
Packit e4b6da
Also, if you look at native Info pages, you will see there is a certain 
Packit e4b6da
structure, that your DocBook document may not adhere to.  There is
Packit e4b6da
really no fix for this.  It is possible, though, to give rendering
Packit e4b6da
hints to the Texinfo stylesheet in your DocBook source, like this this 
Packit e4b6da
manual does. Unfortunately these are not yet documented in a prominent place.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
How do I use SAXON (or Xalan-Java) with docbook2X?
Packit e4b6da
@cindex SAXON
Packit e4b6da
@cindex Xalan-Java
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
Bob Stayton's @i{DocBook XSL: The Complete Guide}
Packit e4b6da
has a nice 
Packit e4b6da
@uref{http://www.sagehill.net/docbookxsl/InstallingAProcessor.html, section on setting up the XSLT processors}.
Packit e4b6da
It talks about Norman Walsh's DocBook XSL stylesheets,
Packit e4b6da
but for docbook2X you only need to change the stylesheet
Packit e4b6da
argument (any file with the extension @file{.xsl}).
Packit e4b6da
Packit e4b6da
If you use the Perl wrapper scripts provided with docbook2X,
Packit e4b6da
you only need to ``install'' the XSLT processors (i.e. for Java, copying 
Packit e4b6da
the @file{*.jar} files to 
Packit e4b6da
@file{/usr/local/share/java}), and you don't
Packit e4b6da
need to do anything else.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
XML catalogs don't work with Xalan-Java.
Packit e4b6da
(Or: Stop connecting to the Internet when running docbook2X!)
Packit e4b6da
@cindex Xalan-Java
Packit e4b6da
@cindex catalog
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
I have no idea why --- XML catalogs with Xalan-Java don't work for me
Packit e4b6da
either, no matter how hard I try.  Just go use SAXON or libxslt instead
Packit e4b6da
(which do work for me at least).
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
I don't like how docbook2X renders this markup.
Packit e4b6da
@cindex rendering
Packit e4b6da
@cindex customizing
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
The XSLT stylesheets are customizable, so assuming you have
Packit e4b6da
knowledge of XSLT, you should be able to change the rendering easily.  
Packit e4b6da
See @file{doc/ss-texinfo.xsl} of docbook2X's own
Packit e4b6da
documentation for a non-trivial example.
Packit e4b6da
Packit e4b6da
If your customizations can be generally useful, I would like to hear
Packit e4b6da
about it.
Packit e4b6da
Packit e4b6da
If you don't want to muck with XSLT, you can still tell me what sort
Packit e4b6da
of features you want.  Maybe other users want them too.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
Does docbook2X support other XML document types
Packit e4b6da
or output formats?
Packit e4b6da
@cindex other output formats
Packit e4b6da
@cindex other document types
Packit e4b6da
@cindex non-DocBook document type
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
No.  But if you want to create code for a new XML document type
Packit e4b6da
or output format, the existing infrastructure of docbook2X may be able
Packit e4b6da
to help you.
Packit e4b6da
Packit e4b6da
For example, if you want to convert a document in the W3C 
Packit e4b6da
spec DTD to Texinfo, you can write a XSLT stylesheet that outputs a 
Packit e4b6da
document conformant to the Texi-XML, and run that through @code{db2x_texixml}
Packit e4b6da
to get your Texinfo pages.  Writing the said XSLT
Packit e4b6da
stylesheet should not be any more difficult than if you were
Packit e4b6da
to write a stylesheet for HTML output, in fact probably even easier.
Packit e4b6da
Packit e4b6da
An alternative approach is to convert the source document
Packit e4b6da
to DocBook first, then apply docbook2X conversion afterwards.
Packit e4b6da
The stylesheet reference documentation in docbook2X uses this technique:
Packit e4b6da
the documentation embedded in the XSLT stylesheets is first extracted
Packit e4b6da
into a DocBook document, then that is converted to Texinfo.
Packit e4b6da
This approach obviously is not ideal if the source
Packit e4b6da
document does not map well into DocBook,
Packit e4b6da
but it does allow you to use the standard DocBook HTML
Packit e4b6da
and XSL-FO stylesheets to format the source document with little effort.
Packit e4b6da
Packit e4b6da
If you want, on the other hand, to get troff output but 
Packit e4b6da
using a different macro set, you will have to rewrite both the
Packit e4b6da
stylesheets and the post-processor (performing the function of
Packit e4b6da
@code{db2x_manxml} but with a different macro set).
Packit e4b6da
In this case some of the code in @code{db2x_manxml} may be reused, and you 
Packit e4b6da
can certainly reuse @code{utf8trans} and the provided roff character maps.
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
@node Performance analysis, How docbook2X is tested, FAQ, Top
Packit e4b6da
@chapter Performance analysis
Packit e4b6da
@cindex speed
Packit e4b6da
@cindex performance
Packit e4b6da
@cindex optimize
Packit e4b6da
@cindex efficiency
Packit e4b6da
Packit e4b6da
The performance of docbook2X,
Packit e4b6da
and most other DocBook tools@footnote{with the notable exception of the 
Packit e4b6da
@uref{http://packages.debian.org/unstable/text/docbook-to-man,docbook-to-man tool}
Packit e4b6da
based on the @code{instant} stream processor
Packit e4b6da
(but this tool has many correctness problems)
Packit e4b6da
}
Packit e4b6da
can be summed up in a short phrase:
Packit e4b6da
@emph{they are slow}.
Packit e4b6da
Packit e4b6da
On a modern computer producing only a few man pages
Packit e4b6da
at a time, 
Packit e4b6da
with the right software --- namely, libxslt as the XSLT processor ---
Packit e4b6da
the DocBook tools are fast enough.
Packit e4b6da
But their slowness becomes a hindrance for
Packit e4b6da
generating hundreds or even thousands of man pages
Packit e4b6da
at a time.
Packit e4b6da
Packit e4b6da
The author of docbook2X encounters this problem
Packit e4b6da
whenever he tries to do automated tests of the docbook2X package.
Packit e4b6da
Presented below are some actual benchmarks, and possible approaches
Packit e4b6da
to efficient DocBook  to man pages conversion.
Packit e4b6da
Packit e4b6da
@strong{docbook2X running times on 2157 
Packit e4b6da
refentry documents}
Packit e4b6da
Packit e4b6da
@multitable @columnfractions 0.333333333333333 0.333333333333333 0.333333333333333
Packit e4b6da
@item
Packit e4b6da
Step@tab Time for all pages@tab Avg. time per page
Packit e4b6da
@item
Packit e4b6da
DocBook to Man-XML@tab 519.61s@tab 0.24s
Packit e4b6da
@item
Packit e4b6da
Man-XML to man-pages@tab 383.04s@tab 0.18s
Packit e4b6da
@item
Packit e4b6da
roff character mapping@tab 6.72s@tab 0.0031s
Packit e4b6da
@item
Packit e4b6da
Total@tab 909.37s@tab 0.42s
Packit e4b6da
@end multitable
Packit e4b6da
Packit e4b6da
The above benchmark was run on 2157 documents 
Packit e4b6da
coming from the @uref{http://www.catb.org/~esr/doclifter/,doclifter} man-page-to-DocBook conversion tool.  The man pages
Packit e4b6da
come from the section 1 man pages installed in the 
Packit e4b6da
author's Linux system.
Packit e4b6da
The XML files total 44.484 MiB, and on average are 20.6KiB long. 
Packit e4b6da
Packit e4b6da
The results were obtained using the test script in 
Packit e4b6da
@file{test/mass/test.pl},
Packit e4b6da
using the default man-page conversion options.
Packit e4b6da
The test script employs the obvious optimizations, 
Packit e4b6da
such as only loading once the XSLT processor, the 
Packit e4b6da
man-pages stylesheet, @code{db2x_manxml} and @code{utf8trans}.
Packit e4b6da
Packit e4b6da
Unfortunately, there does not seem to be obvious ways
Packit e4b6da
that the performance can be improved, short of re-implementing the
Packit e4b6da
tranformation program in a tight programming language such as C.
Packit e4b6da
Packit e4b6da
Some notes on possible bottlenecks:
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Character mapping by @code{utf8trans} is very fast compared to 
Packit e4b6da
the other stages of the transformation.  Even loading @code{utf8trans}
Packit e4b6da
separately for each document only doubles the running time
Packit e4b6da
of the character mapping stage.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Even though the XSLT processor is written in C,
Packit e4b6da
XSLT processing is still comparatively slow.
Packit e4b6da
It takes double the time of the Perl script@footnote{
Packit e4b6da
From preliminary estimates, the Pure-XSLT solution takes only 
Packit e4b6da
slightly longer at this stage: .22s per page}
Packit e4b6da
@code{db2x_manxml},
Packit e4b6da
even though the XSLT portion and the Perl portion
Packit e4b6da
are processing documents of around the same size@footnote{Of course, conceptually, DocBook processing is more complicated.
Packit e4b6da
So these timings also give us an estimate of the cost
Packit e4b6da
of DocBook's complexity: twice the cost over a simpler document type,
Packit e4b6da
which is actually not too bad.}
Packit e4b6da
(DocBook @code{refentry}
Packit e4b6da
documents and Man-XML documents).  
Packit e4b6da
Packit e4b6da
In fact, profiling the stylesheets shows that a significant
Packit e4b6da
amount of time is spent on the localization templates,
Packit e4b6da
in particular the complex XPath navigation used there.
Packit e4b6da
An obvious optimization is to use XSLT keys for the same
Packit e4b6da
functionality.  
Packit e4b6da
Packit e4b6da
However, when that is implemented,
Packit e4b6da
the author found that the time used for 
Packit e4b6da
@emph{setting up keys} dwarfs the time savings
Packit e4b6da
from avoiding the complex XPath navigation.  It adds an
Packit e4b6da
extra 10s to the processing time for the 2157 documents.
Packit e4b6da
Upon closer examination of the libxslt source code,
Packit e4b6da
XSLT keys are seen to be implemented rather inefficiently:
Packit e4b6da
@emph{each} key pattern @var{x}
Packit e4b6da
causes the entire input document to be traversed once
Packit e4b6da
by evaluating the XPath @samp{//@var{x}}!
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Perhaps a C-based XSLT processor written
Packit e4b6da
with the best performance in mind (libxslt is not particularly
Packit e4b6da
the most efficiently coded) may be able to achieve
Packit e4b6da
better conversion times, without losing all the nice
Packit e4b6da
advantages of XSLT-based tranformation.
Packit e4b6da
Or failing that, one can look into efficient, stream-based
Packit e4b6da
transformations (@uref{http://stx.sourceforge.net/,STX}).
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node How docbook2X is tested, To-do list, Performance analysis, Top
Packit e4b6da
@chapter How docbook2X is tested
Packit e4b6da
@cindex testing
Packit e4b6da
@cindex correctness
Packit e4b6da
@cindex validation
Packit e4b6da
Packit e4b6da
The testing of the process of converting from DocBook to man pages, or Texinfo,
Packit e4b6da
is complicated by the fact
Packit e4b6da
that a given input (the DocBook document) usually
Packit e4b6da
does not have one specific, well-defined output.
Packit e4b6da
Variations on the output are allowed for the result to look ``nice''.
Packit e4b6da
Packit e4b6da
When docbook2X was in the early stages of development,
Packit e4b6da
the author tested it simply by running some sample DocBook documents
Packit e4b6da
through it, and visually inspecting the output.
Packit e4b6da
Packit e4b6da
Clearly, this procedure is not scaleable for testing
Packit e4b6da
a large number of documents.
Packit e4b6da
In the later 0.8.@var{x} versions
Packit e4b6da
of docbook2X, the testing has been automated
Packit e4b6da
as much as possible.
Packit e4b6da
Packit e4b6da
The testing is implemented by 
Packit e4b6da
heuristic checks on the output to see if
Packit e4b6da
it comprises a ``good'' man page or Texinfo file.
Packit e4b6da
These are the checks in particular:
Packit e4b6da
Packit e4b6da
@enumerate 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Validation of the Man-XML or Texi-XML output,
Packit e4b6da
from the first stage, XSLT stylesheets,
Packit e4b6da
against the XML DTDs defining the formats.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Running 
Packit e4b6da
groff(1) and 
Packit e4b6da
makeinfo(1)
Packit e4b6da
on the output, and noting any errors
Packit e4b6da
or warnings from those programs.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Other heuristic checks on the output,
Packit e4b6da
implemented by a Perl script.  Here,
Packit e4b6da
spurious blank lines, uncollapsed whitespace
Packit e4b6da
in the output that would cause a bad display 
Packit e4b6da
are checked.
Packit e4b6da
@end enumerate
Packit e4b6da
Packit e4b6da
There are about 8000 test documents,
Packit e4b6da
mostly @code{refentry}
Packit e4b6da
documents,  that can be run
Packit e4b6da
against the current version of docbook2X.
Packit e4b6da
A few of them have been gathered by the author
Packit e4b6da
from various sources and test cases from bug reports.
Packit e4b6da
The majority come from using 
Packit e4b6da
@uref{http://www.catb.org/~esr/doclifter/,doclifter}
Packit e4b6da
on existing man pages.
Packit e4b6da
Most pages pass the above tests.
Packit e4b6da
Packit e4b6da
To run the tests, go to the @file{test/}
Packit e4b6da
directory in the docbook2X distribution.
Packit e4b6da
The command @samp{make check} will run
Packit e4b6da
some tests on a few documents.
Packit e4b6da
Packit e4b6da
For testing using doclifter,
Packit e4b6da
first generate the DocBook XML sources using doclifter,
Packit e4b6da
then take a look at the @file{test/mass/test.pl}
Packit e4b6da
testing script and run it.
Packit e4b6da
Note that a small portion of the doclifter pages
Packit e4b6da
will fail the tests, because they do not satisfy the heuristic
Packit e4b6da
tests (but are otherwise correct), or, more commonly,
Packit e4b6da
the source coming from the doclifter heuristic up-conversion 
Packit e4b6da
has errors.
Packit e4b6da
Packit e4b6da
@node To-do list, Release history, How docbook2X is tested, Top
Packit e4b6da
@chapter To-do list
Packit e4b6da
@cindex to-do
Packit e4b6da
@cindex future
Packit e4b6da
@cindex bugs
Packit e4b6da
@cindex wishlist
Packit e4b6da
@cindex DocBook
Packit e4b6da
Packit e4b6da
With regards to DocBook support:
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{qandaset} table of contents
Packit e4b6da
Perhaps allow @code{qandadiv}
Packit e4b6da
elements to be nodes in Texinfo.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{olink}
Packit e4b6da
(do it like what the DocBook XSL stylesheets do)
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{synopfragmentref}
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Man pages should support @code{qandaset}, @code{footnote}, @code{mediaobject}, @code{bridgehead}, 
Packit e4b6da
@code{synopfragmentref}
Packit e4b6da
@code{sidebar},
Packit e4b6da
@code{msgset},
Packit e4b6da
@code{procedure}
Packit e4b6da
(and there's more).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Some DocBook 4.0 stuff:
Packit e4b6da
e.g. @code{methodsynopsis}.
Packit e4b6da
On the other hand adding the DocBook 4.2 stuff shouldn't be that hard.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{programlisting}
Packit e4b6da
line numbering, and call-out bugs specified
Packit e4b6da
using @code{area}.
Packit e4b6da
Seems to need XSLT extensions though.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
A template-based system for title pages, and @code{biblioentry}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Setting column widths in tables are not yet supported in man
Packit e4b6da
pages, but they should be.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Support for typesetting mathematics.
Packit e4b6da
However, I have never seen any man pages or Texinfo manuals
Packit e4b6da
that require this, obviously because math looks horrible
Packit e4b6da
in ASCII text.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
For other work items, see the `limitations' or
Packit e4b6da
`bugs' section in the individual tools' reference pages.
Packit e4b6da
Packit e4b6da
Other work items:
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Implement tables in pure XSLT.  Probably swipe the code
Packit e4b6da
that is in the DocBook XSL stylesheets to do so.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Many stylesheet templates are still undocumented.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Write documentation for Man-XML and Texi-XML. 
Packit e4b6da
Write a smaller application (smaller than DocBook, that is!) 
Packit e4b6da
of Man-XML and/or Texi-XML (e.g. for W3C specs).
Packit e4b6da
A side benefit is that we can identify any bugs or design
Packit e4b6da
misfeatures that are not noticed in the DocBook application.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Need to go through the stylesheets and check/fill in
Packit e4b6da
any missing DocBook functionality.  Make a table
Packit e4b6da
outlining what part of DocBook we support.
Packit e4b6da
Packit e4b6da
For example, we have to check that each attribute
Packit e4b6da
is actually supported for an element that we claim 
Packit e4b6da
to support, or else at least raise a warning to the
Packit e4b6da
user when that attribute is used.
Packit e4b6da
Packit e4b6da
Also some of the DocBook elements are not rendered
Packit e4b6da
very nicely even when they are supported.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fault-tolerant, complete error handling.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Full localization for the output, as well as the messages
Packit e4b6da
from docbook2X programs.  (Note that 
Packit e4b6da
we already have internationalization for the output.)
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node Release history, Design notes, To-do list, Top
Packit e4b6da
@chapter Release history
Packit e4b6da
@cindex change log
Packit e4b6da
@cindex history
Packit e4b6da
@cindex release history
Packit e4b6da
@cindex news
Packit e4b6da
@cindex bugs
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_8}@strong{docbook2X 0.8.8. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Errors in the Man-XML and Texi-XML DTD were fixed.
Packit e4b6da
Packit e4b6da
These DTDs are now used to validate the output coming
Packit e4b6da
out of the stylesheets, as part of automated testing.
Packit e4b6da
(Validation provides some assurance that the
Packit e4b6da
result of the conversions are correct.)
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Several rendering errors were fixed after
Packit e4b6da
they had been discovered through automated testing.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Two HTML files in the docbook2X documentation were
Packit e4b6da
accidentally omitted in the last release.
Packit e4b6da
They have been added.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The pure-XSLT-based man-page conversion now supports
Packit e4b6da
table markup.  The implemented was copied from
Packit e4b6da
the one by Michael Smith in the DocBook XSL stylesheets.
Packit e4b6da
Many thanks!
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
As requested by Daniel Leidert,
Packit e4b6da
the man-pages stylesheets now support the 
Packit e4b6da
@code{segmentedlist},
Packit e4b6da
@code{segtitle}
Packit e4b6da
and @code{seg}
Packit e4b6da
DocBook elements.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
As suggested by Matthias Kievermagel, 
Packit e4b6da
docbook2X now supports the @code{code}
Packit e4b6da
element.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_7}@strong{docbook2X 0.8.7. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Some stylistic improvements were made
Packit e4b6da
to the man-pages output.
Packit e4b6da
Packit e4b6da
This includes fixing a bug that, in some cases, caused
Packit e4b6da
an extra blank line to occur after lists in man pages.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
There is a new value @samp{utf-8//TRANSLIT}
Packit e4b6da
for the @code{--encoding} option
Packit e4b6da
to @code{db2x_manxml} and @code{db2x_texixml}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added @code{-m} to @code{utf8trans} for modifying
Packit e4b6da
(a large number of) files in-place.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added a section to the documentation discussing conversion 
Packit e4b6da
performance.
Packit e4b6da
Packit e4b6da
There is also a new test script, 
Packit e4b6da
@file{test/mass/test.pl}
Packit e4b6da
that can exercise docbook2X by converting many documents
Packit e4b6da
at one time, with a focus on achieving the fastest
Packit e4b6da
conversion speed.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The documentation has also been improved in several places.
Packit e4b6da
Most notably, the 
Packit e4b6da
docbook2X(1)
Packit e4b6da
man page has been split into two much more detailed 
Packit e4b6da
man pages explaining
Packit e4b6da
man-page conversion and Texinfo conversion separately,
Packit e4b6da
along with a reference of stylesheet parameters.
Packit e4b6da
Packit e4b6da
The documentation has also been re-indexed (finally!)
Packit e4b6da
Packit e4b6da
Also, due to an oversight, the last release omitted the stylesheet
Packit e4b6da
reference documentation.  They are now included again.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Craig Ruff's patches were not integrated correctly in the last
Packit e4b6da
release; this has been fixed.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
By popular demand, man-page conversion can also be done
Packit e4b6da
with XSLT alone --- i.e. no Perl scripts or compiling required,
Packit e4b6da
just a XSLT processor.
Packit e4b6da
Packit e4b6da
If you want to convert with pure XSLT, invoke 
Packit e4b6da
the XSLT stylesheet in 
Packit e4b6da
@file{xslt/backend/db2x_manxml.xsl}
Packit e4b6da
in lieu of the @code{db2x_manxml} Perl script.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Make the @code{xmlcharmap2utf8trans} script 
Packit e4b6da
(convert XSLT 2.0 character maps to character maps in utf8trans 
Packit e4b6da
format) really work.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_6}@strong{docbook2X 0.8.6. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added rudimentary support for @code{entrytbl}
Packit e4b6da
in man pages; patch by Craig Ruff.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added template for @code{personname}; patch by Aaron Hawley.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fix a build problem that happened on IRIX; patch by Dirk Tilger.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Better rendering of man pages in general.  Fixed
Packit e4b6da
an incompatibility with Solaris troff of some generated man pages.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed some minor bugs in the Perl wrapper scripts.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
There were some fixes to the Man-XML and Texi-XML document types.  
Packit e4b6da
Some of these changes are backwards-incompatible with previous
Packit e4b6da
docbook2X releases.  In particular, Man-XML and Texi-XML now
Packit e4b6da
have their own XML namespaces, so if you were using custom XSLT
Packit e4b6da
stylesheets you will need to add the appropriate namespace
Packit e4b6da
declarations.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_5}@strong{docbook2X 0.8.5. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed a bug, from version 0.8.4, with the generated Texinfo 
Packit e4b6da
files not setting the Info directory information correctly.
Packit e4b6da
(This is exactly the patch that was on the docbook2X Web site.)
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed a problem with @code{db2x_manxml} not calling @code{utf8trans} properly.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added heavy-duty testing to the docbook2X distribution.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_4}@strong{docbook2X 0.8.4. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
There is now an @emph{experimental}
Packit e4b6da
implementation of @code{db2x_manxml} and @code{db2x_texixml} using pure XSLT,
Packit e4b6da
for those who can't use the Perl one for whatever reason.
Packit e4b6da
See the @file{xslt/backend/} directory.
Packit e4b6da
Do not expect this to work completely yet.  
Packit e4b6da
In particular, tables are not yet available in man pages.
Packit e4b6da
(They are, of course, still available in the Perl
Packit e4b6da
implementation.)
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Texinfo conversion does not require XSLT extensions anymore!
Packit e4b6da
See @ref{Design notes; the elimination of XSLT extensions,,Design notes: the elimination of XSLT extensions} for the full story.
Packit e4b6da
Packit e4b6da
As a consequence, @code{db2x_xsltproc} has been rewritten to be
Packit e4b6da
a Perl wrapper script around the stock 
Packit e4b6da
xsltproc(1).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The @code{-S} option to @code{db2x_xsltproc}
Packit e4b6da
no longer uses libxml's hackish ``SGML DocBook'' parser, but now 
Packit e4b6da
calls 
Packit e4b6da
sgml2xml(1).
Packit e4b6da
The corresponding long option has been renamed to
Packit e4b6da
@code{--sgml} from @code{--sgml-docbook}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed a heap of bugs --- that caused invalid output --- in the 
Packit e4b6da
XSLT stylesheets, @code{db2x_manxml} and @code{db2x_texixml}.
Packit e4b6da
Packit e4b6da
Some features such as @code{cmdsynopsis}
Packit e4b6da
and @code{funcsynopsis}
Packit e4b6da
are rendered more nicely.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Man-XML and Texi-XML now have DTDs ---
Packit e4b6da
these are useful when writing and debugging stylesheets.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added a @code{--plaintext} option to @code{db2x_texixml}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Updates to the docbook2X manual.
Packit e4b6da
Stylesheet documentation is in.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_3}@strong{docbook2X 0.8.3. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Incorporated Michael Smith's much-expanded roff character maps.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
There are some improvements to the stylesheets themselves, here and 
Packit e4b6da
there.
Packit e4b6da
Packit e4b6da
Also I made the Texinfo stylesheets adapt to the XSLT processor
Packit e4b6da
automatically (with regards to the XSLT extensions).  This
Packit e4b6da
might be of interest to anybody wanting to use the stylesheets
Packit e4b6da
with some other XSLT processor (especially SAXON).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed a couple of bugs that prevented docbook2X from 
Packit e4b6da
working on Cygwin.  
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed a programming error in @code{utf8trans} that caused it to
Packit e4b6da
segfault.  At the same time, I rewrote parts of it
Packit e4b6da
to make it more efficient for large character maps
Packit e4b6da
(those with more than a thousand entries).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The Perl component of docbook2X has switched from using
Packit e4b6da
libxml-perl (a SAX1 interface) to XML-SAX (a SAX2 interface).
Packit e4b6da
I had always wanted to do the switch since libxml-perl 
Packit e4b6da
is not maintained, but the real impetus this time is
Packit e4b6da
that XML-SAX has a pure Perl XML parser.  If you have
Packit e4b6da
difficulties building @code{XML::Parser}
Packit e4b6da
on Cygwin, like I did, the Perl component will automatically
Packit e4b6da
fall back on the pure Perl parser.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_2}@strong{docbook2X 0.8.2. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added support for tables in man pages.
Packit e4b6da
Almost all table features that can be supported with 
Packit e4b6da
@code{tbl} will work.
Packit e4b6da
The rest will be fixed in a subsequent release.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Copied the ``gentext'' stuff over from Norman Walsh's XSL stylesheets.
Packit e4b6da
This gives (incomplete) localizations for the same languages
Packit e4b6da
that are supported by the Norman Walsh's XSL stylesheets.
Packit e4b6da
Packit e4b6da
Although incomplete, they should be sufficient for localized
Packit e4b6da
man-page output, for which there are only a few strings
Packit e4b6da
like ``Name'' and ``Synopsis'' that need to be translated.
Packit e4b6da
Packit e4b6da
If you do make non-English man pages, you will need
Packit e4b6da
to revise the localization files; please send patches
Packit e4b6da
to fix them afterwards.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Rendering of bibliography, and other less common DocBook
Packit e4b6da
elements is broken.  Actually, it was probably also 
Packit e4b6da
slightly broken before.  Some time will be needed to
Packit e4b6da
go through the stylesheets to check/document everything in 
Packit e4b6da
it and to add anything that is still missing.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added @code{--info} option to @code{db2x_texixml},
Packit e4b6da
to save typing the @code{makeinfo} command.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Rename @code{--stringparam} option 
Packit e4b6da
in @code{db2x_xsltproc} to @code{--string-param},
Packit e4b6da
though the former option name is still accepted
Packit e4b6da
for compatibility.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added the stylesheet for generating the XSLT reference 
Packit e4b6da
documentation.  But the reference documentation is not 
Packit e4b6da
integrated into the main docbook2X documentation yet.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
docbook2X no longer uses SGML-based tools to build.
Packit e4b6da
HTML documentation is now built with the DocBook XSL stylesheets.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Changed the license of this package to the MIT license.
Packit e4b6da
This is in case someone wants to copy snippets of the XSLT stylesheets,
Packit e4b6da
and requiring the resulting stylesheet to be GPL seems too onerous.
Packit e4b6da
Actually there is no real loss since no one wants to hide XSLT source
Packit e4b6da
anyway.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Switched to a newer version of autoconf.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixes for portability (to non-Linux OSes).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
A number of small rendering bug fixes, as usual.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_1}@strong{docbook2X 0.8.1. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Bug fixes.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Texinfo menu generation has been improved: the menus now look almost
Packit e4b6da
as good as human-authored Texinfo pages and include detailed node listings
Packit e4b6da
(@code{@@detailmenu}) also.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added option to process XInclude in @code{db2x_xsltproc} just like standard
Packit e4b6da
@code{xsltproc}.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_8_0}@strong{docbook2X 0.8.0. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Moved @code{docbook2man-spec.pl} to a sister package,
Packit e4b6da
docbook2man-sgmlspl, since it seems to be used quite a lot.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
There are now XSLT stylesheets for man page conversion, superceding the
Packit e4b6da
@code{docbook2manxml}.  @code{docbook2manxml} had some neat code in it, but I
Packit e4b6da
fear maintaining two man-page converters will take too much time in the
Packit e4b6da
future, so I am dropping it now instead of later.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed build errors involving libxslt headers, etc. that plagued the last
Packit e4b6da
release.  The libxslt wrapper (name changed to @code{db2x_xsltproc}, formerly
Packit e4b6da
called @code{docbook2texi-libxslt}) has been
Packit e4b6da
updated for the recent libxslt changes.  
Packit e4b6da
Catalog support working.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Transcoding output to non-UTF-8 charsets is automatic.  
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Made some wrapper scripts for the two-step conversion process.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_7_0}@strong{docbook2X 0.7.0. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
More bug squashing and features in XSLT stylesheets and Perl scripts.  
Packit e4b6da
Too many to list.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added @code{docbook2texi-libxslt}, which uses libxslt.
Packit e4b6da
Finally, no more Java is necessary.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Added a C-based tool to translate UTF-8 characters to arbitrary (byte)
Packit e4b6da
sequences, to avoid having to patch @code{recode} every time
Packit e4b6da
the translation changes.  However, Christoph Spiel has ported the recode
Packit e4b6da
utf8..texi patch to GNU recode 3.6 if you prefer to use recode.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
As usual, the documentation has been improved.
Packit e4b6da
Packit e4b6da
The documentation for the XSLT stylesheets can be extracted
Packit e4b6da
automatically.  (Caveat: libxslt has a bug which affects this process,
Packit e4b6da
so if you want to build this part of the documentation yourself you must
Packit e4b6da
use some other XSLT processor. There is no @code{jrefentry} support in docbook2X yet, so the
Packit e4b6da
reference is packaged in HTML format; this will change in the next
Packit e4b6da
release, hopefully.)
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Build system now uses autoconf and automake.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_6_9}@strong{docbook2X 0.6.9. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Removed old unmaintained code such as @code{docbook2man},
Packit e4b6da
@code{docbook2texi}.
Packit e4b6da
Moved Perl scripts to @file{perl/} directory and did some
Packit e4b6da
renaming of the scripts to saner names.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Better make system.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Debugged, fixed the XSLT stylesheets more and added libxslt invocation.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Cut down the superfluity in the documentation.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Fixed other bugs in @code{docbook2manxml} and the Texi-XML,
Packit e4b6da
Man-XML tools.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_6_1}@strong{docbook2X 0.6.1. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@code{docbook2man-spec.pl} has an option to strip or
Packit e4b6da
not strip letters in man page section names, and xref may now refer to
Packit e4b6da
@code{refsect@var{n}}.
Packit e4b6da
I have not personally tested these options, but loosing them
Packit e4b6da
in the interests of release early and often.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Menu label quirks, @code{paramdef}
Packit e4b6da
non-conformance, and vertical simplelists with multiple columns fixed in
Packit e4b6da
@code{docbook2texixml}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Brought @code{docbook2manxml} up
Packit e4b6da
to speed. It builds its own documentation now.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Arcane bugs in @code{texi_xml} and @code{man_xml}
Packit e4b6da
fixed.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_6_0}@strong{docbook2X 0.6.0. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Introduced Texinfo XSLT stylesheets. 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Bugfixes to @code{texi_xml} and 
Packit e4b6da
@code{docbook2texixml}. 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Produced patch to GNU @code{recode} which maps Unicode
Packit e4b6da
characters to the corresponding Texinfo commands or characters.
Packit e4b6da
It is in @file{ucs2texi.patch}.
Packit e4b6da
I have already sent this patch to the maintainer of @code{recode}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Updated documentation.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@anchor{docbook2X 0_5_9}@strong{docbook2X 0.5.9. } 
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Both @code{docbook2texixml} transform into intermediate XML
Packit e4b6da
format which closely resembles the Texinfo format, and then another
Packit e4b6da
tool is used to convert this XML to the actual format.
Packit e4b6da
Packit e4b6da
This scheme moves all the messy whitespace, newline, and escaping issues
Packit e4b6da
out of the actual transformation code.  Another benefit is that other
Packit e4b6da
stylesheets (systems), can be used to do the transformation, and it
Packit e4b6da
serves as a base for transformation to Texinfo from other
Packit e4b6da
DTDs.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Texinfo node handling has been rewritten.  Node handling used to work
Packit e4b6da
back and forth between IDs and node names, which caused a lot of
Packit e4b6da
confusion.  The old code also could not support DocBook
Packit e4b6da
@code{set}s because it did not keep track of the Texinfo
Packit e4b6da
file being processed.  
Packit e4b6da
Packit e4b6da
As a consequence,  the bug in which @code{docbook2texixml} did
Packit e4b6da
not output the @samp{@@setinfofile} is fixed.
Packit e4b6da
@code{xreflabel} handling is also sane now.  
Packit e4b6da
Packit e4b6da
In the new scheme, elements are referred to by their ID (auto-generated
Packit e4b6da
if necessary).  The Texinfo node names are generated before doing the
Packit e4b6da
actual transformation, and subsequent @code{texinode_get}
Packit e4b6da
simply looks up the node name when given an element.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The stylesheet architecture allows internationalization to be
Packit e4b6da
implemented easily, although it is not done yet.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The (non-XML-based) old code is still in the CVS tree, but I'm not
Packit e4b6da
really interested in maintaining it.  I'll still accept patches to them, 
Packit e4b6da
and probably will keep them around for reference and porting purposes.
Packit e4b6da
Packit e4b6da
There are some changes to the old code base in
Packit e4b6da
this new release; see old change log for details.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The documentation has been revised.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
I am currently rewriting docbook2man using the same transform-to-XML technique.
Packit e4b6da
It's not included in 0.5.9 simply because I wanted to get the improved
Packit e4b6da
Texinfo tool out quickly.
Packit e4b6da
Additional XSLT stylesheets will be written.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node Design notes, Package installation, Release history, Top
Packit e4b6da
@chapter Design notes
Packit e4b6da
@cindex design
Packit e4b6da
@cindex history
Packit e4b6da
Packit e4b6da
Lessons learned:
Packit e4b6da
Packit e4b6da
@itemize 
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@cindex stream processing
Packit e4b6da
@cindex tree processing
Packit e4b6da
Packit e4b6da
Think four times before doing stream-based XML processing, even though it
Packit e4b6da
appears to be more efficient than tree-based.
Packit e4b6da
Stream-based processing is usually more difficult.
Packit e4b6da
Packit e4b6da
But if you have to do stream-based processing, make sure to use robust,
Packit e4b6da
fairly scaleable tools like @code{XML::Templates}, 
Packit e4b6da
@emph{not} @code{sgmlspl}.  Of course it cannot 
Packit e4b6da
be as pleasant as tree-based XML processing, but examine 
Packit e4b6da
@code{db2x_manxml} and @code{db2x_texixml}.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Do not use @code{XML::DOM} directly for stylesheets.
Packit e4b6da
Your ``stylesheet'' would become seriously unmanageable.
Packit e4b6da
Its also extremely slow for anything but trivial documents.
Packit e4b6da
Packit e4b6da
At least take a look at some of the XPath modules out there.
Packit e4b6da
Better yet, see if your solution really cannot use XSLT.
Packit e4b6da
A C/C++-based implementation of XSLT can be fast enough
Packit e4b6da
for many tasks.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@cindex XSLT extensions
Packit e4b6da
Packit e4b6da
Avoid XSLT extensions whenever possible.  I don't think there is
Packit e4b6da
anything wrong with them intrinsically, but it is a headache
Packit e4b6da
to have to compile your own XSLT processor. (libxslt is written 
Packit e4b6da
in C, and the extensions must be compiled-in and cannot be loaded
Packit e4b6da
dynamically at runtime.)  Not to mention there seems to be a thousand
Packit e4b6da
different set-ups for different XSLT processors.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@cindex Perl
Packit e4b6da
Packit e4b6da
Perl is not as good at XML as it's hyped to be.  
Packit e4b6da
Packit e4b6da
SAX comes from the Java world, and its port to Perl
Packit e4b6da
(with all the object-orientedness, and without adopting Perl idioms)
Packit e4b6da
is awkward to use.
Packit e4b6da
Packit e4b6da
Another problem is that Perl SAX does not seem to be well-maintained.
Packit e4b6da
The implementations have various bugs; while they can be worked around,
Packit e4b6da
they have been around for such a long time that it does not inspire
Packit e4b6da
confidence that the Perl XML modules are reliable software.
Packit e4b6da
Packit e4b6da
It also seems that no one else has seriously used Perl SAX
Packit e4b6da
for robust applications.  It seems to be unnecessarily hard to 
Packit e4b6da
certain tasks such as displaying error diagnostics on its
Packit e4b6da
input, processing large documents with complicated structure.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
@cindex Man-XML
Packit e4b6da
@cindex Texi-XML
Packit e4b6da
Packit e4b6da
Do not be afraid to use XML intermediate formats 
Packit e4b6da
(e.g. Man-XML and Texi-XML) for converting to other
Packit e4b6da
markup languages, implemented with a scripting language.
Packit e4b6da
The syntax rules for these formats are made for 
Packit e4b6da
authoring by hand, not machine generation; hence a conversion
Packit e4b6da
using tools designed for XML-to-XML conversion, 
Packit e4b6da
requires jumping through hoops. 
Packit e4b6da
Packit e4b6da
You might think that we could, instead, make a separate module 
Packit e4b6da
that abstracts all this complexity
Packit e4b6da
from the rest of the conversion program.  For example,
Packit e4b6da
there is nothing stopping a XSLT processor from serializing
Packit e4b6da
the output document as a text document obeying the syntax
Packit e4b6da
rules for man pages or Texinfo documents.
Packit e4b6da
Packit e4b6da
Theoretically you would get the same result,
Packit e4b6da
but it is much harder to implement.  It is far easier to write plain 
Packit e4b6da
text manipulation code in a scripting language than in Java or C or XSLT.
Packit e4b6da
Also, if the intermediate format is hidden in a Java class or 
Packit e4b6da
C API, output errors are harder to see.
Packit e4b6da
Whereas with the intermediate-format approach, we can
Packit e4b6da
visually examine the textual output of the XSLT processor and fix
Packit e4b6da
the Perl script as we go along.
Packit e4b6da
Packit e4b6da
Some XSLT processors support scripting to go beyond XSLT
Packit e4b6da
functionality, but they are usually not portable, and not 
Packit e4b6da
always easy to use.
Packit e4b6da
Therefore, opt to do two-pass processing, with a standalone
Packit e4b6da
script as the second stage.  (The first stage using XSLT.)
Packit e4b6da
Packit e4b6da
@anchor{Design notes; the elimination of XSLT extensions}
Packit e4b6da
Finally, another advantage of using intermediate XML formats
Packit e4b6da
processed by a Perl script is that we can often eliminate the
Packit e4b6da
use of XSLT extensions.  In particular, all the way back when XSLT 
Packit e4b6da
stylesheets first went into docbook2X, the extensions related to
Packit e4b6da
Texinfo node handling could have been easily moved to the Perl script,
Packit e4b6da
but I didn't realize it!  I feel stupid now. 
Packit e4b6da
Packit e4b6da
If I had known this in the very beginning, it would have saved 
Packit e4b6da
a lot of development time, and docbook2X would be much more 
Packit e4b6da
advanced by now.
Packit e4b6da
Packit e4b6da
Note that even the man-pages stylesheet from the DocBook XSL
Packit e4b6da
distribution essentially does two-pass processing
Packit e4b6da
just the same as the docbook2X solution.  That stylesheet
Packit e4b6da
had formerly used one-pass processing, and its authors 
Packit e4b6da
probably finally realized what a mess that was.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Design the XML intermediate format to be easy to use from the standpoint
Packit e4b6da
of the conversion tool, and similar to how XML document types work in
Packit e4b6da
general.  e.g. abstract the paragraphs of a document, rather than their 
Packit e4b6da
paragraph @emph{breaks}
Packit e4b6da
(the latter is typical of traditional markup languages, but not of XML).
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
I am quite impressed by some of the things that people make XSLT 1.0 do.
Packit e4b6da
Things that I thought were impossible, or at least unworkable
Packit e4b6da
without using ``real'' scripting language. 
Packit e4b6da
(@code{db2x_manxml} and @code{db2x_texixml} fall in the
Packit e4b6da
category of things that can be done in XSLT 1.0 but inelegantly.)
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Internationalize as soon as possible.  
Packit e4b6da
That is much easier than adding it in later.
Packit e4b6da
Packit e4b6da
Same advice for build system.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
I would suggest against using build systems based
Packit e4b6da
on Makefiles or any form of automake.
Packit e4b6da
Of course it is inertia that prevents people from
Packit e4b6da
switching to better build systems.  But also
Packit e4b6da
consider that while Makefile-based build systems 
Packit e4b6da
can do many of the things newer build systems are capable
Packit e4b6da
of, they often require too many fragile hacks.  Developing
Packit e4b6da
these hacks take too much time that would be better
Packit e4b6da
spent developing the program itself.
Packit e4b6da
Packit e4b6da
Alas, better build systems such as scons were not available
Packit e4b6da
when docbook2X was at an earlier stage.  It's too late
Packit e4b6da
to switch now.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Writing good documentation takes skill.  This manual has
Packit e4b6da
has been revised substantially at least four times
Packit e4b6da
@footnote{
Packit e4b6da
This number is probably inflated because of the so many design 
Packit e4b6da
mistakes in the process.}, with the author
Packit e4b6da
consciously trying to condense information each time.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
Table processing in the pure-XSLT man-pages conversion
Packit e4b6da
is convoluted --- it goes through HTML(!) tables as an intermediary.
Packit e4b6da
That is the same way that the DocBook XSL stylesheets implement
Packit e4b6da
it (due to Michael Smith), and I copied the code there
Packit e4b6da
almost verbatim.  I did it this way to save myself time and energy
Packit e4b6da
re-implementing tables conversion @emph{again}.
Packit e4b6da
Packit e4b6da
And Michael Smith says that going through HTML is better,
Packit e4b6da
because some varieties of DocBook allow the HTML table model
Packit e4b6da
in addition to the CALS table model.  (I am not convinced
Packit e4b6da
that this is such a good idea, but anyway.)
Packit e4b6da
Then HTML tables in DocBook can be translated to man pages
Packit e4b6da
too without much more effort.
Packit e4b6da
Packit e4b6da
Is this inefficient? Probably.  But that's what you get
Packit e4b6da
if you insist on using pure XSLT.  The Perl implementation
Packit e4b6da
of docbook2X.
Packit e4b6da
already supported tables conversion for two years prior.
Packit e4b6da
Packit e4b6da
@item
Packit e4b6da
The design of @code{utf8trans} is not the best.
Packit e4b6da
It was chosen to simplify implementations while being efficient.
Packit e4b6da
A more general design, while still retaining efficiency, is possible, 
Packit e4b6da
which I describe below.  However, unfortunately,
Packit e4b6da
at this point changing @code{utf8trans}
Packit e4b6da
will be too disruptive to users with little gain in functionality.
Packit e4b6da
Packit e4b6da
Instead of working with characters, we should work with byte strings.
Packit e4b6da
This means that, if all input and output is in UTF-8,
Packit e4b6da
with no escape sequences, then UTF-8 decoding or encoding
Packit e4b6da
is not necessary at all.  Indeed the program becomes agnostic
Packit e4b6da
to the character set used.  Of course,
Packit e4b6da
multi-character matches become possible.
Packit e4b6da
Packit e4b6da
The translation map will be an unordered list of key-value pairs.
Packit e4b6da
The key and value are both arbitrary-length byte strings,
Packit e4b6da
with an explicit length attached (so null bytes in the input
Packit e4b6da
and output are retained).
Packit e4b6da
Packit e4b6da
The program would take the translation map, and transform the input file
Packit e4b6da
by matching the start of input, seen as a sequence of bytes, 
Packit e4b6da
against the keys in the translation map, greedily.
Packit e4b6da
(Since the matching is greedy, the translation keys do not
Packit e4b6da
need to be restricted to be prefix-free.)
Packit e4b6da
Once the longest (in byte length) matching key is found, 
Packit e4b6da
the corresponding value (another byte string) is substituted
Packit e4b6da
in the output, and processing repeats (until the input is finished).
Packit e4b6da
If, on the other hand, no match is found, the next byte
Packit e4b6da
in the input file is copied as-is, and processing repeats 
Packit e4b6da
at the next byte of input.
Packit e4b6da
Packit e4b6da
Since bytes are 8 bits and the key strings are typically
Packit e4b6da
very short (up to 3 
Packit e4b6da
bytes for a Unicode BMP character encoded in UTF-8),
Packit e4b6da
this algorithm can be implemented with radix search.
Packit e4b6da
It would be competitive, in both execution time and space,
Packit e4b6da
with character codepoint hashing and sparse multi-level
Packit e4b6da
arrays, the primary techniques for implementing
Packit e4b6da
Unicode @emph{character} translation.
Packit e4b6da
(@code{utf8trans} is implemented using sparse multi-level arrays.)
Packit e4b6da
Packit e4b6da
One could even try to generalize the radix searching further,
Packit e4b6da
so that keys can include wildcard characters, for example.
Packit e4b6da
Taken to the extremes, the design would end up being
Packit e4b6da
a regular expressions processor optimized for matching
Packit e4b6da
many strings with common prefixes.
Packit e4b6da
@end itemize
Packit e4b6da
Packit e4b6da
@node Package installation, Concept index, Design notes, Top
Packit e4b6da
@appendix Package installation
Packit e4b6da
Packit e4b6da
@menu
Packit e4b6da
* Installation::                Package install procedure
Packit e4b6da
* Dependencies on other software::   Other software packages that docbook2X
Packit e4b6da
                                       needs
Packit e4b6da
@end menu
Packit e4b6da
Packit e4b6da
@node Installation, Dependencies on other software, , Package installation
Packit e4b6da
@section Installation
Packit e4b6da
@cindex docbook2X package
Packit e4b6da
@cindex installation
Packit e4b6da
Packit e4b6da
After checking that you have the 
Packit e4b6da
necessary prerequisites (@pxref{Dependencies on other software}),
Packit e4b6da
unpack the tarball, then run @samp{./configure}, and
Packit e4b6da
then @samp{make}, @samp{make install},
Packit e4b6da
as usual.  
Packit e4b6da
Packit e4b6da
@quotation
Packit e4b6da
Packit e4b6da
@strong{Note}
Packit e4b6da
Packit e4b6da
@cindex pure XSLT
Packit e4b6da
If you intend to use only the pure XSLT version of docbook2X,
Packit e4b6da
then you do not need to compile or build the package at all.
Packit e4b6da
Simply unpack the tarball, and point your XSLT processor
Packit e4b6da
to the XSLT stylesheets under the @file{xslt/}
Packit e4b6da
subdirectory.
Packit e4b6da
@end quotation
Packit e4b6da
Packit e4b6da
(The last @samp{make install} step, to install
Packit e4b6da
the files of the package onto the filesystem, is optional.  You may use
Packit e4b6da
docbook2X from its own directory after building it, although in that case, 
Packit e4b6da
when invoking docbook2X, you will have to specify some paths manually
Packit e4b6da
on the command-line.)
Packit e4b6da
Packit e4b6da
You may also want to run @samp{make check} to do some
Packit e4b6da
checks that the package is working properly.  Typing
Packit e4b6da
@samp{make -W docbook2X.xml man texi} in
Packit e4b6da
the @file{doc/} directory will rebuild
Packit e4b6da
docbook2X's own documentation, and can serve as an additional check.
Packit e4b6da
Packit e4b6da
You need GNU make to build docbook2X properly.
Packit e4b6da
@cindex CVS
Packit e4b6da
Packit e4b6da
If you are using the CVS version, you will also need the autoconf and automake
Packit e4b6da
tools, and must run @samp{./autogen.sh} first.  But
Packit e4b6da
see also the note below about the CVS version.
Packit e4b6da
Packit e4b6da
@cindex HTML documentation
Packit e4b6da
If you want to (re-)build HTML documentation (after having installed Norman Walsh's DocBook XSL stylesheets), pass @code{--with-html-xsl}
Packit e4b6da
to @samp{./configure}.  You do not really need this,
Packit e4b6da
since docbook2X releases already contain pre-built HTML documentation.
Packit e4b6da
Packit e4b6da
Some other packages also call their conversion programs
Packit e4b6da
@code{docbook2man} and @code{docbook2texi};
Packit e4b6da
you can use the @code{--program-transform-name} parameter to 
Packit e4b6da
@samp{./configure} if you do not want docbook2X to clobber
Packit e4b6da
over your existing @code{docbook2man} or 
Packit e4b6da
@code{docbook2texi}.
Packit e4b6da
Packit e4b6da
If you are using a Java-based XSLT processor,
Packit e4b6da
you need to use pass @code{--with-xslt-processor=saxon} for
Packit e4b6da
SAXON, or @code{--with-xslt-processor=xalan-j} for
Packit e4b6da
Xalan-Java.  (The default is for libxslt.)
Packit e4b6da
In addition, since the automatic check for the installed JARs is not
Packit e4b6da
very intelligent, you will probably need to pass some options
Packit e4b6da
to @samp{./configure} to tell it where the JARs are.
Packit e4b6da
See @samp{./configure --help} for details.
Packit e4b6da
Packit e4b6da
The docbook2X package supports VPATH builds (building in a location 
Packit e4b6da
other than the source directory), but any newly generated documentation
Packit e4b6da
will not end up in the right place for installation and redistribution.
Packit e4b6da
Cross compilation is not supported at all.
Packit e4b6da
Packit e4b6da
@noindent
Packit e4b6da
@anchor{Installation problems}
Packit e4b6da
@subsection Installation problems
Packit e4b6da
@cindex problems
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
Where is @code{XML::Handler::SGMLSpl}?
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
It's included in the docbook2X package.  
Packit e4b6da
If Perl says it cannot find it,
Packit e4b6da
then that is a bug in the docbook2X distribution.
Packit e4b6da
Please report it.
Packit e4b6da
Packit e4b6da
In older versions of docbook2X, the SGMLSpl module
Packit e4b6da
had to be installed, or specified manually on the Perl command line.
Packit e4b6da
That is no longer the case.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
@code{db2x_xsltproc} tells me that `one input document is required'
Packit e4b6da
when building docbook2X.
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
Use GNU make to build docbook2X (as opposed to BSD make).
Packit e4b6da
Packit e4b6da
I could fix this incompatibility in the docbook2X make files,
Packit e4b6da
but some of the default automake rules have the same problem,
Packit e4b6da
so I didn't bother.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
When docbook2X attempts to build its documentation,
Packit e4b6da
I get errors about ``attempting to load network entity'', etc.
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
You will need to set up the XML catalogs for the DocBook XML DTDs correctly.
Packit e4b6da
This tells the XSLT processor where to find the DocBook DTDs on your system.
Packit e4b6da
Recent Linux distributions should already have this done for you.
Packit e4b6da
Packit e4b6da
This error (or rather, warning) is harmless in the case of docbook2X
Packit e4b6da
documentation --- it does not actually require the DTD to build.
Packit e4b6da
But your other DocBook documents might (mainly because they use
Packit e4b6da
the ISO entities).
Packit e4b6da
Packit e4b6da
libxml also understands SGML catalogs, but last time I tried it
Packit e4b6da
there was some bug that stopped it from working.  Your Mileage May Vary.
Packit e4b6da
Packit e4b6da
@item @ @ Q:
Packit e4b6da
I cannot build from CVS.
Packit e4b6da
Packit e4b6da
@item @ @ A:
Packit e4b6da
If the problem is related to HTML files, then you must
Packit e4b6da
pass @code{--with-html-xsl} to @code{configure}.
Packit e4b6da
The problem is that the HTML files are automatically generated
Packit e4b6da
from the XML source and are not in CVS, but the Makefile still
Packit e4b6da
tries to install them.  (This issue does not appear when
Packit e4b6da
building from release tarballs.)
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
For other docbook2X problems, please also look at its main documentation.
Packit e4b6da
Packit e4b6da
@node Dependencies on other software, , Installation, Package installation
Packit e4b6da
@section Dependencies on other software
Packit e4b6da
@cindex dependencies
Packit e4b6da
@cindex prerequisites
Packit e4b6da
@cindex docbook2X package
Packit e4b6da
Packit e4b6da
To use docbook2X you need:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item A reasonable Unix system, with Perl 5
Packit e4b6da
@cindex Windows
Packit e4b6da
Packit e4b6da
docbook2X can work on Linux, FreeBSD, Solaris, and Cygwin on Windows.
Packit e4b6da
Packit e4b6da
A C compiler is required to compile
Packit e4b6da
a small ANSI C program (@code{utf8trans}).  
Packit e4b6da
Packit e4b6da
@item XML-NamespaceSupport, XML-SAX, XML-Parser and XML-SAX-Expat (Perl modules)
Packit e4b6da
@cindex Perl
Packit e4b6da
The last two are optional: they add a Perl interface to the 
Packit e4b6da
C-based XML parser Expat.  It is recommended that you install them 
Packit e4b6da
anyway; otherwise, the fallback Perl-based XML parser
Packit e4b6da
makes docbook2X real slow.
Packit e4b6da
Packit e4b6da
You can get all the Perl modules here: @uref{http://www.cpan.org/modules/by-category/11_String_Lang_Text_Proc/XML/,CPAN XML module listing}.
Packit e4b6da
Packit e4b6da
@item iconv
Packit e4b6da
@cindex @code{iconv}
Packit e4b6da
Packit e4b6da
If you are running Linux glibc, you already have it.
Packit e4b6da
Otherwise, see @uref{http://www.gnu.org/software/libiconv/,the GNU libiconv home page}.
Packit e4b6da
Packit e4b6da
@item XSLT 1.0 processor
Packit e4b6da
@cindex SAXON
Packit e4b6da
@cindex Xalan-Java
Packit e4b6da
@cindex libxslt
Packit e4b6da
You have a choice of:
Packit e4b6da
Packit e4b6da
@table @asis
Packit e4b6da
Packit e4b6da
@item libxslt
Packit e4b6da
See the @uref{http://xmlsoft.org/, libxml2@comma{} libxslt home page}.
Packit e4b6da
Packit e4b6da
@item SAXON
Packit e4b6da
See @uref{http://saxon.sourceforge.net/, the SAXON home page}.
Packit e4b6da
Packit e4b6da
@item Xalan-Java
Packit e4b6da
See @uref{http://xml.apache.org/xalan-j/, the Xalan-Java home page}.
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
@cindex catalog
Packit e4b6da
For the Java-based processors (SAXON and Xalan-Java),
Packit e4b6da
you will also need@footnote{Strictly speaking this component is not required, but if you do not have it, you will almost certainly have your computer downloading large XML files from the Internet all the time, as portable XML files will not refer directly to cached local copies of the required files.} @uref{http://xml.apache.org/commons/,the Apache XML Commons} distribution.
Packit e4b6da
This adds XML catalogs support to any Java-based 
Packit e4b6da
processor.
Packit e4b6da
Packit e4b6da
Out of the three processors, libxslt is recommended.
Packit e4b6da
(I would have added support for other XSLT processors,
Packit e4b6da
but only these three seem to have proper XML catalogs
Packit e4b6da
support.)
Packit e4b6da
Packit e4b6da
Unlike previous versions of docbook2X, these Java-based
Packit e4b6da
processors can work almost out-of-the-box.  Also docbook2X
Packit e4b6da
no longer needs to compile XSLT extensions,
Packit e4b6da
so you if you use an OS distribution package of libxslt,
Packit e4b6da
you do not need the development versions of the
Packit e4b6da
library any more.
Packit e4b6da
Packit e4b6da
@item DocBook XML DTD
Packit e4b6da
@cindex DocBook
Packit e4b6da
Packit e4b6da
Make sure you set up the XML catalogs for the DTDs
Packit e4b6da
you install.
Packit e4b6da
Packit e4b6da
The @uref{http://www.docbook.org/,@i{DocBook: The Definitive Guide} website} has more information.
Packit e4b6da
Packit e4b6da
You may also need the SGML DTD if your documents are SGML
Packit e4b6da
rather than XML.
Packit e4b6da
Packit e4b6da
@item Norman Walsh's DocBook XSL stylesheets
Packit e4b6da
@cindex HTML documentation
Packit e4b6da
Packit e4b6da
See the @uref{http://docbook.sourceforge.net/,Open DocBook Repository}.
Packit e4b6da
Packit e4b6da
This is optional and is only used to build documentation in HTML format.  In your XML catalog, point the URI in @file{doc/ss-html.xsl}
Packit e4b6da
to a local copy of the stylesheets.
Packit e4b6da
@end table
Packit e4b6da
Packit e4b6da
For all the items above, it will be easier for you
Packit e4b6da
to install the OS packaging of the software (e.g. Debian packages),
Packit e4b6da
than to install them manually.  But be aware that sometimes the OS
Packit e4b6da
package may not be for an up-to-date version of the software.
Packit e4b6da
@cindex Windows
Packit e4b6da
Packit e4b6da
If you cannot satisfy all the prerequisites above (say you are on 
Packit e4b6da
a vanilla Win32 system), then you will not be able to ``build''
Packit e4b6da
docbook2X properly, but if you are knowledgeable, you can still
Packit e4b6da
salvage its parts (e.g. the XSLT stylesheets, which can be run alone).
Packit e4b6da
Packit e4b6da
@node Concept index, , Package installation, Top
Packit e4b6da
@unnumbered Index
Packit e4b6da
Packit e4b6da
@printindex cp
Packit e4b6da
Packit e4b6da
@bye