|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<sect1 id="performance">
|
|
Packit |
e4b6da |
<sect1info>
|
|
Packit |
e4b6da |
<abstract role="texinfo-node">
|
|
Packit |
e4b6da |
<para>Discussion on conversion speed</para>
|
|
Packit |
e4b6da |
</abstract>
|
|
Packit |
e4b6da |
</sect1info>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<title>Performance analysis</title>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<indexterm><primary>speed</primary></indexterm>
|
|
Packit |
e4b6da |
<indexterm><primary>performance</primary></indexterm>
|
|
Packit |
e4b6da |
<indexterm><primary>optimize</primary></indexterm>
|
|
Packit |
e4b6da |
<indexterm><primary>efficiency</primary></indexterm>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
The performance of docbook2X,
|
|
Packit |
e4b6da |
and most other DocBook tools<footnote>
|
|
Packit |
e4b6da |
<para>with the notable exception of the
|
|
Packit |
e4b6da |
<ulink url="http://packages.debian.org/unstable/text/docbook-to-man">docbook-to-man tool</ulink>
|
|
Packit |
e4b6da |
based on the <command>instant</command> stream processor
|
|
Packit |
e4b6da |
(but this tool has many correctness problems)
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
</footnote>
|
|
Packit |
e4b6da |
can be summed up in a short phrase:
|
|
Packit |
e4b6da |
<emphasis>they are slow</emphasis>.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
On a modern computer producing only a few man pages
|
|
Packit |
e4b6da |
at a time,
|
|
Packit |
e4b6da |
with the right software — namely, libxslt as the XSLT processor —
|
|
Packit |
e4b6da |
the DocBook tools are fast enough.
|
|
Packit |
e4b6da |
But their slowness becomes a hindrance for
|
|
Packit |
e4b6da |
generating hundreds or even thousands of man pages
|
|
Packit |
e4b6da |
at a time.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
The author of docbook2X encounters this problem
|
|
Packit |
e4b6da |
whenever he tries to do automated tests of the docbook2X package.
|
|
Packit |
e4b6da |
Presented below are some actual benchmarks, and possible approaches
|
|
Packit |
e4b6da |
to efficient DocBook to man pages conversion.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<title>docbook2X running times on 2157
|
|
Packit |
e4b6da |
<sgmltag class="element">refentry</sgmltag> documents</title>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<tgroup cols="3" rowsep="1" colsep="1">
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<row>
|
|
Packit |
e4b6da |
<entry>Step</entry>
|
|
Packit |
e4b6da |
<entry>Time for all pages</entry>
|
|
Packit |
e4b6da |
<entry>Avg. time per page</entry>
|
|
Packit |
e4b6da |
</row>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<row>
|
|
Packit |
e4b6da |
<entry>DocBook to Man-XML</entry>
|
|
Packit |
e4b6da |
<entry>519.61 s</entry>
|
|
Packit |
e4b6da |
<entry>0.24 s</entry>
|
|
Packit |
e4b6da |
</row>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<row>
|
|
Packit |
e4b6da |
<entry>Man-XML to man-pages</entry>
|
|
Packit |
e4b6da |
<entry>383.04 s</entry>
|
|
Packit |
e4b6da |
<entry>0.18 s</entry>
|
|
Packit |
e4b6da |
</row>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<row>
|
|
Packit |
e4b6da |
<entry>roff character mapping</entry>
|
|
Packit |
e4b6da |
<entry>6.72 s</entry>
|
|
Packit |
e4b6da |
<entry>0.0031 s</entry>
|
|
Packit |
e4b6da |
</row>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<row>
|
|
Packit |
e4b6da |
<entry>Total</entry>
|
|
Packit |
e4b6da |
<entry>909.37 s</entry>
|
|
Packit |
e4b6da |
<entry>0.42 s</entry>
|
|
Packit |
e4b6da |
</row>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
</tgroup>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
The above benchmark was run on 2157 documents
|
|
Packit |
e4b6da |
coming from the <ulink url="http://www.catb.org/~esr/doclifter/">doclifter</ulink> man-page-to-DocBook conversion tool. The man pages
|
|
Packit |
e4b6da |
come from the section 1 man pages installed in the
|
|
Packit |
e4b6da |
author’s Linux system.
|
|
Packit |
e4b6da |
The XML files total 44.484 MiB, and on average are 20.6KiB long.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
The results were obtained using the test script in
|
|
Packit |
e4b6da |
<filename>test/mass/test.pl</filename>,
|
|
Packit |
e4b6da |
using the default man-page conversion options.
|
|
Packit |
e4b6da |
The test script employs the obvious optimizations,
|
|
Packit |
e4b6da |
such as only loading once the XSLT processor, the
|
|
Packit |
e4b6da |
man-pages stylesheet, &db2x_manxml; and &utf8trans;.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
Unfortunately, there does not seem to be obvious ways
|
|
Packit |
e4b6da |
that the performance can be improved, short of re-implementing the
|
|
Packit |
e4b6da |
tranformation program in a tight programming language such as C.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
Some notes on possible bottlenecks:
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<itemizedlist>
|
|
Packit |
e4b6da |
<listitem>
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
Character mapping by &utf8trans; is very fast compared to
|
|
Packit |
e4b6da |
the other stages of the transformation. Even loading &utf8trans;
|
|
Packit |
e4b6da |
separately for each document only doubles the running time
|
|
Packit |
e4b6da |
of the character mapping stage.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
</listitem>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<listitem>
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
Even though the XSLT processor is written in C,
|
|
Packit |
e4b6da |
XSLT processing is still comparatively slow.
|
|
Packit |
e4b6da |
It takes double the time of the Perl script<footnote>
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
From preliminary estimates, the Pure-XSLT solution takes only
|
|
Packit |
e4b6da |
slightly longer at this stage: .22 s per page</para></footnote>
|
|
Packit |
e4b6da |
&db2x_manxml;,
|
|
Packit |
e4b6da |
even though the XSLT portion and the Perl portion
|
|
Packit |
e4b6da |
are processing documents of around the same size<footnote>
|
|
Packit |
e4b6da |
<para>Of course, conceptually, DocBook processing is more complicated.
|
|
Packit |
e4b6da |
So these timings also give us an estimate of the cost
|
|
Packit |
e4b6da |
of DocBook’s complexity: twice the cost over a simpler document type,
|
|
Packit |
e4b6da |
which is actually not too bad.</para></footnote>
|
|
Packit |
e4b6da |
(DocBook <sgmltag class="element">refentry</sgmltag>
|
|
Packit |
e4b6da |
documents and Man-XML documents).
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
In fact, profiling the stylesheets shows that a significant
|
|
Packit |
e4b6da |
amount of time is spent on the localization templates,
|
|
Packit |
e4b6da |
in particular the complex XPath navigation used there.
|
|
Packit |
e4b6da |
An obvious optimization is to use XSLT keys for the same
|
|
Packit |
e4b6da |
functionality.
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
However, when that is implemented,
|
|
Packit |
e4b6da |
the author found that the time used for
|
|
Packit |
e4b6da |
<emphasis>setting up keys</emphasis> dwarfs the time savings
|
|
Packit |
e4b6da |
from avoiding the complex XPath navigation. It adds an
|
|
Packit |
e4b6da |
extra 10s to the processing time for the 2157 documents.
|
|
Packit |
e4b6da |
Upon closer examination of the libxslt source code,
|
|
Packit |
e4b6da |
XSLT keys are seen to be implemented rather inefficiently:
|
|
Packit |
e4b6da |
<emphasis>each</emphasis> key pattern <replaceable>x</replaceable>
|
|
Packit |
e4b6da |
causes the entire input document to be traversed once
|
|
Packit |
e4b6da |
by evaluating the XPath <literal>//<replaceable>x</replaceable></literal>!
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
</listitem>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
<listitem>
|
|
Packit |
e4b6da |
<para>
|
|
Packit |
e4b6da |
Perhaps a C-based XSLT processor written
|
|
Packit |
e4b6da |
with the best performance in mind (libxslt is not particularly
|
|
Packit |
e4b6da |
the most efficiently coded) may be able to achieve
|
|
Packit |
e4b6da |
better conversion times, without losing all the nice
|
|
Packit |
e4b6da |
advantages of XSLT-based tranformation.
|
|
Packit |
e4b6da |
Or failing that, one can look into efficient, stream-based
|
|
Packit |
e4b6da |
transformations (<ulink url="http://stx.sourceforge.net/">STX</ulink>).
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
</listitem>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
</itemizedlist>
|
|
Packit |
e4b6da |
</para>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
</sect1>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|