|
Packit |
df99a1 |
.\" Copyright (c) 2002 Bill C. Riemers
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" This is free documentation; you can redistribute it and/or
|
|
Packit |
df99a1 |
.\" modify it under the terms of the GNU General Public License as
|
|
Packit |
df99a1 |
.\" GNU General Public License, either Version 2 of the license,
|
|
Packit |
df99a1 |
.\" or (at your option) any later version. The license should have
|
|
Packit |
df99a1 |
.\" published by the Free Software Foundation; either version 2 of
|
|
Packit |
df99a1 |
.\" the License, or (at your option) any later version.
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" The GNU General Public License's references to "object code"
|
|
Packit |
df99a1 |
.\" and "executables" are to be interpreted as the output of any
|
|
Packit |
df99a1 |
.\" document formatting or typesetting system, including
|
|
Packit |
df99a1 |
.\" intermediate and printed output.
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" This manual is distributed in the hope that it will be useful,
|
|
Packit |
df99a1 |
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
Packit |
df99a1 |
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
Packit |
df99a1 |
.\" GNU General Public License for more details.
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" You should have received a copy of the GNU General Public
|
|
Packit |
df99a1 |
.\" License along with this manual. Otherwise check the web site
|
|
Packit |
df99a1 |
.\" of the Free Software Foundation at http://www.fsf.org.
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" I, Bill C. Riemers, hereby grant all rights to this code,
|
|
Packit |
df99a1 |
.\" provided usage complies with the GPL or a written exception to
|
|
Packit |
df99a1 |
.\" the GPL granted by any of Bill C. Riemers, Leon Bottou,
|
|
Packit |
df99a1 |
.\" Yann Le Cun, or the Free Source Foundation.
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" ------------------------------------------------------------------
|
|
Packit |
df99a1 |
.\" DjVuLibre-3.5 is derived from the DjVu(r) Reference Library from
|
|
Packit |
df99a1 |
.\" Lizardtech Software. Lizardtech Software has authorized us to
|
|
Packit |
df99a1 |
.\" replace the original DjVu(r) Reference Library notice by the following
|
|
Packit |
df99a1 |
.\" text (see doc/lizard2002.djvu and doc/lizardtech2007.djvu):
|
|
Packit |
df99a1 |
.\"
|
|
Packit |
df99a1 |
.\" ------------------------------------------------------------------
|
|
Packit |
df99a1 |
.\" | DjVu (r) Reference Library (v. 3.5)
|
|
Packit |
df99a1 |
.\" | Copyright (c) 1999-2001 LizardTech, Inc. All Rights Reserved.
|
|
Packit |
df99a1 |
.\" | The DjVu Reference Library is protected by U.S. Pat. No.
|
|
Packit |
df99a1 |
.\" | 6,058,214 and patents pending.
|
|
Packit |
df99a1 |
.\" |
|
|
Packit |
df99a1 |
.\" | This software is subject to, and may be distributed under, the
|
|
Packit |
df99a1 |
.\" | GNU General Public License, either Version 2 of the license,
|
|
Packit |
df99a1 |
.\" | or (at your option) any later version. The license should have
|
|
Packit |
df99a1 |
.\" | accompanied the software or you may obtain a copy of the license
|
|
Packit |
df99a1 |
.\" | from the Free Software Foundation at http://www.fsf.org .
|
|
Packit |
df99a1 |
.\" |
|
|
Packit |
df99a1 |
.\" | The computer code originally released by LizardTech under this
|
|
Packit |
df99a1 |
.\" | license and unmodified by other parties is deemed "the LIZARDTECH
|
|
Packit |
df99a1 |
.\" | ORIGINAL CODE." Subject to any third party intellectual property
|
|
Packit |
df99a1 |
.\" | claims, LizardTech grants recipient a worldwide, royalty-free,
|
|
Packit |
df99a1 |
.\" | non-exclusive license to make, use, sell, or otherwise dispose of
|
|
Packit |
df99a1 |
.\" | the LIZARDTECH ORIGINAL CODE or of programs derived from the
|
|
Packit |
df99a1 |
.\" | LIZARDTECH ORIGINAL CODE in compliance with the terms of the GNU
|
|
Packit |
df99a1 |
.\" | General Public License. This grant only confers the right to
|
|
Packit |
df99a1 |
.\" | infringe patent claims underlying the LIZARDTECH ORIGINAL CODE to
|
|
Packit |
df99a1 |
.\" | the extent such infringement is reasonably necessary to enable
|
|
Packit |
df99a1 |
.\" | recipient to make, have made, practice, sell, or otherwise dispose
|
|
Packit |
df99a1 |
.\" | of the LIZARDTECH ORIGINAL CODE (or portions thereof) and not to
|
|
Packit |
df99a1 |
.\" | any greater extent that may be necessary to utilize further
|
|
Packit |
df99a1 |
.\" | modifications or combinations.
|
|
Packit |
df99a1 |
.\" |
|
|
Packit |
df99a1 |
.\" | The LIZARDTECH ORIGINAL CODE is provided "AS IS" WITHOUT WARRANTY
|
|
Packit |
df99a1 |
.\" | OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
|
|
Packit |
df99a1 |
.\" | TO ANY WARRANTY OF NON-INFRINGEMENT, OR ANY IMPLIED WARRANTY OF
|
|
Packit |
df99a1 |
.\" | MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|
Packit |
df99a1 |
.\" +------------------------------------------------------------------
|
|
Packit |
df99a1 |
.TH DJVUXML 1 "11/15/2002" "DjVuLibre XML Tools" "DjVuLibre XML Tools"
|
|
Packit |
df99a1 |
.de SS
|
|
Packit |
df99a1 |
.SH \\0\\0\\0\\$*
|
|
Packit |
df99a1 |
..
|
|
Packit |
df99a1 |
.SH NAME
|
|
Packit |
df99a1 |
djvutoxml, djvuxmlparser \- DjVuLibre XML Tools.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH SYNOPSIS
|
|
Packit |
df99a1 |
.BI "djvutoxml [" options "] " inputdjvufile " [" outputxmlfile "]"
|
|
Packit |
df99a1 |
.br
|
|
Packit |
df99a1 |
.BI "djvuxmlparser [ -o " djvufile " ] " inputxmlfile
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH DESCRIPTION
|
|
Packit |
df99a1 |
The DjVuLibre XML Tools provide for editing the
|
|
Packit |
df99a1 |
metadata, hyperlinks and hidden text
|
|
Packit |
df99a1 |
associated with DjVu files. Unlike
|
|
Packit |
df99a1 |
.BR djvused (1)
|
|
Packit |
df99a1 |
the DjVuLibre XML Tools rely on the XML technology
|
|
Packit |
df99a1 |
and can take advantage of XML editors and verifiers.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH DJVUTOXML
|
|
Packit |
df99a1 |
Program
|
|
Packit |
df99a1 |
.B djvutoxml
|
|
Packit |
df99a1 |
creates a XML file
|
|
Packit |
df99a1 |
.I outputxmlfile
|
|
Packit |
df99a1 |
containing a reference to the original DjVu document
|
|
Packit |
df99a1 |
.I inputdjvufile
|
|
Packit |
df99a1 |
as well as tags describing the metadata,
|
|
Packit |
df99a1 |
hyperlinks, and hidden text associated with the DjVu file.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The following options are supported:
|
|
Packit |
df99a1 |
.TP
|
|
Packit |
df99a1 |
.BI "--page " "pagenum"
|
|
Packit |
df99a1 |
Select a page in a multi-page document.
|
|
Packit |
df99a1 |
Without this option,
|
|
Packit |
df99a1 |
.B djvutoxml
|
|
Packit |
df99a1 |
outputs the XML
|
|
Packit |
df99a1 |
corresponding to all pages of the document.
|
|
Packit |
df99a1 |
.TP
|
|
Packit |
df99a1 |
.BI "--with-text"
|
|
Packit |
df99a1 |
Specifies the
|
|
Packit |
df99a1 |
.B HIDDENTEXT
|
|
Packit |
df99a1 |
element for each page should be included in the output.
|
|
Packit |
df99a1 |
If specified without the
|
|
Packit |
df99a1 |
.B --with-anno
|
|
Packit |
df99a1 |
flag then the
|
|
Packit |
df99a1 |
.B --without-anno
|
|
Packit |
df99a1 |
is implied. If none of the
|
|
Packit |
df99a1 |
.B --with-text,
|
|
Packit |
df99a1 |
.B --without-text,
|
|
Packit |
df99a1 |
.B --with-anno,
|
|
Packit |
df99a1 |
or
|
|
Packit |
df99a1 |
.B --without-anno,
|
|
Packit |
df99a1 |
flags are specified, then the
|
|
Packit |
df99a1 |
.B --with-text
|
|
Packit |
df99a1 |
and
|
|
Packit |
df99a1 |
.B --with-anno
|
|
Packit |
df99a1 |
flags are implied.
|
|
Packit |
df99a1 |
.TP
|
|
Packit |
df99a1 |
.BI "--without-text"
|
|
Packit |
df99a1 |
Specifies not to output the
|
|
Packit |
df99a1 |
.B HIDDENTEXT
|
|
Packit |
df99a1 |
element for each page. If specified without the
|
|
Packit |
df99a1 |
.B --without-anno
|
|
Packit |
df99a1 |
flag then the
|
|
Packit |
df99a1 |
.B --with-anno
|
|
Packit |
df99a1 |
flag is implied.
|
|
Packit |
df99a1 |
.TP
|
|
Packit |
df99a1 |
.BI "--with-anno"
|
|
Packit |
df99a1 |
Specifies the area
|
|
Packit |
df99a1 |
.B MAP
|
|
Packit |
df99a1 |
element for each page should be included in the output.
|
|
Packit |
df99a1 |
If specified without the
|
|
Packit |
df99a1 |
.B --with-text
|
|
Packit |
df99a1 |
flag then the
|
|
Packit |
df99a1 |
.B --without-text
|
|
Packit |
df99a1 |
flag is implied.
|
|
Packit |
df99a1 |
.TP
|
|
Packit |
df99a1 |
.BI "--without-anno"
|
|
Packit |
df99a1 |
Specifies the area
|
|
Packit |
df99a1 |
.B MAP
|
|
Packit |
df99a1 |
element for each page should not be included in the output.
|
|
Packit |
df99a1 |
If specified without the
|
|
Packit |
df99a1 |
.B --without-text
|
|
Packit |
df99a1 |
flag then the
|
|
Packit |
df99a1 |
.B --with-text
|
|
Packit |
df99a1 |
flag is implied.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH DJVUXMLPARSER
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Files produced by
|
|
Packit |
df99a1 |
.B djvutoxml
|
|
Packit |
df99a1 |
can then be modified using
|
|
Packit |
df99a1 |
either a text editor or a XML editor.
|
|
Packit |
df99a1 |
Program
|
|
Packit |
df99a1 |
.B djvuxmlparser
|
|
Packit |
df99a1 |
parses the XML file
|
|
Packit |
df99a1 |
.I inputxmlfile
|
|
Packit |
df99a1 |
in order to modify the metadata of the corresponding DjVu file.
|
|
Packit |
df99a1 |
.TP
|
|
Packit |
df99a1 |
.BI "-o " "djvufile"
|
|
Packit |
df99a1 |
In principle the target DjVu file is the file
|
|
Packit |
df99a1 |
referenced by the
|
|
Packit |
df99a1 |
.I OBJECT
|
|
Packit |
df99a1 |
element of the XML file.
|
|
Packit |
df99a1 |
This option provides the means to override the filename
|
|
Packit |
df99a1 |
specified in the
|
|
Packit |
df99a1 |
.I OBJECT
|
|
Packit |
df99a1 |
element.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH DJVUXML DOCUMENT TYPE DEFINITION
|
|
Packit |
df99a1 |
The document type definition file (DTD)
|
|
Packit |
df99a1 |
.IP "" 2
|
|
Packit |
df99a1 |
.B DATADIR/djvu/pubtext/DjVuXML-s.dtd
|
|
Packit |
df99a1 |
.PP
|
|
Packit |
df99a1 |
defines the input and output of the DjVu XML tools.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The DjVuXML-s DTD is a simplification of the HTML DTD:
|
|
Packit |
df99a1 |
.IP "" 2
|
|
Packit |
df99a1 |
.B \%http://www.w3c.org/TR/1998/REC-html40-19980424/sgml/dtd.html
|
|
Packit |
df99a1 |
.PP
|
|
Packit |
df99a1 |
with a few new attributes added specific to DjVu. Each of the
|
|
Packit |
df99a1 |
specified pages of a DjVu document are represented as
|
|
Packit |
df99a1 |
.B OBJECT
|
|
Packit |
df99a1 |
elements within the
|
|
Packit |
df99a1 |
.B BODY
|
|
Packit |
df99a1 |
element of the XML file.
|
|
Packit |
df99a1 |
Each
|
|
Packit |
df99a1 |
.B OBJECT
|
|
Packit |
df99a1 |
element may contain multiple
|
|
Packit |
df99a1 |
.B PARAM
|
|
Packit |
df99a1 |
elements to specify attributes like page name,
|
|
Packit |
df99a1 |
resolution,
|
|
Packit |
df99a1 |
and gamma factor.
|
|
Packit |
df99a1 |
Each
|
|
Packit |
df99a1 |
.B OBJECT
|
|
Packit |
df99a1 |
element may also contain one
|
|
Packit |
df99a1 |
.B HIDDENTTEXT
|
|
Packit |
df99a1 |
element to specify the hidden text (usually generated with an OCR engine)
|
|
Packit |
df99a1 |
within the DjVu page. In addition each
|
|
Packit |
df99a1 |
.B OBJECT
|
|
Packit |
df99a1 |
element may reference a single area
|
|
Packit |
df99a1 |
.B MAP
|
|
Packit |
df99a1 |
element which contains multiple
|
|
Packit |
df99a1 |
.B AREA
|
|
Packit |
df99a1 |
elements to represent all the hyperlink and highlight areas within
|
|
Packit |
df99a1 |
the DjVu document.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SS PARAM Elements
|
|
Packit |
df99a1 |
Legal
|
|
Packit |
df99a1 |
.B PARAM
|
|
Packit |
df99a1 |
elements of a DjVu
|
|
Packit |
df99a1 |
.B OBJECT
|
|
Packit |
df99a1 |
include but are not limited to
|
|
Packit |
df99a1 |
.B PAGE
|
|
Packit |
df99a1 |
for specifying the page-name,
|
|
Packit |
df99a1 |
.B GAMMA
|
|
Packit |
df99a1 |
for specifying the gamma correction factor (normally 2.2), and
|
|
Packit |
df99a1 |
.B DPI
|
|
Packit |
df99a1 |
for specifying the page resolution.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SS HIDDENTEXT Elements
|
|
Packit |
df99a1 |
The
|
|
Packit |
df99a1 |
.B HIDDENTEXT
|
|
Packit |
df99a1 |
elements consists of nested elements of
|
|
Packit |
df99a1 |
.B PAGECOLUMNS,
|
|
Packit |
df99a1 |
.B REGION,
|
|
Packit |
df99a1 |
.B PARAGRAPH,
|
|
Packit |
df99a1 |
.B LINE,
|
|
Packit |
df99a1 |
and
|
|
Packit |
df99a1 |
.B WORD.
|
|
Packit |
df99a1 |
The most deeply nested element specified, should specify the bounding
|
|
Packit |
df99a1 |
coordinates of the element in top-down orientation. The body of the
|
|
Packit |
df99a1 |
most deeply nested element should contain the text. Most DjVu
|
|
Packit |
df99a1 |
documents use either
|
|
Packit |
df99a1 |
.B LINE
|
|
Packit |
df99a1 |
or
|
|
Packit |
df99a1 |
.B WORD
|
|
Packit |
df99a1 |
as the lowest level element, but any element is legal as the lowest
|
|
Packit |
df99a1 |
level element. A white space is always added between
|
|
Packit |
df99a1 |
.B WORD
|
|
Packit |
df99a1 |
elements and a line feed is always added between
|
|
Packit |
df99a1 |
.B LINE
|
|
Packit |
df99a1 |
elements. Since languages such as Japanese do not use spaces between
|
|
Packit |
df99a1 |
words, it is quite common for Asian OCR engines to use
|
|
Packit |
df99a1 |
.B WORD
|
|
Packit |
df99a1 |
as characters instead.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SS MAP Elements
|
|
Packit |
df99a1 |
The body of the
|
|
Packit |
df99a1 |
.B MAP
|
|
Packit |
df99a1 |
elements consist of
|
|
Packit |
df99a1 |
.B AREA
|
|
Packit |
df99a1 |
elements. In addition to the attributes listed in
|
|
Packit |
df99a1 |
.IP "" 2
|
|
Packit |
df99a1 |
.BR \%http://www.w3.org/TR/1998/REC-html40-19980424/struct/objects.html#edef-AREA ,
|
|
Packit |
df99a1 |
.PP
|
|
Packit |
df99a1 |
the attributes
|
|
Packit |
df99a1 |
.BR bordertype ,
|
|
Packit |
df99a1 |
.BR bordercolor ,
|
|
Packit |
df99a1 |
.BR border ,
|
|
Packit |
df99a1 |
and
|
|
Packit |
df99a1 |
.B highlight
|
|
Packit |
df99a1 |
have been added to specify border type, border color, border width, and
|
|
Packit |
df99a1 |
highlight colors respectively. Legal values for each of these attributes
|
|
Packit |
df99a1 |
are listed in the DjVuXML-s DTD.
|
|
Packit |
df99a1 |
In addition, the shape
|
|
Packit |
df99a1 |
.B oval
|
|
Packit |
df99a1 |
has been added to the legal list of shapes. An oval uses a rectangular
|
|
Packit |
df99a1 |
bounding box.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH BUGS
|
|
Packit |
df99a1 |
Perhaps it would have been better to use CC2 style sheets
|
|
Packit |
df99a1 |
with standard HTML elements instead of defining the
|
|
Packit |
df99a1 |
.B HIDDENTEXT
|
|
Packit |
df99a1 |
element.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH CREDITS
|
|
Packit |
df99a1 |
The DjVu XML tools and DTD were written
|
|
Packit |
df99a1 |
by Bill C. Riemers <docbill@sourceforge.net>
|
|
Packit |
df99a1 |
and Fred Crary.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
.SH SEE ALSO
|
|
Packit |
df99a1 |
.BR djvu (1),
|
|
Packit |
df99a1 |
.BR djvused (1),
|
|
Packit |
df99a1 |
and
|
|
Packit |
df99a1 |
.BR utf8 (7).
|