|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
0- INTRODUCTION.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
This document describes actual and proposed changes to the djvu
|
|
Packit |
df99a1 |
format since the release of the DjVu3 specification by Lizardtech in
|
|
Packit |
df99a1 |
november 2005.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
1- AMBIGUOUS PARTS IN THE SPECIFICATION
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
1.1- BACKGROUND REDUCTION RATIO
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The relation between the size (W,H) of the foreground image and the
|
|
Packit |
df99a1 |
size (w,h) of the background image is described in page 33 of the
|
|
Packit |
df99a1 |
specification. The requirement ceil(W/w)=ceil(H/h) does not
|
|
Packit |
df99a1 |
completely capture the fact that the same reduction factor should be
|
|
Packit |
df99a1 |
applied to both dimensions. The requirements should be read as:
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
<
|
|
Packit |
df99a1 |
that w=ceil(W/k) and h=ceil(H/k). Should there be several
|
|
Packit |
df99a1 |
such integers, the smallest one shall be considered
|
|
Packit |
df99a1 |
to rescale the background image to the foreground resolution.>>
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
1.2- ENCODING OF URLS IN ANNOTATIONS
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Page 16 omits to specify that URLs in annotations should be percent encoded.
|
|
Packit |
df99a1 |
However, consistent with the encoding of strings in the annotation chunk,
|
|
Packit |
df99a1 |
UTF-8 characters in URL strings will be interpreted properly as if
|
|
Packit |
df99a1 |
they were percent encoded.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
1.3- FLAGS IN DJVUINFO CHUNKS
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Page 24 specifies that {1,6,2,5} are the only four allowed
|
|
Packit |
df99a1 |
values of the "flags" field in the INFO chunk.
|
|
Packit |
df99a1 |
To maximize compatibility with earlier versions of the standard,
|
|
Packit |
df99a1 |
values different from {1,6,2,5} should be ignored
|
|
Packit |
df99a1 |
and interpreted as 1 : rightside up orientation.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
2- ESCAPE SEQUENCES IN ANNOTATION CHUNK STRINGS.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The treatment of escape sequence in annotation chunk strings has
|
|
Packit |
df99a1 |
historically been slightly different in Lizardtech DjVu and
|
|
Packit |
df99a1 |
DjVuLibre. We are expecting that the DjVuLibre solution will
|
|
Packit |
df99a1 |
eventually become the standard.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Lizardtech DjVu uses the "old" rule described in section 8.3.4.2.
|
|
Packit |
df99a1 |
The sequence of characters BACKSLASH DOUBLEQUOTE represents a
|
|
Packit |
df99a1 |
DOUBLEQUOTE character without terminating the string. There are no
|
|
Packit |
df99a1 |
other escape sequences. All other utf8 characters are written
|
|
Packit |
df99a1 |
directly. The main drawback of this approach is the inability to
|
|
Packit |
df99a1 |
write a string containing the sequence of characters BACKSLASH
|
|
Packit |
df99a1 |
DOUBLEQUOTE since there is no way to escape the first BACKSLASH
|
|
Packit |
df99a1 |
character.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
DjVuLibre has introduced a more flexible scheme a few years ago.
|
|
Packit |
df99a1 |
Annotation strings are similar to strings in the C language.
|
|
Packit |
df99a1 |
Character sequences starting with a backslash have special meaning.
|
|
Packit |
df99a1 |
A BACKSLASH followed by "a", "b", "t", "n", "v", "f", "r", or "\"
|
|
Packit |
df99a1 |
stands for the ascii character BEL, BS, HT, LF, VT, FF, CR,
|
|
Packit |
df99a1 |
BACKSLASH or DOUBLEQUOTE. A BACKSLASH followed by one to three
|
|
Packit |
df99a1 |
digits stands for the byte whose octal code is expressed by the
|
|
Packit |
df99a1 |
digits. All other backslash sequences are illegal. Non printable
|
|
Packit |
df99a1 |
ascii characters must be escaped. Multibyte characters should either
|
|
Packit |
df99a1 |
be entered directly, or represented using octal sequences.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
DjVuLibre minimizes the compatibility problems by searching illegal
|
|
Packit |
df99a1 |
escape sequences in the annotation chunk. If any illegal sequence
|
|
Packit |
df99a1 |
is found, the Lizardtech rule is used instead of the DjVuLibre rule.
|
|
Packit |
df99a1 |
It is expected that Lizardtech will at some point adopt the improved
|
|
Packit |
df99a1 |
DjVuLibre rules. We will then be able to state that all DjVu files
|
|
Packit |
df99a1 |
with version greater than some constant use the new convention.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
3- PAGE TITLES.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Each page in a DjVu document is identified by three strings named
|
|
Packit |
df99a1 |
the ID, the NAME, and the TITLE. The semantic distinction between
|
|
Packit |
df99a1 |
these three strings is no longer very clear. DjVu libraries
|
|
Packit |
df99a1 |
does not work consistently when these strings are different. See
|
|
Packit |
df99a1 |
the comment in the table, section 8.3.2.2 of the specification.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Recent versions of DjVuLibre still require that the ID and NAME
|
|
Packit |
df99a1 |
string are equal. The TITLE string however can be different and
|
|
Packit |
df99a1 |
should be used to display friendly page names. The djvused program
|
|
Packit |
df99a1 |
now features a command 'set-page-title' to install a TITLE different
|
|
Packit |
df99a1 |
from the ID string. The djview program then displays and recognizes
|
|
Packit |
df99a1 |
these page titles in lieu of the sequential page numbers.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Things get complicated when some page titles are purely numerical.
|
|
Packit |
df99a1 |
For instance, the page titled "4" might not be the fourth page of
|
|
Packit |
df99a1 |
the document, and there might be many pages titled "4".
|
|
Packit |
df99a1 |
The djview4 viewer uses the following rules:
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
3.1- DJVU CGI ARGUMENT "PAGE=".
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The djvu cgi argument "page=X" is interpreted as follows:
|
|
Packit |
df99a1 |
- It first searches a page whose ID matches X.
|
|
Packit |
df99a1 |
- Otherwise, if X has the form +N or -N where N is a number,
|
|
Packit |
df99a1 |
this indicates a displacement relative to the current page.
|
|
Packit |
df99a1 |
- Otherwise, it searches a page with TITLE X starting
|
|
Packit |
df99a1 |
from the current page and wrapping around.
|
|
Packit |
df99a1 |
- Otherwise, if X is numerical and in range, this is the page number.
|
|
Packit |
df99a1 |
- Otherwise, it searches a page whose NAME matches X.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
3.2- PAGE REFERENCES IN MAPAREA AND OUTLINE LINKS
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Page references in hyperlink annotations always have the form "#X".
|
|
Packit |
df99a1 |
These references are interpretes as follows.
|
|
Packit |
df99a1 |
- It first searches a page whose ID matches X.
|
|
Packit |
df99a1 |
- Otherwise, if X has the form +N or -N where N is a number,
|
|
Packit |
df99a1 |
this indicates a displacement relative to the page containing the link.
|
|
Packit |
df99a1 |
- Otherwise, it searches a page with TITLE X starting
|
|
Packit |
df99a1 |
from the current page and wrapping around.
|
|
Packit |
df99a1 |
- Otherwise, if X is numerical and in range, this is the page number.
|
|
Packit |
df99a1 |
- Otherwise, it searches a page whose NAME matches X.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
4- METADATA
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
4.1- DJVULIBRE METADATA
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
DjVuLibre has introduced metadata annotations a few years ago.
|
|
Packit |
df99a1 |
Metadata entries for each page are represent by key/value pairs
|
|
Packit |
df99a1 |
located in a metadata directive in the annotation chunk.
|
|
Packit |
df99a1 |
Metadata entries for the document are represented similarly
|
|
Packit |
df99a1 |
using the methods described in the next section.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The metadata directive has the form
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
(metadata ... (key "value") ... )
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Each entry is identified by a symbol <key>
|
|
Packit |
df99a1 |
representing the nature of the metadata entry.
|
|
Packit |
df99a1 |
The string <"value"> represents
|
|
Packit |
df99a1 |
the value associated with the corresponding key.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Several sets of keys are noteworthy.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
* Keys borrowed from the BibTex bibliography system.
|
|
Packit |
df99a1 |
These key names are always expressed
|
|
Packit |
df99a1 |
in lowercase, such as 'year', 'booktitle', 'editor',
|
|
Packit |
df99a1 |
'author', etc.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
* Keys borrowed from the PDF DocInfo.
|
|
Packit |
df99a1 |
These key names start with an uppercase letter:
|
|
Packit |
df99a1 |
'Title', 'Author', 'Subject', 'Keywords', 'Creator',
|
|
Packit |
df99a1 |
'Producer', 'Trapped', 'CreationDate', and 'ModDate'.
|
|
Packit |
df99a1 |
The values associated with the last two keys
|
|
Packit |
df99a1 |
should be dates expressed according to RFC 3339.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
4.2- XMP METADATA
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The XMP specification describes a general purpose RDF/XML format for
|
|
Packit |
df99a1 |
metadata. Just like DjVuLibre metadata, XMP metadata is embedded in
|
|
Packit |
df99a1 |
an annotation chunk at the page or document level using the following
|
|
Packit |
df99a1 |
annotation directive
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
(xmp "<rdf:RDF xmlns:rdf=... [escaped XMP here] ...</rdf:RDF>")
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The sole argument of the xmp directive is the serialized XMP data
|
|
Packit |
df99a1 |
without the "xpacket" wrapper. The "x:xmpmeta" element may also be
|
|
Packit |
df99a1 |
dropped. Only elements from "rdf:RDF" inwards are needed.
|
|
Packit |
df99a1 |
Since the XMP data is represented as a string, doublequotes and
|
|
Packit |
df99a1 |
backslashes must be escaped. Other characters may be escaped as well
|
|
Packit |
df99a1 |
(see section 2 above).
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The full XMP specification is available from Adobe:
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
http://www.adobe.com/devnet/xmp/
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
To maximize interoperability with current viewers, it is recommended
|
|
Packit |
df99a1 |
that XMP manipulation programs keep the DjVuLibre metadata in sync.
|
|
Packit |
df99a1 |
This is facilitated by synchronizing the PDF DocInfo keys with XMP
|
|
Packit |
df99a1 |
properties as follows:
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
DocInfo key XMP property
|
|
Packit |
df99a1 |
------------ ---------------
|
|
Packit |
df99a1 |
Title dc:title
|
|
Packit |
df99a1 |
Author dc:creator
|
|
Packit |
df99a1 |
Subject dc:description
|
|
Packit |
df99a1 |
Keywords pdf:Keywords
|
|
Packit |
df99a1 |
Producer pdf:Producer
|
|
Packit |
df99a1 |
Trapped pdf:Trapped
|
|
Packit |
df99a1 |
Creator xmp:CreatorTool
|
|
Packit |
df99a1 |
CreationDate xmp:CreateDate
|
|
Packit |
df99a1 |
ModDate xmp:ModifyDate
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
4.3- DOCUMENT ANNOTATIONS AND METADATA
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The above schemes provide ways to specify metadata for each page.
|
|
Packit |
df99a1 |
But it is often useful to provide metadata that applies to the whole
|
|
Packit |
df99a1 |
document. Document wide metadata are represented using one or
|
|
Packit |
df99a1 |
several metadata directives in the shared annotations chunk.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
This scheme has a potential drawback. Since the shared annotations
|
|
Packit |
df99a1 |
is included by all pages, the document wide metadata also appears as
|
|
Packit |
df99a1 |
page metadata for all pages. This might not be adequate for some
|
|
Packit |
df99a1 |
uses. As a workaround, the djview4 viewer only displays
|
|
Packit |
df99a1 |
page metadata that differ from the document metadata.
|
|
Packit |
df99a1 |
A more definitive answer would be the definition of a document
|
|
Packit |
df99a1 |
annotation chunk located after the DIRM chunk and before any
|
|
Packit |
df99a1 |
component file. This space is already used by the NAVM chunk.
|
|
Packit |
df99a1 |
This is being considered.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
5- CGI STYLE OPTIONS IN MAPAREA AND OUTLINE LINKS
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Outline and maparea annotation are
|
|
Packit |
df99a1 |
UTF-8 encoded strings that can be
|
|
Packit |
df99a1 |
interpreted as page specification (see section 3.1)
|
|
Packit |
df99a1 |
or as percent-encoded URLs (see section 1.3.)
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
In addition, strings starting with a question mark '?' are
|
|
Packit |
df99a1 |
interpreted as CGI style options separated by the
|
|
Packit |
df99a1 |
ampersand character '&'. These options are ignored
|
|
Packit |
df99a1 |
when the maparea link target is another window.
|
|
Packit |
df99a1 |
Otherwise these options are passed verbatim to the viewer.
|
|
Packit |
df99a1 |
This can cause portability problems because different djvu
|
|
Packit |
df99a1 |
viewers support different sets of CGI style options.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
6- THE "SECURE DJVU" FORMAT.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Recently Lizardtech introduced an incompatible "Secure DJVU" format.
|
|
Packit |
df99a1 |
This format encrypt djvu data in the hope of controlling
|
|
Packit |
df99a1 |
whether users can use, copy or print djvu documents.
|
|
Packit |
df99a1 |
A recent specification describes the container format
|
|
Packit |
df99a1 |
but does not provide enough information to decode the content.
|
|
Packit |
df99a1 |
Providing such an information would obviously provide
|
|
Packit |
df99a1 |
a means to avoid the usage restrictions. In fact there is no durable
|
|
Packit |
df99a1 |
way to enforce such constraints besides "security through obscurity".
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
The current djvulibre library simply emits an
|
|
Packit |
df99a1 |
error message when encountering such files.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
Some observations regarding these files:
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
- They use the same IFF85 structure as djvu files.
|
|
Packit |
df99a1 |
Chunk "SINF" contains a scrambled version of the decryption
|
|
Packit |
df99a1 |
key and describes which actions are authorized or denied.
|
|
Packit |
df99a1 |
Chunks "CELX" encapsulate the regular DjVu chunks.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
- Each CELX chunk starts with four bytes for the original chunk
|
|
Packit |
df99a1 |
name, and four bytes for the original chunk length.
|
|
Packit |
df99a1 |
This is followed encrypted data, composed of enough
|
|
Packit |
df99a1 |
blocks of 8 bytes to covert the initial chunk length.
|
|
Packit |
df99a1 |
|
|
Packit |
df99a1 |
- Lizardtech claims encryption is 32 bit blowfish.
|
|
Packit |
df99a1 |
It appears to be in fact composed of 64 bits block
|
|
Packit |
df99a1 |
as you would expect with blowfish.
|
|
Packit |
df99a1 |
|