Blame doc/djvuchanges.txt

Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
0- INTRODUCTION.
Packit df99a1
Packit df99a1
This document describes actual and proposed changes to the djvu
Packit df99a1
format since the release of the DjVu3 specification by Lizardtech in
Packit df99a1
november 2005.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
1- AMBIGUOUS PARTS IN THE SPECIFICATION
Packit df99a1
Packit df99a1
1.1- BACKGROUND REDUCTION RATIO
Packit df99a1
Packit df99a1
The relation between the size (W,H) of the foreground image and the
Packit df99a1
size (w,h) of the background image is described in page 33 of the
Packit df99a1
specification.  The requirement ceil(W/w)=ceil(H/h) does not
Packit df99a1
completely capture the fact that the same reduction factor should be
Packit df99a1
applied to both dimensions.  The requirements should be read as:
Packit df99a1
Packit df99a1
<
Packit df99a1
  that w=ceil(W/k) and h=ceil(H/k).  Should there be several
Packit df99a1
  such integers, the smallest one shall be considered
Packit df99a1
  to rescale the background image to the foreground resolution.>>
Packit df99a1
Packit df99a1
1.2- ENCODING OF URLS IN ANNOTATIONS
Packit df99a1
Packit df99a1
Page 16 omits to specify that URLs in annotations should be percent encoded. 
Packit df99a1
However, consistent with the encoding of strings in the annotation chunk, 
Packit df99a1
UTF-8 characters in URL strings will be interpreted properly as if
Packit df99a1
they were percent encoded.
Packit df99a1
Packit df99a1
1.3- FLAGS IN DJVUINFO CHUNKS
Packit df99a1
Packit df99a1
Page 24 specifies that {1,6,2,5} are the only four allowed 
Packit df99a1
values of the "flags" field in the INFO chunk.
Packit df99a1
To maximize compatibility with earlier versions of the standard,
Packit df99a1
values different from {1,6,2,5} should be ignored
Packit df99a1
and interpreted as 1 : rightside up orientation.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
2- ESCAPE SEQUENCES IN ANNOTATION CHUNK STRINGS.
Packit df99a1
Packit df99a1
The treatment of escape sequence in annotation chunk strings has
Packit df99a1
historically been slightly different in Lizardtech DjVu and
Packit df99a1
DjVuLibre.  We are expecting that the DjVuLibre solution will
Packit df99a1
eventually become the standard.
Packit df99a1
Packit df99a1
Lizardtech DjVu uses the "old" rule described in section 8.3.4.2.
Packit df99a1
The sequence of characters BACKSLASH DOUBLEQUOTE represents a
Packit df99a1
DOUBLEQUOTE character without terminating the string.  There are no
Packit df99a1
other escape sequences. All other utf8 characters are written
Packit df99a1
directly. The main drawback of this approach is the inability to
Packit df99a1
write a string containing the sequence of characters BACKSLASH
Packit df99a1
DOUBLEQUOTE since there is no way to escape the first BACKSLASH
Packit df99a1
character.
Packit df99a1
Packit df99a1
DjVuLibre has introduced a more flexible scheme a few years ago.
Packit df99a1
Annotation strings are similar to strings in the C language.
Packit df99a1
Character sequences starting with a backslash have special meaning.
Packit df99a1
A BACKSLASH followed by "a", "b", "t", "n", "v", "f", "r", or "\"
Packit df99a1
stands for the ascii character BEL, BS, HT, LF, VT, FF, CR,
Packit df99a1
BACKSLASH or DOUBLEQUOTE.  A BACKSLASH followed by one to three
Packit df99a1
digits stands for the byte whose octal code is expressed by the
Packit df99a1
digits.  All other backslash sequences are illegal.  Non printable
Packit df99a1
ascii characters must be escaped. Multibyte characters should either
Packit df99a1
be entered directly, or represented using octal sequences.
Packit df99a1
Packit df99a1
DjVuLibre minimizes the compatibility problems by searching illegal
Packit df99a1
escape sequences in the annotation chunk.  If any illegal sequence
Packit df99a1
is found, the Lizardtech rule is used instead of the DjVuLibre rule.
Packit df99a1
It is expected that Lizardtech will at some point adopt the improved
Packit df99a1
DjVuLibre rules. We will then be able to state that all DjVu files
Packit df99a1
with version greater than some constant use the new convention.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
3- PAGE TITLES.
Packit df99a1
Packit df99a1
Each page in a DjVu document is identified by three strings named
Packit df99a1
the ID, the NAME, and the TITLE. The semantic distinction between
Packit df99a1
these three strings is no longer very clear. DjVu libraries
Packit df99a1
does not work consistently when these strings are different.  See
Packit df99a1
the comment in the table, section 8.3.2.2 of the specification.
Packit df99a1
Packit df99a1
Recent versions of DjVuLibre still require that the ID and NAME
Packit df99a1
string are equal. The TITLE string however can be different and
Packit df99a1
should be used to display friendly page names. The djvused program
Packit df99a1
now features a command 'set-page-title' to install a TITLE different
Packit df99a1
from the ID string. The djview program then displays and recognizes
Packit df99a1
these page titles in lieu of the sequential page numbers.
Packit df99a1
Packit df99a1
Things get complicated when some page titles are purely numerical.
Packit df99a1
For instance, the page titled "4" might not be the fourth page of 
Packit df99a1
the document, and there might be many pages titled "4".
Packit df99a1
The djview4 viewer uses the following rules:
Packit df99a1
Packit df99a1
3.1- DJVU CGI ARGUMENT "PAGE=".
Packit df99a1
Packit df99a1
The djvu cgi argument "page=X" is interpreted as follows:
Packit df99a1
- It first searches a page whose ID matches X.
Packit df99a1
- Otherwise, if X has the form +N or -N where N is a number,
Packit df99a1
  this indicates a displacement relative to the current page.
Packit df99a1
- Otherwise, it searches a page with TITLE X starting
Packit df99a1
  from the current page and wrapping around.
Packit df99a1
- Otherwise, if X is numerical and in range, this is the page number.
Packit df99a1
- Otherwise, it searches a page whose NAME matches X.
Packit df99a1
 
Packit df99a1
3.2- PAGE REFERENCES IN MAPAREA AND OUTLINE LINKS
Packit df99a1
Packit df99a1
Page references in hyperlink annotations always have the form "#X".
Packit df99a1
These references are interpretes as follows.
Packit df99a1
- It first searches a page whose ID matches X.
Packit df99a1
- Otherwise, if X has the form +N or -N where N is a number,
Packit df99a1
  this indicates a displacement relative to the page containing the link.
Packit df99a1
- Otherwise, it searches a page with TITLE X starting
Packit df99a1
  from the current page and wrapping around.
Packit df99a1
- Otherwise, if X is numerical and in range, this is the page number.
Packit df99a1
- Otherwise, it searches a page whose NAME matches X.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
4- METADATA
Packit df99a1
Packit df99a1
4.1- DJVULIBRE METADATA
Packit df99a1
Packit df99a1
DjVuLibre has introduced metadata annotations a few years ago.
Packit df99a1
Metadata entries for each page are represent by key/value pairs
Packit df99a1
located in a metadata directive in the annotation chunk.
Packit df99a1
Metadata entries for the document are represented similarly
Packit df99a1
using the methods described in the next section.
Packit df99a1
Packit df99a1
The metadata directive has the form
Packit df99a1
Packit df99a1
  (metadata ... (key "value") ... )
Packit df99a1
Packit df99a1
Each entry is identified by a symbol <key> 
Packit df99a1
representing the nature of the metadata entry.
Packit df99a1
The string <"value"> represents
Packit df99a1
the value associated with the corresponding key.
Packit df99a1
Packit df99a1
Several sets of keys are noteworthy.
Packit df99a1
Packit df99a1
* Keys borrowed from the BibTex bibliography system. 
Packit df99a1
  These key names are always expressed 
Packit df99a1
  in lowercase, such as 'year', 'booktitle', 'editor', 
Packit df99a1
  'author', etc.  
Packit df99a1
Packit df99a1
* Keys borrowed from the PDF DocInfo.
Packit df99a1
  These key names start with an uppercase letter:
Packit df99a1
  'Title', 'Author', 'Subject', 'Keywords', 'Creator', 
Packit df99a1
  'Producer', 'Trapped', 'CreationDate', and 'ModDate'.
Packit df99a1
  The values associated with the last two keys
Packit df99a1
  should be dates expressed according to RFC 3339.
Packit df99a1
Packit df99a1
4.2- XMP METADATA
Packit df99a1
Packit df99a1
The XMP specification describes a general purpose RDF/XML format for
Packit df99a1
metadata. Just like DjVuLibre metadata, XMP metadata is embedded in
Packit df99a1
an annotation chunk at the page or document level using the following
Packit df99a1
annotation directive
Packit df99a1
Packit df99a1
   (xmp "<rdf:RDF xmlns:rdf=... [escaped XMP here] ...</rdf:RDF>")
Packit df99a1
Packit df99a1
The sole argument of the xmp directive is the serialized XMP data
Packit df99a1
without the "xpacket" wrapper. The "x:xmpmeta" element may also be
Packit df99a1
dropped. Only elements from "rdf:RDF" inwards are needed.
Packit df99a1
Since the XMP data is represented as a string, doublequotes and
Packit df99a1
backslashes must be escaped.  Other characters may be escaped as well
Packit df99a1
(see section 2 above).
Packit df99a1
Packit df99a1
The full XMP specification is available from Adobe:
Packit df99a1
Packit df99a1
   http://www.adobe.com/devnet/xmp/
Packit df99a1
Packit df99a1
To maximize interoperability with current viewers, it is recommended
Packit df99a1
that XMP manipulation programs keep the DjVuLibre metadata in sync.
Packit df99a1
This is facilitated by synchronizing the PDF DocInfo keys with XMP
Packit df99a1
properties as follows:
Packit df99a1
Packit df99a1
   DocInfo key    XMP property
Packit df99a1
   ------------   ---------------
Packit df99a1
   Title          dc:title
Packit df99a1
   Author         dc:creator
Packit df99a1
   Subject        dc:description
Packit df99a1
   Keywords       pdf:Keywords
Packit df99a1
   Producer       pdf:Producer
Packit df99a1
   Trapped        pdf:Trapped
Packit df99a1
   Creator        xmp:CreatorTool
Packit df99a1
   CreationDate   xmp:CreateDate
Packit df99a1
   ModDate        xmp:ModifyDate
Packit df99a1
Packit df99a1
4.3- DOCUMENT ANNOTATIONS AND METADATA
Packit df99a1
Packit df99a1
The above schemes provide ways to specify metadata for each page.
Packit df99a1
But it is often useful to provide metadata that applies to the whole
Packit df99a1
document. Document wide metadata are represented using one or
Packit df99a1
several metadata directives in the shared annotations chunk.
Packit df99a1
Packit df99a1
This scheme has a potential drawback. Since the shared annotations
Packit df99a1
is included by all pages, the document wide metadata also appears as
Packit df99a1
page metadata for all pages. This might not be adequate for some
Packit df99a1
uses. As a workaround, the djview4 viewer only displays 
Packit df99a1
page metadata that differ from the document metadata.
Packit df99a1
A more definitive answer would be the definition of a document
Packit df99a1
annotation chunk located after the DIRM chunk and before any
Packit df99a1
component file. This space is already used by the NAVM chunk.  
Packit df99a1
This is being considered.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
5- CGI STYLE OPTIONS IN MAPAREA AND OUTLINE LINKS
Packit df99a1
Packit df99a1
Outline and maparea annotation are
Packit df99a1
UTF-8 encoded strings that can be 
Packit df99a1
interpreted as page specification (see section 3.1)
Packit df99a1
or as percent-encoded URLs (see section 1.3.)
Packit df99a1
Packit df99a1
In addition, strings starting with a question mark '?' are 
Packit df99a1
interpreted as CGI style options separated by the 
Packit df99a1
ampersand character '&'.  These options are ignored
Packit df99a1
when the maparea link target is another window.
Packit df99a1
Otherwise these options are passed verbatim to the viewer.
Packit df99a1
This can cause portability problems because different djvu 
Packit df99a1
viewers support different sets of CGI style options.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
6- THE "SECURE DJVU" FORMAT.
Packit df99a1
Packit df99a1
Recently Lizardtech introduced an incompatible "Secure DJVU" format.
Packit df99a1
This format encrypt djvu data in the hope of controlling
Packit df99a1
whether users can use, copy or print djvu documents.
Packit df99a1
A recent specification describes the container format
Packit df99a1
but does not provide enough information to decode the content.
Packit df99a1
Providing such an information would obviously provide
Packit df99a1
a means to avoid the usage restrictions. In fact there is no durable 
Packit df99a1
way to enforce such constraints besides "security through obscurity".
Packit df99a1
Packit df99a1
The current djvulibre library simply emits an 
Packit df99a1
error message when encountering such files.
Packit df99a1
Packit df99a1
Some observations regarding these files:
Packit df99a1
Packit df99a1
- They use the same IFF85 structure as djvu files.
Packit df99a1
  Chunk "SINF" contains a scrambled version of the decryption 
Packit df99a1
  key and describes which actions are authorized or denied.
Packit df99a1
  Chunks "CELX" encapsulate the regular DjVu chunks.  
Packit df99a1
  
Packit df99a1
- Each CELX chunk starts with four bytes for the original chunk 
Packit df99a1
  name, and four bytes for the original chunk length.
Packit df99a1
  This is followed encrypted data, composed of enough
Packit df99a1
  blocks of 8 bytes to covert the initial chunk length.
Packit df99a1
Packit df99a1
- Lizardtech claims encryption is 32 bit blowfish.
Packit df99a1
  It appears to be in fact composed of 64 bits block
Packit df99a1
  as you would expect with blowfish.
Packit df99a1