Blame doc/old/djvu3changes.txt

Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
 BY THE LIZARDTECH SPECIFICATION "DJVU3SPEC.DJVU".>
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
    This file summarizes the file format changes 
Packit df99a1
    between DjVu2 and DjVu3.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
1 - DJVU3 FILE STRUCTURE OVERVIEW
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    DjVu files are organized according to the ``EA IFF 85'' layout.  Pointers to
Packit df99a1
    the appropriate reference document are provided in section
Packit df99a1
    \Ref{IFFByteStream.h}.  IFF files are logically composed of a sequence of
Packit df99a1
    data \emph{chunks}.  Each chunk comes with a four character \emph{chunk
Packit df99a1
    identifier} describing the type of the data stored in the chunk.  A few
Packit df99a1
    special chunk identifiers, for instance #"FORM"#, are reserved for so
Packit df99a1
    called \emph{composite chunks} containing a sequence of data chunks.  This
Packit df99a1
    convention effectively provides IFF files with a hierarchical structure.
Packit df99a1
    Composite chunks are further identified by a \emph{secondary chunk
Packit df99a1
    identifier}.  For convenience, both identifiers are gathered as an
Packit df99a1
    extended chunk identifier such as #"FORM:DJVU"#.
Packit df99a1
Packit df99a1
    The four octets #0x41,0x54,0x26,0x54# may be inserted in front of the IFF 
Packit df99a1
    compliant byte stream.  The decoder simply ignores these four octets when
Packit df99a1
    they are present.  These four octets are not part of the IFF format and
Packit df99a1
    are not required components of a valid DjVu file.  Certain versions of MSIE
Packit df99a1
    incorrectly recognize any IFF file as a Microsoft AIFF sound file.  The
Packit df99a1
    presence of these four octets prevents this incorrect identification.
Packit df99a1
Packit df99a1
    The DjVu specification mandates that the decoder should silently
Packit df99a1
    skip chunks whose identifier is not recognized.  This mechanism
Packit df99a1
    provides a backward compatible way to extend the initial format by
Packit df99a1
    allocating new chunk identifiers. 
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
1.1 - DJVU3 IMAGE FILES
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    \textbf{Photo DjVu Image} --- 
Packit df99a1
Packit df99a1
    Photo DjVu Image files are best used for
Packit df99a1
    encoding photographic images in colors or in shades of gray.  The data
Packit df99a1
    compression model relies on the IW44 wavelet representation.  This format
Packit df99a1
    is designed such that the IW44 decoder is able to quickly perform
Packit df99a1
    progressive rendering of any image segment using only a small amount of
Packit df99a1
    memory.  Photo DjVu files are composed of a single #"FORM:DJVU"# composite
Packit df99a1
    chunk.  This composite chunk always begins with one #"INFO"# chunk
Packit df99a1
    describing the image size and resolution (see \Ref{DjVuInfo.h}).  One or
Packit df99a1
    more additional #"BG44"# chunks contains the image data encoded with the
Packit df99a1
    IW44 representation (see \Ref{IW44Image.h}).  The image size specified in
Packit df99a1
    the #"INFO"# chunk and the image size specified in the IW44 data must be
Packit df99a1
    equal.
Packit df99a1
Packit df99a1
    \textbf{Bilevel DjVu Image} --- 
Packit df99a1
Packit df99a1
    Bilevel DjVu Image files are used to compress
Packit df99a1
    black and white images representing text and simple drawings.   The
Packit df99a1
    JB2 data compression model uses the soft pattern matching technique, which
Packit df99a1
    essentially consists of encoding each character by describing how it
Packit df99a1
    differs from a well chosen already encoded character.  Bilevel DjVu Files
Packit df99a1
    are composed of a single #"FORM:DJVU"# composite chunk.  This composite
Packit df99a1
    chunk always begins with one #"INFO"# chunk describing the image size and
Packit df99a1
    resolution (see \Ref{DjVuInfo.h}).  An additional #"Sjbz"# chunk contains
Packit df99a1
    the bilevel data encoded with the JB2 representation (see
Packit df99a1
    \Ref{JB2Image.h}).  The image size specified in the #"INFO"# chunk and the
Packit df99a1
    image size specified in the JB2 data must be equal.
Packit df99a1
Packit df99a1
    \textbf{Compound DjVu Image} --- 
Packit df99a1
Packit df99a1
    Compound DjVu Files are an extremely
Packit df99a1
    efficient way to compress high resolution Compound document images
Packit df99a1
    containing both pictures and text, such as a page of a magazine.  Compound
Packit df99a1
    DjVu Files represent the document images using two layers.  The
Packit df99a1
    \emph{background layer} is used for encoding the pictures and the
Packit df99a1
    paper texture.
Packit df99a1
    The \emph{foreground layer} is used for encoding the text and the drawings.
Packit df99a1
    Compound DjVu Files are composed of a single #"FORM:DJVU"# composite
Packit df99a1
    chunk.  This composite chunk always begins with one #"INFO"# chunk
Packit df99a1
    describing the size and the resolution of the image (see \Ref{DjVuInfo}).
Packit df99a1
    Additional chunks hold the components of either the foreground or the
Packit df99a1
    background layers.
Packit df99a1
Packit df99a1
    The main component of the foreground layer is a bilevel image named the
Packit df99a1
    \emph{foreground mask}. The pixel size of the foreground mask is equal to
Packit df99a1
    the size of the DjVu image.  It contains a black-on-white representation
Packit df99a1
    of the text and the drawings.  This image is encoded by a #"Sjbz"# chunk
Packit df99a1
    using the JB2 representation.  There may also be a companion chunk
Packit df99a1
    #"Djbz"# containing a \emph{shape dictionary} that defines bilevel shapes
Packit df99a1
    referenced by the #"Sjbz"# chunk.
Packit df99a1
Packit df99a1
    The \emph{foreground colors} can be encoded according to two models:
Packit df99a1
    \begin{itemize}
Packit df99a1
    \item 
Packit df99a1
      The foreground colors may be encoded using a small color image,
Packit df99a1
      the \emph{foreground color image}, encoded as a single #"FG44"#
Packit df99a1
      chunk using the
Packit df99a1
      IW44 representation (see \Ref{IW44Image.h}).  Such compound DjVu images
Packit df99a1
      are rendered by painting the foreground color image on top of the
Packit df99a1
      background color image using the foreground mask as a stencil.  The
Packit df99a1
      pixel size of the foreground color image is computed by rounding up the
Packit df99a1
      quotient of the mask size by an integer sub-sampling factor ranging from
Packit df99a1
      1 to 12.  Most Compound DjVu Images use a foreground color sub-sampling
Packit df99a1
      factor of 12.  Smaller sub-sampling factors produce very slightly better
Packit df99a1
      images.
Packit df99a1
    \item
Packit df99a1
      The foreground colors may be encoded by specifying one solid color per
Packit df99a1
      object described by the JB2 encoded mask. These \emph{JB2 colors} are
Packit df99a1
      color-quantized and stored in a single #"FGbz"# chunk (see.
Packit df99a1
      \Ref{DjVuPalette.h}).  Such compound DjVu images are rendered by
Packit df99a1
      painting each foreground object on top of the background color image
Packit df99a1
      using the solid color specified by the #"FGbz"# chunk.
Packit df99a1
    \end{itemize}
Packit df99a1
Packit df99a1
    The background layer is a color image, \Ref{the background color image}
Packit df99a1
    ncoded by an arbitrary number of #"BG44"# chunks containing successive
Packit df99a1
    IW44 refinements (see \Ref{IW44Image.h}).  The size of this image is
Packit df99a1
    computed by rounding up the quotient of the mask size by an integer
Packit df99a1
    sub-sampling factor ranging from 1 to 12.  Most Compound DjVu Images use a
Packit df99a1
    background sub-sampling factor equal to 3.  Smaller sub-sampling factors
Packit df99a1
    are adequate for images with a very rich paper texture.  Larger
Packit df99a1
    sub-sampling factors are adequate for images containing no pictures.
Packit df99a1
Packit df99a1
    There are no ordering or interleaving constraints on these chunks except
Packit df99a1
    that (a) the #"INFO"# chunk must appear first, and (b) the successive
Packit df99a1
    #"BG44"# refinements must appear with their natural order.  The chunk
Packit df99a1
    order simply affects the progressive rendering of DjVu images on a web
Packit df99a1
    browser.  
Packit df99a1
Packit df99a1
    \textbf{IW44 Image Files} --
Packit df99a1
Packit df99a1
    The IW44 Image file format is the native format for the IW44 wavelet
Packit df99a1
    representation.  These files are deprecated in favor of Photo DjVu
Packit df99a1
    Images.
Packit df99a1
Packit df99a1
    \textbf{Alternative encodings} --- 
Packit df99a1
Packit df99a1
    Besides the JB2 and IW44 encoding schemes,
Packit df99a1
    the DjVu format supports alternative encoding methods for its components.  
Packit df99a1
Packit df99a1
    \begin{itemize}
Packit df99a1
    \item
Packit df99a1
       The foreground mask may be represented by a single #"Smmr"# chunk
Packit df99a1
       instead of #"Sjbz"#.  The #"Smmr"# chunk contains a bilevel image
Packit df99a1
       encoded with the Fax-G4/MMR method.  Although the resulting files
Packit df99a1
       are typically six times larger, this capability can be useful when
Packit df99a1
       DjVu is used as a front-end for fax machines and scanners with 
Packit df99a1
       embedded Fax-G4/MMR capabilities. 
Packit df99a1
    \item
Packit df99a1
       The background color image may be represented by a single #"BGjp"#
Packit df99a1
       chunk instead of several #"BG44"# chunks.  The #"BGjp"# chunk contains
Packit df99a1
       a JPEG encoded color image.  The resulting files are significantly
Packit df99a1
       larger and lack the progressivity of the usual DjVu files.  
Packit df99a1
       This is useful because some scanners have embedded JPEG capabilities.
Packit df99a1
    \item
Packit df99a1
       The foreground color image may be represented by a single #"FGjp"#
Packit df99a1
       chunk instead of a single #"FG44"# chunk.  This is useful because 
Packit df99a1
       some scanners have embedded JPEG capabilities.
Packit df99a1
    \end{itemize}
Packit df99a1
Packit df99a1
    In addition, the chunk names #"BG2k"# and #"FG2k"# have been reserved for
Packit df99a1
    encoding the background color image and the foreground color image using
Packit df99a1
    the forthcoming JPEG-2000 standard.  This capability is not implemented at
Packit df99a1
    the moment.  The JPEG-2000 standard may even become the preferred encoding
Packit df99a1
    method for color images in DjVu.  */
Packit df99a1
Packit df99a1
    \textbf{Annotations and Textual Information } --
Packit df99a1
Packit df99a1
    All types of DjVu images may contain
Packit df99a1
    annotation chunks.  Annotation chunks are currently used to describe
Packit df99a1
    hyperlinks, to specify more closely the behavior of the viewers,
Packit df99a1
    and to hold metadata information.  Annotations are contained in #"ANTa"# 
Packit df99a1
    or #"ANTz"# chunks.  The #"ANTa"# chunks contain the annotation in 
Packit df99a1
    plain text. The #"ANTz"# chunks contain the same information compressed 
Packit df99a1
    with the BZZ encoder (cf. \Ref{BSByteStream.h}).
Packit df99a1
Packit df99a1
    All types of DjVu image files may also contain a
Packit df99a1
    computer readable description of the text appearing on the page.  This
Packit df99a1
    information is contained by either a #"TXTa"# chunk or #"TXTz"# chunk.
Packit df99a1
    The #"TXTa"# chunk contains uncompressed data.  The #"TXTz"# chunk
Packit df99a1
    contains the same data compressed with the \Ref{bzz} compressor
Packit df99a1
    (cf. \Ref{BSByteStream.h}).  The #"TXTa"# chunks begins by a 24 bit
Packit df99a1
    integer (most significant byte first) describing the length of the text in
Packit df99a1
    bytes.  Then come the ISO10646/UTF8 text.  Additional information
Packit df99a1
    indicates the position of each column/region/paragraph/line/word in the
Packit df99a1
    document.  More information about the capabilities of the chunk can be
Packit df99a1
    found in section \Ref{DjVuTXT}.  More information about the encoding of
Packit df99a1
    textual information can be found in file #"DjVuAnno.cpp"#.  */
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
1.2 - DJVU3 MULTIPAGE DOCUMENTS
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    The DjVu3 system supports two models for multi-page documents:
Packit df99a1
    \emph{bundled} multi-page documents and \emph{indirect} multi-page documents.
Packit df99a1
    
Packit df99a1
    \textbf{Bundled multi-page documents} --- 
Packit df99a1
Packit df99a1
    A \emph{bundled} multi-page DjVu
Packit df99a1
    document uses a single file to represent the entire document.  This single
Packit df99a1
    file contains all the pages as well as ancillary information (e.g. the
Packit df99a1
    page directory, data shared by several pages, thumbnails, etc.).  Using a
Packit df99a1
    single file format is very convenient for storing documents or for sending
Packit df99a1
    email attachments.
Packit df99a1
Packit df99a1
    A bundled multi-page document is composed of a single #"FORM:DJVM"#
Packit df99a1
    composite chunk.  This composite chunk always begins with a #"DIRM"# chunk
Packit df99a1
    containing the document directory (see. \Ref{DjVmDir.h}) which represents
Packit df99a1
    the list of the \emph{component files} that compose the document.  The
Packit df99a1
    component files themselves are then encoded as IFF85 composite chunks
Packit df99a1
    following the #"DIRM"# chunk.
Packit df99a1
Packit df99a1
    \begin{itemize}
Packit df99a1
    \item  
Packit df99a1
       Component files may be any valid DjVu image (see \Ref{DjVu Image Files})
Packit df99a1
       or IW44 image (see \Ref{IW44 Image Files}.)  These component files 
Packit df99a1
       always represent a page of a document.  The corresponding IFF85 chunk ids are 
Packit df99a1
       #"FORM:DJVU"#, #"FORM:PM44"#, or #"FORM:BM44"#.
Packit df99a1
    \item 
Packit df99a1
       Component files may contain shared information indirectly referenced by
Packit df99a1
       some document pages.  These \emph{shared component files} are always composed
Packit df99a1
       of a single #"FORM:DJVI"# chunk containing an arbitrary collection of
Packit df99a1
       chunks. 
Packit df99a1
    \item
Packit df99a1
       Thumbnail files contain optional thumbnail images for a few consecutive
Packit df99a1
       pages of the document.  Thumbnail files consist of a single
Packit df99a1
       #"FORM:THUM"# composite chunk containing several #"TH44"# chunks
Packit df99a1
       containing IW44 encoded thumbnail images (see \Ref{IW44Image.h}).  These
Packit df99a1
       thumbnails always pertain the first few page files following the
Packit df99a1
       thumbnail file in the document directory.
Packit df99a1
    \end{itemize}
Packit df99a1
Packit df99a1
    \textbf{Including shared information} --- 
Packit df99a1
Packit df99a1
    Any DjVu image file contained in a multipage file may contain an #"INCL"#
Packit df99a1
    chunk containing the ID of a shared component file.  The decoder processes
Packit df99a1
    the chunks contained in the shared component file as if they were
Packit df99a1
    contained by the DjVu image file.
Packit df99a1
Packit df99a1
    A shared component file is composed of a single #"FORM:DJVI"# potentially
Packit df99a1
    containing any information otherwise allowed in a DjVu image file (except
Packit df99a1
    for the #"INFO"# chunk of course).
Packit df99a1
Packit df99a1
    There are many benefits associated with storing such shared information in
Packit df99a1
    separate files.  A well designed browser may keep pre-decoded copies of
Packit df99a1
    these files in a cache.  This procedure would reduce the size of the data
Packit df99a1
    transferred over the Internet and also increase the display speed.  The
Packit df99a1
    multipage DjVu compressor, for instance, identifies similar object shapes
Packit df99a1
    occuring in several pages.  These shapes are encoded in a shape dictionary
Packit df99a1
    (chunk #"Djbz"#) placed in a shared component file.  All relevant pages
Packit df99a1
    include this shared component file.  Although they appear in several
Packit df99a1
    pages, these shared shapes are encoded only once in the document.
Packit df99a1
Packit df99a1
    \textbf{Browsing a multi-page document} --- 
Packit df99a1
Packit df99a1
    You can view the pages using the DjVu plugin and a web browser.  When you
Packit df99a1
    type the URL of a multi-page document, the browser starts downloading the
Packit df99a1
    whole file, but displays the first page as soon as it is available.  You
Packit df99a1
    can immediately navigate to other pages using the DjVu toolbar.  Suppose
Packit df99a1
    however that the document is stored on a remote web server.  You can
Packit df99a1
    easily access the first page and see that this is not the document you
Packit df99a1
    wanted.  Although you will never display the other pages the browser is
Packit df99a1
    transferring data for these pages and is wasting the bandwith of your
Packit df99a1
    server (and the bandwith of the Internet too).  You could also see the
Packit df99a1
    summary of the document on the first page and jump to page 100.  But page
Packit df99a1
    100 cannot be displayed until data for pages 1 to 99 has been received.
Packit df99a1
    You may have to wait for the transmission of unnecessary page data.  This
Packit df99a1
    second problem (the unnecessary wait) can be solved using the ``byte
Packit df99a1
    serving'' options of the HTTP/1.1 protocol.  This option has to be
Packit df99a1
    supported by the web server, the proxies, the caches and the browser.  We
Packit df99a1
    are coming there but not quite yet.  Byte serving however does not solve
Packit df99a1
    the first problem (the waste of bandwith).
Packit df99a1
Packit df99a1
    \textbf{Indirect multi-page documents} --- 
Packit df99a1
Packit df99a1
    DjVu solves both problem using a
Packit df99a1
    special multi-page format named the \emph{indirect} model.  An indirect
Packit df99a1
    multi-page DjVu document is composed of several files.  The main file is
Packit df99a1
    named the \emph{index file}.  You can browse a document using the URL of
Packit df99a1
    the index file, just like you do with a bundled multi-page document.  The
Packit df99a1
    index file however is very small.  It simply contains the document
Packit df99a1
    directory and the URLs of secondary files containing the page data.  When
Packit df99a1
    you browse an indirect multi-page document, the browser only accesses data
Packit df99a1
    for the pages you are viewing.  This can be done at a reasonable speed
Packit df99a1
    because the browser maintains a cache of pages and sometimes pre-fetches a
Packit df99a1
    few pages ahead of the current page.  This model uses the web serving
Packit df99a1
    bandwith much more effectively.  It also eliminates unnecessary delays
Packit df99a1
    when jumping ahead to pages located anywhere in a long document.
Packit df99a1
Packit df99a1
    \textbf{Obsolete Formats} --- 
Packit df99a1
Packit df99a1
    The library also supports two other multipage
Packit df99a1
    formats which are now obsolete.  These formats are technologically
Packit df99a1
    inferior and should no longer be used. */
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2 - CHUNK ENCODING
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    This section describes 
Packit df99a1
    - the encoding of new chunks introduces with DjVu3
Packit df99a1
    - the encoding changes of chunks already present in DjVu2
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.1 - CHANGES TO JB2 ( "Sjbz" AND "Djbz" CHUNKS )
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    Two extensions of the JB2 encoding format have been introduced
Packit df99a1
    with DjVu files version 21.  Both extensions maintain significant
Packit df99a1
    backward compatibility with previous version of the JB2 format.
Packit df99a1
    These extensions are described below by reference to the DjVu2 spec
Packit df99a1
    dated August 1999.  Both extension make use of the unused record 
Packit df99a1
    type value #9# (cf. ICFDD page 24) which has been renamed
Packit df99a1
    #REQUIRED_DICT_OR_RESET#.
Packit df99a1
Packit df99a1
    \textbf{Shared Shape Dictionaries} --- This extension provides
Packit df99a1
    support for sharing symbol definitions between the pages of a
Packit df99a1
    document.  To achieve this objective, the JB2 image data chunk
Packit df99a1
    must be able to address symbols defined elsewhere by a JB2
Packit df99a1
    dictionary data chunk shared by all the pages of a document.
Packit df99a1
Packit df99a1
    The arithmetically encoded JB2 image data logically consist of a
Packit df99a1
    sequence of records. The decoder processes these records in
Packit df99a1
    sequence and maintains a library of symbols which can be addressed
Packit df99a1
    by the following records.  The first record usually is a ``Start
Packit df99a1
    Of Image'' record describing the size of the image.
Packit df99a1
Packit df99a1
    Starting with version 21, a #REQUIRED_DICT_OR_RESET# (9) record
Packit df99a1
    type can appear \emph{before} the #START_OF_DATA# (0) record.  The
Packit df99a1
    record type field is followed by a single number arithmetically
Packit df99a1
    encoded (cf. ICFDD page 26) using a sixteenth context (cf. ICFDD
Packit df99a1
    page 25).  This record appears when the JB2 data chunk requires
Packit df99a1
    symbols encoded in a separate JB2 dictionary data chunk.  The
Packit df99a1
    number (the \textbf{dictionary size}) indicates how many symbols
Packit df99a1
    should have been defined by the JB2 dictionary data chunk.  The
Packit df99a1
    decoder should simply load these symbols in the symbol library and
Packit df99a1
    proceed as usual.  New symbols potentially defined by the
Packit df99a1
    subsequent JB2 image data records will therefore be numbered with
Packit df99a1
    integers greater or equal than the dictionary size.
Packit df99a1
Packit df99a1
    The JB2 dictionary data format is a pure subset of the JB2 image
Packit df99a1
    data format.  The #START_OF_DATA# (0) record always specifies an
Packit df99a1
    image width of zero and an image height of zero.  The only allowed
Packit df99a1
    record types are those defining library symbols only
Packit df99a1
    (#NEW_SYMBOL_LIBRARY_ONLY# (2) and #MATCHED_REFINE_LIBRARY_ONLY#
Packit df99a1
    (5) cf. ICFDD page 24) followed by a final #END_OF_DATA# (11)
Packit df99a1
    record.
Packit df99a1
Packit df99a1
    The JB2 dictionary data is usually located in an \textbf{Djbz} chunk.
Packit df99a1
    Each page \textbf{FORM:DJVU} may directly contain a \textbf{Djbz} chunk,
Packit df99a1
    or may indirectly point to such a chunk using an \textbf{INCL} chunk
Packit df99a1
    (cf. \Ref{Multipage DjVu documents.}).
Packit df99a1
Packit df99a1
    \textbf{Numcoder Reset} --- This extension addresses a problem for
Packit df99a1
    hardware implementations.  The encoding of numbers (cf. ICFDD page
Packit df99a1
    26) potentially uses an unbounded number of binary coding
Packit df99a1
    contexts. These contexts are normally allocated when they are used
Packit df99a1
    for the first time (cf. ICFDD informative note, page 27).
Packit df99a1
Packit df99a1
    Starting with version 21, a #REQUIRED_DICT_OR_RESET# (9) record
Packit df99a1
    type can appear \emph{after} the #START_OF_DATA# (0) record.  The
Packit df99a1
    decoder should proceed with the next record after \emph{clearing
Packit df99a1
    all binary contexts used for coding numbers}.  This operation
Packit df99a1
    implies that all binary contexts previously allocated for coding
Packit df99a1
    numbers can be deallocated.
Packit df99a1
  
Packit df99a1
    Starting with version 21, the JB2 encoder should insert a
Packit df99a1
    #REQUIRED_DICT_OR_RESET# record type whenever the number of these
Packit df99a1
    allocated binary contexts exceeds #20000#.  Only very large
Packit df99a1
    documents ever reach such a large number of allocated binary
Packit df99a1
    contexts (e.g large maps).  Hardware implementation however can
Packit df99a1
    benefit greatly from a hard bound on the total number of binary
Packit df99a1
    coding contexts.  Old JB2 decoders will treat this record type as
Packit df99a1
    an #END_OF_DATA# record and cleanly stop decoding (cf. ICFDD page
Packit df99a1
    30, Image refinement data).
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.2 - JB2 COLORS ( "FGbz" CHUNK )
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    To be documented.
Packit df99a1
Packit df99a1
    The #"FGbz"# contains BZZ compressed data 
Packit df99a1
    (cf. \Ref{BSByteStream.h}).
Packit df99a1
Packit df99a1
    The uncompressed data can be decoded using function
Packit df99a1
    #DjVuPalette::decode# defined in file #"DjVuPalette.cpp"#.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.3 - ANNOTATIONS ( "ANTa" AND "ANTz" CHUNKS )
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
[MAY 19TH, 2005: 
Packit df99a1
 New annotation types have been defined by Lizardtech.
Packit df99a1
 The authoritative documentation is now the djvused man page.]
Packit df99a1
Packit df99a1
Packit df99a1
    Annotations are contained in #"ANTa"# 
Packit df99a1
    or #"ANTz"# chunks.  The #"ANTa"# chunks contain the annotation in 
Packit df99a1
    plain text. The #"ANTz"# chunks contain the same information compressed 
Packit df99a1
    with the BZZ encoder (cf. \Ref{BSByteStream.h}).
Packit df99a1
Packit df99a1
    The complete annotation text is obtained by concatenating all annotation 
Packit df99a1
    chunks present in the page.  Pages can share annotations using an INCL
Packit df99a1
    chunk as explained in section \Ref{Including shared information}.
Packit df99a1
    A restriction of the current reference library implementation
Packit df99a1
    limits the number of shared annotation files to one.
Packit df99a1
Packit df99a1
    The syntax of the annotation text uses a simple
Packit df99a1
    parenthesized notation. Erroneous and unrecognized constructs are silently
Packit df99a1
    ignored.  The following constructs are recognized:
Packit df99a1
Packit df99a1
    \begin{description}
Packit df99a1
    \item[(background <color>)]
Packit df99a1
       Sets the color of the viewer area surrounding the DjVu image.
Packit df99a1
       The color argument #color# are always represented using X11 
Packit df99a1
       syntax \##RRGGBB#. For instance \##000000# is black 
Packit df99a1
       and \##FFFFFF# is white.
Packit df99a1
Packit df99a1
    \item[(zoom <zoom-value>)]
Packit df99a1
       Sets the initial zoom factor of the image.  Argument #zoom-value# may
Packit df99a1
       be #stretch#, #one2one#, #width#, #page#, or composed of the letter
Packit df99a1
       #"d"# followed by a number between #1# and #999# (such as in #d300# for
Packit df99a1
       instance.)
Packit df99a1
Packit df99a1
    \item[(mode <mode-value>)]
Packit df99a1
       Sets the display mode for the image.  Argument #mode-value# may
Packit df99a1
       be #color#, #bw#, #fore# or #back#.
Packit df99a1
Packit df99a1
    \item[(align <horz-align> <vert-align>)]
Packit df99a1
       Specifies how the image should be aligned on the viewer surface.
Packit df99a1
       By default the image is located in the center.  Argument #horz-align#
Packit df99a1
       may be #left#, #center#, or #right#.  Argument #vert-align# may be
Packit df99a1
       #top#, #center#, or #bottom#.
Packit df99a1
Packit df99a1
    \item[(maparea <url> <comment> <area> <...options...>]
Packit df99a1
       Defines an hyperlink for the URL specified by argument #url#.
Packit df99a1
Packit df99a1
       Argument #url# may have one of the following two forms:
Packit df99a1
       \begin{verbatim}
Packit df99a1
            "<href>"
Packit df99a1
            (url "<href>" "<target>")
Packit df99a1
       \end{verbatim}
Packit df99a1
       where #href# is a string representing the URL and #target# is a string
Packit df99a1
       representing the target frame for the hyperlink (cf. Documentation for
Packit df99a1
       the HTML tag ##).  Both strings are surrounded with double quotes.
Packit df99a1
       Argument #comment# is a string surrounded by double quotes.
Packit df99a1
       This string may be displayed as a tooltip when the user
Packit df99a1
       moves the mouse over the hyperlink.
Packit df99a1
       Argument #area# defines the shape of the hyperlink.
Packit df99a1
       The following options are supported for representing
Packit df99a1
       rectangle, circle, or polygons.
Packit df99a1
       \begin{verbatim}
Packit df99a1
            (rect <xmin> <ymin> <width> <height>)
Packit df99a1
            (oval <xmin> <ymin> <width> <height>)
Packit df99a1
            (polygon <x0> <y0> <x1> <y1> ....)
Packit df99a1
       \end{verbatim}
Packit df99a1
       All parameters are numbers representing coordinates measured in image
Packit df99a1
       pixels with the origin set at the bottom left corner of the image.  The
Packit df99a1
       remaining arguments describe options regarding the hyperlink borders.
Packit df99a1
       A first set of option define the type of the borders:
Packit df99a1
       \begin{verbatim}
Packit df99a1
            (xor)
Packit df99a1
            (border <color>
Packit df99a1
            (shadow_in [<thickness>])
Packit df99a1
            (shadow_out [<thickness>])
Packit df99a1
            (shadow_ein [<thickness>])
Packit df99a1
            (shadow_eout [<thickness>])
Packit df99a1
       \end{verbatim}
Packit df99a1
       where parameter #color# has syntax \##RRGGBB# (as above) and parameter
Packit df99a1
       #thickness# is a number from 1 to 32.  The last four border modes are
Packit df99a1
       only supported with rectangular areas. The border becomes visible when
Packit df99a1
       the user moves the mouse over the hyperlink.  The border may be made
Packit df99a1
       always visible by using the following option:
Packit df99a1
       \begin{verbatim}
Packit df99a1
            (border-avis)
Packit df99a1
       \end{verbatim}
Packit df99a1
       Finally the following option may be used with rectangular areas only.
Packit df99a1
       The complete area will be hilited using the specified color (specified
Packit df99a1
       with syntax \##RRGGBB# as usual).
Packit df99a1
       \begin{verbatim}
Packit df99a1
            (hilite <color>)
Packit df99a1
       \end{verbatim}
Packit df99a1
       This is often used with an empty URL for simply emphasizing a specific
Packit df99a1
       segment of an image.
Packit df99a1
Packit df99a1
    \item[(metadata <...entries...>)]
Packit df99a1
       Defines multiple metadata entries.
Packit df99a1
Packit df99a1
       Each metadata entry has the form
Packit df99a1
       \begin{verbatim}
Packit df99a1
          (<key> "<value>")
Packit df99a1
       \end{verbatim}
Packit df99a1
       parameter #<key># is a symbolic attribute name such as #year#,
Packit df99a1
       #booktitle#, #editor#, #author#, and parameter #<value>#
Packit df99a1
       is a UTF-8 encoded string representing the attribute value.  
Packit df99a1
       Common C escape sequences are recognized.
Packit df99a1
       It is suggested to use the same key names as 
Packit df99a1
       the BibTeX bibliography system.
Packit df99a1
Packit df99a1
       Metadata pertaining to the entire document should be placed
Packit df99a1
       in a shared annotation file (and therefore are seen in all pages).
Packit df99a1
       Metadata pertaining to a particular page are usually places
Packit df99a1
       inside an #"ANTz"# chunk in this particular page.
Packit df99a1
Packit df99a1
    \end{description}
Packit df99a1
Packit df99a1
    
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.4 - HIDDEN TEXT ( "TXTa" AND "TXTz" CHUNKS )
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    To be documented.
Packit df99a1
Packit df99a1
    The #"TXTa"# chunk contains uncompressed data.  
Packit df99a1
    The #"TXTz"# chunk contains BZZ compressed data (cf. \Ref{BSByteStream.h}).
Packit df99a1
Packit df99a1
    The uncompressed data can be decoded using function #DjVuText::decode#
Packit df99a1
    defined in file #"DjVuText.cpp"# Program #djvused# can display the content
Packit df99a1
    of the text chunk using a lisp syntax, and can create a text chunk from
Packit df99a1
    this lisp syntax.
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.5 - MULTIPAGE DIRECTORY CHUNK ( "DIRM" CHUNK )
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    Multipage DjVu documents follow the EA
Packit df99a1
    IFF85 format (cf. \Ref{IFFByteStream.h}.)  A document is composed of a
Packit df99a1
    #"FORM:DJVM"# whose first chunk is a #"DIRM"# chunk containing the
Packit df99a1
    \emph{document directory}.  This directory lists all component
Packit df99a1
    files composing
Packit df99a1
    the given document, helps to access every component file and identify the
Packit df99a1
    pages of the document.
Packit df99a1
Packit df99a1
    \begin{itemize} 
Packit df99a1
    \item
Packit df99a1
         In a \emph{bundled} multipage file, the component files 
Packit df99a1
         are stored immediately after the #"DIRM"# chunk,
Packit df99a1
         within the #"FORM:DJVM"# composite chunk.  
Packit df99a1
    \item
Packit df99a1
         In an \emph{indirect} multipage file, the component files are 
Packit df99a1
         stored in different files whose URLs are composed using information 
Packit df99a1
         stored in the #"DIRM"# chunk.
Packit df99a1
    \end{itemize} 
Packit df99a1
Packit df99a1
    Most of the component files represent pages of a document.  Some files
Packit df99a1
    however represent data shared by several pages.  The pages refer to these
Packit df99a1
    supporting files by means of an inclusion chunk (#"INCL"# chunks)
Packit df99a1
    identifying the supporting file.  Every directory record describes a 
Packit df99a1
    component file.  Each component file is identified by a small string 
Packit df99a1
    named the identifier (ID). Each component file also contains a 
Packit df99a1
    file name and a title.
Packit df99a1
Packit df99a1
    Theoretically, IDs are used to uniquely identify each component file in
Packit df99a1
    #"INCL"# chunks, names are used to compose the the URLs of the component
Packit df99a1
    files in an indirect multipage DjVu file, and titles are cosmetic names
Packit df99a1
    possibly displayed when viewing a page of a document.  There are however
Packit df99a1
    many problems with this scheme, and we \emph{strongly suggest}, with the
Packit df99a1
    current implementation to always make the file ID, the file name and the
Packit df99a1
    file title identical.
Packit df99a1
Packit df99a1
    \textbf{Variants} --- There are two versions of the #"DIRM"# chunk format.
Packit df99a1
    The version number is identified by the seven low bits of the first byte
Packit df99a1
    of the chunk.  Version \textbf{0} is obsolete and should never be used.  This
Packit df99a1
    section describes version \textbf{1}.  There are two major multipage DjVu
Packit df99a1
    formats supported: \emph{bundled} and \emph{indirect}.  The #"DIRM"# chunk
Packit df99a1
    indicates which format is used in the most significant bit of the first
Packit df99a1
    byte of the chunk.  The document is bundled when this bit is set.
Packit df99a1
    Otherwise the document is indirect.
Packit df99a1
Packit df99a1
    \textbf{Unencoded data} --- 
Packit df99a1
Packit df99a1
    The #"DIRM"# chunk is composed some unencoded
Packit df99a1
    data followed by \Ref{bzz} encoded data.  The unencoded data starts with
Packit df99a1
    the version byte and a 16 bit integer representing the number of component
Packit df99a1
    files.  All integers are encoded with the most significant byte first.
Packit df99a1
    \begin{verbatim}
Packit df99a1
          BYTE:             Flags/Version:  0x<bundled>0000011
Packit df99a1
          INT16:            Number of component files.
Packit df99a1
    \end{verbatim}
Packit df99a1
    When the document is a bundled document (i.e. the flag #bundled# is set),
Packit df99a1
    this header is followed by the offsets of each of the component files within
Packit df99a1
    the #"FORM:DJVM"#.  These offsets allow for random component file access.
Packit df99a1
    \begin{verbatim}
Packit df99a1
          INT32:            Offset of first component file.
Packit df99a1
          INT32:            Offset of second component file.
Packit df99a1
          ...
Packit df99a1
          INT32:            Offset of last component file.
Packit df99a1
    \end{verbatim}
Packit df99a1
Packit df99a1
    \textbf{BZZ encoded data} ---
Packit df99a1
Packit df99a1
     The rest of the chunk is entirely compressed
Packit df99a1
    with the BZZ general purpose compressor.  We describe now the data fed
Packit df99a1
    into (or retrieved from) the BZZ codec (cf. \Ref{BSByteStream}.)  First
Packit df99a1
    come the sizes and the flags associated with each component file.
Packit df99a1
    \begin{verbatim}
Packit df99a1
          INT24:             Size of the first component file.
Packit df99a1
          INT24:             Size of the second component file.
Packit df99a1
          ...
Packit df99a1
          INT24:             Size of the last component file.
Packit df99a1
          BYTE:              Flag byte for the first component file.
Packit df99a1
          BYTE:              Flag byte for the second component file.
Packit df99a1
          ...
Packit df99a1
          BYTE:              Flag byte for the last component file.
Packit df99a1
    \end{verbatim}
Packit df99a1
    The flag bytes have the following format:
Packit df99a1
    \begin{verbatim}
Packit df99a1
          0b<hasname><hastitle>000000     for a file included by other files.
Packit df99a1
          0b<hasname><hastitle>000001     for a file representing a page.
Packit df99a1
          0b<hasname><hastitle>000010     for a file containing thumbnails.
Packit df99a1
    \end{verbatim}
Packit df99a1
    Flag #hasname# is set when the name of the file is different from the file
Packit df99a1
    ID.  Flag #hastitle# is set when the title of the file is different from
Packit df99a1
    the file ID.  These flags are used to avoid encoding the same string three
Packit df99a1
    times.  Then come a sequence of zero terminated strings.  There are one to
Packit df99a1
    three such strings per component file.  The first string contains the ID
Packit df99a1
    of the component file.  The second string contains the name of the
Packit df99a1
    component file.  It is only present when the flag #hasname# is set. The third
Packit df99a1
    one contains the title of the component file. It is only present when the
Packit df99a1
    flag #hastitle# is set. The \Ref{bzz} encoding system makes sure that 
Packit df99a1
    all these strings will be encoded efficiently despite their possible
Packit df99a1
    redundancies.
Packit df99a1
    \begin{verbatim}
Packit df99a1
          ZSTR:     ID of the first component file.
Packit df99a1
          ZSTR:     Name of the first component file (only if #hasname# is set.)
Packit df99a1
          ZSTR:     Title of the first component file (only if #hastitle# is set.)
Packit df99a1
          ... 
Packit df99a1
          ZSTR:     ID of the last component file.
Packit df99a1
          ZSTR:     Name of the last component file (only if #hasname# is set.)
Packit df99a1
          ZSTR:     Title of the last component file (only if #hastitle# is set.)
Packit df99a1
    \end{verbatim}
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.6 - INCLUDES ( "INCL" CHUNK )
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
    The chunks simply contains the ascii encoded ID
Packit df99a1
    of the included component file.
Packit df99a1
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.7 - THUMBNAILS
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
Packit df99a1
    Multipage document file optionally can contain thumbnails for some or all
Packit df99a1
    pages.  These thumbnails are stored into special component files
Packit df99a1
    containing thumbnails for a number of consecutive pages.
Packit df99a1
Packit df99a1
    The thumbnail component file is composed of a single #"FORM:THUM"#
Packit df99a1
    containing one or more #"TH44"# chunk.  Each #"TH44"# chunk contains one
Packit df99a1
    IW44 encoded thumbnail image for one page (cf. \Ref{IW44Image.h}).
Packit df99a1
Packit df99a1
Packit df99a1
------------------------------------------------------------
Packit df99a1
2.8 - OUTLINES/BOOKMARKS
Packit df99a1
------------------------------------------------------------
Packit df99a1
Packit df99a1
[MAY 19th, 2005
Packit df99a1
 Multipage files (FORM:DJVM) can contain an 
Packit df99a1
 additional chunk "NAVM" located after the "DIRM" chunk.
Packit df99a1
 The NAVM chunk contains outlines and bookmarks.
Packit df99a1
 See the files libdjvu/DjVmNav.h and libdjvu.DjVmNav.cpp]