Tree - source-git/djvulibre - CentOS Git server

source-git / djvulibre

Blame doc/old/djvu3changes.txt

Blob History Raw

Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`BY THE LIZARDTECH SPECIFICATION "DJVU3SPEC.DJVU".>`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`This file summarizes the file format changes`
Packit	df99a1	`between DjVu2 and DjVu3.`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`1 - DJVU3 FILE STRUCTURE OVERVIEW`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	DjVu files are organized according to the ``EA IFF 85'' layout. Pointers to
Packit	df99a1	`the appropriate reference document are provided in section`
Packit	df99a1	`\Ref{IFFByteStream.h}. IFF files are logically composed of a sequence of`
Packit	df99a1	`data \emph{chunks}. Each chunk comes with a four character \emph{chunk`
Packit	df99a1	`identifier} describing the type of the data stored in the chunk. A few`
Packit	df99a1	`special chunk identifiers, for instance #"FORM"#, are reserved for so`
Packit	df99a1	`called \emph{composite chunks} containing a sequence of data chunks. This`
Packit	df99a1	`convention effectively provides IFF files with a hierarchical structure.`
Packit	df99a1	`Composite chunks are further identified by a \emph{secondary chunk`
Packit	df99a1	`identifier}. For convenience, both identifiers are gathered as an`
Packit	df99a1	`extended chunk identifier such as #"FORM:DJVU"#.`
Packit	df99a1
Packit	df99a1	`The four octets #0x41,0x54,0x26,0x54# may be inserted in front of the IFF`
Packit	df99a1	`compliant byte stream. The decoder simply ignores these four octets when`
Packit	df99a1	`they are present. These four octets are not part of the IFF format and`
Packit	df99a1	`are not required components of a valid DjVu file. Certain versions of MSIE`
Packit	df99a1	`incorrectly recognize any IFF file as a Microsoft AIFF sound file. The`
Packit	df99a1	`presence of these four octets prevents this incorrect identification.`
Packit	df99a1
Packit	df99a1	`The DjVu specification mandates that the decoder should silently`
Packit	df99a1	`skip chunks whose identifier is not recognized. This mechanism`
Packit	df99a1	`provides a backward compatible way to extend the initial format by`
Packit	df99a1	`allocating new chunk identifiers.`
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`1.1 - DJVU3 IMAGE FILES`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`\textbf{Photo DjVu Image} ---`
Packit	df99a1
Packit	df99a1	`Photo DjVu Image files are best used for`
Packit	df99a1	`encoding photographic images in colors or in shades of gray. The data`
Packit	df99a1	`compression model relies on the IW44 wavelet representation. This format`
Packit	df99a1	`is designed such that the IW44 decoder is able to quickly perform`
Packit	df99a1	`progressive rendering of any image segment using only a small amount of`
Packit	df99a1	`memory. Photo DjVu files are composed of a single #"FORM:DJVU"# composite`
Packit	df99a1	`chunk. This composite chunk always begins with one #"INFO"# chunk`
Packit	df99a1	`describing the image size and resolution (see \Ref{DjVuInfo.h}). One or`
Packit	df99a1	`more additional #"BG44"# chunks contains the image data encoded with the`
Packit	df99a1	`IW44 representation (see \Ref{IW44Image.h}). The image size specified in`
Packit	df99a1	`the #"INFO"# chunk and the image size specified in the IW44 data must be`
Packit	df99a1	`equal.`
Packit	df99a1
Packit	df99a1	`\textbf{Bilevel DjVu Image} ---`
Packit	df99a1
Packit	df99a1	`Bilevel DjVu Image files are used to compress`
Packit	df99a1	`black and white images representing text and simple drawings. The`
Packit	df99a1	`JB2 data compression model uses the soft pattern matching technique, which`
Packit	df99a1	`essentially consists of encoding each character by describing how it`
Packit	df99a1	`differs from a well chosen already encoded character. Bilevel DjVu Files`
Packit	df99a1	`are composed of a single #"FORM:DJVU"# composite chunk. This composite`
Packit	df99a1	`chunk always begins with one #"INFO"# chunk describing the image size and`
Packit	df99a1	`resolution (see \Ref{DjVuInfo.h}). An additional #"Sjbz"# chunk contains`
Packit	df99a1	`the bilevel data encoded with the JB2 representation (see`
Packit	df99a1	`\Ref{JB2Image.h}). The image size specified in the #"INFO"# chunk and the`
Packit	df99a1	`image size specified in the JB2 data must be equal.`
Packit	df99a1
Packit	df99a1	`\textbf{Compound DjVu Image} ---`
Packit	df99a1
Packit	df99a1	`Compound DjVu Files are an extremely`
Packit	df99a1	`efficient way to compress high resolution Compound document images`
Packit	df99a1	`containing both pictures and text, such as a page of a magazine. Compound`
Packit	df99a1	`DjVu Files represent the document images using two layers. The`
Packit	df99a1	`\emph{background layer} is used for encoding the pictures and the`
Packit	df99a1	`paper texture.`
Packit	df99a1	`The \emph{foreground layer} is used for encoding the text and the drawings.`
Packit	df99a1	`Compound DjVu Files are composed of a single #"FORM:DJVU"# composite`
Packit	df99a1	`chunk. This composite chunk always begins with one #"INFO"# chunk`
Packit	df99a1	`describing the size and the resolution of the image (see \Ref{DjVuInfo}).`
Packit	df99a1	`Additional chunks hold the components of either the foreground or the`
Packit	df99a1	`background layers.`
Packit	df99a1
Packit	df99a1	`The main component of the foreground layer is a bilevel image named the`
Packit	df99a1	`\emph{foreground mask}. The pixel size of the foreground mask is equal to`
Packit	df99a1	`the size of the DjVu image. It contains a black-on-white representation`
Packit	df99a1	`of the text and the drawings. This image is encoded by a #"Sjbz"# chunk`
Packit	df99a1	`using the JB2 representation. There may also be a companion chunk`
Packit	df99a1	`#"Djbz"# containing a \emph{shape dictionary} that defines bilevel shapes`
Packit	df99a1	`referenced by the #"Sjbz"# chunk.`
Packit	df99a1
Packit	df99a1	`The \emph{foreground colors} can be encoded according to two models:`
Packit	df99a1	`\begin{itemize}`
Packit	df99a1	`\item`
Packit	df99a1	`The foreground colors may be encoded using a small color image,`
Packit	df99a1	`the \emph{foreground color image}, encoded as a single #"FG44"#`
Packit	df99a1	`chunk using the`
Packit	df99a1	`IW44 representation (see \Ref{IW44Image.h}). Such compound DjVu images`
Packit	df99a1	`are rendered by painting the foreground color image on top of the`
Packit	df99a1	`background color image using the foreground mask as a stencil. The`
Packit	df99a1	`pixel size of the foreground color image is computed by rounding up the`
Packit	df99a1	`quotient of the mask size by an integer sub-sampling factor ranging from`
Packit	df99a1	`1 to 12. Most Compound DjVu Images use a foreground color sub-sampling`
Packit	df99a1	`factor of 12. Smaller sub-sampling factors produce very slightly better`
Packit	df99a1	`images.`
Packit	df99a1	`\item`
Packit	df99a1	`The foreground colors may be encoded by specifying one solid color per`
Packit	df99a1	`object described by the JB2 encoded mask. These \emph{JB2 colors} are`
Packit	df99a1	`color-quantized and stored in a single #"FGbz"# chunk (see.`
Packit	df99a1	`\Ref{DjVuPalette.h}). Such compound DjVu images are rendered by`
Packit	df99a1	`painting each foreground object on top of the background color image`
Packit	df99a1	`using the solid color specified by the #"FGbz"# chunk.`
Packit	df99a1	`\end{itemize}`
Packit	df99a1
Packit	df99a1	`The background layer is a color image, \Ref{the background color image}`
Packit	df99a1	`ncoded by an arbitrary number of #"BG44"# chunks containing successive`
Packit	df99a1	`IW44 refinements (see \Ref{IW44Image.h}). The size of this image is`
Packit	df99a1	`computed by rounding up the quotient of the mask size by an integer`
Packit	df99a1	`sub-sampling factor ranging from 1 to 12. Most Compound DjVu Images use a`
Packit	df99a1	`background sub-sampling factor equal to 3. Smaller sub-sampling factors`
Packit	df99a1	`are adequate for images with a very rich paper texture. Larger`
Packit	df99a1	`sub-sampling factors are adequate for images containing no pictures.`
Packit	df99a1
Packit	df99a1	`There are no ordering or interleaving constraints on these chunks except`
Packit	df99a1	`that (a) the #"INFO"# chunk must appear first, and (b) the successive`
Packit	df99a1	`#"BG44"# refinements must appear with their natural order. The chunk`
Packit	df99a1	`order simply affects the progressive rendering of DjVu images on a web`
Packit	df99a1	`browser.`
Packit	df99a1
Packit	df99a1	`\textbf{IW44 Image Files} --`
Packit	df99a1
Packit	df99a1	`The IW44 Image file format is the native format for the IW44 wavelet`
Packit	df99a1	`representation. These files are deprecated in favor of Photo DjVu`
Packit	df99a1	`Images.`
Packit	df99a1
Packit	df99a1	`\textbf{Alternative encodings} ---`
Packit	df99a1
Packit	df99a1	`Besides the JB2 and IW44 encoding schemes,`
Packit	df99a1	`the DjVu format supports alternative encoding methods for its components.`
Packit	df99a1
Packit	df99a1	`\begin{itemize}`
Packit	df99a1	`\item`
Packit	df99a1	`The foreground mask may be represented by a single #"Smmr"# chunk`
Packit	df99a1	`instead of #"Sjbz"#. The #"Smmr"# chunk contains a bilevel image`
Packit	df99a1	`encoded with the Fax-G4/MMR method. Although the resulting files`
Packit	df99a1	`are typically six times larger, this capability can be useful when`
Packit	df99a1	`DjVu is used as a front-end for fax machines and scanners with`
Packit	df99a1	`embedded Fax-G4/MMR capabilities.`
Packit	df99a1	`\item`
Packit	df99a1	`The background color image may be represented by a single #"BGjp"#`
Packit	df99a1	`chunk instead of several #"BG44"# chunks. The #"BGjp"# chunk contains`
Packit	df99a1	`a JPEG encoded color image. The resulting files are significantly`
Packit	df99a1	`larger and lack the progressivity of the usual DjVu files.`
Packit	df99a1	`This is useful because some scanners have embedded JPEG capabilities.`
Packit	df99a1	`\item`
Packit	df99a1	`The foreground color image may be represented by a single #"FGjp"#`
Packit	df99a1	`chunk instead of a single #"FG44"# chunk. This is useful because`
Packit	df99a1	`some scanners have embedded JPEG capabilities.`
Packit	df99a1	`\end{itemize}`
Packit	df99a1
Packit	df99a1	`In addition, the chunk names #"BG2k"# and #"FG2k"# have been reserved for`
Packit	df99a1	`encoding the background color image and the foreground color image using`
Packit	df99a1	`the forthcoming JPEG-2000 standard. This capability is not implemented at`
Packit	df99a1	`the moment. The JPEG-2000 standard may even become the preferred encoding`
Packit	df99a1	`method for color images in DjVu. */`
Packit	df99a1
Packit	df99a1	`\textbf{Annotations and Textual Information } --`
Packit	df99a1
Packit	df99a1	`All types of DjVu images may contain`
Packit	df99a1	`annotation chunks. Annotation chunks are currently used to describe`
Packit	df99a1	`hyperlinks, to specify more closely the behavior of the viewers,`
Packit	df99a1	`and to hold metadata information. Annotations are contained in #"ANTa"#`
Packit	df99a1	`or #"ANTz"# chunks. The #"ANTa"# chunks contain the annotation in`
Packit	df99a1	`plain text. The #"ANTz"# chunks contain the same information compressed`
Packit	df99a1	`with the BZZ encoder (cf. \Ref{BSByteStream.h}).`
Packit	df99a1
Packit	df99a1	`All types of DjVu image files may also contain a`
Packit	df99a1	`computer readable description of the text appearing on the page. This`
Packit	df99a1	`information is contained by either a #"TXTa"# chunk or #"TXTz"# chunk.`
Packit	df99a1	`The #"TXTa"# chunk contains uncompressed data. The #"TXTz"# chunk`
Packit	df99a1	`contains the same data compressed with the \Ref{bzz} compressor`
Packit	df99a1	`(cf. \Ref{BSByteStream.h}). The #"TXTa"# chunks begins by a 24 bit`
Packit	df99a1	`integer (most significant byte first) describing the length of the text in`
Packit	df99a1	`bytes. Then come the ISO10646/UTF8 text. Additional information`
Packit	df99a1	`indicates the position of each column/region/paragraph/line/word in the`
Packit	df99a1	`document. More information about the capabilities of the chunk can be`
Packit	df99a1	`found in section \Ref{DjVuTXT}. More information about the encoding of`
Packit	df99a1	`textual information can be found in file #"DjVuAnno.cpp"#. */`
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`1.2 - DJVU3 MULTIPAGE DOCUMENTS`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`The DjVu3 system supports two models for multi-page documents:`
Packit	df99a1	`\emph{bundled} multi-page documents and \emph{indirect} multi-page documents.`
Packit	df99a1
Packit	df99a1	`\textbf{Bundled multi-page documents} ---`
Packit	df99a1
Packit	df99a1	`A \emph{bundled} multi-page DjVu`
Packit	df99a1	`document uses a single file to represent the entire document. This single`
Packit	df99a1	`file contains all the pages as well as ancillary information (e.g. the`
Packit	df99a1	`page directory, data shared by several pages, thumbnails, etc.). Using a`
Packit	df99a1	`single file format is very convenient for storing documents or for sending`
Packit	df99a1	`email attachments.`
Packit	df99a1
Packit	df99a1	`A bundled multi-page document is composed of a single #"FORM:DJVM"#`
Packit	df99a1	`composite chunk. This composite chunk always begins with a #"DIRM"# chunk`
Packit	df99a1	`containing the document directory (see. \Ref{DjVmDir.h}) which represents`
Packit	df99a1	`the list of the \emph{component files} that compose the document. The`
Packit	df99a1	`component files themselves are then encoded as IFF85 composite chunks`
Packit	df99a1	`following the #"DIRM"# chunk.`
Packit	df99a1
Packit	df99a1	`\begin{itemize}`
Packit	df99a1	`\item`
Packit	df99a1	`Component files may be any valid DjVu image (see \Ref{DjVu Image Files})`
Packit	df99a1	`or IW44 image (see \Ref{IW44 Image Files}.) These component files`
Packit	df99a1	`always represent a page of a document. The corresponding IFF85 chunk ids are`
Packit	df99a1	`#"FORM:DJVU"#, #"FORM:PM44"#, or #"FORM:BM44"#.`
Packit	df99a1	`\item`
Packit	df99a1	`Component files may contain shared information indirectly referenced by`
Packit	df99a1	`some document pages. These \emph{shared component files} are always composed`
Packit	df99a1	`of a single #"FORM:DJVI"# chunk containing an arbitrary collection of`
Packit	df99a1	`chunks.`
Packit	df99a1	`\item`
Packit	df99a1	`Thumbnail files contain optional thumbnail images for a few consecutive`
Packit	df99a1	`pages of the document. Thumbnail files consist of a single`
Packit	df99a1	`#"FORM:THUM"# composite chunk containing several #"TH44"# chunks`
Packit	df99a1	`containing IW44 encoded thumbnail images (see \Ref{IW44Image.h}). These`
Packit	df99a1	`thumbnails always pertain the first few page files following the`
Packit	df99a1	`thumbnail file in the document directory.`
Packit	df99a1	`\end{itemize}`
Packit	df99a1
Packit	df99a1	`\textbf{Including shared information} ---`
Packit	df99a1
Packit	df99a1	`Any DjVu image file contained in a multipage file may contain an #"INCL"#`
Packit	df99a1	`chunk containing the ID of a shared component file. The decoder processes`
Packit	df99a1	`the chunks contained in the shared component file as if they were`
Packit	df99a1	`contained by the DjVu image file.`
Packit	df99a1
Packit	df99a1	`A shared component file is composed of a single #"FORM:DJVI"# potentially`
Packit	df99a1	`containing any information otherwise allowed in a DjVu image file (except`
Packit	df99a1	`for the #"INFO"# chunk of course).`
Packit	df99a1
Packit	df99a1	`There are many benefits associated with storing such shared information in`
Packit	df99a1	`separate files. A well designed browser may keep pre-decoded copies of`
Packit	df99a1	`these files in a cache. This procedure would reduce the size of the data`
Packit	df99a1	`transferred over the Internet and also increase the display speed. The`
Packit	df99a1	`multipage DjVu compressor, for instance, identifies similar object shapes`
Packit	df99a1	`occuring in several pages. These shapes are encoded in a shape dictionary`
Packit	df99a1	`(chunk #"Djbz"#) placed in a shared component file. All relevant pages`
Packit	df99a1	`include this shared component file. Although they appear in several`
Packit	df99a1	`pages, these shared shapes are encoded only once in the document.`
Packit	df99a1
Packit	df99a1	`\textbf{Browsing a multi-page document} ---`
Packit	df99a1
Packit	df99a1	`You can view the pages using the DjVu plugin and a web browser. When you`
Packit	df99a1	`type the URL of a multi-page document, the browser starts downloading the`
Packit	df99a1	`whole file, but displays the first page as soon as it is available. You`
Packit	df99a1	`can immediately navigate to other pages using the DjVu toolbar. Suppose`
Packit	df99a1	`however that the document is stored on a remote web server. You can`
Packit	df99a1	`easily access the first page and see that this is not the document you`
Packit	df99a1	`wanted. Although you will never display the other pages the browser is`
Packit	df99a1	`transferring data for these pages and is wasting the bandwith of your`
Packit	df99a1	`server (and the bandwith of the Internet too). You could also see the`
Packit	df99a1	`summary of the document on the first page and jump to page 100. But page`
Packit	df99a1	`100 cannot be displayed until data for pages 1 to 99 has been received.`
Packit	df99a1	`You may have to wait for the transmission of unnecessary page data. This`
Packit	df99a1	second problem (the unnecessary wait) can be solved using the ``byte
Packit	df99a1	`serving'' options of the HTTP/1.1 protocol. This option has to be`
Packit	df99a1	`supported by the web server, the proxies, the caches and the browser. We`
Packit	df99a1	`are coming there but not quite yet. Byte serving however does not solve`
Packit	df99a1	`the first problem (the waste of bandwith).`
Packit	df99a1
Packit	df99a1	`\textbf{Indirect multi-page documents} ---`
Packit	df99a1
Packit	df99a1	`DjVu solves both problem using a`
Packit	df99a1	`special multi-page format named the \emph{indirect} model. An indirect`
Packit	df99a1	`multi-page DjVu document is composed of several files. The main file is`
Packit	df99a1	`named the \emph{index file}. You can browse a document using the URL of`
Packit	df99a1	`the index file, just like you do with a bundled multi-page document. The`
Packit	df99a1	`index file however is very small. It simply contains the document`
Packit	df99a1	`directory and the URLs of secondary files containing the page data. When`
Packit	df99a1	`you browse an indirect multi-page document, the browser only accesses data`
Packit	df99a1	`for the pages you are viewing. This can be done at a reasonable speed`
Packit	df99a1	`because the browser maintains a cache of pages and sometimes pre-fetches a`
Packit	df99a1	`few pages ahead of the current page. This model uses the web serving`
Packit	df99a1	`bandwith much more effectively. It also eliminates unnecessary delays`
Packit	df99a1	`when jumping ahead to pages located anywhere in a long document.`
Packit	df99a1
Packit	df99a1	`\textbf{Obsolete Formats} ---`
Packit	df99a1
Packit	df99a1	`The library also supports two other multipage`
Packit	df99a1	`formats which are now obsolete. These formats are technologically`
Packit	df99a1	`inferior and should no longer be used. */`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2 - CHUNK ENCODING`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`This section describes`
Packit	df99a1	`- the encoding of new chunks introduces with DjVu3`
Packit	df99a1	`- the encoding changes of chunks already present in DjVu2`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.1 - CHANGES TO JB2 ( "Sjbz" AND "Djbz" CHUNKS )`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`Two extensions of the JB2 encoding format have been introduced`
Packit	df99a1	`with DjVu files version 21. Both extensions maintain significant`
Packit	df99a1	`backward compatibility with previous version of the JB2 format.`
Packit	df99a1	`These extensions are described below by reference to the DjVu2 spec`
Packit	df99a1	`dated August 1999. Both extension make use of the unused record`
Packit	df99a1	`type value #9# (cf. ICFDD page 24) which has been renamed`
Packit	df99a1	`#REQUIRED_DICT_OR_RESET#.`
Packit	df99a1
Packit	df99a1	`\textbf{Shared Shape Dictionaries} --- This extension provides`
Packit	df99a1	`support for sharing symbol definitions between the pages of a`
Packit	df99a1	`document. To achieve this objective, the JB2 image data chunk`
Packit	df99a1	`must be able to address symbols defined elsewhere by a JB2`
Packit	df99a1	`dictionary data chunk shared by all the pages of a document.`
Packit	df99a1
Packit	df99a1	`The arithmetically encoded JB2 image data logically consist of a`
Packit	df99a1	`sequence of records. The decoder processes these records in`
Packit	df99a1	`sequence and maintains a library of symbols which can be addressed`
Packit	df99a1	by the following records. The first record usually is a ``Start
Packit	df99a1	`Of Image'' record describing the size of the image.`
Packit	df99a1
Packit	df99a1	`Starting with version 21, a #REQUIRED_DICT_OR_RESET# (9) record`
Packit	df99a1	`type can appear \emph{before} the #START_OF_DATA# (0) record. The`
Packit	df99a1	`record type field is followed by a single number arithmetically`
Packit	df99a1	`encoded (cf. ICFDD page 26) using a sixteenth context (cf. ICFDD`
Packit	df99a1	`page 25). This record appears when the JB2 data chunk requires`
Packit	df99a1	`symbols encoded in a separate JB2 dictionary data chunk. The`
Packit	df99a1	`number (the \textbf{dictionary size}) indicates how many symbols`
Packit	df99a1	`should have been defined by the JB2 dictionary data chunk. The`
Packit	df99a1	`decoder should simply load these symbols in the symbol library and`
Packit	df99a1	`proceed as usual. New symbols potentially defined by the`
Packit	df99a1	`subsequent JB2 image data records will therefore be numbered with`
Packit	df99a1	`integers greater or equal than the dictionary size.`
Packit	df99a1
Packit	df99a1	`The JB2 dictionary data format is a pure subset of the JB2 image`
Packit	df99a1	`data format. The #START_OF_DATA# (0) record always specifies an`
Packit	df99a1	`image width of zero and an image height of zero. The only allowed`
Packit	df99a1	`record types are those defining library symbols only`
Packit	df99a1	`(#NEW_SYMBOL_LIBRARY_ONLY# (2) and #MATCHED_REFINE_LIBRARY_ONLY#`
Packit	df99a1	`(5) cf. ICFDD page 24) followed by a final #END_OF_DATA# (11)`
Packit	df99a1	`record.`
Packit	df99a1
Packit	df99a1	`The JB2 dictionary data is usually located in an \textbf{Djbz} chunk.`
Packit	df99a1	`Each page \textbf{FORM:DJVU} may directly contain a \textbf{Djbz} chunk,`
Packit	df99a1	`or may indirectly point to such a chunk using an \textbf{INCL} chunk`
Packit	df99a1	`(cf. \Ref{Multipage DjVu documents.}).`
Packit	df99a1
Packit	df99a1	`\textbf{Numcoder Reset} --- This extension addresses a problem for`
Packit	df99a1	`hardware implementations. The encoding of numbers (cf. ICFDD page`
Packit	df99a1	`26) potentially uses an unbounded number of binary coding`
Packit	df99a1	`contexts. These contexts are normally allocated when they are used`
Packit	df99a1	`for the first time (cf. ICFDD informative note, page 27).`
Packit	df99a1
Packit	df99a1	`Starting with version 21, a #REQUIRED_DICT_OR_RESET# (9) record`
Packit	df99a1	`type can appear \emph{after} the #START_OF_DATA# (0) record. The`
Packit	df99a1	`decoder should proceed with the next record after \emph{clearing`
Packit	df99a1	`all binary contexts used for coding numbers}. This operation`
Packit	df99a1	`implies that all binary contexts previously allocated for coding`
Packit	df99a1	`numbers can be deallocated.`
Packit	df99a1
Packit	df99a1	`Starting with version 21, the JB2 encoder should insert a`
Packit	df99a1	`#REQUIRED_DICT_OR_RESET# record type whenever the number of these`
Packit	df99a1	`allocated binary contexts exceeds #20000#. Only very large`
Packit	df99a1	`documents ever reach such a large number of allocated binary`
Packit	df99a1	`contexts (e.g large maps). Hardware implementation however can`
Packit	df99a1	`benefit greatly from a hard bound on the total number of binary`
Packit	df99a1	`coding contexts. Old JB2 decoders will treat this record type as`
Packit	df99a1	`an #END_OF_DATA# record and cleanly stop decoding (cf. ICFDD page`
Packit	df99a1	`30, Image refinement data).`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.2 - JB2 COLORS ( "FGbz" CHUNK )`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`To be documented.`
Packit	df99a1
Packit	df99a1	`The #"FGbz"# contains BZZ compressed data`
Packit	df99a1	`(cf. \Ref{BSByteStream.h}).`
Packit	df99a1
Packit	df99a1	`The uncompressed data can be decoded using function`
Packit	df99a1	`#DjVuPalette::decode# defined in file #"DjVuPalette.cpp"#.`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.3 - ANNOTATIONS ( "ANTa" AND "ANTz" CHUNKS )`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`[MAY 19TH, 2005:`
Packit	df99a1	`New annotation types have been defined by Lizardtech.`
Packit	df99a1	`The authoritative documentation is now the djvused man page.]`
Packit	df99a1
Packit	df99a1
Packit	df99a1	`Annotations are contained in #"ANTa"#`
Packit	df99a1	`or #"ANTz"# chunks. The #"ANTa"# chunks contain the annotation in`
Packit	df99a1	`plain text. The #"ANTz"# chunks contain the same information compressed`
Packit	df99a1	`with the BZZ encoder (cf. \Ref{BSByteStream.h}).`
Packit	df99a1
Packit	df99a1	`The complete annotation text is obtained by concatenating all annotation`
Packit	df99a1	`chunks present in the page. Pages can share annotations using an INCL`
Packit	df99a1	`chunk as explained in section \Ref{Including shared information}.`
Packit	df99a1	`A restriction of the current reference library implementation`
Packit	df99a1	`limits the number of shared annotation files to one.`
Packit	df99a1
Packit	df99a1	`The syntax of the annotation text uses a simple`
Packit	df99a1	`parenthesized notation. Erroneous and unrecognized constructs are silently`
Packit	df99a1	`ignored. The following constructs are recognized:`
Packit	df99a1
Packit	df99a1	`\begin{description}`
Packit	df99a1	`\item[(background <color>)]`
Packit	df99a1	`Sets the color of the viewer area surrounding the DjVu image.`
Packit	df99a1	`The color argument #color# are always represented using X11`
Packit	df99a1	`syntax \##RRGGBB#. For instance \##000000# is black`
Packit	df99a1	`and \##FFFFFF# is white.`
Packit	df99a1
Packit	df99a1	`\item[(zoom <zoom-value>)]`
Packit	df99a1	`Sets the initial zoom factor of the image. Argument #zoom-value# may`
Packit	df99a1	`be #stretch#, #one2one#, #width#, #page#, or composed of the letter`
Packit	df99a1	`#"d"# followed by a number between #1# and #999# (such as in #d300# for`
Packit	df99a1	`instance.)`
Packit	df99a1
Packit	df99a1	`\item[(mode <mode-value>)]`
Packit	df99a1	`Sets the display mode for the image. Argument #mode-value# may`
Packit	df99a1	`be #color#, #bw#, #fore# or #back#.`
Packit	df99a1
Packit	df99a1	`\item[(align <horz-align> <vert-align>)]`
Packit	df99a1	`Specifies how the image should be aligned on the viewer surface.`
Packit	df99a1	`By default the image is located in the center. Argument #horz-align#`
Packit	df99a1	`may be #left#, #center#, or #right#. Argument #vert-align# may be`
Packit	df99a1	`#top#, #center#, or #bottom#.`
Packit	df99a1
Packit	df99a1	`\item[(maparea <url> <comment> <area> <...options...>]`
Packit	df99a1	`Defines an hyperlink for the URL specified by argument #url#.`
Packit	df99a1
Packit	df99a1	`Argument #url# may have one of the following two forms:`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`"<href>"`
Packit	df99a1	`(url "<href>" "<target>")`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`where #href# is a string representing the URL and #target# is a string`
Packit	df99a1	`representing the target frame for the hyperlink (cf. Documentation for`
Packit	df99a1	`the HTML tag ##). Both strings are surrounded with double quotes.`
Packit	df99a1	`Argument #comment# is a string surrounded by double quotes.`
Packit	df99a1	`This string may be displayed as a tooltip when the user`
Packit	df99a1	`moves the mouse over the hyperlink.`
Packit	df99a1	`Argument #area# defines the shape of the hyperlink.`
Packit	df99a1	`The following options are supported for representing`
Packit	df99a1	`rectangle, circle, or polygons.`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`(rect <xmin> <ymin> <width> <height>)`
Packit	df99a1	`(oval <xmin> <ymin> <width> <height>)`
Packit	df99a1	`(polygon <x0> <y0> <x1> <y1> ....)`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`All parameters are numbers representing coordinates measured in image`
Packit	df99a1	`pixels with the origin set at the bottom left corner of the image. The`
Packit	df99a1	`remaining arguments describe options regarding the hyperlink borders.`
Packit	df99a1	`A first set of option define the type of the borders:`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`(xor)`
Packit	df99a1	`(border <color>`
Packit	df99a1	`(shadow_in [<thickness>])`
Packit	df99a1	`(shadow_out [<thickness>])`
Packit	df99a1	`(shadow_ein [<thickness>])`
Packit	df99a1	`(shadow_eout [<thickness>])`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`where parameter #color# has syntax \##RRGGBB# (as above) and parameter`
Packit	df99a1	`#thickness# is a number from 1 to 32. The last four border modes are`
Packit	df99a1	`only supported with rectangular areas. The border becomes visible when`
Packit	df99a1	`the user moves the mouse over the hyperlink. The border may be made`
Packit	df99a1	`always visible by using the following option:`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`(border-avis)`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`Finally the following option may be used with rectangular areas only.`
Packit	df99a1	`The complete area will be hilited using the specified color (specified`
Packit	df99a1	`with syntax \##RRGGBB# as usual).`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`(hilite <color>)`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`This is often used with an empty URL for simply emphasizing a specific`
Packit	df99a1	`segment of an image.`
Packit	df99a1
Packit	df99a1	`\item[(metadata <...entries...>)]`
Packit	df99a1	`Defines multiple metadata entries.`
Packit	df99a1
Packit	df99a1	`Each metadata entry has the form`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`(<key> "<value>")`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`parameter #<key># is a symbolic attribute name such as #year#,`
Packit	df99a1	`#booktitle#, #editor#, #author#, and parameter #<value>#`
Packit	df99a1	`is a UTF-8 encoded string representing the attribute value.`
Packit	df99a1	`Common C escape sequences are recognized.`
Packit	df99a1	`It is suggested to use the same key names as`
Packit	df99a1	`the BibTeX bibliography system.`
Packit	df99a1
Packit	df99a1	`Metadata pertaining to the entire document should be placed`
Packit	df99a1	`in a shared annotation file (and therefore are seen in all pages).`
Packit	df99a1	`Metadata pertaining to a particular page are usually places`
Packit	df99a1	`inside an #"ANTz"# chunk in this particular page.`
Packit	df99a1
Packit	df99a1	`\end{description}`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.4 - HIDDEN TEXT ( "TXTa" AND "TXTz" CHUNKS )`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`To be documented.`
Packit	df99a1
Packit	df99a1	`The #"TXTa"# chunk contains uncompressed data.`
Packit	df99a1	`The #"TXTz"# chunk contains BZZ compressed data (cf. \Ref{BSByteStream.h}).`
Packit	df99a1
Packit	df99a1	`The uncompressed data can be decoded using function #DjVuText::decode#`
Packit	df99a1	`defined in file #"DjVuText.cpp"# Program #djvused# can display the content`
Packit	df99a1	`of the text chunk using a lisp syntax, and can create a text chunk from`
Packit	df99a1	`this lisp syntax.`
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.5 - MULTIPAGE DIRECTORY CHUNK ( "DIRM" CHUNK )`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`Multipage DjVu documents follow the EA`
Packit	df99a1	`IFF85 format (cf. \Ref{IFFByteStream.h}.) A document is composed of a`
Packit	df99a1	`#"FORM:DJVM"# whose first chunk is a #"DIRM"# chunk containing the`
Packit	df99a1	`\emph{document directory}. This directory lists all component`
Packit	df99a1	`files composing`
Packit	df99a1	`the given document, helps to access every component file and identify the`
Packit	df99a1	`pages of the document.`
Packit	df99a1
Packit	df99a1	`\begin{itemize}`
Packit	df99a1	`\item`
Packit	df99a1	`In a \emph{bundled} multipage file, the component files`
Packit	df99a1	`are stored immediately after the #"DIRM"# chunk,`
Packit	df99a1	`within the #"FORM:DJVM"# composite chunk.`
Packit	df99a1	`\item`
Packit	df99a1	`In an \emph{indirect} multipage file, the component files are`
Packit	df99a1	`stored in different files whose URLs are composed using information`
Packit	df99a1	`stored in the #"DIRM"# chunk.`
Packit	df99a1	`\end{itemize}`
Packit	df99a1
Packit	df99a1	`Most of the component files represent pages of a document. Some files`
Packit	df99a1	`however represent data shared by several pages. The pages refer to these`
Packit	df99a1	`supporting files by means of an inclusion chunk (#"INCL"# chunks)`
Packit	df99a1	`identifying the supporting file. Every directory record describes a`
Packit	df99a1	`component file. Each component file is identified by a small string`
Packit	df99a1	`named the identifier (ID). Each component file also contains a`
Packit	df99a1	`file name and a title.`
Packit	df99a1
Packit	df99a1	`Theoretically, IDs are used to uniquely identify each component file in`
Packit	df99a1	`#"INCL"# chunks, names are used to compose the the URLs of the component`
Packit	df99a1	`files in an indirect multipage DjVu file, and titles are cosmetic names`
Packit	df99a1	`possibly displayed when viewing a page of a document. There are however`
Packit	df99a1	`many problems with this scheme, and we \emph{strongly suggest}, with the`
Packit	df99a1	`current implementation to always make the file ID, the file name and the`
Packit	df99a1	`file title identical.`
Packit	df99a1
Packit	df99a1	`\textbf{Variants} --- There are two versions of the #"DIRM"# chunk format.`
Packit	df99a1	`The version number is identified by the seven low bits of the first byte`
Packit	df99a1	`of the chunk. Version \textbf{0} is obsolete and should never be used. This`
Packit	df99a1	`section describes version \textbf{1}. There are two major multipage DjVu`
Packit	df99a1	`formats supported: \emph{bundled} and \emph{indirect}. The #"DIRM"# chunk`
Packit	df99a1	`indicates which format is used in the most significant bit of the first`
Packit	df99a1	`byte of the chunk. The document is bundled when this bit is set.`
Packit	df99a1	`Otherwise the document is indirect.`
Packit	df99a1
Packit	df99a1	`\textbf{Unencoded data} ---`
Packit	df99a1
Packit	df99a1	`The #"DIRM"# chunk is composed some unencoded`
Packit	df99a1	`data followed by \Ref{bzz} encoded data. The unencoded data starts with`
Packit	df99a1	`the version byte and a 16 bit integer representing the number of component`
Packit	df99a1	`files. All integers are encoded with the most significant byte first.`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`BYTE: Flags/Version: 0x<bundled>0000011`
Packit	df99a1	`INT16: Number of component files.`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`When the document is a bundled document (i.e. the flag #bundled# is set),`
Packit	df99a1	`this header is followed by the offsets of each of the component files within`
Packit	df99a1	`the #"FORM:DJVM"#. These offsets allow for random component file access.`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`INT32: Offset of first component file.`
Packit	df99a1	`INT32: Offset of second component file.`
Packit	df99a1	`...`
Packit	df99a1	`INT32: Offset of last component file.`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1
Packit	df99a1	`\textbf{BZZ encoded data} ---`
Packit	df99a1
Packit	df99a1	`The rest of the chunk is entirely compressed`
Packit	df99a1	`with the BZZ general purpose compressor. We describe now the data fed`
Packit	df99a1	`into (or retrieved from) the BZZ codec (cf. \Ref{BSByteStream}.) First`
Packit	df99a1	`come the sizes and the flags associated with each component file.`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`INT24: Size of the first component file.`
Packit	df99a1	`INT24: Size of the second component file.`
Packit	df99a1	`...`
Packit	df99a1	`INT24: Size of the last component file.`
Packit	df99a1	`BYTE: Flag byte for the first component file.`
Packit	df99a1	`BYTE: Flag byte for the second component file.`
Packit	df99a1	`...`
Packit	df99a1	`BYTE: Flag byte for the last component file.`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`The flag bytes have the following format:`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`0b<hasname><hastitle>000000 for a file included by other files.`
Packit	df99a1	`0b<hasname><hastitle>000001 for a file representing a page.`
Packit	df99a1	`0b<hasname><hastitle>000010 for a file containing thumbnails.`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1	`Flag #hasname# is set when the name of the file is different from the file`
Packit	df99a1	`ID. Flag #hastitle# is set when the title of the file is different from`
Packit	df99a1	`the file ID. These flags are used to avoid encoding the same string three`
Packit	df99a1	`times. Then come a sequence of zero terminated strings. There are one to`
Packit	df99a1	`three such strings per component file. The first string contains the ID`
Packit	df99a1	`of the component file. The second string contains the name of the`
Packit	df99a1	`component file. It is only present when the flag #hasname# is set. The third`
Packit	df99a1	`one contains the title of the component file. It is only present when the`
Packit	df99a1	`flag #hastitle# is set. The \Ref{bzz} encoding system makes sure that`
Packit	df99a1	`all these strings will be encoded efficiently despite their possible`
Packit	df99a1	`redundancies.`
Packit	df99a1	`\begin{verbatim}`
Packit	df99a1	`ZSTR: ID of the first component file.`
Packit	df99a1	`ZSTR: Name of the first component file (only if #hasname# is set.)`
Packit	df99a1	`ZSTR: Title of the first component file (only if #hastitle# is set.)`
Packit	df99a1	`...`
Packit	df99a1	`ZSTR: ID of the last component file.`
Packit	df99a1	`ZSTR: Name of the last component file (only if #hasname# is set.)`
Packit	df99a1	`ZSTR: Title of the last component file (only if #hastitle# is set.)`
Packit	df99a1	`\end{verbatim}`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.6 - INCLUDES ( "INCL" CHUNK )`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`The chunks simply contains the ascii encoded ID`
Packit	df99a1	`of the included component file.`
Packit	df99a1
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.7 - THUMBNAILS`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1
Packit	df99a1	`Multipage document file optionally can contain thumbnails for some or all`
Packit	df99a1	`pages. These thumbnails are stored into special component files`
Packit	df99a1	`containing thumbnails for a number of consecutive pages.`
Packit	df99a1
Packit	df99a1	`The thumbnail component file is composed of a single #"FORM:THUM"#`
Packit	df99a1	`containing one or more #"TH44"# chunk. Each #"TH44"# chunk contains one`
Packit	df99a1	`IW44 encoded thumbnail image for one page (cf. \Ref{IW44Image.h}).`
Packit	df99a1
Packit	df99a1
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1	`2.8 - OUTLINES/BOOKMARKS`
Packit	df99a1	`------------------------------------------------------------`
Packit	df99a1
Packit	df99a1	`[MAY 19th, 2005`
Packit	df99a1	`Multipage files (FORM:DJVM) can contain an`
Packit	df99a1	`additional chunk "NAVM" located after the "DIRM" chunk.`
Packit	df99a1	`The NAVM chunk contains outlines and bookmarks.`
Packit	df99a1	`See the files libdjvu/DjVmNav.h and libdjvu.DjVmNav.cpp]`

source-git / djvulibre

Source Code

Blame doc/old/djvu3changes.txt