Blame tools/djvu.1

Packit df99a1
.\" Copyright (c) 2001-2003 Leon Bottou, Yann Le Cun, Patrick Haffner,
Packit df99a1
.\" Copyright (c) 2001 AT&T Corp., and Lizardtech, Inc.
Packit df99a1
.\"
Packit df99a1
.\" This is free documentation; you can redistribute it and/or
Packit df99a1
.\" modify it under the terms of the GNU General Public License as
Packit df99a1
.\" published by the Free Software Foundation; either version 2 of
Packit df99a1
.\" the License, or (at your option) any later version.
Packit df99a1
.\"
Packit df99a1
.\" The GNU General Public License's references to "object code"
Packit df99a1
.\" and "executables" are to be interpreted as the output of any
Packit df99a1
.\" document formatting or typesetting system, including
Packit df99a1
.\" intermediate and printed output.
Packit df99a1
.\"
Packit df99a1
.\" This manual is distributed in the hope that it will be useful,
Packit df99a1
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
Packit df99a1
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
Packit df99a1
.\" GNU General Public License for more details.
Packit df99a1
.\"
Packit df99a1
.\" You should have received a copy of the GNU General Public
Packit df99a1
.\" License along with this manual. Otherwise check the web site
Packit df99a1
.\" of the Free Software Foundation at http://www.fsf.org.
Packit df99a1
.TH DJVU 1 "10/11/2001" "DjVuLibre-3.5" "DjVuLibre-3.5"
Packit df99a1
.de SS
Packit df99a1
.SH \\0\\0\\0\\$*
Packit df99a1
..
Packit df99a1
.SH NAME
Packit df99a1
DjVu \- DjVu and DjVuLibre.
Packit df99a1
Packit df99a1
.SH INTRODUCTION
Packit df99a1
Packit df99a1
Although the Internet has given us a worldwide infrastructure on which to
Packit df99a1
build the universal library, much of the world knowledge, history, and
Packit df99a1
literature is still trapped on paper in the basements of the world's
Packit df99a1
traditional libraries. Many libraries and content owners are in the process of
Packit df99a1
digitizing their collections.  While many such efforts involve the painstaking
Packit df99a1
process of converting paper documents to computer-friendly form, such as
Packit df99a1
.SM SGML
Packit df99a1
based formats, the high cost of such conversions limits their
Packit df99a1
extent. Scanning documents, and distributing the resulting images
Packit df99a1
electronically is not only considerably cheaper, but also more faithful to the
Packit df99a1
original document because it preserves its visual aspect.
Packit df99a1
.PP
Packit df99a1
Despite the quickly improving speed of network connections and computers, the
Packit df99a1
number of scanned document images accessible on the Web today is relatively
Packit df99a1
small. There are several reasons for this.
Packit df99a1
.PP
Packit df99a1
The first reason is the relatively high cost of scanning anything else but
Packit df99a1
unbound sheets in black and white. This problem is slowly going away with the
Packit df99a1
appearance of fast and low-cost color scanners with sheet feeders.
Packit df99a1
.PP
Packit df99a1
The second reason is that long-established image compression standards and
Packit df99a1
file formats have proved inadequate for distributing scanned documents at high
Packit df99a1
resolution, particularly color documents.  Not only are the file sizes and
Packit df99a1
download times impractical, the decoding and rendering times are also
Packit df99a1
prohibitive.  A typical magazine page scanned in color at 100 dpi in
Packit df99a1
.SM JPEG
Packit df99a1
would typically occupy 100
Packit df99a1
.SM KB
Packit df99a1
to 200
Packit df99a1
.SM KB
Packit df99a1
, but the text would be hardly readable: insufficient for screen viewing and
Packit df99a1
totally unacceptable for printing. The same page at 300 dpi would have
Packit df99a1
sufficient quality for viewing and printing, but the file size would be 300
Packit df99a1
.SM KB
Packit df99a1
to 1000
Packit df99a1
.SM KB
Packit df99a1
at best, which is impractical for remote access. Another major problem is that
Packit df99a1
a fully decoded 300 dpi color images of a letter-size page occupies 24
Packit df99a1
.SM MB
Packit df99a1
of memory and easily causes disk swapping.
Packit df99a1
.PP
Packit df99a1
The third reason is that digital documents are more than just a collection of
Packit df99a1
individual page images. Pages in a scanned documents have a natural serial
Packit df99a1
order. Special provision must be made to ensure that flipping pages be
Packit df99a1
instantaneous and effortless so as to maintain a good user experience. Even
Packit df99a1
more important, most existing document formats force users to download the
Packit df99a1
entire document first before displaying a chosen page.  However, users often
Packit df99a1
want to jump to individual pages of the document without waiting for the
Packit df99a1
entire document to download.  Efficient browsing requires efficient random
Packit df99a1
page access, fast sequential page flipping, and quick rendering. This can be
Packit df99a1
achieved with a combination of advanced compression, pre-fetching,
Packit df99a1
pre-decoding, caching, and progressive rendering. DjVu decomposes each page
Packit df99a1
into multiple components (text, backgrounds, images, libraries of common
Packit df99a1
shapes...)  that may be shared by several pages and downloaded on demand.  All
Packit df99a1
these requirements call for a very sophisticated but parsimonious control
Packit df99a1
mechanism to handle on-demand downloading, pre-fetching, decoding, caching,
Packit df99a1
and progressive rendering of the page images.  What is being considered here
Packit df99a1
is not just a document image compression technique, but a whole platform for
Packit df99a1
document delivery.
Packit df99a1
.PP
Packit df99a1
DjVu is an image compression technique, a document format, and a software
Packit df99a1
platform for delivering documents images over the Internet that fulfills the
Packit df99a1
above requirements.
Packit df99a1
Packit df99a1
.SH DJVU IMAGE COMPRESSION
Packit df99a1
Packit df99a1
The DjVu image compression is based on three technologies:
Packit df99a1
.SS DjVuPhoto
Packit df99a1
DjVuPhoto, also known as
Packit df99a1
.SM IW44,
Packit df99a1
is a wavelet-based continuous-tone image
Packit df99a1
compression technique with progressive decoding/rendering.  It is best used
Packit df99a1
for encoding photographic images in colors or in shades of gray.  Images are
Packit df99a1
typically half the size as
Packit df99a1
.SM JPEG
Packit df99a1
for the same distortion.
Packit df99a1
.SS DjVuBitonal
Packit df99a1
DjVuBitonal, also known as
Packit df99a1
.SM JB2,
Packit df99a1
is a bitonal image compression that takes
Packit df99a1
advantage of repetitions of nearly identical shapes on the page (such as
Packit df99a1
characters) to efficiently compress text images.  It is best used to compress
Packit df99a1
black and white images representing text and simple drawings.  A typical
Packit df99a1
300 dpi page in DjVuBitonal occupies 5 to 25
Packit df99a1
.SM KB
Packit df99a1
(3 to 8 times better than
Packit df99a1
.SM TIFF-G4
Packit df99a1
or
Packit df99a1
.SM PDF
Packit df99a1
).
Packit df99a1
.SS DjVuDocument
Packit df99a1
DjVuDocument is a compression technique specifically designed for color
Packit df99a1
digital documents images containing both pictures and text, such as a page of
Packit df99a1
a magazine.  DjVuDocument represents images into separately compressed layers.
Packit df99a1
The foreground layer is usually compressed with DjVu Bitonal and contains the
Packit df99a1
text and drawings.  The background layer is usually compressed with DjVuPhoto
Packit df99a1
and contains the background texture and the pictures at lower resolution.
Packit df99a1
Packit df99a1
.SH DJVU DOCUMENT DELIVERY PLATFORM
Packit df99a1
Packit df99a1
The DjVu technology is designed from the ground up to support the efficient
Packit df99a1
delivery of digital documents over the Internet.  It provides various ways to
Packit df99a1
deal with multi-page documents, and various ways to enrich the content with
Packit df99a1
hyper-links, meta-data, searchable text, etc.
Packit df99a1
Packit df99a1
.SS MIME types
Packit df99a1
The DjVu format has an official MIME type of
Packit df99a1
.BR image/vnd.djvu ,
Packit df99a1
which is the preferred content-type to be given by http servers for
Packit df99a1
DjVu files.  Unofficial mime types used historically are
Packit df99a1
.B image/x.djvu
Packit df99a1
and
Packit df99a1
.BR image/x-djvu ,
Packit df99a1
which may still be encountered.  Ideally, clients should be configured
Packit df99a1
to handle all three.  (For web server configuration help, see
Packit df99a1
.BR http://www.djvuzone.org/support/tutorial/chapter-authoring1.html .)
Packit df99a1
Packit df99a1
.SS Bundled multi-page documents
Packit df99a1
Bundled multi-page DjVu document uses a single file to represent the entire
Packit df99a1
document.  This single file contains all the pages as well as ancillary
Packit df99a1
information (e.g. the page directory, data shared by several pages,
Packit df99a1
thumbnails, etc.).  Using a single file format is very convenient for storing
Packit df99a1
documents or for sending email attachments.
Packit df99a1
.PP
Packit df99a1
When you type the
Packit df99a1
.SM URL
Packit df99a1
of a multi-page document, the DjVu browser plugin starts
Packit df99a1
downloading the whole file, but displays the first page as soon as it is
Packit df99a1
available.  You can immediately navigate to other pages using the DjVu
Packit df99a1
toolbar.  Suppose however that the document is stored on a remote web server.
Packit df99a1
You can easily access the first page and see that this is not the document you
Packit df99a1
wanted.  Although you will never display the other pages the browser is
Packit df99a1
transferring data for these pages and is wasting the bandwidth of your server
Packit df99a1
(and the bandwidth of the Internet too).  You could also see the summary of the
Packit df99a1
document on the first page and jump to page 100.  But page 100 cannot be
Packit df99a1
displayed until data for pages 1 to 99 has been received.  You may have to
Packit df99a1
wait for the transmission of unnecessary page data.  This second problem (the
Packit df99a1
unnecessary wait) can be solved using the ``byte serving'' options of the
Packit df99a1
.SM HTTP/1.1
Packit df99a1
protocol.  This option has to be supported by the web server, the
Packit df99a1
proxies, the caches and the browser.  Byte serving however does not solve the
Packit df99a1
first problem (the waste of bandwidth).
Packit df99a1
.SS Indirect multi-page documents
Packit df99a1
Indirect multi-page DjVu documents solve both problems.  An indirect
Packit df99a1
multi-page DjVu document is composed of several files.  The main file is named
Packit df99a1
the index file.  You can browse a document using the
Packit df99a1
.SM URL
Packit df99a1
of the index file, just like you do with a bundled multi-page document.  The
Packit df99a1
index file however is very small.  It simply contains the document directory
Packit df99a1
and the
Packit df99a1
.SM URLs
Packit df99a1
of secondary files containing the page data.  When you browse an indirect
Packit df99a1
multi-page document, the browser only accesses data for the pages you are
Packit df99a1
viewing.  This can be done at a reasonable speed because the browser maintains
Packit df99a1
a cache of pages and sometimes pre-fetches a few pages ahead of the current
Packit df99a1
page.  This model uses the web serving bandwidth much more effectively.  It
Packit df99a1
also eliminates unnecessary delays when jumping ahead to pages located
Packit df99a1
anywhere in a long document.
Packit df99a1
.SS Annotations
Packit df99a1
Every DjVu image optionally includes so-called annotation chunks.  The
Packit df99a1
annotation chunk is often used to define hyper-links to other document pages or
Packit df99a1
to arbitrary web pages.  Annotation chunks can also be used for other purposes
Packit df99a1
such as setting the initial viewing mode of a page, defining highlighted zones, or
Packit df99a1
storing arbitrary meta-data about the page or the document.
Packit df99a1
.SS Hidden text
Packit df99a1
Every DjVu image optionally includes a hidden text layer that associated
Packit df99a1
graphical features with the corresponding text.  The hidden text layer is
Packit df99a1
usually generated by running an Optical Character Recognition software.  This
Packit df99a1
textual information provides for indexing DjVu documents and copying/pasting
Packit df99a1
text from DjVu page images.
Packit df99a1
.SS Thumbnails
Packit df99a1
DjVu documents sometimes contain pre-computed page thumbnails.
Packit df99a1
.SS Outline
Packit df99a1
DjVu documents sometimes contain a navigation chunk
Packit df99a1
containing an outline, that is, a hierarchical 
Packit df99a1
table of contents with pointers to the corresponding
Packit df99a1
document pages.
Packit df99a1
Packit df99a1
.SH DJVUZONE AND DJVULIBRE
Packit df99a1
Packit df99a1
The DjVu technology was initially created by a few researchers in AT&T Labs
Packit df99a1
between 1995 and 1999.  Lizardtech, Inc. (
Packit df99a1
.B http://www.lizardtech.com
Packit df99a1
) then obtained a commercial license from AT&T and continued
Packit df99a1
the development.  They have now a variety of solutions for producing
Packit df99a1
and distributing documents using the DjVu technology.
Packit df99a1
.PP
Packit df99a1
The DjVuZone web site (
Packit df99a1
.B http://www.djvuzone.org
Packit df99a1
) is managed by the few AT&T Labs researchers who created the
Packit df99a1
DjVu technology in the first place.  We promote the DjVu technology
Packit df99a1
by providing an independent source of information about DjVu.
Packit df99a1
.PP
Packit df99a1
Understanding how little room there is for a proprietary document format,
Packit df99a1
Lizardtech released the DjVu Reference Library under the
Packit df99a1
.SM GNU
Packit df99a1
Public License in December 2000.  This library entirely defines the
Packit df99a1
compression format and the elementary codecs.  Six month later, Lizardtech
Packit df99a1
released an updated DjVu Reference Library as well as the source code of the
Packit df99a1
Unix viewer.
Packit df99a1
.PP
Packit df99a1
These two releases form the basis of our initial DjVuLibre software.  We
Packit df99a1
modified the build system to comply with the expectations of the open source
Packit df99a1
community.  Various bugs and portability issues have been fixed.  We also
Packit df99a1
tried to make it simpler to use and install, while preserving the essential
Packit df99a1
structure of the Lizardtech releases.
Packit df99a1
.PP
Packit df99a1
The DjVuLibre software contains the following components:
Packit df99a1
.TP
Packit df99a1
.BR bzz (1)
Packit df99a1
A general purpose compression command line program.  Many internal DjVu data
Packit df99a1
structures are compressed using this technique.
Packit df99a1
.TP
Packit df99a1
.BR c44 (1)
Packit df99a1
A DjVuPhoto command line encoder. This state-of-the-art wavelet compressor
Packit df99a1
produces DjVuPhoto images from PPM or JPEG images.
Packit df99a1
.TP
Packit df99a1
.BR cjb2 (1)
Packit df99a1
A DjVuBitonal command line encoder. This soft-pattern-matching compressor
Packit df99a1
produces DjVuBitonal images from PBM images.  It can encode images without loss,
Packit df99a1
or introduce small changes in order to improve the compression ratio.  The
Packit df99a1
lossless encoding mode is competitive with that of the Lizardtech commercial
Packit df99a1
encoders.
Packit df99a1
.TP
Packit df99a1
.BR cpaldjvu (1)
Packit df99a1
A DjVuDocument command line encoder for images with few colors.  This encoder
Packit df99a1
is well suited to compressing images with a small number of distinct colors
Packit df99a1
(e.g. screen-shots).  The dominant color is encoded by the background layer.
Packit df99a1
The other colors are encoded by the foreground layer.
Packit df99a1
.TP
Packit df99a1
.BR csepdjvu (1)
Packit df99a1
A DjVuDocument command line encoder for separated images.  This encoder takes
Packit df99a1
a file containing pre-segmented foreground and background images and produces
Packit df99a1
a DjVuDocument image.
Packit df99a1
.TP
Packit df99a1
.BR ddjvu (1)
Packit df99a1
A command line decoder for DjVu images.  This program produces a
Packit df99a1
.SM PNM
Packit df99a1
image representing any segment of any page of a DjVu document at any
Packit df99a1
resolution.
Packit df99a1
.TP
Packit df99a1
.BR djview (1)
Packit df99a1
A stand-alone viewer for DjVu images.  This sophisticated viewer displays DjVu
Packit df99a1
documents.  It implements document navigation as well as fast zooming and
Packit df99a1
panning.
Packit df99a1
.TP
Packit df99a1
.BR nsdejavu (1)
Packit df99a1
A web browser plugin for viewing DjVu images.  This small plugin allows for
Packit df99a1
viewing DjVu documents from web browsers.  It internally uses djview to
Packit df99a1
perform the actual work.
Packit df99a1
.TP
Packit df99a1
.BR djvups (1)
Packit df99a1
A command line tool for converting DjVu documents into
Packit df99a1
PostScript .
Packit df99a1
.TP
Packit df99a1
.BR djvm (1)
Packit df99a1
A command line tool for manipulating bundled multi-page DjVu documents.  This
Packit df99a1
program is often used to collect individual pages and produce a bundled
Packit df99a1
document.
Packit df99a1
.TP
Packit df99a1
.BR djvmcvt (1)
Packit df99a1
A command line tool for converting bundled documents to indirect documents and
Packit df99a1
conversely.
Packit df99a1
.TP
Packit df99a1
.BR djvused (1)
Packit df99a1
A powerful command line tool for manipulating multi-page documents, creating
Packit df99a1
or editing annotation chunks, creating or editing hidden text layers,
Packit df99a1
pre-computing thumbnail images, and more...
Packit df99a1
.TP
Packit df99a1
.BR djvutxt (1)
Packit df99a1
A command line tool to extract the hidden text from DjVu documents.
Packit df99a1
.TP
Packit df99a1
.BR djvudump (1)
Packit df99a1
A command line tool for inspecting DjVu files and displaying their internal
Packit df99a1
structure.
Packit df99a1
.TP
Packit df99a1
.BR djvuextract (1)
Packit df99a1
A command line tool for dis-assembling DjVu image files.
Packit df99a1
.TP
Packit df99a1
.BR djvumake (1)
Packit df99a1
A command line tool for assembling DjVu image files.
Packit df99a1
.TP
Packit df99a1
.BR djvuserve (1)
Packit df99a1
A
Packit df99a1
.SM CGI
Packit df99a1
program for generating indirect multi-page DjVu documents
Packit df99a1
on the fly.
Packit df99a1
.TP
Packit df99a1
.BR djvutoxml "(1), " djvuxmlparser (1)
Packit df99a1
Command line tools to edit DjVu metadata as XML files.
Packit df99a1
Packit df99a1
.SH DJVU ENCODERS AND ANY2DJVU
Packit df99a1
Packit df99a1
DjVuLibre comes with a variety of specialized encoders,
Packit df99a1
.BR c44 (1)
Packit df99a1
for photographic images,
Packit df99a1
.BR cjb2 (1)
Packit df99a1
for bitonal images, and
Packit df99a1
.BR cpaldjvu (1)
Packit df99a1
for images with few distinct colors.
Packit df99a1
Although these encoders perform well in their specialized domain,
Packit df99a1
they cannot handle complex tasks involving segmentation and
Packit df99a1
multipage encoding.
Packit df99a1
Packit df99a1
The Lizardtech commercial products
Packit df99a1
.BR "" "(see " "http://www.lizardtech.com/solutions/document" )
Packit df99a1
can perform these complex encoding tasks
Packit df99a1
Packit df99a1
Packit df99a1
Another solution is provided by the compression server at
Packit df99a1
.BR "" ( "http://any2djvu.djvuzone.org" ).
Packit df99a1
This machine uses pre-lizardtech prototype encoders from AT&T Labs and
Packit df99a1
performs almost as well as the commercial Lizardtech encoders.  Please note
Packit df99a1
that the Any2DjVu compression server comes with no guarantee, that
Packit df99a1
nothing is done to ensure that your documents will remain confidential, and
Packit df99a1
that there is only one computer working for the whole planet.
Packit df99a1
Packit df99a1
.SH CREDITS
Packit df99a1
Packit df99a1
Numerous people have contributed to the DjVu source code during the
Packit df99a1
last five years.  Please submit a sourceforge bug report to update the
Packit df99a1
following list.
Packit df99a1
.IP "" 3
Packit df99a1
Yoshua Bengio,
Packit df99a1
L\('eon Bottou,
Packit df99a1
Chakradhar Chandaluri,
Packit df99a1
Regis M\. Chaplin,
Packit df99a1
Ming Chen,
Packit df99a1
Parag Deshmukh,
Packit df99a1
Royce Edwards,
Packit df99a1
Andrew Erofeev,
Packit df99a1
Praveen Guduru,
Packit df99a1
Patrick Haffner,
Packit df99a1
Paul G\. Howard,
Packit df99a1
Orlando Keise,
Packit df99a1
Yann Le Cun,
Packit df99a1
Artem Mikheev,
Packit df99a1
Florin Nicsa,
Packit df99a1
Joseph M\. Orost,
Packit df99a1
Steven Pigeon,
Packit df99a1
Bill Riemers,
Packit df99a1
Patrice Simard,
Packit df99a1
Jeffery Triggs,
Packit df99a1
Luc Vincent,
Packit df99a1
Pascal Vincent.