Blame manual.xml

Packit 71fd91
 
Packit 71fd91
Packit 71fd91
  "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"[
Packit 71fd91
Packit 71fd91
Packit 71fd91
 %common-ents;
Packit 71fd91
]>
Packit 71fd91
Packit 71fd91
<book lang="en" id="userman" xreflabel="bzip2 Manual">
Packit 71fd91
Packit 71fd91
 <bookinfo>
Packit 71fd91
  <title>bzip2 and libbzip2, version 1.0.6</title>
Packit 71fd91
  <subtitle>A program and library for data compression</subtitle>
Packit 71fd91
  <copyright>
Packit 71fd91
   <year>&bz-lifespan;</year>
Packit 71fd91
   <holder>Julian Seward</holder>
Packit 71fd91
  </copyright>
Packit 71fd91
  <releaseinfo>Version &bz-version; of &bz-date;</releaseinfo>
Packit 71fd91
Packit 71fd91
  <authorgroup>
Packit 71fd91
   <author>
Packit 71fd91
    <firstname>Julian</firstname>
Packit 71fd91
    <surname>Seward</surname>
Packit 71fd91
    <affiliation>
Packit 71fd91
     <orgname>&bz-url;</orgname>
Packit 71fd91
    </affiliation>
Packit 71fd91
   </author>
Packit 71fd91
  </authorgroup>
Packit 71fd91
Packit 71fd91
  <legalnotice>
Packit 71fd91
Packit 71fd91
  <para>This program, <computeroutput>bzip2</computeroutput>, the
Packit 71fd91
  associated library <computeroutput>libbzip2</computeroutput>, and
Packit 71fd91
  all documentation, are copyright © &bz-lifespan; Julian Seward.
Packit 71fd91
  All rights reserved.</para>
Packit 71fd91
Packit 71fd91
  <para>Redistribution and use in source and binary forms, with
Packit 71fd91
  or without modification, are permitted provided that the
Packit 71fd91
  following conditions are met:</para>
Packit 71fd91
Packit 71fd91
  <itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
   <listitem><para>Redistributions of source code must retain the
Packit 71fd91
   above copyright notice, this list of conditions and the
Packit 71fd91
   following disclaimer.</para></listitem>
Packit 71fd91
Packit 71fd91
   <listitem><para>The origin of this software must not be
Packit 71fd91
   misrepresented; you must not claim that you wrote the original
Packit 71fd91
   software.  If you use this software in a product, an
Packit 71fd91
   acknowledgment in the product documentation would be
Packit 71fd91
   appreciated but is not required.</para></listitem>
Packit 71fd91
Packit 71fd91
   <listitem><para>Altered source versions must be plainly marked
Packit 71fd91
   as such, and must not be misrepresented as being the original
Packit 71fd91
   software.</para></listitem>
Packit 71fd91
Packit 71fd91
   <listitem><para>The name of the author may not be used to
Packit 71fd91
   endorse or promote products derived from this software without
Packit 71fd91
   specific prior written permission.</para></listitem>
Packit 71fd91
Packit 71fd91
  </itemizedlist>
Packit 71fd91
Packit 71fd91
  <para>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY
Packit 71fd91
  EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
Packit 71fd91
  THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
Packit 71fd91
  PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE
Packit 71fd91
  AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
Packit 71fd91
  EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
Packit 71fd91
  TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
Packit 71fd91
  DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
Packit 71fd91
  ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
Packit 71fd91
  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
Packit 71fd91
  IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
Packit 71fd91
  THE POSSIBILITY OF SUCH DAMAGE.</para>
Packit 71fd91
Packit 71fd91
 <para>PATENTS: To the best of my knowledge,
Packit 71fd91
 <computeroutput>bzip2</computeroutput> and
Packit 71fd91
 <computeroutput>libbzip2</computeroutput> do not use any patented
Packit 71fd91
 algorithms.  However, I do not have the resources to carry
Packit 71fd91
 out a patent search.  Therefore I cannot give any guarantee of
Packit 71fd91
 the above statement.
Packit 71fd91
 </para>
Packit 71fd91
Packit 71fd91
</legalnotice>
Packit 71fd91
Packit 71fd91
</bookinfo>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<chapter id="intro" xreflabel="Introduction">
Packit 71fd91
<title>Introduction</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> compresses files
Packit 71fd91
using the Burrows-Wheeler block-sorting text compression
Packit 71fd91
algorithm, and Huffman coding.  Compression is generally
Packit 71fd91
considerably better than that achieved by more conventional
Packit 71fd91
LZ77/LZ78-based compressors, and approaches the performance of
Packit 71fd91
the PPM family of statistical compressors.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> is built on top of
Packit 71fd91
<computeroutput>libbzip2</computeroutput>, a flexible library for
Packit 71fd91
handling compressed data in the
Packit 71fd91
<computeroutput>bzip2</computeroutput> format.  This manual
Packit 71fd91
describes both how to use the program and how to work with the
Packit 71fd91
library interface.  Most of the manual is devoted to this
Packit 71fd91
library, not the program, which is good news if your interest is
Packit 71fd91
only in the program.</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para><xref linkend="using"/> describes how to use
Packit 71fd91
 <computeroutput>bzip2</computeroutput>; this is the only part
Packit 71fd91
 you need to read if you just want to know how to operate the
Packit 71fd91
 program.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><xref linkend="libprog"/> describes the
Packit 71fd91
 programming interfaces in detail, and</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><xref linkend="misc"/> records some
Packit 71fd91
 miscellaneous notes which I thought ought to be recorded
Packit 71fd91
 somewhere.</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
</chapter>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<chapter id="using" xreflabel="How to use bzip2">
Packit 71fd91
<title>How to use bzip2</title>
Packit 71fd91
Packit 71fd91
<para>This chapter contains a copy of the
Packit 71fd91
<computeroutput>bzip2</computeroutput> man page, and nothing
Packit 71fd91
else.</para>
Packit 71fd91
Packit 71fd91
<sect1 id="name" xreflabel="NAME">
Packit 71fd91
<title>NAME</title>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzip2</computeroutput>,
Packit 71fd91
  <computeroutput>bunzip2</computeroutput> - a block-sorting file
Packit 71fd91
  compressor, v1.0.6</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzcat</computeroutput> -
Packit 71fd91
   decompresses files to stdout</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzip2recover</computeroutput> -
Packit 71fd91
   recovers data from damaged bzip2 files</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="synopsis" xreflabel="SYNOPSIS">
Packit 71fd91
<title>SYNOPSIS</title>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzip2</computeroutput> [
Packit 71fd91
  -cdfkqstvzVL123456789 ] [ filenames ...  ]</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bunzip2</computeroutput> [
Packit 71fd91
  -fkvsVL ] [ filenames ...  ]</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzcat</computeroutput> [ -s ] [
Packit 71fd91
  filenames ...  ]</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzip2recover</computeroutput>
Packit 71fd91
  filename</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="description" xreflabel="DESCRIPTION">
Packit 71fd91
<title>DESCRIPTION</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> compresses files
Packit 71fd91
using the Burrows-Wheeler block sorting text compression
Packit 71fd91
algorithm, and Huffman coding.  Compression is generally
Packit 71fd91
considerably better than that achieved by more conventional
Packit 71fd91
LZ77/LZ78-based compressors, and approaches the performance of
Packit 71fd91
the PPM family of statistical compressors.</para>
Packit 71fd91
Packit 71fd91
<para>The command-line options are deliberately very similar to
Packit 71fd91
those of GNU <computeroutput>gzip</computeroutput>, but they are
Packit 71fd91
not identical.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> expects a list of
Packit 71fd91
file names to accompany the command-line flags.  Each file is
Packit 71fd91
replaced by a compressed version of itself, with the name
Packit 71fd91
<computeroutput>original_name.bz2</computeroutput>.  Each
Packit 71fd91
compressed file has the same modification date, permissions, and,
Packit 71fd91
when possible, ownership as the corresponding original, so that
Packit 71fd91
these properties can be correctly restored at decompression time.
Packit 71fd91
File name handling is naive in the sense that there is no
Packit 71fd91
mechanism for preserving original file names, permissions,
Packit 71fd91
ownerships or dates in filesystems which lack these concepts, or
Packit 71fd91
have serious file name length restrictions, such as
Packit 71fd91
MS-DOS.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> and
Packit 71fd91
<computeroutput>bunzip2</computeroutput> will by default not
Packit 71fd91
overwrite existing files.  If you want this to happen, specify
Packit 71fd91
the <computeroutput>-f</computeroutput> flag.</para>
Packit 71fd91
Packit 71fd91
<para>If no file names are specified,
Packit 71fd91
<computeroutput>bzip2</computeroutput> compresses from standard
Packit 71fd91
input to standard output.  In this case,
Packit 71fd91
<computeroutput>bzip2</computeroutput> will decline to write
Packit 71fd91
compressed output to a terminal, as this would be entirely
Packit 71fd91
incomprehensible and therefore pointless.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bunzip2</computeroutput> (or
Packit 71fd91
<computeroutput>bzip2 -d</computeroutput>) decompresses all
Packit 71fd91
specified files.  Files which were not created by
Packit 71fd91
<computeroutput>bzip2</computeroutput> will be detected and
Packit 71fd91
ignored, and a warning issued.
Packit 71fd91
<computeroutput>bzip2</computeroutput> attempts to guess the
Packit 71fd91
filename for the decompressed file from that of the compressed
Packit 71fd91
file as follows:</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>filename.bz2 </computeroutput>
Packit 71fd91
  becomes
Packit 71fd91
  <computeroutput>filename</computeroutput></para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>filename.bz </computeroutput>
Packit 71fd91
  becomes
Packit 71fd91
  <computeroutput>filename</computeroutput></para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>filename.tbz2</computeroutput>
Packit 71fd91
  becomes
Packit 71fd91
  <computeroutput>filename.tar</computeroutput></para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>filename.tbz </computeroutput>
Packit 71fd91
  becomes
Packit 71fd91
  <computeroutput>filename.tar</computeroutput></para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>anyothername </computeroutput>
Packit 71fd91
  becomes
Packit 71fd91
  <computeroutput>anyothername.out</computeroutput></para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
<para>If the file does not end in one of the recognised endings,
Packit 71fd91
<computeroutput>.bz2</computeroutput>,
Packit 71fd91
<computeroutput>.bz</computeroutput>,
Packit 71fd91
<computeroutput>.tbz2</computeroutput> or
Packit 71fd91
<computeroutput>.tbz</computeroutput>,
Packit 71fd91
<computeroutput>bzip2</computeroutput> complains that it cannot
Packit 71fd91
guess the name of the original file, and uses the original name
Packit 71fd91
with <computeroutput>.out</computeroutput> appended.</para>
Packit 71fd91
Packit 71fd91
<para>As with compression, supplying no filenames causes
Packit 71fd91
decompression from standard input to standard output.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bunzip2</computeroutput> will correctly
Packit 71fd91
decompress a file which is the concatenation of two or more
Packit 71fd91
compressed files.  The result is the concatenation of the
Packit 71fd91
corresponding uncompressed files.  Integrity testing
Packit 71fd91
(<computeroutput>-t</computeroutput>) of concatenated compressed
Packit 71fd91
files is also supported.</para>
Packit 71fd91
Packit 71fd91
<para>You can also compress or decompress files to the standard
Packit 71fd91
output by giving the <computeroutput>-c</computeroutput> flag.
Packit 71fd91
Multiple files may be compressed and decompressed like this.  The
Packit 71fd91
resulting outputs are fed sequentially to stdout.  Compression of
Packit 71fd91
multiple files in this manner generates a stream containing
Packit 71fd91
multiple compressed file representations.  Such a stream can be
Packit 71fd91
decompressed correctly only by
Packit 71fd91
<computeroutput>bzip2</computeroutput> version 0.9.0 or later.
Packit 71fd91
Earlier versions of <computeroutput>bzip2</computeroutput> will
Packit 71fd91
stop after decompressing the first file in the stream.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzcat</computeroutput> (or
Packit 71fd91
<computeroutput>bzip2 -dc</computeroutput>) decompresses all
Packit 71fd91
specified files to the standard output.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> will read arguments
Packit 71fd91
from the environment variables
Packit 71fd91
<computeroutput>BZIP2</computeroutput> and
Packit 71fd91
<computeroutput>BZIP</computeroutput>, in that order, and will
Packit 71fd91
process them before any arguments read from the command line.
Packit 71fd91
This gives a convenient way to supply default arguments.</para>
Packit 71fd91
Packit 71fd91
<para>Compression is always performed, even if the compressed
Packit 71fd91
file is slightly larger than the original.  Files of less than
Packit 71fd91
about one hundred bytes tend to get larger, since the compression
Packit 71fd91
mechanism has a constant overhead in the region of 50 bytes.
Packit 71fd91
Random data (including the output of most file compressors) is
Packit 71fd91
coded at about 8.05 bits per byte, giving an expansion of around
Packit 71fd91
0.5%.</para>
Packit 71fd91
Packit 71fd91
<para>As a self-check for your protection,
Packit 71fd91
<computeroutput>bzip2</computeroutput> uses 32-bit CRCs to make
Packit 71fd91
sure that the decompressed version of a file is identical to the
Packit 71fd91
original.  This guards against corruption of the compressed data,
Packit 71fd91
and against undetected bugs in
Packit 71fd91
<computeroutput>bzip2</computeroutput> (hopefully very unlikely).
Packit 71fd91
The chances of data corruption going undetected is microscopic,
Packit 71fd91
about one chance in four billion for each file processed.  Be
Packit 71fd91
aware, though, that the check occurs upon decompression, so it
Packit 71fd91
can only tell you that something is wrong.  It can't help you
Packit 71fd91
recover the original uncompressed data.  You can use
Packit 71fd91
<computeroutput>bzip2recover</computeroutput> to try to recover
Packit 71fd91
data from damaged files.</para>
Packit 71fd91
Packit 71fd91
<para>Return values: 0 for a normal exit, 1 for environmental
Packit 71fd91
problems (file not found, invalid flags, I/O errors, etc.), 2
Packit 71fd91
to indicate a corrupt compressed file, 3 for an internal
Packit 71fd91
consistency error (eg, bug) which caused
Packit 71fd91
<computeroutput>bzip2</computeroutput> to panic.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="options" xreflabel="OPTIONS">
Packit 71fd91
<title>OPTIONS</title>
Packit 71fd91
Packit 71fd91
<variablelist>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-c --stdout</computeroutput></term>
Packit 71fd91
 <listitem><para>Compress or decompress to standard
Packit 71fd91
  output.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-d --decompress</computeroutput></term>
Packit 71fd91
 <listitem><para>Force decompression.
Packit 71fd91
  <computeroutput>bzip2</computeroutput>,
Packit 71fd91
  <computeroutput>bunzip2</computeroutput> and
Packit 71fd91
  <computeroutput>bzcat</computeroutput> are really the same
Packit 71fd91
  program, and the decision about what actions to take is done on
Packit 71fd91
  the basis of which name is used.  This flag overrides that
Packit 71fd91
  mechanism, and forces bzip2 to decompress.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-z --compress</computeroutput></term>
Packit 71fd91
 <listitem><para>The complement to
Packit 71fd91
  <computeroutput>-d</computeroutput>: forces compression,
Packit 71fd91
  regardless of the invokation name.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-t --test</computeroutput></term>
Packit 71fd91
 <listitem><para>Check integrity of the specified file(s), but
Packit 71fd91
  don't decompress them.  This really performs a trial
Packit 71fd91
  decompression and throws away the result.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-f --force</computeroutput></term>
Packit 71fd91
 <listitem><para>Force overwrite of output files.  Normally,
Packit 71fd91
  <computeroutput>bzip2</computeroutput> will not overwrite
Packit 71fd91
  existing output files.  Also forces
Packit 71fd91
  <computeroutput>bzip2</computeroutput> to break hard links to
Packit 71fd91
  files, which it otherwise wouldn't do.</para>
Packit 71fd91
  <para><computeroutput>bzip2</computeroutput> normally declines
Packit 71fd91
  to decompress files which don't have the correct magic header
Packit 71fd91
  bytes. If forced (<computeroutput>-f</computeroutput>),
Packit 71fd91
  however, it will pass such files through unmodified. This is
Packit 71fd91
  how GNU <computeroutput>gzip</computeroutput> behaves.</para>
Packit 71fd91
 </listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-k --keep</computeroutput></term>
Packit 71fd91
 <listitem><para>Keep (don't delete) input files during
Packit 71fd91
  compression or decompression.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-s --small</computeroutput></term>
Packit 71fd91
 <listitem><para>Reduce memory usage, for compression,
Packit 71fd91
  decompression and testing.  Files are decompressed and tested
Packit 71fd91
  using a modified algorithm which only requires 2.5 bytes per
Packit 71fd91
  block byte.  This means any file can be decompressed in 2300k
Packit 71fd91
  of memory, albeit at about half the normal speed.</para>
Packit 71fd91
  <para>During compression, <computeroutput>-s</computeroutput>
Packit 71fd91
  selects a block size of 200k, which limits memory use to around
Packit 71fd91
  the same figure, at the expense of your compression ratio.  In
Packit 71fd91
  short, if your machine is low on memory (8 megabytes or less),
Packit 71fd91
  use <computeroutput>-s</computeroutput> for everything.  See
Packit 71fd91
  <xref linkend="memory-management"/> below.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-q --quiet</computeroutput></term>
Packit 71fd91
 <listitem><para>Suppress non-essential warning messages.
Packit 71fd91
  Messages pertaining to I/O errors and other critical events
Packit 71fd91
  will not be suppressed.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-v --verbose</computeroutput></term>
Packit 71fd91
 <listitem><para>Verbose mode -- show the compression ratio for
Packit 71fd91
  each file processed.  Further
Packit 71fd91
  <computeroutput>-v</computeroutput>'s increase the verbosity
Packit 71fd91
  level, spewing out lots of information which is primarily of
Packit 71fd91
  interest for diagnostic purposes.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-L --license -V --version</computeroutput></term>
Packit 71fd91
 <listitem><para>Display the software version, license terms and
Packit 71fd91
  conditions.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>-1</computeroutput> (or
Packit 71fd91
 <computeroutput>--fast</computeroutput>) to
Packit 71fd91
 <computeroutput>-9</computeroutput> (or
Packit 71fd91
 <computeroutput>-best</computeroutput>)</term>
Packit 71fd91
 <listitem><para>Set the block size to 100 k, 200 k ...  900 k
Packit 71fd91
  when compressing.  Has no effect when decompressing.  See 
Packit 71fd91
  linkend="memory-management" /> below.  The
Packit 71fd91
  <computeroutput>--fast</computeroutput> and
Packit 71fd91
  <computeroutput>--best</computeroutput> aliases are primarily
Packit 71fd91
  for GNU <computeroutput>gzip</computeroutput> compatibility.
Packit 71fd91
  In particular, <computeroutput>--fast</computeroutput> doesn't
Packit 71fd91
  make things significantly faster.  And
Packit 71fd91
  <computeroutput>--best</computeroutput> merely selects the
Packit 71fd91
  default behaviour.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>--</computeroutput></term>
Packit 71fd91
 <listitem><para>Treats all subsequent arguments as file names,
Packit 71fd91
  even if they start with a dash.  This is so you can handle
Packit 71fd91
  files with names beginning with a dash, for example:
Packit 71fd91
  <computeroutput>bzip2 --
Packit 71fd91
  -myfilename</computeroutput>.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
 <term><computeroutput>--repetitive-fast</computeroutput></term>
Packit 71fd91
 <term><computeroutput>--repetitive-best</computeroutput></term>
Packit 71fd91
 <listitem><para>These flags are redundant in versions 0.9.5 and
Packit 71fd91
  above.  They provided some coarse control over the behaviour of
Packit 71fd91
  the sorting algorithm in earlier versions, which was sometimes
Packit 71fd91
  useful.  0.9.5 and above have an improved algorithm which
Packit 71fd91
  renders these flags irrelevant.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
</variablelist>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="memory-management" xreflabel="MEMORY MANAGEMENT">
Packit 71fd91
<title>MEMORY MANAGEMENT</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> compresses large
Packit 71fd91
files in blocks.  The block size affects both the compression
Packit 71fd91
ratio achieved, and the amount of memory needed for compression
Packit 71fd91
and decompression.  The flags <computeroutput>-1</computeroutput>
Packit 71fd91
through <computeroutput>-9</computeroutput> specify the block
Packit 71fd91
size to be 100,000 bytes through 900,000 bytes (the default)
Packit 71fd91
respectively.  At decompression time, the block size used for
Packit 71fd91
compression is read from the header of the compressed file, and
Packit 71fd91
<computeroutput>bunzip2</computeroutput> then allocates itself
Packit 71fd91
just enough memory to decompress the file.  Since block sizes are
Packit 71fd91
stored in compressed files, it follows that the flags
Packit 71fd91
<computeroutput>-1</computeroutput> to
Packit 71fd91
<computeroutput>-9</computeroutput> are irrelevant to and so
Packit 71fd91
ignored during decompression.</para>
Packit 71fd91
Packit 71fd91
<para>Compression and decompression requirements, in bytes, can be
Packit 71fd91
estimated as:</para>
Packit 71fd91
<programlisting>
Packit 71fd91
Compression:   400k + ( 8 x block size )
Packit 71fd91
Packit 71fd91
Decompression: 100k + ( 4 x block size ), or
Packit 71fd91
               100k + ( 2.5 x block size )
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Larger block sizes give rapidly diminishing marginal
Packit 71fd91
returns.  Most of the compression comes from the first two or
Packit 71fd91
three hundred k of block size, a fact worth bearing in mind when
Packit 71fd91
using <computeroutput>bzip2</computeroutput> on small machines.
Packit 71fd91
It is also important to appreciate that the decompression memory
Packit 71fd91
requirement is set at compression time by the choice of block
Packit 71fd91
size.</para>
Packit 71fd91
Packit 71fd91
<para>For files compressed with the default 900k block size,
Packit 71fd91
<computeroutput>bunzip2</computeroutput> will require about 3700
Packit 71fd91
kbytes to decompress.  To support decompression of any file on a
Packit 71fd91
4 megabyte machine, <computeroutput>bunzip2</computeroutput> has
Packit 71fd91
an option to decompress using approximately half this amount of
Packit 71fd91
memory, about 2300 kbytes.  Decompression speed is also halved,
Packit 71fd91
so you should use this option only where necessary.  The relevant
Packit 71fd91
flag is <computeroutput>-s</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>In general, try and use the largest block size memory
Packit 71fd91
constraints allow, since that maximises the compression achieved.
Packit 71fd91
Compression and decompression speed are virtually unaffected by
Packit 71fd91
block size.</para>
Packit 71fd91
Packit 71fd91
<para>Another significant point applies to files which fit in a
Packit 71fd91
single block -- that means most files you'd encounter using a
Packit 71fd91
large block size.  The amount of real memory touched is
Packit 71fd91
proportional to the size of the file, since the file is smaller
Packit 71fd91
than a block.  For example, compressing a file 20,000 bytes long
Packit 71fd91
with the flag <computeroutput>-9</computeroutput> will cause the
Packit 71fd91
compressor to allocate around 7600k of memory, but only touch
Packit 71fd91
400k + 20000 * 8 = 560 kbytes of it.  Similarly, the decompressor
Packit 71fd91
will allocate 3700k but only touch 100k + 20000 * 4 = 180
Packit 71fd91
kbytes.</para>
Packit 71fd91
Packit 71fd91
<para>Here is a table which summarises the maximum memory usage
Packit 71fd91
for different block sizes.  Also recorded is the total compressed
Packit 71fd91
size for 14 files of the Calgary Text Compression Corpus
Packit 71fd91
totalling 3,141,622 bytes.  This column gives some feel for how
Packit 71fd91
compression varies with block size.  These figures tend to
Packit 71fd91
understate the advantage of larger block sizes for larger files,
Packit 71fd91
since the Corpus is dominated by smaller files.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
        Compress   Decompress   Decompress   Corpus
Packit 71fd91
Flag     usage      usage       -s usage     Size
Packit 71fd91
Packit 71fd91
 -1      1200k       500k         350k      914704
Packit 71fd91
 -2      2000k       900k         600k      877703
Packit 71fd91
 -3      2800k      1300k         850k      860338
Packit 71fd91
 -4      3600k      1700k        1100k      846899
Packit 71fd91
 -5      4400k      2100k        1350k      845160
Packit 71fd91
 -6      5200k      2500k        1600k      838626
Packit 71fd91
 -7      6100k      2900k        1850k      834096
Packit 71fd91
 -8      6800k      3300k        2100k      828642
Packit 71fd91
 -9      7600k      3700k        2350k      828642
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="recovering" xreflabel="RECOVERING DATA FROM DAMAGED FILES">
Packit 71fd91
<title>RECOVERING DATA FROM DAMAGED FILES</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> compresses files in
Packit 71fd91
blocks, usually 900kbytes long.  Each block is handled
Packit 71fd91
independently.  If a media or transmission error causes a
Packit 71fd91
multi-block <computeroutput>.bz2</computeroutput> file to become
Packit 71fd91
damaged, it may be possible to recover data from the undamaged
Packit 71fd91
blocks in the file.</para>
Packit 71fd91
Packit 71fd91
<para>The compressed representation of each block is delimited by
Packit 71fd91
a 48-bit pattern, which makes it possible to find the block
Packit 71fd91
boundaries with reasonable certainty.  Each block also carries
Packit 71fd91
its own 32-bit CRC, so damaged blocks can be distinguished from
Packit 71fd91
undamaged ones.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2recover</computeroutput> is a simple
Packit 71fd91
program whose purpose is to search for blocks in
Packit 71fd91
<computeroutput>.bz2</computeroutput> files, and write each block
Packit 71fd91
out into its own <computeroutput>.bz2</computeroutput> file.  You
Packit 71fd91
can then use <computeroutput>bzip2 -t</computeroutput> to test
Packit 71fd91
the integrity of the resulting files, and decompress those which
Packit 71fd91
are undamaged.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2recover</computeroutput> takes a
Packit 71fd91
single argument, the name of the damaged file, and writes a
Packit 71fd91
number of files <computeroutput>rec0001file.bz2</computeroutput>,
Packit 71fd91
<computeroutput>rec0002file.bz2</computeroutput>, etc, containing
Packit 71fd91
the extracted blocks.  The output filenames are designed so that
Packit 71fd91
the use of wildcards in subsequent processing -- for example,
Packit 71fd91
<computeroutput>bzip2 -dc rec*file.bz2 >
Packit 71fd91
recovered_data</computeroutput> -- lists the files in the correct
Packit 71fd91
order.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2recover</computeroutput> should be of
Packit 71fd91
most use dealing with large <computeroutput>.bz2</computeroutput>
Packit 71fd91
files, as these will contain many blocks.  It is clearly futile
Packit 71fd91
to use it on damaged single-block files, since a damaged block
Packit 71fd91
cannot be recovered.  If you wish to minimise any potential data
Packit 71fd91
loss through media or transmission errors, you might consider
Packit 71fd91
compressing with a smaller block size.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="performance" xreflabel="PERFORMANCE NOTES">
Packit 71fd91
<title>PERFORMANCE NOTES</title>
Packit 71fd91
Packit 71fd91
<para>The sorting phase of compression gathers together similar
Packit 71fd91
strings in the file.  Because of this, files containing very long
Packit 71fd91
runs of repeated symbols, like "aabaabaabaab ..."  (repeated
Packit 71fd91
several hundred times) may compress more slowly than normal.
Packit 71fd91
Versions 0.9.5 and above fare much better than previous versions
Packit 71fd91
in this respect.  The ratio between worst-case and average-case
Packit 71fd91
compression time is in the region of 10:1.  For previous
Packit 71fd91
versions, this figure was more like 100:1.  You can use the
Packit 71fd91
<computeroutput>-vvvv</computeroutput> option to monitor progress
Packit 71fd91
in great detail, if you want.</para>
Packit 71fd91
Packit 71fd91
<para>Decompression speed is unaffected by these
Packit 71fd91
phenomena.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> usually allocates
Packit 71fd91
several megabytes of memory to operate in, and then charges all
Packit 71fd91
over it in a fairly random fashion.  This means that performance,
Packit 71fd91
both for compressing and decompressing, is largely determined by
Packit 71fd91
the speed at which your machine can service cache misses.
Packit 71fd91
Because of this, small changes to the code to reduce the miss
Packit 71fd91
rate have been observed to give disproportionately large
Packit 71fd91
performance improvements.  I imagine
Packit 71fd91
<computeroutput>bzip2</computeroutput> will perform best on
Packit 71fd91
machines with very large caches.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="caveats" xreflabel="CAVEATS">
Packit 71fd91
<title>CAVEATS</title>
Packit 71fd91
Packit 71fd91
<para>I/O error messages are not as helpful as they could be.
Packit 71fd91
<computeroutput>bzip2</computeroutput> tries hard to detect I/O
Packit 71fd91
errors and exit cleanly, but the details of what the problem is
Packit 71fd91
sometimes seem rather misleading.</para>
Packit 71fd91
Packit 71fd91
<para>This manual page pertains to version &bz-version; of
Packit 71fd91
<computeroutput>bzip2</computeroutput>.  Compressed data created by
Packit 71fd91
this version is entirely forwards and backwards compatible with the
Packit 71fd91
previous public releases, versions 0.1pl2, 0.9.0 and 0.9.5, 1.0.0,
Packit 71fd91
1.0.1, 1.0.2 and 1.0.3, but with the following exception: 0.9.0 and
Packit 71fd91
above can correctly decompress multiple concatenated compressed files.
Packit 71fd91
0.1pl2 cannot do this; it will stop after decompressing just the first
Packit 71fd91
file in the stream.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2recover</computeroutput> versions
Packit 71fd91
prior to 1.0.2 used 32-bit integers to represent bit positions in
Packit 71fd91
compressed files, so it could not handle compressed files more
Packit 71fd91
than 512 megabytes long.  Versions 1.0.2 and above use 64-bit ints
Packit 71fd91
on some platforms which support them (GNU supported targets, and
Packit 71fd91
Windows). To establish whether or not
Packit 71fd91
<computeroutput>bzip2recover</computeroutput> was built with such
Packit 71fd91
a limitation, run it without arguments. In any event you can
Packit 71fd91
build yourself an unlimited version if you can recompile it with
Packit 71fd91
<computeroutput>MaybeUInt64</computeroutput> set to be an
Packit 71fd91
unsigned 64-bit integer.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="author" xreflabel="AUTHOR">
Packit 71fd91
<title>AUTHOR</title>
Packit 71fd91
Packit 71fd91
<para>Julian Seward,
Packit 71fd91
<computeroutput>&bz-email;</computeroutput></para>
Packit 71fd91
Packit 71fd91
<para>The ideas embodied in
Packit 71fd91
<computeroutput>bzip2</computeroutput> are due to (at least) the
Packit 71fd91
following people: Michael Burrows and David Wheeler (for the
Packit 71fd91
block sorting transformation), David Wheeler (again, for the
Packit 71fd91
Huffman coder), Peter Fenwick (for the structured coding model in
Packit 71fd91
the original <computeroutput>bzip</computeroutput>, and many
Packit 71fd91
refinements), and Alistair Moffat, Radford Neal and Ian Witten
Packit 71fd91
(for the arithmetic coder in the original
Packit 71fd91
<computeroutput>bzip</computeroutput>).  I am much indebted for
Packit 71fd91
their help, support and advice.  See the manual in the source
Packit 71fd91
distribution for pointers to sources of documentation.  Christian
Packit 71fd91
von Roques encouraged me to look for faster sorting algorithms,
Packit 71fd91
so as to speed up compression.  Bela Lubkin encouraged me to
Packit 71fd91
improve the worst-case compression performance.  
Packit 71fd91
Donna Robinson XMLised the documentation.
Packit 71fd91
Many people sent
Packit 71fd91
patches, helped with portability problems, lent machines, gave
Packit 71fd91
advice and were generally helpful.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
</chapter>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<chapter id="libprog" xreflabel="Programming with libbzip2">
Packit 71fd91
<title>
Packit 71fd91
Programming with <computeroutput>libbzip2</computeroutput>
Packit 71fd91
</title>
Packit 71fd91
Packit 71fd91
<para>This chapter describes the programming interface to
Packit 71fd91
<computeroutput>libbzip2</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>For general background information, particularly about
Packit 71fd91
memory use and performance aspects, you'd be well advised to read
Packit 71fd91
<xref linkend="using"/> as well.</para>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="top-level" xreflabel="Top-level structure">
Packit 71fd91
<title>Top-level structure</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>libbzip2</computeroutput> is a flexible
Packit 71fd91
library for compressing and decompressing data in the
Packit 71fd91
<computeroutput>bzip2</computeroutput> data format.  Although
Packit 71fd91
packaged as a single entity, it helps to regard the library as
Packit 71fd91
three separate parts: the low level interface, and the high level
Packit 71fd91
interface, and some utility functions.</para>
Packit 71fd91
Packit 71fd91
<para>The structure of
Packit 71fd91
<computeroutput>libbzip2</computeroutput>'s interfaces is similar
Packit 71fd91
to that of Jean-loup Gailly's and Mark Adler's excellent
Packit 71fd91
<computeroutput>zlib</computeroutput> library.</para>
Packit 71fd91
Packit 71fd91
<para>All externally visible symbols have names beginning
Packit 71fd91
<computeroutput>BZ2_</computeroutput>.  This is new in version
Packit 71fd91
1.0.  The intention is to minimise pollution of the namespaces of
Packit 71fd91
library clients.</para>
Packit 71fd91
Packit 71fd91
<para>To use any part of the library, you need to
Packit 71fd91
<computeroutput>#include <bzlib.h></computeroutput>
Packit 71fd91
into your sources.</para>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="ll-summary" xreflabel="Low-level summary">
Packit 71fd91
<title>Low-level summary</title>
Packit 71fd91
Packit 71fd91
<para>This interface provides services for compressing and
Packit 71fd91
decompressing data in memory.  There's no provision for dealing
Packit 71fd91
with files, streams or any other I/O mechanisms, just straight
Packit 71fd91
memory-to-memory work.  In fact, this part of the library can be
Packit 71fd91
compiled without inclusion of
Packit 71fd91
<computeroutput>stdio.h</computeroutput>, which may be helpful
Packit 71fd91
for embedded applications.</para>
Packit 71fd91
Packit 71fd91
<para>The low-level part of the library has no global variables
Packit 71fd91
and is therefore thread-safe.</para>
Packit 71fd91
Packit 71fd91
<para>Six routines make up the low level interface:
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>, and
Packit 71fd91
<computeroutput>BZ2_bzCompressEnd</computeroutput> for
Packit 71fd91
compression, and a corresponding trio
Packit 71fd91
<computeroutput>BZ2_bzDecompressInit</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzDecompressEnd</computeroutput> for
Packit 71fd91
decompression.  The <computeroutput>*Init</computeroutput>
Packit 71fd91
functions allocate memory for compression/decompression and do
Packit 71fd91
other initialisations, whilst the
Packit 71fd91
<computeroutput>*End</computeroutput> functions close down
Packit 71fd91
operations and release memory.</para>
Packit 71fd91
Packit 71fd91
<para>The real work is done by
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput>.  These
Packit 71fd91
compress and decompress data from a user-supplied input buffer to
Packit 71fd91
a user-supplied output buffer.  These buffers can be any size;
Packit 71fd91
arbitrary quantities of data are handled by making repeated calls
Packit 71fd91
to these functions.  This is a flexible mechanism allowing a
Packit 71fd91
consumer-pull style of activity, or producer-push, or a mixture
Packit 71fd91
of both.</para>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="hl-summary" xreflabel="High-level summary">
Packit 71fd91
<title>High-level summary</title>
Packit 71fd91
Packit 71fd91
<para>This interface provides some handy wrappers around the
Packit 71fd91
low-level interface to facilitate reading and writing
Packit 71fd91
<computeroutput>bzip2</computeroutput> format files
Packit 71fd91
(<computeroutput>.bz2</computeroutput> files).  The routines
Packit 71fd91
provide hooks to facilitate reading files in which the
Packit 71fd91
<computeroutput>bzip2</computeroutput> data stream is embedded
Packit 71fd91
within some larger-scale file structure, or where there are
Packit 71fd91
multiple <computeroutput>bzip2</computeroutput> data streams
Packit 71fd91
concatenated end-to-end.</para>
Packit 71fd91
Packit 71fd91
<para>For reading files,
Packit 71fd91
<computeroutput>BZ2_bzReadOpen</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzRead</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzReadClose</computeroutput> and 
Packit 71fd91
<computeroutput>BZ2_bzReadGetUnused</computeroutput> are
Packit 71fd91
supplied.  For writing files,
Packit 71fd91
<computeroutput>BZ2_bzWriteOpen</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzWrite</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzWriteFinish</computeroutput> are
Packit 71fd91
available.</para>
Packit 71fd91
Packit 71fd91
<para>As with the low-level library, no global variables are used
Packit 71fd91
so the library is per se thread-safe.  However, if I/O errors
Packit 71fd91
occur whilst reading or writing the underlying compressed files,
Packit 71fd91
you may have to consult <computeroutput>errno</computeroutput> to
Packit 71fd91
determine the cause of the error.  In that case, you'd need a C
Packit 71fd91
library which correctly supports
Packit 71fd91
<computeroutput>errno</computeroutput> in a multithreaded
Packit 71fd91
environment.</para>
Packit 71fd91
Packit 71fd91
<para>To make the library a little simpler and more portable,
Packit 71fd91
<computeroutput>BZ2_bzReadOpen</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzWriteOpen</computeroutput> require you to
Packit 71fd91
pass them file handles (<computeroutput>FILE*</computeroutput>s)
Packit 71fd91
which have previously been opened for reading or writing
Packit 71fd91
respectively.  That avoids portability problems associated with
Packit 71fd91
file operations and file attributes, whilst not being much of an
Packit 71fd91
imposition on the programmer.</para>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="util-fns-summary" xreflabel="Utility functions summary">
Packit 71fd91
<title>Utility functions summary</title>
Packit 71fd91
Packit 71fd91
<para>For very simple needs,
Packit 71fd91
<computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> are
Packit 71fd91
provided.  These compress data in memory from one buffer to
Packit 71fd91
another buffer in a single function call.  You should assess
Packit 71fd91
whether these functions fulfill your memory-to-memory
Packit 71fd91
compression/decompression requirements before investing effort in
Packit 71fd91
understanding the more general but more complex low-level
Packit 71fd91
interface.</para>
Packit 71fd91
Packit 71fd91
<para>Yoshioka Tsuneo
Packit 71fd91
(<computeroutput>tsuneo@rr.iij4u.or.jp</computeroutput>) has
Packit 71fd91
contributed some functions to give better
Packit 71fd91
<computeroutput>zlib</computeroutput> compatibility.  These
Packit 71fd91
functions are <computeroutput>BZ2_bzopen</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzread</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzwrite</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzflush</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzclose</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzerror</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzlibVersion</computeroutput>.  You may find
Packit 71fd91
these functions more convenient for simple file reading and
Packit 71fd91
writing, than those in the high-level interface.  These functions
Packit 71fd91
are not (yet) officially part of the library, and are minimally
Packit 71fd91
documented here.  If they break, you get to keep all the pieces.
Packit 71fd91
I hope to document them properly when time permits.</para>
Packit 71fd91
Packit 71fd91
<para>Yoshioka also contributed modifications to allow the
Packit 71fd91
library to be built as a Windows DLL.</para>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="err-handling" xreflabel="Error handling">
Packit 71fd91
<title>Error handling</title>
Packit 71fd91
Packit 71fd91
<para>The library is designed to recover cleanly in all
Packit 71fd91
situations, including the worst-case situation of decompressing
Packit 71fd91
random data.  I'm not 100% sure that it can always do this, so
Packit 71fd91
you might want to add a signal handler to catch segmentation
Packit 71fd91
violations during decompression if you are feeling especially
Packit 71fd91
paranoid.  I would be interested in hearing more about the
Packit 71fd91
robustness of the library to corrupted compressed data.</para>
Packit 71fd91
Packit 71fd91
<para>Version 1.0.3 more robust in this respect than any
Packit 71fd91
previous version.  Investigations with Valgrind (a tool for detecting
Packit 71fd91
problems with memory management) indicate
Packit 71fd91
that, at least for the few files I tested, all single-bit errors
Packit 71fd91
in the decompressed data are caught properly, with no
Packit 71fd91
segmentation faults, no uses of uninitialised data, no out of
Packit 71fd91
range reads or writes, and no infinite looping in the decompressor.
Packit 71fd91
So it's certainly pretty robust, although
Packit 71fd91
I wouldn't claim it to be totally bombproof.</para>
Packit 71fd91
Packit 71fd91
<para>The file <computeroutput>bzlib.h</computeroutput> contains
Packit 71fd91
all definitions needed to use the library.  In particular, you
Packit 71fd91
should definitely not include
Packit 71fd91
<computeroutput>bzlib_private.h</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>In <computeroutput>bzlib.h</computeroutput>, the various
Packit 71fd91
return values are defined.  The following list is not intended as
Packit 71fd91
an exhaustive description of the circumstances in which a given
Packit 71fd91
value may be returned -- those descriptions are given later.
Packit 71fd91
Rather, it is intended to convey the rough meaning of each return
Packit 71fd91
value.  The first five actions are normal and not intended to
Packit 71fd91
denote an error situation.</para>
Packit 71fd91
Packit 71fd91
<variablelist>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_OK</computeroutput></term>
Packit 71fd91
  <listitem><para>The requested action was completed
Packit 71fd91
   successfully.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_RUN_OK, BZ_FLUSH_OK,
Packit 71fd91
    BZ_FINISH_OK</computeroutput></term>
Packit 71fd91
  <listitem><para>In 
Packit 71fd91
   <computeroutput>BZ2_bzCompress</computeroutput>, the requested
Packit 71fd91
   flush/finish/nothing-special action was completed
Packit 71fd91
   successfully.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_STREAM_END</computeroutput></term>
Packit 71fd91
  <listitem><para>Compression of data was completed, or the
Packit 71fd91
   logical stream end was detected during
Packit 71fd91
   decompression.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
</variablelist>
Packit 71fd91
Packit 71fd91
<para>The following return values indicate an error of some
Packit 71fd91
kind.</para>
Packit 71fd91
Packit 71fd91
<variablelist>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_CONFIG_ERROR</computeroutput></term>
Packit 71fd91
  <listitem><para>Indicates that the library has been improperly
Packit 71fd91
   compiled on your platform -- a major configuration error.
Packit 71fd91
   Specifically, it means that
Packit 71fd91
   <computeroutput>sizeof(char)</computeroutput>,
Packit 71fd91
   <computeroutput>sizeof(short)</computeroutput> and
Packit 71fd91
   <computeroutput>sizeof(int)</computeroutput> are not 1, 2 and
Packit 71fd91
   4 respectively, as they should be.  Note that the library
Packit 71fd91
   should still work properly on 64-bit platforms which follow
Packit 71fd91
   the LP64 programming model -- that is, where
Packit 71fd91
   <computeroutput>sizeof(long)</computeroutput> and
Packit 71fd91
   <computeroutput>sizeof(void*)</computeroutput> are 8.  Under
Packit 71fd91
   LP64, <computeroutput>sizeof(int)</computeroutput> is still 4,
Packit 71fd91
   so <computeroutput>libbzip2</computeroutput>, which doesn't
Packit 71fd91
   use the <computeroutput>long</computeroutput> type, is
Packit 71fd91
   OK.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_SEQUENCE_ERROR</computeroutput></term>
Packit 71fd91
  <listitem><para>When using the library, it is important to call
Packit 71fd91
   the functions in the correct sequence and with data structures
Packit 71fd91
   (buffers etc) in the correct states.
Packit 71fd91
   <computeroutput>libbzip2</computeroutput> checks as much as it
Packit 71fd91
   can to ensure this is happening, and returns
Packit 71fd91
   <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> if not.
Packit 71fd91
   Code which complies precisely with the function semantics, as
Packit 71fd91
   detailed below, should never receive this value; such an event
Packit 71fd91
   denotes buggy code which you should
Packit 71fd91
   investigate.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_PARAM_ERROR</computeroutput></term>
Packit 71fd91
  <listitem><para>Returned when a parameter to a function call is
Packit 71fd91
   out of range or otherwise manifestly incorrect.  As with
Packit 71fd91
   <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, this
Packit 71fd91
   denotes a bug in the client code.  The distinction between
Packit 71fd91
   <computeroutput>BZ_PARAM_ERROR</computeroutput> and
Packit 71fd91
   <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> is a bit
Packit 71fd91
   hazy, but still worth making.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_MEM_ERROR</computeroutput></term>
Packit 71fd91
  <listitem><para>Returned when a request to allocate memory
Packit 71fd91
   failed.  Note that the quantity of memory needed to decompress
Packit 71fd91
   a stream cannot be determined until the stream's header has
Packit 71fd91
   been read.  So
Packit 71fd91
   <computeroutput>BZ2_bzDecompress</computeroutput> and
Packit 71fd91
   <computeroutput>BZ2_bzRead</computeroutput> may return
Packit 71fd91
   <computeroutput>BZ_MEM_ERROR</computeroutput> even though some
Packit 71fd91
   of the compressed data has been read.  The same is not true
Packit 71fd91
   for compression; once
Packit 71fd91
   <computeroutput>BZ2_bzCompressInit</computeroutput> or
Packit 71fd91
   <computeroutput>BZ2_bzWriteOpen</computeroutput> have
Packit 71fd91
   successfully completed,
Packit 71fd91
   <computeroutput>BZ_MEM_ERROR</computeroutput> cannot
Packit 71fd91
   occur.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_DATA_ERROR</computeroutput></term>
Packit 71fd91
  <listitem><para>Returned when a data integrity error is
Packit 71fd91
   detected during decompression.  Most importantly, this means
Packit 71fd91
   when stored and computed CRCs for the data do not match.  This
Packit 71fd91
   value is also returned upon detection of any other anomaly in
Packit 71fd91
   the compressed data.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_DATA_ERROR_MAGIC</computeroutput></term>
Packit 71fd91
  <listitem><para>As a special case of
Packit 71fd91
   <computeroutput>BZ_DATA_ERROR</computeroutput>, it is
Packit 71fd91
   sometimes useful to know when the compressed stream does not
Packit 71fd91
   start with the correct magic bytes (<computeroutput>'B' 'Z'
Packit 71fd91
   'h'</computeroutput>).</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_IO_ERROR</computeroutput></term>
Packit 71fd91
  <listitem><para>Returned by
Packit 71fd91
   <computeroutput>BZ2_bzRead</computeroutput> and
Packit 71fd91
   <computeroutput>BZ2_bzWrite</computeroutput> when there is an
Packit 71fd91
   error reading or writing in the compressed file, and by
Packit 71fd91
   <computeroutput>BZ2_bzReadOpen</computeroutput> and
Packit 71fd91
   <computeroutput>BZ2_bzWriteOpen</computeroutput> for attempts
Packit 71fd91
   to use a file for which the error indicator (viz,
Packit 71fd91
   <computeroutput>ferror(f)</computeroutput>) is set.  On
Packit 71fd91
   receipt of <computeroutput>BZ_IO_ERROR</computeroutput>, the
Packit 71fd91
   caller should consult <computeroutput>errno</computeroutput>
Packit 71fd91
   and/or <computeroutput>perror</computeroutput> to acquire
Packit 71fd91
   operating-system specific information about the
Packit 71fd91
   problem.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_UNEXPECTED_EOF</computeroutput></term>
Packit 71fd91
  <listitem><para>Returned by
Packit 71fd91
   <computeroutput>BZ2_bzRead</computeroutput> when the
Packit 71fd91
   compressed file finishes before the logical end of stream is
Packit 71fd91
   detected.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
 <varlistentry>
Packit 71fd91
  <term><computeroutput>BZ_OUTBUFF_FULL</computeroutput></term>
Packit 71fd91
  <listitem><para>Returned by
Packit 71fd91
   <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and
Packit 71fd91
   <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> to
Packit 71fd91
   indicate that the output data will not fit into the output
Packit 71fd91
   buffer provided.</para></listitem>
Packit 71fd91
 </varlistentry>
Packit 71fd91
Packit 71fd91
</variablelist>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="low-level" xreflabel=">Low-level interface">
Packit 71fd91
<title>Low-level interface</title>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzcompress-init" xreflabel="BZ2_bzCompressInit">
Packit 71fd91
<title>BZ2_bzCompressInit</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
typedef struct {
Packit 71fd91
  char *next_in;
Packit 71fd91
  unsigned int avail_in;
Packit 71fd91
  unsigned int total_in_lo32;
Packit 71fd91
  unsigned int total_in_hi32;
Packit 71fd91
Packit 71fd91
  char *next_out;
Packit 71fd91
  unsigned int avail_out;
Packit 71fd91
  unsigned int total_out_lo32;
Packit 71fd91
  unsigned int total_out_hi32;
Packit 71fd91
Packit 71fd91
  void *state;
Packit 71fd91
Packit 71fd91
  void *(*bzalloc)(void *,int,int);
Packit 71fd91
  void (*bzfree)(void *,void *);
Packit 71fd91
  void *opaque;
Packit 71fd91
} bz_stream;
Packit 71fd91
Packit 71fd91
int BZ2_bzCompressInit ( bz_stream *strm, 
Packit 71fd91
                         int blockSize100k, 
Packit 71fd91
                         int verbosity,
Packit 71fd91
                         int workFactor );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Prepares for compression.  The
Packit 71fd91
<computeroutput>bz_stream</computeroutput> structure holds all
Packit 71fd91
data pertaining to the compression activity.  A
Packit 71fd91
<computeroutput>bz_stream</computeroutput> structure should be
Packit 71fd91
allocated and initialised prior to the call.  The fields of
Packit 71fd91
<computeroutput>bz_stream</computeroutput> comprise the entirety
Packit 71fd91
of the user-visible data.  <computeroutput>state</computeroutput>
Packit 71fd91
is a pointer to the private data structures required for
Packit 71fd91
compression.</para>
Packit 71fd91
Packit 71fd91
<para>Custom memory allocators are supported, via fields
Packit 71fd91
<computeroutput>bzalloc</computeroutput>,
Packit 71fd91
<computeroutput>bzfree</computeroutput>, and
Packit 71fd91
<computeroutput>opaque</computeroutput>.  The value
Packit 71fd91
<computeroutput>opaque</computeroutput> is passed to as the first
Packit 71fd91
argument to all calls to <computeroutput>bzalloc</computeroutput>
Packit 71fd91
and <computeroutput>bzfree</computeroutput>, but is otherwise
Packit 71fd91
ignored by the library.  The call <computeroutput>bzalloc (
Packit 71fd91
opaque, n, m )</computeroutput> is expected to return a pointer
Packit 71fd91
<computeroutput>p</computeroutput> to <computeroutput>n *
Packit 71fd91
m</computeroutput> bytes of memory, and <computeroutput>bzfree (
Packit 71fd91
opaque, p )</computeroutput> should free that memory.</para>
Packit 71fd91
Packit 71fd91
<para>If you don't want to use a custom memory allocator, set
Packit 71fd91
<computeroutput>bzalloc</computeroutput>,
Packit 71fd91
<computeroutput>bzfree</computeroutput> and
Packit 71fd91
<computeroutput>opaque</computeroutput> to
Packit 71fd91
<computeroutput>NULL</computeroutput>, and the library will then
Packit 71fd91
use the standard <computeroutput>malloc</computeroutput> /
Packit 71fd91
<computeroutput>free</computeroutput> routines.</para>
Packit 71fd91
Packit 71fd91
<para>Before calling
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>, fields
Packit 71fd91
<computeroutput>bzalloc</computeroutput>,
Packit 71fd91
<computeroutput>bzfree</computeroutput> and
Packit 71fd91
<computeroutput>opaque</computeroutput> should be filled
Packit 71fd91
appropriately, as just described.  Upon return, the internal
Packit 71fd91
state will have been allocated and initialised, and
Packit 71fd91
<computeroutput>total_in_lo32</computeroutput>,
Packit 71fd91
<computeroutput>total_in_hi32</computeroutput>,
Packit 71fd91
<computeroutput>total_out_lo32</computeroutput> and
Packit 71fd91
<computeroutput>total_out_hi32</computeroutput> will have been
Packit 71fd91
set to zero.  These four fields are used by the library to inform
Packit 71fd91
the caller of the total amount of data passed into and out of the
Packit 71fd91
library, respectively.  You should not try to change them.  As of
Packit 71fd91
version 1.0, 64-bit counts are maintained, even on 32-bit
Packit 71fd91
platforms, using the <computeroutput>_hi32</computeroutput>
Packit 71fd91
fields to store the upper 32 bits of the count.  So, for example,
Packit 71fd91
the total amount of data in is <computeroutput>(total_in_hi32
Packit 71fd91
<< 32) + total_in_lo32</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Parameter <computeroutput>blockSize100k</computeroutput>
Packit 71fd91
specifies the block size to be used for compression.  It should
Packit 71fd91
be a value between 1 and 9 inclusive, and the actual block size
Packit 71fd91
used is 100000 x this figure.  9 gives the best compression but
Packit 71fd91
takes most memory.</para>
Packit 71fd91
Packit 71fd91
<para>Parameter <computeroutput>verbosity</computeroutput> should
Packit 71fd91
be set to a number between 0 and 4 inclusive.  0 is silent, and
Packit 71fd91
greater numbers give increasingly verbose monitoring/debugging
Packit 71fd91
output.  If the library has been compiled with
Packit 71fd91
<computeroutput>-DBZ_NO_STDIO</computeroutput>, no such output
Packit 71fd91
will appear for any verbosity setting.</para>
Packit 71fd91
Packit 71fd91
<para>Parameter <computeroutput>workFactor</computeroutput>
Packit 71fd91
controls how the compression phase behaves when presented with
Packit 71fd91
worst case, highly repetitive, input data.  If compression runs
Packit 71fd91
into difficulties caused by repetitive data, the library switches
Packit 71fd91
from the standard sorting algorithm to a fallback algorithm.  The
Packit 71fd91
fallback is slower than the standard algorithm by perhaps a
Packit 71fd91
factor of three, but always behaves reasonably, no matter how bad
Packit 71fd91
the input.</para>
Packit 71fd91
Packit 71fd91
<para>Lower values of <computeroutput>workFactor</computeroutput>
Packit 71fd91
reduce the amount of effort the standard algorithm will expend
Packit 71fd91
before resorting to the fallback.  You should set this parameter
Packit 71fd91
carefully; too low, and many inputs will be handled by the
Packit 71fd91
fallback algorithm and so compress rather slowly, too high, and
Packit 71fd91
your average-to-worst case compression times can become very
Packit 71fd91
large.  The default value of 30 gives reasonable behaviour over a
Packit 71fd91
wide range of circumstances.</para>
Packit 71fd91
Packit 71fd91
<para>Allowable values range from 0 to 250 inclusive.  0 is a
Packit 71fd91
special case, equivalent to using the default value of 30.</para>
Packit 71fd91
Packit 71fd91
<para>Note that the compressed output generated is the same
Packit 71fd91
regardless of whether or not the fallback algorithm is
Packit 71fd91
used.</para>
Packit 71fd91
Packit 71fd91
<para>Be aware also that this parameter may disappear entirely in
Packit 71fd91
future versions of the library.  In principle it should be
Packit 71fd91
possible to devise a good way to automatically choose which
Packit 71fd91
algorithm to use.  Such a mechanism would render the parameter
Packit 71fd91
obsolete.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_CONFIG_ERROR
Packit 71fd91
  if the library has been mis-compiled
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if strm is NULL 
Packit 71fd91
  or blockSize < 1 or blockSize > 9
Packit 71fd91
  or verbosity < 0 or verbosity > 4
Packit 71fd91
  or workFactor < 0 or workFactor > 250
Packit 71fd91
BZ_MEM_ERROR 
Packit 71fd91
  if not enough memory is available
Packit 71fd91
BZ_OK 
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ2_bzCompress
Packit 71fd91
  if BZ_OK is returned
Packit 71fd91
  no specific action needed in case of error
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzCompress" xreflabel="BZ2_bzCompress">
Packit 71fd91
<title>BZ2_bzCompress</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzCompress ( bz_stream *strm, int action );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Provides more input and/or output buffer space for the
Packit 71fd91
library.  The caller maintains input and output buffers, and
Packit 71fd91
calls <computeroutput>BZ2_bzCompress</computeroutput> to transfer
Packit 71fd91
data between them.</para>
Packit 71fd91
Packit 71fd91
<para>Before each call to
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>,
Packit 71fd91
<computeroutput>next_in</computeroutput> should point at the data
Packit 71fd91
to be compressed, and <computeroutput>avail_in</computeroutput>
Packit 71fd91
should indicate how many bytes the library may read.
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> updates
Packit 71fd91
<computeroutput>next_in</computeroutput>,
Packit 71fd91
<computeroutput>avail_in</computeroutput> and
Packit 71fd91
<computeroutput>total_in</computeroutput> to reflect the number
Packit 71fd91
of bytes it has read.</para>
Packit 71fd91
Packit 71fd91
<para>Similarly, <computeroutput>next_out</computeroutput> should
Packit 71fd91
point to a buffer in which the compressed data is to be placed,
Packit 71fd91
with <computeroutput>avail_out</computeroutput> indicating how
Packit 71fd91
much output space is available.
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> updates
Packit 71fd91
<computeroutput>next_out</computeroutput>,
Packit 71fd91
<computeroutput>avail_out</computeroutput> and
Packit 71fd91
<computeroutput>total_out</computeroutput> to reflect the number
Packit 71fd91
of bytes output.</para>
Packit 71fd91
Packit 71fd91
<para>You may provide and remove as little or as much data as you
Packit 71fd91
like on each call of
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>.  In the limit,
Packit 71fd91
it is acceptable to supply and remove data one byte at a time,
Packit 71fd91
although this would be terribly inefficient.  You should always
Packit 71fd91
ensure that at least one byte of output space is available at
Packit 71fd91
each call.</para>
Packit 71fd91
Packit 71fd91
<para>A second purpose of
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> is to request a
Packit 71fd91
change of mode of the compressed stream.</para>
Packit 71fd91
Packit 71fd91
<para>Conceptually, a compressed stream can be in one of four
Packit 71fd91
states: IDLE, RUNNING, FLUSHING and FINISHING.  Before
Packit 71fd91
initialisation
Packit 71fd91
(<computeroutput>BZ2_bzCompressInit</computeroutput>) and after
Packit 71fd91
termination (<computeroutput>BZ2_bzCompressEnd</computeroutput>),
Packit 71fd91
a stream is regarded as IDLE.</para>
Packit 71fd91
Packit 71fd91
<para>Upon initialisation
Packit 71fd91
(<computeroutput>BZ2_bzCompressInit</computeroutput>), the stream
Packit 71fd91
is placed in the RUNNING state.  Subsequent calls to
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> should pass
Packit 71fd91
<computeroutput>BZ_RUN</computeroutput> as the requested action;
Packit 71fd91
other actions are illegal and will result in
Packit 71fd91
<computeroutput>BZ_SEQUENCE_ERROR</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>At some point, the calling program will have provided all
Packit 71fd91
the input data it wants to.  It will then want to finish up -- in
Packit 71fd91
effect, asking the library to process any data it might have
Packit 71fd91
buffered internally.  In this state,
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> will no longer
Packit 71fd91
attempt to read data from
Packit 71fd91
<computeroutput>next_in</computeroutput>, but it will want to
Packit 71fd91
write data to <computeroutput>next_out</computeroutput>.  Because
Packit 71fd91
the output buffer supplied by the user can be arbitrarily small,
Packit 71fd91
the finishing-up operation cannot necessarily be done with a
Packit 71fd91
single call of
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Instead, the calling program passes
Packit 71fd91
<computeroutput>BZ_FINISH</computeroutput> as an action to
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>.  This changes
Packit 71fd91
the stream's state to FINISHING.  Any remaining input (ie,
Packit 71fd91
<computeroutput>next_in[0 .. avail_in-1]</computeroutput>) is
Packit 71fd91
compressed and transferred to the output buffer.  To do this,
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> must be called
Packit 71fd91
repeatedly until all the output has been consumed.  At that
Packit 71fd91
point, <computeroutput>BZ2_bzCompress</computeroutput> returns
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput>, and the stream's
Packit 71fd91
state is set back to IDLE.
Packit 71fd91
<computeroutput>BZ2_bzCompressEnd</computeroutput> should then be
Packit 71fd91
called.</para>
Packit 71fd91
Packit 71fd91
<para>Just to make sure the calling program does not cheat, the
Packit 71fd91
library makes a note of <computeroutput>avail_in</computeroutput>
Packit 71fd91
at the time of the first call to
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> which has
Packit 71fd91
<computeroutput>BZ_FINISH</computeroutput> as an action (ie, at
Packit 71fd91
the time the program has announced its intention to not supply
Packit 71fd91
any more input).  By comparing this value with that of
Packit 71fd91
<computeroutput>avail_in</computeroutput> over subsequent calls
Packit 71fd91
to <computeroutput>BZ2_bzCompress</computeroutput>, the library
Packit 71fd91
can detect any attempts to slip in more data to compress.  Any
Packit 71fd91
calls for which this is detected will return
Packit 71fd91
<computeroutput>BZ_SEQUENCE_ERROR</computeroutput>.  This
Packit 71fd91
indicates a programming mistake which should be corrected.</para>
Packit 71fd91
Packit 71fd91
<para>Instead of asking to finish, the calling program may ask
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> to take all the
Packit 71fd91
remaining input, compress it and terminate the current
Packit 71fd91
(Burrows-Wheeler) compression block.  This could be useful for
Packit 71fd91
error control purposes.  The mechanism is analogous to that for
Packit 71fd91
finishing: call <computeroutput>BZ2_bzCompress</computeroutput>
Packit 71fd91
with an action of <computeroutput>BZ_FLUSH</computeroutput>,
Packit 71fd91
remove output data, and persist with the
Packit 71fd91
<computeroutput>BZ_FLUSH</computeroutput> action until the value
Packit 71fd91
<computeroutput>BZ_RUN</computeroutput> is returned.  As with
Packit 71fd91
finishing, <computeroutput>BZ2_bzCompress</computeroutput>
Packit 71fd91
detects any attempt to provide more input data once the flush has
Packit 71fd91
begun.</para>
Packit 71fd91
Packit 71fd91
<para>Once the flush is complete, the stream returns to the
Packit 71fd91
normal RUNNING state.</para>
Packit 71fd91
Packit 71fd91
<para>This all sounds pretty complex, but isn't really.  Here's a
Packit 71fd91
table which shows which actions are allowable in each state, what
Packit 71fd91
action will be taken, what the next state is, and what the
Packit 71fd91
non-error return values are.  Note that you can't explicitly ask
Packit 71fd91
what state the stream is in, but nor do you need to -- it can be
Packit 71fd91
inferred from the values returned by
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
IDLE/any
Packit 71fd91
  Illegal.  IDLE state only exists after BZ2_bzCompressEnd or
Packit 71fd91
  before BZ2_bzCompressInit.
Packit 71fd91
  Return value = BZ_SEQUENCE_ERROR
Packit 71fd91
Packit 71fd91
RUNNING/BZ_RUN
Packit 71fd91
  Compress from next_in to next_out as much as possible.
Packit 71fd91
  Next state = RUNNING
Packit 71fd91
  Return value = BZ_RUN_OK
Packit 71fd91
Packit 71fd91
RUNNING/BZ_FLUSH
Packit 71fd91
  Remember current value of next_in. Compress from next_in
Packit 71fd91
  to next_out as much as possible, but do not accept any more input.
Packit 71fd91
  Next state = FLUSHING
Packit 71fd91
  Return value = BZ_FLUSH_OK
Packit 71fd91
Packit 71fd91
RUNNING/BZ_FINISH
Packit 71fd91
  Remember current value of next_in. Compress from next_in
Packit 71fd91
  to next_out as much as possible, but do not accept any more input.
Packit 71fd91
  Next state = FINISHING
Packit 71fd91
  Return value = BZ_FINISH_OK
Packit 71fd91
Packit 71fd91
FLUSHING/BZ_FLUSH
Packit 71fd91
  Compress from next_in to next_out as much as possible, 
Packit 71fd91
  but do not accept any more input.
Packit 71fd91
  If all the existing input has been used up and all compressed
Packit 71fd91
  output has been removed
Packit 71fd91
    Next state = RUNNING; Return value = BZ_RUN_OK
Packit 71fd91
  else
Packit 71fd91
    Next state = FLUSHING; Return value = BZ_FLUSH_OK
Packit 71fd91
Packit 71fd91
FLUSHING/other     
Packit 71fd91
  Illegal.
Packit 71fd91
  Return value = BZ_SEQUENCE_ERROR
Packit 71fd91
Packit 71fd91
FINISHING/BZ_FINISH
Packit 71fd91
  Compress from next_in to next_out as much as possible,
Packit 71fd91
  but to not accept any more input.  
Packit 71fd91
  If all the existing input has been used up and all compressed
Packit 71fd91
  output has been removed
Packit 71fd91
    Next state = IDLE; Return value = BZ_STREAM_END
Packit 71fd91
  else
Packit 71fd91
    Next state = FINISHING; Return value = BZ_FINISH_OK
Packit 71fd91
Packit 71fd91
FINISHING/other
Packit 71fd91
  Illegal.
Packit 71fd91
  Return value = BZ_SEQUENCE_ERROR
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<para>That still looks complicated?  Well, fair enough.  The
Packit 71fd91
usual sequence of calls for compressing a load of data is:</para>
Packit 71fd91
Packit 71fd91
<orderedlist>
Packit 71fd91
Packit 71fd91
 <listitem><para>Get started with
Packit 71fd91
  <computeroutput>BZ2_bzCompressInit</computeroutput>.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>Shovel data in and shlurp out its compressed form
Packit 71fd91
  using zero or more calls of
Packit 71fd91
  <computeroutput>BZ2_bzCompress</computeroutput> with action =
Packit 71fd91
  <computeroutput>BZ_RUN</computeroutput>.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>Finish up. Repeatedly call
Packit 71fd91
  <computeroutput>BZ2_bzCompress</computeroutput> with action =
Packit 71fd91
  <computeroutput>BZ_FINISH</computeroutput>, copying out the
Packit 71fd91
  compressed output, until
Packit 71fd91
  <computeroutput>BZ_STREAM_END</computeroutput> is
Packit 71fd91
  returned.</para></listitem> <listitem><para>Close up and go home.  Call
Packit 71fd91
  <computeroutput>BZ2_bzCompressEnd</computeroutput>.</para></listitem>
Packit 71fd91
Packit 71fd91
</orderedlist>
Packit 71fd91
Packit 71fd91
<para>If the data you want to compress fits into your input
Packit 71fd91
buffer all at once, you can skip the calls of
Packit 71fd91
<computeroutput>BZ2_bzCompress ( ..., BZ_RUN )</computeroutput>
Packit 71fd91
and just do the <computeroutput>BZ2_bzCompress ( ..., BZ_FINISH
Packit 71fd91
)</computeroutput> calls.</para>
Packit 71fd91
Packit 71fd91
<para>All required memory is allocated by
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>.  The
Packit 71fd91
compression library can accept any data at all (obviously).  So
Packit 71fd91
you shouldn't get any error return values from the
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> calls.  If you
Packit 71fd91
do, they will be
Packit 71fd91
<computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, and indicate
Packit 71fd91
a bug in your programming.</para>
Packit 71fd91
Packit 71fd91
<para>Trivial other possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if strm is NULL, or strm->s is NULL
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzCompress-end" xreflabel="BZ2_bzCompressEnd">
Packit 71fd91
<title>BZ2_bzCompressEnd</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzCompressEnd ( bz_stream *strm );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Releases all memory associated with a compression
Packit 71fd91
stream.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR  if strm is NULL or strm->s is NULL
Packit 71fd91
BZ_OK           otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzDecompress-init" xreflabel="BZ2_bzDecompressInit">
Packit 71fd91
<title>BZ2_bzDecompressInit</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Prepares for decompression.  As with
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>, a
Packit 71fd91
<computeroutput>bz_stream</computeroutput> record should be
Packit 71fd91
allocated and initialised before the call.  Fields
Packit 71fd91
<computeroutput>bzalloc</computeroutput>,
Packit 71fd91
<computeroutput>bzfree</computeroutput> and
Packit 71fd91
<computeroutput>opaque</computeroutput> should be set if a custom
Packit 71fd91
memory allocator is required, or made
Packit 71fd91
<computeroutput>NULL</computeroutput> for the normal
Packit 71fd91
<computeroutput>malloc</computeroutput> /
Packit 71fd91
<computeroutput>free</computeroutput> routines.  Upon return, the
Packit 71fd91
internal state will have been initialised, and
Packit 71fd91
<computeroutput>total_in</computeroutput> and
Packit 71fd91
<computeroutput>total_out</computeroutput> will be zero.</para>
Packit 71fd91
Packit 71fd91
<para>For the meaning of parameter
Packit 71fd91
<computeroutput>verbosity</computeroutput>, see
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>If <computeroutput>small</computeroutput> is nonzero, the
Packit 71fd91
library will use an alternative decompression algorithm which
Packit 71fd91
uses less memory but at the cost of decompressing more slowly
Packit 71fd91
(roughly speaking, half the speed, but the maximum memory
Packit 71fd91
requirement drops to around 2300k).  See <xref linkend="using"/>
Packit 71fd91
for more information on memory management.</para>
Packit 71fd91
Packit 71fd91
<para>Note that the amount of memory needed to decompress a
Packit 71fd91
stream cannot be determined until the stream's header has been
Packit 71fd91
read, so even if
Packit 71fd91
<computeroutput>BZ2_bzDecompressInit</computeroutput> succeeds, a
Packit 71fd91
subsequent <computeroutput>BZ2_bzDecompress</computeroutput>
Packit 71fd91
could fail with
Packit 71fd91
<computeroutput>BZ_MEM_ERROR</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_CONFIG_ERROR
Packit 71fd91
  if the library has been mis-compiled
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if ( small != 0 && small != 1 )
Packit 71fd91
  or (verbosity <; 0 || verbosity > 4)
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if insufficient memory is available
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ2_bzDecompress
Packit 71fd91
  if BZ_OK was returned
Packit 71fd91
  no specific action required in case of error
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzDecompress" xreflabel="BZ2_bzDecompress">
Packit 71fd91
<title>BZ2_bzDecompress</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzDecompress ( bz_stream *strm );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Provides more input and/out output buffer space for the
Packit 71fd91
library.  The caller maintains input and output buffers, and uses
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput> to transfer
Packit 71fd91
data between them.</para>
Packit 71fd91
Packit 71fd91
<para>Before each call to
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput>,
Packit 71fd91
<computeroutput>next_in</computeroutput> should point at the
Packit 71fd91
compressed data, and <computeroutput>avail_in</computeroutput>
Packit 71fd91
should indicate how many bytes the library may read.
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput> updates
Packit 71fd91
<computeroutput>next_in</computeroutput>,
Packit 71fd91
<computeroutput>avail_in</computeroutput> and
Packit 71fd91
<computeroutput>total_in</computeroutput> to reflect the number
Packit 71fd91
of bytes it has read.</para>
Packit 71fd91
Packit 71fd91
<para>Similarly, <computeroutput>next_out</computeroutput> should
Packit 71fd91
point to a buffer in which the uncompressed output is to be
Packit 71fd91
placed, with <computeroutput>avail_out</computeroutput>
Packit 71fd91
indicating how much output space is available.
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput> updates
Packit 71fd91
<computeroutput>next_out</computeroutput>,
Packit 71fd91
<computeroutput>avail_out</computeroutput> and
Packit 71fd91
<computeroutput>total_out</computeroutput> to reflect the number
Packit 71fd91
of bytes output.</para>
Packit 71fd91
Packit 71fd91
<para>You may provide and remove as little or as much data as you
Packit 71fd91
like on each call of
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput>.  In the limit,
Packit 71fd91
it is acceptable to supply and remove data one byte at a time,
Packit 71fd91
although this would be terribly inefficient.  You should always
Packit 71fd91
ensure that at least one byte of output space is available at
Packit 71fd91
each call.</para>
Packit 71fd91
Packit 71fd91
<para>Use of <computeroutput>BZ2_bzDecompress</computeroutput> is
Packit 71fd91
simpler than
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>You should provide input and remove output as described
Packit 71fd91
above, and repeatedly call
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput> until
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput> is returned.
Packit 71fd91
Appearance of <computeroutput>BZ_STREAM_END</computeroutput>
Packit 71fd91
denotes that <computeroutput>BZ2_bzDecompress</computeroutput>
Packit 71fd91
has detected the logical end of the compressed stream.
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput> will not
Packit 71fd91
produce <computeroutput>BZ_STREAM_END</computeroutput> until all
Packit 71fd91
output data has been placed into the output buffer, so once
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput> appears, you are
Packit 71fd91
guaranteed to have available all the decompressed output, and
Packit 71fd91
<computeroutput>BZ2_bzDecompressEnd</computeroutput> can safely
Packit 71fd91
be called.</para>
Packit 71fd91
Packit 71fd91
<para>If case of an error return value, you should call
Packit 71fd91
<computeroutput>BZ2_bzDecompressEnd</computeroutput> to clean up
Packit 71fd91
and release memory.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if strm is NULL or strm->s is NULL
Packit 71fd91
  or strm->avail_out < 1
Packit 71fd91
BZ_DATA_ERROR
Packit 71fd91
  if a data integrity error is detected in the compressed stream
Packit 71fd91
BZ_DATA_ERROR_MAGIC
Packit 71fd91
  if the compressed stream doesn't begin with the right magic bytes
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if there wasn't enough memory available
Packit 71fd91
BZ_STREAM_END
Packit 71fd91
  if the logical end of the data stream was detected and all
Packit 71fd91
  output in has been consumed, eg s-->avail_out > 0
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ2_bzDecompress
Packit 71fd91
  if BZ_OK was returned
Packit 71fd91
BZ2_bzDecompressEnd
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzDecompress-end" xreflabel="BZ2_bzDecompressEnd">
Packit 71fd91
<title>BZ2_bzDecompressEnd</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzDecompressEnd ( bz_stream *strm );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Releases all memory associated with a decompression
Packit 71fd91
stream.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if strm is NULL or strm->s is NULL
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
  None.
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="hl-interface" xreflabel="High-level interface">
Packit 71fd91
<title>High-level interface</title>
Packit 71fd91
Packit 71fd91
<para>This interface provides functions for reading and writing
Packit 71fd91
<computeroutput>bzip2</computeroutput> format files.  First, some
Packit 71fd91
general points.</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para>All of the functions take an
Packit 71fd91
  <computeroutput>int*</computeroutput> first argument,
Packit 71fd91
  <computeroutput>bzerror</computeroutput>.  After each call,
Packit 71fd91
  <computeroutput>bzerror</computeroutput> should be consulted
Packit 71fd91
  first to determine the outcome of the call.  If
Packit 71fd91
  <computeroutput>bzerror</computeroutput> is
Packit 71fd91
  <computeroutput>BZ_OK</computeroutput>, the call completed
Packit 71fd91
  successfully, and only then should the return value of the
Packit 71fd91
  function (if any) be consulted.  If
Packit 71fd91
  <computeroutput>bzerror</computeroutput> is
Packit 71fd91
  <computeroutput>BZ_IO_ERROR</computeroutput>, there was an
Packit 71fd91
  error reading/writing the underlying compressed file, and you
Packit 71fd91
  should then consult <computeroutput>errno</computeroutput> /
Packit 71fd91
  <computeroutput>perror</computeroutput> to determine the cause
Packit 71fd91
  of the difficulty.  <computeroutput>bzerror</computeroutput>
Packit 71fd91
  may also be set to various other values; precise details are
Packit 71fd91
  given on a per-function basis below.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>If <computeroutput>bzerror</computeroutput> indicates
Packit 71fd91
  an error (ie, anything except
Packit 71fd91
  <computeroutput>BZ_OK</computeroutput> and
Packit 71fd91
  <computeroutput>BZ_STREAM_END</computeroutput>), you should
Packit 71fd91
  immediately call
Packit 71fd91
  <computeroutput>BZ2_bzReadClose</computeroutput> (or
Packit 71fd91
  <computeroutput>BZ2_bzWriteClose</computeroutput>, depending on
Packit 71fd91
  whether you are attempting to read or to write) to free up all
Packit 71fd91
  resources associated with the stream.  Once an error has been
Packit 71fd91
  indicated, behaviour of all calls except
Packit 71fd91
  <computeroutput>BZ2_bzReadClose</computeroutput>
Packit 71fd91
  (<computeroutput>BZ2_bzWriteClose</computeroutput>) is
Packit 71fd91
  undefined.  The implication is that (1)
Packit 71fd91
  <computeroutput>bzerror</computeroutput> should be checked
Packit 71fd91
  after each call, and (2) if
Packit 71fd91
  <computeroutput>bzerror</computeroutput> indicates an error,
Packit 71fd91
  <computeroutput>BZ2_bzReadClose</computeroutput>
Packit 71fd91
  (<computeroutput>BZ2_bzWriteClose</computeroutput>) should then
Packit 71fd91
  be called to clean up.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>The <computeroutput>FILE*</computeroutput> arguments
Packit 71fd91
  passed to <computeroutput>BZ2_bzReadOpen</computeroutput> /
Packit 71fd91
  <computeroutput>BZ2_bzWriteOpen</computeroutput> should be set
Packit 71fd91
  to binary mode.  Most Unix systems will do this by default, but
Packit 71fd91
  other platforms, including Windows and Mac, will not.  If you
Packit 71fd91
  omit this, you may encounter problems when moving code to new
Packit 71fd91
  platforms.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>Memory allocation requests are handled by
Packit 71fd91
  <computeroutput>malloc</computeroutput> /
Packit 71fd91
  <computeroutput>free</computeroutput>.  At present there is no
Packit 71fd91
  facility for user-defined memory allocators in the file I/O
Packit 71fd91
  functions (could easily be added, though).</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzreadopen" xreflabel="BZ2_bzReadOpen">
Packit 71fd91
<title>BZ2_bzReadOpen</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
typedef void BZFILE;
Packit 71fd91
Packit 71fd91
BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f, 
Packit 71fd91
                        int verbosity, int small,
Packit 71fd91
                        void *unused, int nUnused );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Prepare to read compressed data from file handle
Packit 71fd91
<computeroutput>f</computeroutput>.
Packit 71fd91
<computeroutput>f</computeroutput> should refer to a file which
Packit 71fd91
has been opened for reading, and for which the error indicator
Packit 71fd91
(<computeroutput>ferror(f)</computeroutput>)is not set.  If
Packit 71fd91
<computeroutput>small</computeroutput> is 1, the library will try
Packit 71fd91
to decompress using less memory, at the expense of speed.</para>
Packit 71fd91
Packit 71fd91
<para>For reasons explained below,
Packit 71fd91
<computeroutput>BZ2_bzRead</computeroutput> will decompress the
Packit 71fd91
<computeroutput>nUnused</computeroutput> bytes starting at
Packit 71fd91
<computeroutput>unused</computeroutput>, before starting to read
Packit 71fd91
from the file <computeroutput>f</computeroutput>.  At most
Packit 71fd91
<computeroutput>BZ_MAX_UNUSED</computeroutput> bytes may be
Packit 71fd91
supplied like this.  If this facility is not required, you should
Packit 71fd91
pass <computeroutput>NULL</computeroutput> and
Packit 71fd91
<computeroutput>0</computeroutput> for
Packit 71fd91
<computeroutput>unused</computeroutput> and
Packit 71fd91
n<computeroutput>Unused</computeroutput> respectively.</para>
Packit 71fd91
Packit 71fd91
<para>For the meaning of parameters
Packit 71fd91
<computeroutput>small</computeroutput> and
Packit 71fd91
<computeroutput>verbosity</computeroutput>, see
Packit 71fd91
<computeroutput>BZ2_bzDecompressInit</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>The amount of memory needed to decompress a file cannot be
Packit 71fd91
determined until the file's header has been read.  So it is
Packit 71fd91
possible that <computeroutput>BZ2_bzReadOpen</computeroutput>
Packit 71fd91
returns <computeroutput>BZ_OK</computeroutput> but a subsequent
Packit 71fd91
call of <computeroutput>BZ2_bzRead</computeroutput> will return
Packit 71fd91
<computeroutput>BZ_MEM_ERROR</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_CONFIG_ERROR
Packit 71fd91
  if the library has been mis-compiled
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if f is NULL
Packit 71fd91
  or small is neither 0 nor 1
Packit 71fd91
  or ( unused == NULL && nUnused != 0 )
Packit 71fd91
  or ( unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED) )
Packit 71fd91
BZ_IO_ERROR
Packit 71fd91
  if ferror(f) is nonzero
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if insufficient memory is available
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise.
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
Pointer to an abstract BZFILE
Packit 71fd91
  if bzerror is BZ_OK
Packit 71fd91
NULL
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ2_bzRead
Packit 71fd91
  if bzerror is BZ_OK
Packit 71fd91
BZ2_bzClose
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzread" xreflabel="BZ2_bzRead">
Packit 71fd91
<title>BZ2_bzRead</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Reads up to <computeroutput>len</computeroutput>
Packit 71fd91
(uncompressed) bytes from the compressed file
Packit 71fd91
<computeroutput>b</computeroutput> into the buffer
Packit 71fd91
<computeroutput>buf</computeroutput>.  If the read was
Packit 71fd91
successful, <computeroutput>bzerror</computeroutput> is set to
Packit 71fd91
<computeroutput>BZ_OK</computeroutput> and the number of bytes
Packit 71fd91
read is returned.  If the logical end-of-stream was detected,
Packit 71fd91
<computeroutput>bzerror</computeroutput> will be set to
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput>, and the number of
Packit 71fd91
bytes read is returned.  All other
Packit 71fd91
<computeroutput>bzerror</computeroutput> values denote an
Packit 71fd91
error.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>BZ2_bzRead</computeroutput> will supply
Packit 71fd91
<computeroutput>len</computeroutput> bytes, unless the logical
Packit 71fd91
stream end is detected or an error occurs.  Because of this, it
Packit 71fd91
is possible to detect the stream end by observing when the number
Packit 71fd91
of bytes returned is less than the number requested.
Packit 71fd91
Nevertheless, this is regarded as inadvisable; you should instead
Packit 71fd91
check <computeroutput>bzerror</computeroutput> after every call
Packit 71fd91
and watch out for
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Internally, <computeroutput>BZ2_bzRead</computeroutput>
Packit 71fd91
copies data from the compressed file in chunks of size
Packit 71fd91
<computeroutput>BZ_MAX_UNUSED</computeroutput> bytes before
Packit 71fd91
decompressing it.  If the file contains more bytes than strictly
Packit 71fd91
needed to reach the logical end-of-stream,
Packit 71fd91
<computeroutput>BZ2_bzRead</computeroutput> will almost certainly
Packit 71fd91
read some of the trailing data before signalling
Packit 71fd91
<computeroutput>BZ_SEQUENCE_END</computeroutput>.  To collect the
Packit 71fd91
read but unused data once
Packit 71fd91
<computeroutput>BZ_SEQUENCE_END</computeroutput> has appeared,
Packit 71fd91
call <computeroutput>BZ2_bzReadGetUnused</computeroutput>
Packit 71fd91
immediately before
Packit 71fd91
<computeroutput>BZ2_bzReadClose</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if b is NULL or buf is NULL or len < 0
Packit 71fd91
BZ_SEQUENCE_ERROR
Packit 71fd91
  if b was opened with BZ2_bzWriteOpen
Packit 71fd91
BZ_IO_ERROR
Packit 71fd91
  if there is an error reading from the compressed file
Packit 71fd91
BZ_UNEXPECTED_EOF
Packit 71fd91
  if the compressed file ended before 
Packit 71fd91
  the logical end-of-stream was detected
Packit 71fd91
BZ_DATA_ERROR
Packit 71fd91
  if a data integrity error was detected in the compressed stream
Packit 71fd91
BZ_DATA_ERROR_MAGIC
Packit 71fd91
  if the stream does not begin with the requisite header bytes 
Packit 71fd91
  (ie, is not a bzip2 data file).  This is really 
Packit 71fd91
  a special case of BZ_DATA_ERROR.
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if insufficient memory was available
Packit 71fd91
BZ_STREAM_END
Packit 71fd91
  if the logical end of stream was detected.
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise.
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
number of bytes read
Packit 71fd91
  if bzerror is BZ_OK or BZ_STREAM_END
Packit 71fd91
undefined
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
collect data from buf, then BZ2_bzRead or BZ2_bzReadClose
Packit 71fd91
  if bzerror is BZ_OK
Packit 71fd91
collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused
Packit 71fd91
  if bzerror is BZ_SEQUENCE_END
Packit 71fd91
BZ2_bzReadClose
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzreadgetunused" xreflabel="BZ2_bzReadGetUnused">
Packit 71fd91
<title>BZ2_bzReadGetUnused</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b, 
Packit 71fd91
                          void** unused, int* nUnused );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Returns data which was read from the compressed file but
Packit 71fd91
was not needed to get to the logical end-of-stream.
Packit 71fd91
<computeroutput>*unused</computeroutput> is set to the address of
Packit 71fd91
the data, and <computeroutput>*nUnused</computeroutput> to the
Packit 71fd91
number of bytes.  <computeroutput>*nUnused</computeroutput> will
Packit 71fd91
be set to a value between <computeroutput>0</computeroutput> and
Packit 71fd91
<computeroutput>BZ_MAX_UNUSED</computeroutput> inclusive.</para>
Packit 71fd91
Packit 71fd91
<para>This function may only be called once
Packit 71fd91
<computeroutput>BZ2_bzRead</computeroutput> has signalled
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput> but before
Packit 71fd91
<computeroutput>BZ2_bzReadClose</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if b is NULL
Packit 71fd91
  or unused is NULL or nUnused is NULL
Packit 71fd91
BZ_SEQUENCE_ERROR
Packit 71fd91
  if BZ_STREAM_END has not been signalled
Packit 71fd91
  or if b was opened with BZ2_bzWriteOpen
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ2_bzReadClose
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzreadclose" xreflabel="BZ2_bzReadClose">
Packit 71fd91
<title>BZ2_bzReadClose</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
void BZ2_bzReadClose ( int *bzerror, BZFILE *b );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Releases all memory pertaining to the compressed file
Packit 71fd91
<computeroutput>b</computeroutput>.
Packit 71fd91
<computeroutput>BZ2_bzReadClose</computeroutput> does not call
Packit 71fd91
<computeroutput>fclose</computeroutput> on the underlying file
Packit 71fd91
handle, so you should do that yourself if appropriate.
Packit 71fd91
<computeroutput>BZ2_bzReadClose</computeroutput> should be called
Packit 71fd91
to clean up after all error situations.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_SEQUENCE_ERROR
Packit 71fd91
  if b was opened with BZ2_bzOpenWrite
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
none
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzwriteopen" xreflabel="BZ2_bzWriteOpen">
Packit 71fd91
<title>BZ2_bzWriteOpen</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f, 
Packit 71fd91
                         int blockSize100k, int verbosity,
Packit 71fd91
                         int workFactor );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Prepare to write compressed data to file handle
Packit 71fd91
<computeroutput>f</computeroutput>.
Packit 71fd91
<computeroutput>f</computeroutput> should refer to a file which
Packit 71fd91
has been opened for writing, and for which the error indicator
Packit 71fd91
(<computeroutput>ferror(f)</computeroutput>)is not set.</para>
Packit 71fd91
Packit 71fd91
<para>For the meaning of parameters
Packit 71fd91
<computeroutput>blockSize100k</computeroutput>,
Packit 71fd91
<computeroutput>verbosity</computeroutput> and
Packit 71fd91
<computeroutput>workFactor</computeroutput>, see
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>All required memory is allocated at this stage, so if the
Packit 71fd91
call completes successfully,
Packit 71fd91
<computeroutput>BZ_MEM_ERROR</computeroutput> cannot be signalled
Packit 71fd91
by a subsequent call to
Packit 71fd91
<computeroutput>BZ2_bzWrite</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_CONFIG_ERROR
Packit 71fd91
  if the library has been mis-compiled
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if f is NULL
Packit 71fd91
  or blockSize100k < 1 or blockSize100k > 9
Packit 71fd91
BZ_IO_ERROR
Packit 71fd91
  if ferror(f) is nonzero
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if insufficient memory is available
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
Pointer to an abstract BZFILE
Packit 71fd91
  if bzerror is BZ_OK
Packit 71fd91
NULL
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Allowable next actions:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ2_bzWrite
Packit 71fd91
  if bzerror is BZ_OK
Packit 71fd91
  (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless)
Packit 71fd91
BZ2_bzWriteClose
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzwrite" xreflabel="BZ2_bzWrite">
Packit 71fd91
<title>BZ2_bzWrite</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Absorbs <computeroutput>len</computeroutput> bytes from the
Packit 71fd91
buffer <computeroutput>buf</computeroutput>, eventually to be
Packit 71fd91
compressed and written to the file.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if b is NULL or buf is NULL or len < 0
Packit 71fd91
BZ_SEQUENCE_ERROR
Packit 71fd91
  if b was opened with BZ2_bzReadOpen
Packit 71fd91
BZ_IO_ERROR
Packit 71fd91
  if there is an error writing the compressed file.
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzwriteclose" xreflabel="BZ2_bzWriteClose">
Packit 71fd91
<title>BZ2_bzWriteClose</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
void BZ2_bzWriteClose( int *bzerror, BZFILE* f,
Packit 71fd91
                       int abandon,
Packit 71fd91
                       unsigned int* nbytes_in,
Packit 71fd91
                       unsigned int* nbytes_out );
Packit 71fd91
Packit 71fd91
void BZ2_bzWriteClose64( int *bzerror, BZFILE* f,
Packit 71fd91
                         int abandon,
Packit 71fd91
                         unsigned int* nbytes_in_lo32,
Packit 71fd91
                         unsigned int* nbytes_in_hi32,
Packit 71fd91
                         unsigned int* nbytes_out_lo32,
Packit 71fd91
                         unsigned int* nbytes_out_hi32 );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Compresses and flushes to the compressed file all data so
Packit 71fd91
far supplied by <computeroutput>BZ2_bzWrite</computeroutput>.
Packit 71fd91
The logical end-of-stream markers are also written, so subsequent
Packit 71fd91
calls to <computeroutput>BZ2_bzWrite</computeroutput> are
Packit 71fd91
illegal.  All memory associated with the compressed file
Packit 71fd91
<computeroutput>b</computeroutput> is released.
Packit 71fd91
<computeroutput>fflush</computeroutput> is called on the
Packit 71fd91
compressed file, but it is not
Packit 71fd91
<computeroutput>fclose</computeroutput>'d.</para>
Packit 71fd91
Packit 71fd91
<para>If <computeroutput>BZ2_bzWriteClose</computeroutput> is
Packit 71fd91
called to clean up after an error, the only action is to release
Packit 71fd91
the memory.  The library records the error codes issued by
Packit 71fd91
previous calls, so this situation will be detected automatically.
Packit 71fd91
There is no attempt to complete the compression operation, nor to
Packit 71fd91
<computeroutput>fflush</computeroutput> the compressed file.  You
Packit 71fd91
can force this behaviour to happen even in the case of no error,
Packit 71fd91
by passing a nonzero value to
Packit 71fd91
<computeroutput>abandon</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>If <computeroutput>nbytes_in</computeroutput> is non-null,
Packit 71fd91
<computeroutput>*nbytes_in</computeroutput> will be set to be the
Packit 71fd91
total volume of uncompressed data handled.  Similarly,
Packit 71fd91
<computeroutput>nbytes_out</computeroutput> will be set to the
Packit 71fd91
total volume of compressed data written.  For compatibility with
Packit 71fd91
older versions of the library,
Packit 71fd91
<computeroutput>BZ2_bzWriteClose</computeroutput> only yields the
Packit 71fd91
lower 32 bits of these counts.  Use
Packit 71fd91
<computeroutput>BZ2_bzWriteClose64</computeroutput> if you want
Packit 71fd91
the full 64 bit counts.  These two functions are otherwise
Packit 71fd91
absolutely identical.</para>
Packit 71fd91
Packit 71fd91
<para>Possible assignments to
Packit 71fd91
<computeroutput>bzerror</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_SEQUENCE_ERROR
Packit 71fd91
  if b was opened with BZ2_bzReadOpen
Packit 71fd91
BZ_IO_ERROR
Packit 71fd91
  if there is an error writing the compressed file
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="embed" xreflabel="Handling embedded compressed data streams">
Packit 71fd91
<title>Handling embedded compressed data streams</title>
Packit 71fd91
Packit 71fd91
<para>The high-level library facilitates use of
Packit 71fd91
<computeroutput>bzip2</computeroutput> data streams which form
Packit 71fd91
some part of a surrounding, larger data stream.</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para>For writing, the library takes an open file handle,
Packit 71fd91
  writes compressed data to it,
Packit 71fd91
  <computeroutput>fflush</computeroutput>es it but does not
Packit 71fd91
  <computeroutput>fclose</computeroutput> it.  The calling
Packit 71fd91
  application can write its own data before and after the
Packit 71fd91
  compressed data stream, using that same file handle.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>Reading is more complex, and the facilities are not as
Packit 71fd91
  general as they could be since generality is hard to reconcile
Packit 71fd91
  with efficiency.  <computeroutput>BZ2_bzRead</computeroutput>
Packit 71fd91
  reads from the compressed file in blocks of size
Packit 71fd91
  <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes, and in
Packit 71fd91
  doing so probably will overshoot the logical end of compressed
Packit 71fd91
  stream.  To recover this data once decompression has ended,
Packit 71fd91
  call <computeroutput>BZ2_bzReadGetUnused</computeroutput> after
Packit 71fd91
  the last call of <computeroutput>BZ2_bzRead</computeroutput>
Packit 71fd91
  (the one returning
Packit 71fd91
  <computeroutput>BZ_STREAM_END</computeroutput>) but before
Packit 71fd91
  calling
Packit 71fd91
  <computeroutput>BZ2_bzReadClose</computeroutput>.</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
<para>This mechanism makes it easy to decompress multiple
Packit 71fd91
<computeroutput>bzip2</computeroutput> streams placed end-to-end.
Packit 71fd91
As the end of one stream, when
Packit 71fd91
<computeroutput>BZ2_bzRead</computeroutput> returns
Packit 71fd91
<computeroutput>BZ_STREAM_END</computeroutput>, call
Packit 71fd91
<computeroutput>BZ2_bzReadGetUnused</computeroutput> to collect
Packit 71fd91
the unused data (copy it into your own buffer somewhere).  That
Packit 71fd91
data forms the start of the next compressed stream.  To start
Packit 71fd91
uncompressing that next stream, call
Packit 71fd91
<computeroutput>BZ2_bzReadOpen</computeroutput> again, feeding in
Packit 71fd91
the unused data via the <computeroutput>unused</computeroutput> /
Packit 71fd91
<computeroutput>nUnused</computeroutput> parameters.  Keep doing
Packit 71fd91
this until <computeroutput>BZ_STREAM_END</computeroutput> return
Packit 71fd91
coincides with the physical end of file
Packit 71fd91
(<computeroutput>feof(f)</computeroutput>).  In this situation
Packit 71fd91
<computeroutput>BZ2_bzReadGetUnused</computeroutput> will of
Packit 71fd91
course return no data.</para>
Packit 71fd91
Packit 71fd91
<para>This should give some feel for how the high-level interface
Packit 71fd91
can be used.  If you require extra flexibility, you'll have to
Packit 71fd91
bite the bullet and get to grips with the low-level
Packit 71fd91
interface.</para>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="std-rdwr" xreflabel="Standard file-reading/writing code">
Packit 71fd91
<title>Standard file-reading/writing code</title>
Packit 71fd91
Packit 71fd91
<para>Here's how you'd write data to a compressed file:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
FILE*   f;
Packit 71fd91
BZFILE* b;
Packit 71fd91
int     nBuf;
Packit 71fd91
char    buf[ /* whatever size you like */ ];
Packit 71fd91
int     bzerror;
Packit 71fd91
int     nWritten;
Packit 71fd91
Packit 71fd91
f = fopen ( "myfile.bz2", "w" );
Packit 71fd91
if ( !f ) {
Packit 71fd91
 /* handle error */
Packit 71fd91
}
Packit 71fd91
b = BZ2_bzWriteOpen( &bzerror, f, 9 );
Packit 71fd91
if (bzerror != BZ_OK) {
Packit 71fd91
 BZ2_bzWriteClose ( b );
Packit 71fd91
 /* handle error */
Packit 71fd91
}
Packit 71fd91
Packit 71fd91
while ( /* condition */ ) {
Packit 71fd91
 /* get data to write into buf, and set nBuf appropriately */
Packit 71fd91
 nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf );
Packit 71fd91
 if (bzerror == BZ_IO_ERROR) { 
Packit 71fd91
   BZ2_bzWriteClose ( &bzerror, b );
Packit 71fd91
   /* handle error */
Packit 71fd91
 }
Packit 71fd91
}
Packit 71fd91
Packit 71fd91
BZ2_bzWriteClose( &bzerror, b );
Packit 71fd91
if (bzerror == BZ_IO_ERROR) {
Packit 71fd91
 /* handle error */
Packit 71fd91
}
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>And to read from a compressed file:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
FILE*   f;
Packit 71fd91
BZFILE* b;
Packit 71fd91
int     nBuf;
Packit 71fd91
char    buf[ /* whatever size you like */ ];
Packit 71fd91
int     bzerror;
Packit 71fd91
int     nWritten;
Packit 71fd91
Packit 71fd91
f = fopen ( "myfile.bz2", "r" );
Packit 71fd91
if ( !f ) {
Packit 71fd91
  /* handle error */
Packit 71fd91
}
Packit 71fd91
b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 );
Packit 71fd91
if ( bzerror != BZ_OK ) {
Packit 71fd91
  BZ2_bzReadClose ( &bzerror, b );
Packit 71fd91
  /* handle error */
Packit 71fd91
}
Packit 71fd91
Packit 71fd91
bzerror = BZ_OK;
Packit 71fd91
while ( bzerror == BZ_OK && /* arbitrary other conditions */) {
Packit 71fd91
  nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ );
Packit 71fd91
  if ( bzerror == BZ_OK ) {
Packit 71fd91
    /* do something with buf[0 .. nBuf-1] */
Packit 71fd91
  }
Packit 71fd91
}
Packit 71fd91
if ( bzerror != BZ_STREAM_END ) {
Packit 71fd91
   BZ2_bzReadClose ( &bzerror, b );
Packit 71fd91
   /* handle error */
Packit 71fd91
} else {
Packit 71fd91
   BZ2_bzReadClose ( &bzerror, b );
Packit 71fd91
}
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="util-fns" xreflabel="Utility functions">
Packit 71fd91
<title>Utility functions</title>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzbufftobuffcompress" xreflabel="BZ2_bzBuffToBuffCompress">
Packit 71fd91
<title>BZ2_bzBuffToBuffCompress</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzBuffToBuffCompress( char*         dest,
Packit 71fd91
                              unsigned int* destLen,
Packit 71fd91
                              char*         source,
Packit 71fd91
                              unsigned int  sourceLen,
Packit 71fd91
                              int           blockSize100k,
Packit 71fd91
                              int           verbosity,
Packit 71fd91
                              int           workFactor );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Attempts to compress the data in <computeroutput>source[0
Packit 71fd91
.. sourceLen-1]</computeroutput> into the destination buffer,
Packit 71fd91
<computeroutput>dest[0 .. *destLen-1]</computeroutput>.  If the
Packit 71fd91
destination buffer is big enough,
Packit 71fd91
<computeroutput>*destLen</computeroutput> is set to the size of
Packit 71fd91
the compressed data, and <computeroutput>BZ_OK</computeroutput>
Packit 71fd91
is returned.  If the compressed data won't fit,
Packit 71fd91
<computeroutput>*destLen</computeroutput> is unchanged, and
Packit 71fd91
<computeroutput>BZ_OUTBUFF_FULL</computeroutput> is
Packit 71fd91
returned.</para>
Packit 71fd91
Packit 71fd91
<para>Compression in this manner is a one-shot event, done with a
Packit 71fd91
single call to this function.  The resulting compressed data is a
Packit 71fd91
complete <computeroutput>bzip2</computeroutput> format data
Packit 71fd91
stream.  There is no mechanism for making additional calls to
Packit 71fd91
provide extra input data.  If you want that kind of mechanism,
Packit 71fd91
use the low-level interface.</para>
Packit 71fd91
Packit 71fd91
<para>For the meaning of parameters
Packit 71fd91
<computeroutput>blockSize100k</computeroutput>,
Packit 71fd91
<computeroutput>verbosity</computeroutput> and
Packit 71fd91
<computeroutput>workFactor</computeroutput>, see
Packit 71fd91
<computeroutput>BZ2_bzCompressInit</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>To guarantee that the compressed data will fit in its
Packit 71fd91
buffer, allocate an output buffer of size 1% larger than the
Packit 71fd91
uncompressed data, plus six hundred extra bytes.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput>
Packit 71fd91
will not write data at or beyond
Packit 71fd91
<computeroutput>dest[*destLen]</computeroutput>, even in case of
Packit 71fd91
buffer overflow.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_CONFIG_ERROR
Packit 71fd91
  if the library has been mis-compiled
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if dest is NULL or destLen is NULL
Packit 71fd91
  or blockSize100k < 1 or blockSize100k > 9
Packit 71fd91
  or verbosity < 0 or verbosity > 4
Packit 71fd91
  or workFactor < 0 or workFactor > 250
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if insufficient memory is available 
Packit 71fd91
BZ_OUTBUFF_FULL
Packit 71fd91
  if the size of the compressed data exceeds *destLen
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="bzbufftobuffdecompress" xreflabel="BZ2_bzBuffToBuffDecompress">
Packit 71fd91
<title>BZ2_bzBuffToBuffDecompress</title>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzBuffToBuffDecompress( char*         dest,
Packit 71fd91
                                unsigned int* destLen,
Packit 71fd91
                                char*         source,
Packit 71fd91
                                unsigned int  sourceLen,
Packit 71fd91
                                int           small,
Packit 71fd91
                                int           verbosity );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Attempts to decompress the data in <computeroutput>source[0
Packit 71fd91
.. sourceLen-1]</computeroutput> into the destination buffer,
Packit 71fd91
<computeroutput>dest[0 .. *destLen-1]</computeroutput>.  If the
Packit 71fd91
destination buffer is big enough,
Packit 71fd91
<computeroutput>*destLen</computeroutput> is set to the size of
Packit 71fd91
the uncompressed data, and <computeroutput>BZ_OK</computeroutput>
Packit 71fd91
is returned.  If the compressed data won't fit,
Packit 71fd91
<computeroutput>*destLen</computeroutput> is unchanged, and
Packit 71fd91
<computeroutput>BZ_OUTBUFF_FULL</computeroutput> is
Packit 71fd91
returned.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>source</computeroutput> is assumed to hold
Packit 71fd91
a complete <computeroutput>bzip2</computeroutput> format data
Packit 71fd91
stream.
Packit 71fd91
<computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> tries
Packit 71fd91
to decompress the entirety of the stream into the output
Packit 71fd91
buffer.</para>
Packit 71fd91
Packit 71fd91
<para>For the meaning of parameters
Packit 71fd91
<computeroutput>small</computeroutput> and
Packit 71fd91
<computeroutput>verbosity</computeroutput>, see
Packit 71fd91
<computeroutput>BZ2_bzDecompressInit</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Because the compression ratio of the compressed data cannot
Packit 71fd91
be known in advance, there is no easy way to guarantee that the
Packit 71fd91
output buffer will be big enough.  You may of course make
Packit 71fd91
arrangements in your code to record the size of the uncompressed
Packit 71fd91
data, but such a mechanism is beyond the scope of this
Packit 71fd91
library.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput>
Packit 71fd91
will not write data at or beyond
Packit 71fd91
<computeroutput>dest[*destLen]</computeroutput>, even in case of
Packit 71fd91
buffer overflow.</para>
Packit 71fd91
Packit 71fd91
<para>Possible return values:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZ_CONFIG_ERROR
Packit 71fd91
  if the library has been mis-compiled
Packit 71fd91
BZ_PARAM_ERROR
Packit 71fd91
  if dest is NULL or destLen is NULL
Packit 71fd91
  or small != 0 && small != 1
Packit 71fd91
  or verbosity < 0 or verbosity > 4
Packit 71fd91
BZ_MEM_ERROR
Packit 71fd91
  if insufficient memory is available 
Packit 71fd91
BZ_OUTBUFF_FULL
Packit 71fd91
  if the size of the compressed data exceeds *destLen
Packit 71fd91
BZ_DATA_ERROR
Packit 71fd91
  if a data integrity error was detected in the compressed data
Packit 71fd91
BZ_DATA_ERROR_MAGIC
Packit 71fd91
  if the compressed data doesn't begin with the right magic bytes
Packit 71fd91
BZ_UNEXPECTED_EOF
Packit 71fd91
  if the compressed data ends unexpectedly
Packit 71fd91
BZ_OK
Packit 71fd91
  otherwise
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="zlib-compat" xreflabel="zlib compatibility functions">
Packit 71fd91
<title>zlib compatibility functions</title>
Packit 71fd91
Packit 71fd91
<para>Yoshioka Tsuneo has contributed some functions to give
Packit 71fd91
better <computeroutput>zlib</computeroutput> compatibility.
Packit 71fd91
These functions are <computeroutput>BZ2_bzopen</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzread</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzwrite</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzflush</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzclose</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzerror</computeroutput> and
Packit 71fd91
<computeroutput>BZ2_bzlibVersion</computeroutput>.  These
Packit 71fd91
functions are not (yet) officially part of the library.  If they
Packit 71fd91
break, you get to keep all the pieces.  Nevertheless, I think
Packit 71fd91
they work ok.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
typedef void BZFILE;
Packit 71fd91
Packit 71fd91
const char * BZ2_bzlibVersion ( void );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Returns a string indicating the library version.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
BZFILE * BZ2_bzopen  ( const char *path, const char *mode );
Packit 71fd91
BZFILE * BZ2_bzdopen ( int        fd,    const char *mode );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Opens a <computeroutput>.bz2</computeroutput> file for
Packit 71fd91
reading or writing, using either its name or a pre-existing file
Packit 71fd91
descriptor.  Analogous to <computeroutput>fopen</computeroutput>
Packit 71fd91
and <computeroutput>fdopen</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int BZ2_bzread  ( BZFILE* b, void* buf, int len );
Packit 71fd91
int BZ2_bzwrite ( BZFILE* b, void* buf, int len );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Reads/writes data from/to a previously opened
Packit 71fd91
<computeroutput>BZFILE</computeroutput>.  Analogous to
Packit 71fd91
<computeroutput>fread</computeroutput> and
Packit 71fd91
<computeroutput>fwrite</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
int  BZ2_bzflush ( BZFILE* b );
Packit 71fd91
void BZ2_bzclose ( BZFILE* b );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Flushes/closes a <computeroutput>BZFILE</computeroutput>.
Packit 71fd91
<computeroutput>BZ2_bzflush</computeroutput> doesn't actually do
Packit 71fd91
anything.  Analogous to <computeroutput>fflush</computeroutput>
Packit 71fd91
and <computeroutput>fclose</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
const char * BZ2_bzerror ( BZFILE *b, int *errnum )
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>Returns a string describing the more recent error status of
Packit 71fd91
<computeroutput>b</computeroutput>, and also sets
Packit 71fd91
<computeroutput>*errnum</computeroutput> to its numerical
Packit 71fd91
value.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
       xreflabel="Using the library in a stdio-free environment">
Packit 71fd91
<title>Using the library in a stdio-free environment</title>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="stdio-bye" xreflabel="Getting rid of stdio">
Packit 71fd91
<title>Getting rid of stdio</title>
Packit 71fd91
Packit 71fd91
<para>In a deeply embedded application, you might want to use
Packit 71fd91
just the memory-to-memory functions.  You can do this
Packit 71fd91
conveniently by compiling the library with preprocessor symbol
Packit 71fd91
<computeroutput>BZ_NO_STDIO</computeroutput> defined.  Doing this
Packit 71fd91
gives you a library containing only the following eight
Packit 71fd91
functions:</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>BZ2_bzCompressInit</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzCompress</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzCompressEnd</computeroutput>
Packit 71fd91
<computeroutput>BZ2_bzDecompressInit</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzDecompress</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzDecompressEnd</computeroutput>
Packit 71fd91
<computeroutput>BZ2_bzBuffToBuffCompress</computeroutput>,
Packit 71fd91
<computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput></para>
Packit 71fd91
Packit 71fd91
<para>When compiled like this, all functions will ignore
Packit 71fd91
<computeroutput>verbosity</computeroutput> settings.</para>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect2 id="critical-error" xreflabel="Critical error handling">
Packit 71fd91
<title>Critical error handling</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>libbzip2</computeroutput> contains a number
Packit 71fd91
of internal assertion checks which should, needless to say, never
Packit 71fd91
be activated.  Nevertheless, if an assertion should fail,
Packit 71fd91
behaviour depends on whether or not the library was compiled with
Packit 71fd91
<computeroutput>BZ_NO_STDIO</computeroutput> set.</para>
Packit 71fd91
Packit 71fd91
<para>For a normal compile, an assertion failure yields the
Packit 71fd91
message:</para>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<para>bzip2/libbzip2: internal error number N.</para>
Packit 71fd91
<para>This is a bug in bzip2/libbzip2, &bz-version; of &bz-date;.
Packit 71fd91
Please report it to me at: &bz-email;.  If this happened
Packit 71fd91
when you were using some program which uses libbzip2 as a
Packit 71fd91
component, you should also report this bug to the author(s)
Packit 71fd91
of that program.  Please make an effort to report this bug;
Packit 71fd91
timely and accurate bug reports eventually lead to higher
Packit 71fd91
quality software.  Thanks.  Julian Seward, &bz-date;.
Packit 71fd91
</para>
Packit 71fd91
Packit 71fd91
<para>where <computeroutput>N</computeroutput> is some error code
Packit 71fd91
number.  If <computeroutput>N == 1007</computeroutput>, it also
Packit 71fd91
prints some extra text advising the reader that unreliable memory
Packit 71fd91
is often associated with internal error 1007. (This is a
Packit 71fd91
frequently-observed-phenomenon with versions 1.0.0/1.0.1).</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>exit(3)</computeroutput> is then
Packit 71fd91
called.</para>
Packit 71fd91
Packit 71fd91
<para>For a <computeroutput>stdio</computeroutput>-free library,
Packit 71fd91
assertion failures result in a call to a function declared
Packit 71fd91
as:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
extern void bz_internal_error ( int errcode );
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>The relevant code is passed as a parameter.  You should
Packit 71fd91
supply such a function.</para>
Packit 71fd91
Packit 71fd91
<para>In either case, once an assertion failure has occurred, any
Packit 71fd91
<computeroutput>bz_stream</computeroutput> records involved can
Packit 71fd91
be regarded as invalid.  You should not attempt to resume normal
Packit 71fd91
operation with them.</para>
Packit 71fd91
Packit 71fd91
<para>You may, of course, change critical error handling to suit
Packit 71fd91
your needs.  As I said above, critical errors indicate bugs in
Packit 71fd91
the library and should not occur.  All "normal" error situations
Packit 71fd91
are indicated via error return codes from functions, and can be
Packit 71fd91
recovered from.</para>
Packit 71fd91
Packit 71fd91
</sect2>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="win-dll" xreflabel="Making a Windows DLL">
Packit 71fd91
<title>Making a Windows DLL</title>
Packit 71fd91
Packit 71fd91
<para>Everything related to Windows has been contributed by
Packit 71fd91
Yoshioka Tsuneo
Packit 71fd91
(<computeroutput>tsuneo@rr.iij4u.or.jp</computeroutput>), so
Packit 71fd91
you should send your queries to him (but perhaps Cc: me,
Packit 71fd91
<computeroutput>&bz-email;</computeroutput>).</para>
Packit 71fd91
Packit 71fd91
<para>My vague understanding of what to do is: using Visual C++
Packit 71fd91
5.0, open the project file
Packit 71fd91
<computeroutput>libbz2.dsp</computeroutput>, and build.  That's
Packit 71fd91
all.</para>
Packit 71fd91
Packit 71fd91
<para>If you can't open the project file for some reason, make a
Packit 71fd91
new one, naming these files:
Packit 71fd91
<computeroutput>blocksort.c</computeroutput>,
Packit 71fd91
<computeroutput>bzlib.c</computeroutput>,
Packit 71fd91
<computeroutput>compress.c</computeroutput>,
Packit 71fd91
<computeroutput>crctable.c</computeroutput>,
Packit 71fd91
<computeroutput>decompress.c</computeroutput>,
Packit 71fd91
<computeroutput>huffman.c</computeroutput>,
Packit 71fd91
<computeroutput>randtable.c</computeroutput> and
Packit 71fd91
<computeroutput>libbz2.def</computeroutput>.  You will also need
Packit 71fd91
to name the header files <computeroutput>bzlib.h</computeroutput>
Packit 71fd91
and <computeroutput>bzlib_private.h</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>If you don't use VC++, you may need to define the
Packit 71fd91
proprocessor symbol
Packit 71fd91
<computeroutput>_WIN32</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Finally, <computeroutput>dlltest.c</computeroutput> is a
Packit 71fd91
sample program using the DLL.  It has a project file,
Packit 71fd91
<computeroutput>dlltest.dsp</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>If you just want a makefile for Visual C, have a look at
Packit 71fd91
<computeroutput>makefile.msc</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
<para>Be aware that if you compile
Packit 71fd91
<computeroutput>bzip2</computeroutput> itself on Win32, you must
Packit 71fd91
set <computeroutput>BZ_UNIX</computeroutput> to 0 and
Packit 71fd91
<computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the file
Packit 71fd91
<computeroutput>bzip2.c</computeroutput>, before compiling.
Packit 71fd91
Otherwise the resulting binary won't work correctly.</para>
Packit 71fd91
Packit 71fd91
<para>I haven't tried any of this stuff myself, but it all looks
Packit 71fd91
plausible.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
</chapter>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<chapter id="misc" xreflabel="Miscellanea">
Packit 71fd91
<title>Miscellanea</title>
Packit 71fd91
Packit 71fd91
<para>These are just some random thoughts of mine.  Your mileage
Packit 71fd91
may vary.</para>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="limits" xreflabel="Limitations of the compressed file format">
Packit 71fd91
<title>Limitations of the compressed file format</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2-1.0.X</computeroutput>,
Packit 71fd91
<computeroutput>0.9.5</computeroutput> and
Packit 71fd91
<computeroutput>0.9.0</computeroutput> use exactly the same file
Packit 71fd91
format as the original version,
Packit 71fd91
<computeroutput>bzip2-0.1</computeroutput>.  This decision was
Packit 71fd91
made in the interests of stability.  Creating yet another
Packit 71fd91
incompatible compressed file format would create further
Packit 71fd91
confusion and disruption for users.</para>
Packit 71fd91
Packit 71fd91
<para>Nevertheless, this is not a painless decision.  Development
Packit 71fd91
work since the release of
Packit 71fd91
<computeroutput>bzip2-0.1</computeroutput> in August 1997 has
Packit 71fd91
shown complexities in the file format which slow down
Packit 71fd91
decompression and, in retrospect, are unnecessary.  These
Packit 71fd91
are:</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para>The run-length encoder, which is the first of the
Packit 71fd91
   compression transformations, is entirely irrelevant.  The
Packit 71fd91
   original purpose was to protect the sorting algorithm from the
Packit 71fd91
   very worst case input: a string of repeated symbols.  But
Packit 71fd91
   algorithm steps Q6a and Q6b in the original Burrows-Wheeler
Packit 71fd91
   technical report (SRC-124) show how repeats can be handled
Packit 71fd91
   without difficulty in block sorting.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>The randomisation mechanism doesn't really need to be
Packit 71fd91
   there.  Udi Manber and Gene Myers published a suffix array
Packit 71fd91
   construction algorithm a few years back, which can be employed
Packit 71fd91
   to sort any block, no matter how repetitive, in O(N log N)
Packit 71fd91
   time.  Subsequent work by Kunihiko Sadakane has produced a
Packit 71fd91
   derivative O(N (log N)^2) algorithm which usually outperforms
Packit 71fd91
   the Manber-Myers algorithm.</para>
Packit 71fd91
Packit 71fd91
   <para>I could have changed to Sadakane's algorithm, but I find
Packit 71fd91
   it to be slower than <computeroutput>bzip2</computeroutput>'s
Packit 71fd91
   existing algorithm for most inputs, and the randomisation
Packit 71fd91
   mechanism protects adequately against bad cases.  I didn't
Packit 71fd91
   think it was a good tradeoff to make.  Partly this is due to
Packit 71fd91
   the fact that I was not flooded with email complaints about
Packit 71fd91
   <computeroutput>bzip2-0.1</computeroutput>'s performance on
Packit 71fd91
   repetitive data, so perhaps it isn't a problem for real
Packit 71fd91
   inputs.</para>
Packit 71fd91
Packit 71fd91
   <para>Probably the best long-term solution, and the one I have
Packit 71fd91
   incorporated into 0.9.5 and above, is to use the existing
Packit 71fd91
   sorting algorithm initially, and fall back to a O(N (log N)^2)
Packit 71fd91
   algorithm if the standard algorithm gets into
Packit 71fd91
   difficulties.</para></listitem>
Packit 71fd91
Packit 71fd91
  <listitem><para>The compressed file format was never designed to be
Packit 71fd91
   handled by a library, and I have had to jump though some hoops
Packit 71fd91
   to produce an efficient implementation of decompression.  It's
Packit 71fd91
   a bit hairy.  Try passing
Packit 71fd91
   <computeroutput>decompress.c</computeroutput> through the C
Packit 71fd91
   preprocessor and you'll see what I mean.  Much of this
Packit 71fd91
   complexity could have been avoided if the compressed size of
Packit 71fd91
   each block of data was recorded in the data stream.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>An Adler-32 checksum, rather than a CRC32 checksum,
Packit 71fd91
   would be faster to compute.</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
<para>It would be fair to say that the
Packit 71fd91
<computeroutput>bzip2</computeroutput> format was frozen before I
Packit 71fd91
properly and fully understood the performance consequences of
Packit 71fd91
doing so.</para>
Packit 71fd91
Packit 71fd91
<para>Improvements which I was able to incorporate into 0.9.0,
Packit 71fd91
despite using the same file format, are:</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para>Single array implementation of the inverse BWT.  This
Packit 71fd91
  significantly speeds up decompression, presumably because it
Packit 71fd91
  reduces the number of cache misses.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>Faster inverse MTF transform for large MTF values.
Packit 71fd91
  The new implementation is based on the notion of sliding blocks
Packit 71fd91
  of values.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para><computeroutput>bzip2-0.9.0</computeroutput> now reads
Packit 71fd91
  and writes files with <computeroutput>fread</computeroutput>
Packit 71fd91
  and <computeroutput>fwrite</computeroutput>; version 0.1 used
Packit 71fd91
  <computeroutput>putc</computeroutput> and
Packit 71fd91
  <computeroutput>getc</computeroutput>.  Duh!  Well, you live
Packit 71fd91
  and learn.</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
<para>Further ahead, it would be nice to be able to do random
Packit 71fd91
access into files.  This will require some careful design of
Packit 71fd91
compressed file formats.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="port-issues" xreflabel="Portability issues">
Packit 71fd91
<title>Portability issues</title>
Packit 71fd91
Packit 71fd91
<para>After some consideration, I have decided not to use GNU
Packit 71fd91
<computeroutput>autoconf</computeroutput> to configure 0.9.5 or
Packit 71fd91
1.0.</para>
Packit 71fd91
Packit 71fd91
<para><computeroutput>autoconf</computeroutput>, admirable and
Packit 71fd91
wonderful though it is, mainly assists with portability problems
Packit 71fd91
between Unix-like platforms.  But
Packit 71fd91
<computeroutput>bzip2</computeroutput> doesn't have much in the
Packit 71fd91
way of portability problems on Unix; most of the difficulties
Packit 71fd91
appear when porting to the Mac, or to Microsoft's operating
Packit 71fd91
systems.  <computeroutput>autoconf</computeroutput> doesn't help
Packit 71fd91
in those cases, and brings in a whole load of new
Packit 71fd91
complexity.</para>
Packit 71fd91
Packit 71fd91
<para>Most people should be able to compile the library and
Packit 71fd91
program under Unix straight out-of-the-box, so to speak,
Packit 71fd91
especially if you have a version of GNU C available.</para>
Packit 71fd91
Packit 71fd91
<para>There are a couple of
Packit 71fd91
<computeroutput>__inline__</computeroutput> directives in the
Packit 71fd91
code.  GNU C (<computeroutput>gcc</computeroutput>) should be
Packit 71fd91
able to handle them.  If you're not using GNU C, your C compiler
Packit 71fd91
shouldn't see them at all.  If your compiler does, for some
Packit 71fd91
reason, see them and doesn't like them, just
Packit 71fd91
<computeroutput>#define</computeroutput>
Packit 71fd91
<computeroutput>__inline__</computeroutput> to be
Packit 71fd91
<computeroutput>/* */</computeroutput>.  One easy way to do this
Packit 71fd91
is to compile with the flag
Packit 71fd91
<computeroutput>-D__inline__=</computeroutput>, which should be
Packit 71fd91
understood by most Unix compilers.</para>
Packit 71fd91
Packit 71fd91
<para>If you still have difficulties, try compiling with the
Packit 71fd91
macro <computeroutput>BZ_STRICT_ANSI</computeroutput> defined.
Packit 71fd91
This should enable you to build the library in a strictly ANSI
Packit 71fd91
compliant environment.  Building the program itself like this is
Packit 71fd91
dangerous and not supported, since you remove
Packit 71fd91
<computeroutput>bzip2</computeroutput>'s checks against
Packit 71fd91
compressing directories, symbolic links, devices, and other
Packit 71fd91
not-really-a-file entities.  This could cause filesystem
Packit 71fd91
corruption!</para>
Packit 71fd91
Packit 71fd91
<para>One other thing: if you create a
Packit 71fd91
<computeroutput>bzip2</computeroutput> binary for public distribution,
Packit 71fd91
please consider linking it statically (<computeroutput>gcc
Packit 71fd91
-static</computeroutput>).  This avoids all sorts of library-version
Packit 71fd91
issues that others may encounter later on.</para>
Packit 71fd91
Packit 71fd91
<para>If you build <computeroutput>bzip2</computeroutput> on
Packit 71fd91
Win32, you must set <computeroutput>BZ_UNIX</computeroutput> to 0
Packit 71fd91
and <computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the
Packit 71fd91
file <computeroutput>bzip2.c</computeroutput>, before compiling.
Packit 71fd91
Otherwise the resulting binary won't work correctly.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="bugs" xreflabel="Reporting bugs">
Packit 71fd91
<title>Reporting bugs</title>
Packit 71fd91
Packit 71fd91
<para>I tried pretty hard to make sure
Packit 71fd91
<computeroutput>bzip2</computeroutput> is bug free, both by
Packit 71fd91
design and by testing.  Hopefully you'll never need to read this
Packit 71fd91
section for real.</para>
Packit 71fd91
Packit 71fd91
<para>Nevertheless, if <computeroutput>bzip2</computeroutput> dies
Packit 71fd91
with a segmentation fault, a bus error or an internal assertion
Packit 71fd91
failure, it will ask you to email me a bug report.  Experience from
Packit 71fd91
years of feedback of bzip2 users indicates that almost all these
Packit 71fd91
problems can be traced to either compiler bugs or hardware
Packit 71fd91
problems.</para>
Packit 71fd91
Packit 71fd91
<itemizedlist mark='bullet'>
Packit 71fd91
Packit 71fd91
 <listitem><para>Recompile the program with no optimisation, and
Packit 71fd91
  see if it works.  And/or try a different compiler.  I heard all
Packit 71fd91
  sorts of stories about various flavours of GNU C (and other
Packit 71fd91
  compilers) generating bad code for
Packit 71fd91
  <computeroutput>bzip2</computeroutput>, and I've run across two
Packit 71fd91
  such examples myself.</para>
Packit 71fd91
Packit 71fd91
  <para>2.7.X versions of GNU C are known to generate bad code
Packit 71fd91
  from time to time, at high optimisation levels.  If you get
Packit 71fd91
  problems, try using the flags
Packit 71fd91
  <computeroutput>-O2</computeroutput>
Packit 71fd91
  <computeroutput>-fomit-frame-pointer</computeroutput>
Packit 71fd91
  <computeroutput>-fno-strength-reduce</computeroutput>.  You
Packit 71fd91
  should specifically <emphasis>not</emphasis> use
Packit 71fd91
  <computeroutput>-funroll-loops</computeroutput>.</para>
Packit 71fd91
Packit 71fd91
  <para>You may notice that the Makefile runs six tests as part
Packit 71fd91
  of the build process.  If the program passes all of these, it's
Packit 71fd91
  a pretty good (but not 100%) indication that the compiler has
Packit 71fd91
  done its job correctly.</para></listitem>
Packit 71fd91
Packit 71fd91
 <listitem><para>If <computeroutput>bzip2</computeroutput>
Packit 71fd91
  crashes randomly, and the crashes are not repeatable, you may
Packit 71fd91
  have a flaky memory subsystem.
Packit 71fd91
  <computeroutput>bzip2</computeroutput> really hammers your
Packit 71fd91
  memory hierarchy, and if it's a bit marginal, you may get these
Packit 71fd91
  problems.  Ditto if your disk or I/O subsystem is slowly
Packit 71fd91
  failing.  Yup, this really does happen.</para>
Packit 71fd91
Packit 71fd91
  <para>Try using a different machine of the same type, and see
Packit 71fd91
  if you can repeat the problem.</para></listitem>
Packit 71fd91
Packit 71fd91
  <listitem><para>This isn't really a bug, but ... If
Packit 71fd91
  <computeroutput>bzip2</computeroutput> tells you your file is
Packit 71fd91
  corrupted on decompression, and you obtained the file via FTP,
Packit 71fd91
  there is a possibility that you forgot to tell FTP to do a
Packit 71fd91
  binary mode transfer.  That absolutely will cause the file to
Packit 71fd91
  be non-decompressible.  You'll have to transfer it
Packit 71fd91
  again.</para></listitem>
Packit 71fd91
Packit 71fd91
</itemizedlist>
Packit 71fd91
Packit 71fd91
<para>If you've incorporated
Packit 71fd91
<computeroutput>libbzip2</computeroutput> into your own program
Packit 71fd91
and are getting problems, please, please, please, check that the
Packit 71fd91
parameters you are passing in calls to the library, are correct,
Packit 71fd91
and in accordance with what the documentation says is allowable.
Packit 71fd91
I have tried to make the library robust against such problems,
Packit 71fd91
but I'm sure I haven't succeeded.</para>
Packit 71fd91
Packit 71fd91
<para>Finally, if the above comments don't help, you'll have to
Packit 71fd91
send me a bug report.  Now, it's just amazing how many people
Packit 71fd91
will send me a bug report saying something like:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
bzip2 crashed with segmentation fault on my machine
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>and absolutely nothing else.  Needless to say, a such a
Packit 71fd91
report is <emphasis>totally, utterly, completely and
Packit 71fd91
comprehensively 100% useless; a waste of your time, my time, and
Packit 71fd91
net bandwidth</emphasis>.  With no details at all, there's no way
Packit 71fd91
I can possibly begin to figure out what the problem is.</para>
Packit 71fd91
Packit 71fd91
<para>The rules of the game are: facts, facts, facts.  Don't omit
Packit 71fd91
them because "oh, they won't be relevant".  At the bare
Packit 71fd91
minimum:</para>
Packit 71fd91
Packit 71fd91
<programlisting>
Packit 71fd91
Machine type.  Operating system version.  
Packit 71fd91
Exact version of bzip2 (do bzip2 -V).  
Packit 71fd91
Exact version of the compiler used.  
Packit 71fd91
Flags passed to the compiler.
Packit 71fd91
</programlisting>
Packit 71fd91
Packit 71fd91
<para>However, the most important single thing that will help me
Packit 71fd91
is the file that you were trying to compress or decompress at the
Packit 71fd91
time the problem happened.  Without that, my ability to do
Packit 71fd91
anything more than speculate about the cause, is limited.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="package" xreflabel="Did you get the right package?">
Packit 71fd91
<title>Did you get the right package?</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> is a resource hog.
Packit 71fd91
It soaks up large amounts of CPU cycles and memory.  Also, it
Packit 71fd91
gives very large latencies.  In the worst case, you can feed many
Packit 71fd91
megabytes of uncompressed data into the library before getting
Packit 71fd91
any compressed output, so this probably rules out applications
Packit 71fd91
requiring interactive behaviour.</para>
Packit 71fd91
Packit 71fd91
<para>These aren't faults of my implementation, I hope, but more
Packit 71fd91
an intrinsic property of the Burrows-Wheeler transform
Packit 71fd91
(unfortunately).  Maybe this isn't what you want.</para>
Packit 71fd91
Packit 71fd91
<para>If you want a compressor and/or library which is faster,
Packit 71fd91
uses less memory but gets pretty good compression, and has
Packit 71fd91
minimal latency, consider Jean-loup Gailly's and Mark Adler's
Packit 71fd91
work, <computeroutput>zlib-1.2.1</computeroutput> and
Packit 71fd91
<computeroutput>gzip-1.2.4</computeroutput>.  Look for them at 
Packit 71fd91
<ulink url="http://www.zlib.org">http://www.zlib.org</ulink> and 
Packit 71fd91
<ulink url="http://www.gzip.org">http://www.gzip.org</ulink>
Packit 71fd91
respectively.</para>
Packit 71fd91
Packit 71fd91
<para>For something faster and lighter still, you might try Markus F
Packit 71fd91
X J Oberhumer's <computeroutput>LZO</computeroutput> real-time
Packit 71fd91
compression/decompression library, at 
Packit 71fd91
<ulink url="http://www.oberhumer.com/opensource">http://www.oberhumer.com/opensource</ulink>.</para>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
<sect1 id="reading" xreflabel="Further Reading">
Packit 71fd91
<title>Further Reading</title>
Packit 71fd91
Packit 71fd91
<para><computeroutput>bzip2</computeroutput> is not research
Packit 71fd91
work, in the sense that it doesn't present any new ideas.
Packit 71fd91
Rather, it's an engineering exercise based on existing
Packit 71fd91
ideas.</para>
Packit 71fd91
Packit 71fd91
<para>Four documents describe essentially all the ideas behind
Packit 71fd91
<computeroutput>bzip2</computeroutput>:</para>
Packit 71fd91
Packit 71fd91
<literallayout>Michael Burrows and D. J. Wheeler:
Packit 71fd91
  "A block-sorting lossless data compression algorithm"
Packit 71fd91
   10th May 1994. 
Packit 71fd91
   Digital SRC Research Report 124.
Packit 71fd91
   ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
Packit 71fd91
   If you have trouble finding it, try searching at the
Packit 71fd91
   New Zealand Digital Library, http://www.nzdl.org.
Packit 71fd91
Packit 71fd91
Daniel S. Hirschberg and Debra A. LeLewer
Packit 71fd91
  "Efficient Decoding of Prefix Codes"
Packit 71fd91
   Communications of the ACM, April 1990, Vol 33, Number 4.
Packit 71fd91
   You might be able to get an electronic copy of this
Packit 71fd91
   from the ACM Digital Library.
Packit 71fd91
Packit 71fd91
David J. Wheeler
Packit 71fd91
   Program bred3.c and accompanying document bred3.ps.
Packit 71fd91
   This contains the idea behind the multi-table Huffman coding scheme.
Packit 71fd91
   ftp://ftp.cl.cam.ac.uk/users/djw3/
Packit 71fd91
Packit 71fd91
Jon L. Bentley and Robert Sedgewick
Packit 71fd91
  "Fast Algorithms for Sorting and Searching Strings"
Packit 71fd91
   Available from Sedgewick's web page,
Packit 71fd91
   www.cs.princeton.edu/~rs
Packit 71fd91
</literallayout>
Packit 71fd91
Packit 71fd91
<para>The following paper gives valuable additional insights into
Packit 71fd91
the algorithm, but is not immediately the basis of any code used
Packit 71fd91
in bzip2.</para>
Packit 71fd91
Packit 71fd91
<literallayout>Peter Fenwick:
Packit 71fd91
   Block Sorting Text Compression
Packit 71fd91
   Proceedings of the 19th Australasian Computer Science Conference,
Packit 71fd91
     Melbourne, Australia.  Jan 31 - Feb 2, 1996.
Packit 71fd91
   ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</literallayout>
Packit 71fd91
Packit 71fd91
<para>Kunihiko Sadakane's sorting algorithm, mentioned above, is
Packit 71fd91
available from:</para>
Packit 71fd91
Packit 71fd91
<literallayout>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz
Packit 71fd91
</literallayout>
Packit 71fd91
Packit 71fd91
<para>The Manber-Myers suffix array construction algorithm is
Packit 71fd91
described in a paper available from:</para>
Packit 71fd91
Packit 71fd91
<literallayout>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps
Packit 71fd91
</literallayout>
Packit 71fd91
Packit 71fd91
<para>Finally, the following papers document some
Packit 71fd91
investigations I made into the performance of sorting
Packit 71fd91
and decompression algorithms:</para>
Packit 71fd91
Packit 71fd91
<literallayout>Julian Seward
Packit 71fd91
   On the Performance of BWT Sorting Algorithms
Packit 71fd91
   Proceedings of the IEEE Data Compression Conference 2000
Packit 71fd91
     Snowbird, Utah.  28-30 March 2000.
Packit 71fd91
Packit 71fd91
Julian Seward
Packit 71fd91
   Space-time Tradeoffs in the Inverse B-W Transform
Packit 71fd91
   Proceedings of the IEEE Data Compression Conference 2001
Packit 71fd91
     Snowbird, Utah.  27-29 March 2001.
Packit 71fd91
</literallayout>
Packit 71fd91
Packit 71fd91
</sect1>
Packit 71fd91
Packit 71fd91
</chapter>
Packit 71fd91
Packit 71fd91
</book>