Blame bzip2.1

Packit 71fd91
.PU
Packit 71fd91
.TH bzip2 1
Packit 71fd91
.SH NAME
Packit 71fd91
bzip2, bunzip2 \- a block-sorting file compressor, v1.0.6
Packit 71fd91
.br
Packit 71fd91
bzcat \- decompresses files to stdout
Packit 71fd91
.br
Packit 71fd91
bzip2recover \- recovers data from damaged bzip2 files
Packit 71fd91
Packit 71fd91
.SH SYNOPSIS
Packit 71fd91
.ll +8
Packit 71fd91
.B bzip2
Packit 71fd91
.RB [ " \-cdfkqstvzVL123456789 " ]
Packit 71fd91
[
Packit 71fd91
.I "filenames \&..."
Packit 71fd91
]
Packit 71fd91
.ll -8
Packit 71fd91
.br
Packit 71fd91
.B bunzip2
Packit 71fd91
.RB [ " \-fkvsVL " ]
Packit 71fd91
[ 
Packit 71fd91
.I "filenames \&..."
Packit 71fd91
]
Packit 71fd91
.br
Packit 71fd91
.B bzcat
Packit 71fd91
.RB [ " \-s " ]
Packit 71fd91
[ 
Packit 71fd91
.I "filenames \&..."
Packit 71fd91
]
Packit 71fd91
.br
Packit 71fd91
.B bzip2recover
Packit 71fd91
.I "filename"
Packit 71fd91
Packit 71fd91
.SH DESCRIPTION
Packit 71fd91
.I bzip2
Packit 71fd91
compresses files using the Burrows-Wheeler block sorting
Packit 71fd91
text compression algorithm, and Huffman coding.  Compression is
Packit 71fd91
generally considerably better than that achieved by more conventional
Packit 71fd91
LZ77/LZ78-based compressors, and approaches the performance of the PPM
Packit 71fd91
family of statistical compressors.
Packit 71fd91
Packit 71fd91
The command-line options are deliberately very similar to 
Packit 71fd91
those of 
Packit 71fd91
.I GNU gzip, 
Packit 71fd91
but they are not identical.
Packit 71fd91
Packit 71fd91
.I bzip2
Packit 71fd91
expects a list of file names to accompany the
Packit 71fd91
command-line flags.  Each file is replaced by a compressed version of
Packit 71fd91
itself, with the name "original_name.bz2".  
Packit 71fd91
Each compressed file
Packit 71fd91
has the same modification date, permissions, and, when possible,
Packit 71fd91
ownership as the corresponding original, so that these properties can
Packit 71fd91
be correctly restored at decompression time.  File name handling is
Packit 71fd91
naive in the sense that there is no mechanism for preserving original
Packit 71fd91
file names, permissions, ownerships or dates in filesystems which lack
Packit 71fd91
these concepts, or have serious file name length restrictions, such as
Packit 71fd91
MS-DOS.
Packit 71fd91
Packit 71fd91
.I bzip2
Packit 71fd91
and
Packit 71fd91
.I bunzip2
Packit 71fd91
will by default not overwrite existing
Packit 71fd91
files.  If you want this to happen, specify the \-f flag.
Packit 71fd91
Packit 71fd91
If no file names are specified,
Packit 71fd91
.I bzip2
Packit 71fd91
compresses from standard
Packit 71fd91
input to standard output.  In this case,
Packit 71fd91
.I bzip2
Packit 71fd91
will decline to
Packit 71fd91
write compressed output to a terminal, as this would be entirely
Packit 71fd91
incomprehensible and therefore pointless.
Packit 71fd91
Packit 71fd91
.I bunzip2
Packit 71fd91
(or
Packit 71fd91
.I bzip2 \-d) 
Packit 71fd91
decompresses all
Packit 71fd91
specified files.  Files which were not created by 
Packit 71fd91
.I bzip2
Packit 71fd91
will be detected and ignored, and a warning issued.  
Packit 71fd91
.I bzip2
Packit 71fd91
attempts to guess the filename for the decompressed file 
Packit 71fd91
from that of the compressed file as follows:
Packit 71fd91
Packit 71fd91
       filename.bz2    becomes   filename
Packit 71fd91
       filename.bz     becomes   filename
Packit 71fd91
       filename.tbz2   becomes   filename.tar
Packit 71fd91
       filename.tbz    becomes   filename.tar
Packit 71fd91
       anyothername    becomes   anyothername.out
Packit 71fd91
Packit 71fd91
If the file does not end in one of the recognised endings, 
Packit 71fd91
.I .bz2, 
Packit 71fd91
.I .bz, 
Packit 71fd91
.I .tbz2
Packit 71fd91
or
Packit 71fd91
.I .tbz, 
Packit 71fd91
.I bzip2 
Packit 71fd91
complains that it cannot
Packit 71fd91
guess the name of the original file, and uses the original name
Packit 71fd91
with
Packit 71fd91
.I .out
Packit 71fd91
appended.
Packit 71fd91
Packit 71fd91
As with compression, supplying no
Packit 71fd91
filenames causes decompression from 
Packit 71fd91
standard input to standard output.
Packit 71fd91
Packit 71fd91
.I bunzip2 
Packit 71fd91
will correctly decompress a file which is the
Packit 71fd91
concatenation of two or more compressed files.  The result is the
Packit 71fd91
concatenation of the corresponding uncompressed files.  Integrity
Packit 71fd91
testing (\-t) 
Packit 71fd91
of concatenated 
Packit 71fd91
compressed files is also supported.
Packit 71fd91
Packit 71fd91
You can also compress or decompress files to the standard output by
Packit 71fd91
giving the \-c flag.  Multiple files may be compressed and
Packit 71fd91
decompressed like this.  The resulting outputs are fed sequentially to
Packit 71fd91
stdout.  Compression of multiple files 
Packit 71fd91
in this manner generates a stream
Packit 71fd91
containing multiple compressed file representations.  Such a stream
Packit 71fd91
can be decompressed correctly only by
Packit 71fd91
.I bzip2 
Packit 71fd91
version 0.9.0 or
Packit 71fd91
later.  Earlier versions of
Packit 71fd91
.I bzip2
Packit 71fd91
will stop after decompressing
Packit 71fd91
the first file in the stream.
Packit 71fd91
Packit 71fd91
.I bzcat
Packit 71fd91
(or
Packit 71fd91
.I bzip2 -dc) 
Packit 71fd91
decompresses all specified files to
Packit 71fd91
the standard output.
Packit 71fd91
Packit 71fd91
.I bzip2
Packit 71fd91
will read arguments from the environment variables
Packit 71fd91
.I BZIP2
Packit 71fd91
and
Packit 71fd91
.I BZIP,
Packit 71fd91
in that order, and will process them
Packit 71fd91
before any arguments read from the command line.  This gives a 
Packit 71fd91
convenient way to supply default arguments.
Packit 71fd91
Packit 71fd91
Compression is always performed, even if the compressed 
Packit 71fd91
file is slightly
Packit 71fd91
larger than the original.  Files of less than about one hundred bytes
Packit 71fd91
tend to get larger, since the compression mechanism has a constant
Packit 71fd91
overhead in the region of 50 bytes.  Random data (including the output
Packit 71fd91
of most file compressors) is coded at about 8.05 bits per byte, giving
Packit 71fd91
an expansion of around 0.5%.
Packit 71fd91
Packit 71fd91
As a self-check for your protection, 
Packit 71fd91
.I 
Packit 71fd91
bzip2
Packit 71fd91
uses 32-bit CRCs to
Packit 71fd91
make sure that the decompressed version of a file is identical to the
Packit 71fd91
original.  This guards against corruption of the compressed data, and
Packit 71fd91
against undetected bugs in
Packit 71fd91
.I bzip2
Packit 71fd91
(hopefully very unlikely).  The
Packit 71fd91
chances of data corruption going undetected is microscopic, about one
Packit 71fd91
chance in four billion for each file processed.  Be aware, though, that
Packit 71fd91
the check occurs upon decompression, so it can only tell you that
Packit 71fd91
something is wrong.  It can't help you 
Packit 71fd91
recover the original uncompressed
Packit 71fd91
data.  You can use 
Packit 71fd91
.I bzip2recover
Packit 71fd91
to try to recover data from
Packit 71fd91
damaged files.
Packit 71fd91
Packit 71fd91
Return values: 0 for a normal exit, 1 for environmental problems (file
Packit 71fd91
not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
Packit 71fd91
compressed file, 3 for an internal consistency error (eg, bug) which
Packit 71fd91
caused
Packit 71fd91
.I bzip2
Packit 71fd91
to panic.
Packit 71fd91
Packit 71fd91
.SH OPTIONS
Packit 71fd91
.TP
Packit 71fd91
.B \-c --stdout
Packit 71fd91
Compress or decompress to standard output.
Packit 71fd91
.TP
Packit 71fd91
.B \-d --decompress
Packit 71fd91
Force decompression.  
Packit 71fd91
.I bzip2, 
Packit 71fd91
.I bunzip2 
Packit 71fd91
and
Packit 71fd91
.I bzcat 
Packit 71fd91
are
Packit 71fd91
really the same program, and the decision about what actions to take is
Packit 71fd91
done on the basis of which name is used.  This flag overrides that
Packit 71fd91
mechanism, and forces 
Packit 71fd91
.I bzip2
Packit 71fd91
to decompress.
Packit 71fd91
.TP
Packit 71fd91
.B \-z --compress
Packit 71fd91
The complement to \-d: forces compression, regardless of the
Packit 71fd91
invocation name.
Packit 71fd91
.TP
Packit 71fd91
.B \-t --test
Packit 71fd91
Check integrity of the specified file(s), but don't decompress them.
Packit 71fd91
This really performs a trial decompression and throws away the result.
Packit 71fd91
.TP
Packit 71fd91
.B \-f --force
Packit 71fd91
Force overwrite of output files.  Normally,
Packit 71fd91
.I bzip2 
Packit 71fd91
will not overwrite
Packit 71fd91
existing output files.  Also forces 
Packit 71fd91
.I bzip2 
Packit 71fd91
to break hard links
Packit 71fd91
to files, which it otherwise wouldn't do.
Packit 71fd91
Packit 71fd91
bzip2 normally declines to decompress files which don't have the
Packit 71fd91
correct magic header bytes.  If forced (-f), however, it will pass
Packit 71fd91
such files through unmodified.  This is how GNU gzip behaves.
Packit 71fd91
.TP
Packit 71fd91
.B \-k --keep
Packit 71fd91
Keep (don't delete) input files during compression
Packit 71fd91
or decompression.
Packit 71fd91
.TP
Packit 71fd91
.B \-s --small
Packit 71fd91
Reduce memory usage, for compression, decompression and testing.  Files
Packit 71fd91
are decompressed and tested using a modified algorithm which only
Packit 71fd91
requires 2.5 bytes per block byte.  This means any file can be
Packit 71fd91
decompressed in 2300k of memory, albeit at about half the normal speed.
Packit 71fd91
Packit 71fd91
During compression, \-s selects a block size of 200k, which limits
Packit 71fd91
memory use to around the same figure, at the expense of your compression
Packit 71fd91
ratio.  In short, if your machine is low on memory (8 megabytes or
Packit 71fd91
less), use \-s for everything.  See MEMORY MANAGEMENT below.
Packit 71fd91
.TP
Packit 71fd91
.B \-q --quiet
Packit 71fd91
Suppress non-essential warning messages.  Messages pertaining to
Packit 71fd91
I/O errors and other critical events will not be suppressed.
Packit 71fd91
.TP
Packit 71fd91
.B \-v --verbose
Packit 71fd91
Verbose mode -- show the compression ratio for each file processed.
Packit 71fd91
Further \-v's increase the verbosity level, spewing out lots of
Packit 71fd91
information which is primarily of interest for diagnostic purposes.
Packit 71fd91
.TP
Packit 71fd91
.B \-L --license -V --version
Packit 71fd91
Display the software version, license terms and conditions.
Packit 71fd91
.TP
Packit 71fd91
.B \-1 (or \-\-fast) to \-9 (or \-\-best)
Packit 71fd91
Set the block size to 100 k, 200 k ..  900 k when compressing.  Has no
Packit 71fd91
effect when decompressing.  See MEMORY MANAGEMENT below.
Packit 71fd91
The \-\-fast and \-\-best aliases are primarily for GNU gzip 
Packit 71fd91
compatibility.  In particular, \-\-fast doesn't make things
Packit 71fd91
significantly faster.  
Packit 71fd91
And \-\-best merely selects the default behaviour.
Packit 71fd91
.TP
Packit 71fd91
.B \--
Packit 71fd91
Treats all subsequent arguments as file names, even if they start
Packit 71fd91
with a dash.  This is so you can handle files with names beginning
Packit 71fd91
with a dash, for example: bzip2 \-- \-myfilename.
Packit 71fd91
.TP
Packit 71fd91
.B \--repetitive-fast --repetitive-best
Packit 71fd91
These flags are redundant in versions 0.9.5 and above.  They provided
Packit 71fd91
some coarse control over the behaviour of the sorting algorithm in
Packit 71fd91
earlier versions, which was sometimes useful.  0.9.5 and above have an
Packit 71fd91
improved algorithm which renders these flags irrelevant.
Packit 71fd91
Packit 71fd91
.SH MEMORY MANAGEMENT
Packit 71fd91
.I bzip2 
Packit 71fd91
compresses large files in blocks.  The block size affects
Packit 71fd91
both the compression ratio achieved, and the amount of memory needed for
Packit 71fd91
compression and decompression.  The flags \-1 through \-9
Packit 71fd91
specify the block size to be 100,000 bytes through 900,000 bytes (the
Packit 71fd91
default) respectively.  At decompression time, the block size used for
Packit 71fd91
compression is read from the header of the compressed file, and
Packit 71fd91
.I bunzip2
Packit 71fd91
then allocates itself just enough memory to decompress
Packit 71fd91
the file.  Since block sizes are stored in compressed files, it follows
Packit 71fd91
that the flags \-1 to \-9 are irrelevant to and so ignored
Packit 71fd91
during decompression.
Packit 71fd91
Packit 71fd91
Compression and decompression requirements, 
Packit 71fd91
in bytes, can be estimated as:
Packit 71fd91
Packit 71fd91
       Compression:   400k + ( 8 x block size )
Packit 71fd91
Packit 71fd91
       Decompression: 100k + ( 4 x block size ), or
Packit 71fd91
                      100k + ( 2.5 x block size )
Packit 71fd91
Packit 71fd91
Larger block sizes give rapidly diminishing marginal returns.  Most of
Packit 71fd91
the compression comes from the first two or three hundred k of block
Packit 71fd91
size, a fact worth bearing in mind when using
Packit 71fd91
.I bzip2
Packit 71fd91
on small machines.
Packit 71fd91
It is also important to appreciate that the decompression memory
Packit 71fd91
requirement is set at compression time by the choice of block size.
Packit 71fd91
Packit 71fd91
For files compressed with the default 900k block size,
Packit 71fd91
.I bunzip2
Packit 71fd91
will require about 3700 kbytes to decompress.  To support decompression
Packit 71fd91
of any file on a 4 megabyte machine, 
Packit 71fd91
.I bunzip2
Packit 71fd91
has an option to
Packit 71fd91
decompress using approximately half this amount of memory, about 2300
Packit 71fd91
kbytes.  Decompression speed is also halved, so you should use this
Packit 71fd91
option only where necessary.  The relevant flag is -s.
Packit 71fd91
Packit 71fd91
In general, try and use the largest block size memory constraints allow,
Packit 71fd91
since that maximises the compression achieved.  Compression and
Packit 71fd91
decompression speed are virtually unaffected by block size.
Packit 71fd91
Packit 71fd91
Another significant point applies to files which fit in a single block
Packit 71fd91
-- that means most files you'd encounter using a large block size.  The
Packit 71fd91
amount of real memory touched is proportional to the size of the file,
Packit 71fd91
since the file is smaller than a block.  For example, compressing a file
Packit 71fd91
20,000 bytes long with the flag -9 will cause the compressor to
Packit 71fd91
allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
Packit 71fd91
kbytes of it.  Similarly, the decompressor will allocate 3700k but only
Packit 71fd91
touch 100k + 20000 * 4 = 180 kbytes.
Packit 71fd91
Packit 71fd91
Here is a table which summarises the maximum memory usage for different
Packit 71fd91
block sizes.  Also recorded is the total compressed size for 14 files of
Packit 71fd91
the Calgary Text Compression Corpus totalling 3,141,622 bytes.  This
Packit 71fd91
column gives some feel for how compression varies with block size.
Packit 71fd91
These figures tend to understate the advantage of larger block sizes for
Packit 71fd91
larger files, since the Corpus is dominated by smaller files.
Packit 71fd91
Packit 71fd91
           Compress   Decompress   Decompress   Corpus
Packit 71fd91
    Flag     usage      usage       -s usage     Size
Packit 71fd91
Packit 71fd91
     -1      1200k       500k         350k      914704
Packit 71fd91
     -2      2000k       900k         600k      877703
Packit 71fd91
     -3      2800k      1300k         850k      860338
Packit 71fd91
     -4      3600k      1700k        1100k      846899
Packit 71fd91
     -5      4400k      2100k        1350k      845160
Packit 71fd91
     -6      5200k      2500k        1600k      838626
Packit 71fd91
     -7      6100k      2900k        1850k      834096
Packit 71fd91
     -8      6800k      3300k        2100k      828642
Packit 71fd91
     -9      7600k      3700k        2350k      828642
Packit 71fd91
Packit 71fd91
.SH RECOVERING DATA FROM DAMAGED FILES
Packit 71fd91
.I bzip2
Packit 71fd91
compresses files in blocks, usually 900kbytes long.  Each
Packit 71fd91
block is handled independently.  If a media or transmission error causes
Packit 71fd91
a multi-block .bz2
Packit 71fd91
file to become damaged, it may be possible to
Packit 71fd91
recover data from the undamaged blocks in the file.
Packit 71fd91
Packit 71fd91
The compressed representation of each block is delimited by a 48-bit
Packit 71fd91
pattern, which makes it possible to find the block boundaries with
Packit 71fd91
reasonable certainty.  Each block also carries its own 32-bit CRC, so
Packit 71fd91
damaged blocks can be distinguished from undamaged ones.
Packit 71fd91
Packit 71fd91
.I bzip2recover
Packit 71fd91
is a simple program whose purpose is to search for
Packit 71fd91
blocks in .bz2 files, and write each block out into its own .bz2 
Packit 71fd91
file.  You can then use
Packit 71fd91
.I bzip2 
Packit 71fd91
\-t
Packit 71fd91
to test the
Packit 71fd91
integrity of the resulting files, and decompress those which are
Packit 71fd91
undamaged.
Packit 71fd91
Packit 71fd91
.I bzip2recover
Packit 71fd91
takes a single argument, the name of the damaged file, 
Packit 71fd91
and writes a number of files "rec00001file.bz2",
Packit 71fd91
"rec00002file.bz2", etc, containing the  extracted  blocks.
Packit 71fd91
The  output  filenames  are  designed  so  that the use of
Packit 71fd91
wildcards in subsequent processing -- for example,  
Packit 71fd91
"bzip2 -dc  rec*file.bz2 > recovered_data" -- processes the files in
Packit 71fd91
the correct order.
Packit 71fd91
Packit 71fd91
.I bzip2recover
Packit 71fd91
should be of most use dealing with large .bz2
Packit 71fd91
files,  as  these will contain many blocks.  It is clearly
Packit 71fd91
futile to use it on damaged single-block  files,  since  a
Packit 71fd91
damaged  block  cannot  be recovered.  If you wish to minimise 
Packit 71fd91
any potential data loss through media  or  transmission errors, 
Packit 71fd91
you might consider compressing with a smaller
Packit 71fd91
block size.
Packit 71fd91
Packit 71fd91
.SH PERFORMANCE NOTES
Packit 71fd91
The sorting phase of compression gathers together similar strings in the
Packit 71fd91
file.  Because of this, files containing very long runs of repeated
Packit 71fd91
symbols, like "aabaabaabaab ..."  (repeated several hundred times) may
Packit 71fd91
compress more slowly than normal.  Versions 0.9.5 and above fare much
Packit 71fd91
better than previous versions in this respect.  The ratio between
Packit 71fd91
worst-case and average-case compression time is in the region of 10:1.
Packit 71fd91
For previous versions, this figure was more like 100:1.  You can use the
Packit 71fd91
\-vvvv option to monitor progress in great detail, if you want.
Packit 71fd91
Packit 71fd91
Decompression speed is unaffected by these phenomena.
Packit 71fd91
Packit 71fd91
.I bzip2
Packit 71fd91
usually allocates several megabytes of memory to operate
Packit 71fd91
in, and then charges all over it in a fairly random fashion.  This means
Packit 71fd91
that performance, both for compressing and decompressing, is largely
Packit 71fd91
determined by the speed at which your machine can service cache misses.
Packit 71fd91
Because of this, small changes to the code to reduce the miss rate have
Packit 71fd91
been observed to give disproportionately large performance improvements.
Packit 71fd91
I imagine 
Packit 71fd91
.I bzip2
Packit 71fd91
will perform best on machines with very large caches.
Packit 71fd91
Packit 71fd91
.SH CAVEATS
Packit 71fd91
I/O error messages are not as helpful as they could be.
Packit 71fd91
.I bzip2
Packit 71fd91
tries hard to detect I/O errors and exit cleanly, but the details of
Packit 71fd91
what the problem is sometimes seem rather misleading.
Packit 71fd91
Packit 71fd91
This manual page pertains to version 1.0.6 of
Packit 71fd91
.I bzip2.  
Packit 71fd91
Compressed data created by this version is entirely forwards and
Packit 71fd91
backwards compatible with the previous public releases, versions
Packit 71fd91
0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1, 1.0.2 and above, but with the following
Packit 71fd91
exception: 0.9.0 and above can correctly decompress multiple
Packit 71fd91
concatenated compressed files.  0.1pl2 cannot do this; it will stop
Packit 71fd91
after decompressing just the first file in the stream.
Packit 71fd91
Packit 71fd91
.I bzip2recover
Packit 71fd91
versions prior to 1.0.2 used 32-bit integers to represent
Packit 71fd91
bit positions in compressed files, so they could not handle compressed
Packit 71fd91
files more than 512 megabytes long.  Versions 1.0.2 and above use
Packit 71fd91
64-bit ints on some platforms which support them (GNU supported
Packit 71fd91
targets, and Windows).  To establish whether or not bzip2recover was
Packit 71fd91
built with such a limitation, run it without arguments.  In any event
Packit 71fd91
you can build yourself an unlimited version if you can recompile it
Packit 71fd91
with MaybeUInt64 set to be an unsigned 64-bit integer.
Packit 71fd91
Packit 71fd91
Packit 71fd91
Packit 71fd91
.SH AUTHOR
Packit 71fd91
Julian Seward, jsewardbzip.org.
Packit 71fd91
Packit 71fd91
http://www.bzip.org
Packit 71fd91
Packit 71fd91
The ideas embodied in
Packit 71fd91
.I bzip2
Packit 71fd91
are due to (at least) the following
Packit 71fd91
people: Michael Burrows and David Wheeler (for the block sorting
Packit 71fd91
transformation), David Wheeler (again, for the Huffman coder), Peter
Packit 71fd91
Fenwick (for the structured coding model in the original
Packit 71fd91
.I bzip,
Packit 71fd91
and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
Packit 71fd91
(for the arithmetic coder in the original
Packit 71fd91
.I bzip).  
Packit 71fd91
I am much
Packit 71fd91
indebted for their help, support and advice.  See the manual in the
Packit 71fd91
source distribution for pointers to sources of documentation.  Christian
Packit 71fd91
von Roques encouraged me to look for faster sorting algorithms, so as to
Packit 71fd91
speed up compression.  Bela Lubkin encouraged me to improve the
Packit 71fd91
worst-case compression performance.  
Packit 71fd91
Donna Robinson XMLised the documentation.
Packit 71fd91
The bz* scripts are derived from those of GNU gzip.
Packit 71fd91
Many people sent patches, helped
Packit 71fd91
with portability problems, lent machines, gave advice and were generally
Packit 71fd91
helpful.