Blame libarchive/libarchive_internals.3

Packit Service 1d0348
.\" Copyright (c) 2003-2007 Tim Kientzle
Packit Service 1d0348
.\" All rights reserved.
Packit Service 1d0348
.\"
Packit Service 1d0348
.\" Redistribution and use in source and binary forms, with or without
Packit Service 1d0348
.\" modification, are permitted provided that the following conditions
Packit Service 1d0348
.\" are met:
Packit Service 1d0348
.\" 1. Redistributions of source code must retain the above copyright
Packit Service 1d0348
.\"    notice, this list of conditions and the following disclaimer.
Packit Service 1d0348
.\" 2. Redistributions in binary form must reproduce the above copyright
Packit Service 1d0348
.\"    notice, this list of conditions and the following disclaimer in the
Packit Service 1d0348
.\"    documentation and/or other materials provided with the distribution.
Packit Service 1d0348
.\"
Packit Service 1d0348
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
Packit Service 1d0348
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
Packit Service 1d0348
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
Packit Service 1d0348
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
Packit Service 1d0348
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
Packit Service 1d0348
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
Packit Service 1d0348
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
Packit Service 1d0348
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
Packit Service 1d0348
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
Packit Service 1d0348
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
Packit Service 1d0348
.\" SUCH DAMAGE.
Packit Service 1d0348
.\"
Packit Service 1d0348
.\" $FreeBSD$
Packit Service 1d0348
.\"
Packit Service 1d0348
.Dd January 26, 2011
Packit Service 1d0348
.Dt LIBARCHIVE_INTERNALS 3
Packit Service 1d0348
.Os
Packit Service 1d0348
.Sh NAME
Packit Service 1d0348
.Nm libarchive_internals
Packit Service 1d0348
.Nd description of libarchive internal interfaces
Packit Service 1d0348
.Sh OVERVIEW
Packit Service 1d0348
The
Packit Service 1d0348
.Nm libarchive
Packit Service 1d0348
library provides a flexible interface for reading and writing
Packit Service 1d0348
streaming archive files such as tar and cpio.
Packit Service 1d0348
Internally, it follows a modular layered design that should
Packit Service 1d0348
make it easy to add new archive and compression formats.
Packit Service 1d0348
.Sh GENERAL ARCHITECTURE
Packit Service 1d0348
Externally, libarchive exposes most operations through an
Packit Service 1d0348
opaque, object-style interface.
Packit Service 1d0348
The
Packit Service 1d0348
.Xr archive_entry 3
Packit Service 1d0348
objects store information about a single filesystem object.
Packit Service 1d0348
The rest of the library provides facilities to write
Packit Service 1d0348
.Xr archive_entry 3
Packit Service 1d0348
objects to archive files,
Packit Service 1d0348
read them from archive files,
Packit Service 1d0348
and write them to disk.
Packit Service 1d0348
(There are plans to add a facility to read
Packit Service 1d0348
.Xr archive_entry 3
Packit Service 1d0348
objects from disk as well.)
Packit Service 1d0348
.Pp
Packit Service 1d0348
The read and write APIs each have four layers: a public API
Packit Service 1d0348
layer, a format layer that understands the archive file format,
Packit Service 1d0348
a compression layer, and an I/O layer.
Packit Service 1d0348
The I/O layer is completely exposed to clients who can replace
Packit Service 1d0348
it entirely with their own functions.
Packit Service 1d0348
.Pp
Packit Service 1d0348
In order to provide as much consistency as possible for clients,
Packit Service 1d0348
some public functions are virtualized.
Packit Service 1d0348
Eventually, it should be possible for clients to open
Packit Service 1d0348
an archive or disk writer, and then use a single set of
Packit Service 1d0348
code to select and write entries, regardless of the target.
Packit Service 1d0348
.Sh READ ARCHITECTURE
Packit Service 1d0348
From the outside, clients use the
Packit Service 1d0348
.Xr archive_read 3
Packit Service 1d0348
API to manipulate an
Packit Service 1d0348
.Nm archive
Packit Service 1d0348
object to read entries and bodies from an archive stream.
Packit Service 1d0348
Internally, the
Packit Service 1d0348
.Nm archive
Packit Service 1d0348
object is cast to an
Packit Service 1d0348
.Nm archive_read
Packit Service 1d0348
object, which holds all read-specific data.
Packit Service 1d0348
The API has four layers:
Packit Service 1d0348
The lowest layer is the I/O layer.
Packit Service 1d0348
This layer can be overridden by clients, but most clients use
Packit Service 1d0348
the packaged I/O callbacks provided, for example, by
Packit Service 1d0348
.Xr archive_read_open_memory 3 ,
Packit Service 1d0348
and
Packit Service 1d0348
.Xr archive_read_open_fd 3 .
Packit Service 1d0348
The compression layer calls the I/O layer to
Packit Service 1d0348
read bytes and decompresses them for the format layer.
Packit Service 1d0348
The format layer unpacks a stream of uncompressed bytes and
Packit Service 1d0348
creates
Packit Service 1d0348
.Nm archive_entry
Packit Service 1d0348
objects from the incoming data.
Packit Service 1d0348
The API layer tracks overall state
Packit Service 1d0348
(for example, it prevents clients from reading data before reading a header)
Packit Service 1d0348
and invokes the format and compression layer operations
Packit Service 1d0348
through registered function pointers.
Packit Service 1d0348
In particular, the API layer drives the format-detection process:
Packit Service 1d0348
When opening the archive, it reads an initial block of data
Packit Service 1d0348
and offers it to each registered compression handler.
Packit Service 1d0348
The one with the highest bid is initialized with the first block.
Packit Service 1d0348
Similarly, the format handlers are polled to see which handler
Packit Service 1d0348
is the best for each archive.
Packit Service 1d0348
(Prior to 2.4.0, the format bidders were invoked for each
Packit Service 1d0348
entry, but this design hindered error recovery.)
Packit Service 1d0348
.Ss I/O Layer and Client Callbacks
Packit Service 1d0348
The read API goes to some lengths to be nice to clients.
Packit Service 1d0348
As a result, there are few restrictions on the behavior of
Packit Service 1d0348
the client callbacks.
Packit Service 1d0348
.Pp
Packit Service 1d0348
The client read callback is expected to provide a block
Packit Service 1d0348
of data on each call.
Packit Service 1d0348
A zero-length return does indicate end of file, but otherwise
Packit Service 1d0348
blocks may be as small as one byte or as large as the entire file.
Packit Service 1d0348
In particular, blocks may be of different sizes.
Packit Service 1d0348
.Pp
Packit Service 1d0348
The client skip callback returns the number of bytes actually
Packit Service 1d0348
skipped, which may be much smaller than the skip requested.
Packit Service 1d0348
The only requirement is that the skip not be larger.
Packit Service 1d0348
In particular, clients are allowed to return zero for any
Packit Service 1d0348
skip that they don't want to handle.
Packit Service 1d0348
The skip callback must never be invoked with a negative value.
Packit Service 1d0348
.Pp
Packit Service 1d0348
Keep in mind that not all clients are reading from disk:
Packit Service 1d0348
clients reading from networks may provide different-sized
Packit Service 1d0348
blocks on every request and cannot skip at all;
Packit Service 1d0348
advanced clients may use
Packit Service 1d0348
.Xr mmap 2
Packit Service 1d0348
to read the entire file into memory at once and return the
Packit Service 1d0348
entire file to libarchive as a single block;
Packit Service 1d0348
other clients may begin asynchronous I/O operations for the
Packit Service 1d0348
next block on each request.
Packit Service 1d0348
.Ss Decompresssion Layer
Packit Service 1d0348
The decompression layer not only handles decompression,
Packit Service 1d0348
it also buffers data so that the format handlers see a
Packit Service 1d0348
much nicer I/O model.
Packit Service 1d0348
The decompression API is a two stage peek/consume model.
Packit Service 1d0348
A read_ahead request specifies a minimum read amount;
Packit Service 1d0348
the decompression layer must provide a pointer to at least
Packit Service 1d0348
that much data.
Packit Service 1d0348
If more data is immediately available, it should return more:
Packit Service 1d0348
the format layer handles bulk data reads by asking for a minimum
Packit Service 1d0348
of one byte and then copying as much data as is available.
Packit Service 1d0348
.Pp
Packit Service 1d0348
A subsequent call to the
Packit Service 1d0348
.Fn consume
Packit Service 1d0348
function advances the read pointer.
Packit Service 1d0348
Note that data returned from a
Packit Service 1d0348
.Fn read_ahead
Packit Service 1d0348
call is guaranteed to remain in place until
Packit Service 1d0348
the next call to
Packit Service 1d0348
.Fn read_ahead .
Packit Service 1d0348
Intervening calls to
Packit Service 1d0348
.Fn consume
Packit Service 1d0348
should not cause the data to move.
Packit Service 1d0348
.Pp
Packit Service 1d0348
Skip requests must always be handled exactly.
Packit Service 1d0348
Decompression handlers that cannot seek forward should
Packit Service 1d0348
not register a skip handler;
Packit Service 1d0348
the API layer fills in a generic skip handler that reads and discards data.
Packit Service 1d0348
.Pp
Packit Service 1d0348
A decompression handler has a specific lifecycle:
Packit Service 1d0348
.Bl -tag -compact -width indent
Packit Service 1d0348
.It Registration/Configuration
Packit Service 1d0348
When the client invokes the public support function,
Packit Service 1d0348
the decompression handler invokes the internal
Packit Service 1d0348
.Fn __archive_read_register_compression
Packit Service 1d0348
function to provide bid and initialization functions.
Packit Service 1d0348
This function returns
Packit Service 1d0348
.Cm NULL
Packit Service 1d0348
on error or else a pointer to a
Packit Service 1d0348
.Cm struct decompressor_t .
Packit Service 1d0348
This structure contains a
Packit Service 1d0348
.Va void * config
Packit Service 1d0348
slot that can be used for storing any customization information.
Packit Service 1d0348
.It Bid
Packit Service 1d0348
The bid function is invoked with a pointer and size of a block of data.
Packit Service 1d0348
The decompressor can access its config data
Packit Service 1d0348
through the
Packit Service 1d0348
.Va decompressor
Packit Service 1d0348
element of the
Packit Service 1d0348
.Cm archive_read
Packit Service 1d0348
object.
Packit Service 1d0348
The bid function is otherwise stateless.
Packit Service 1d0348
In particular, it must not perform any I/O operations.
Packit Service 1d0348
.Pp
Packit Service 1d0348
The value returned by the bid function indicates its suitability
Packit Service 1d0348
for handling this data stream.
Packit Service 1d0348
A bid of zero will ensure that this decompressor is never invoked.
Packit Service 1d0348
Return zero if magic number checks fail.
Packit Service 1d0348
Otherwise, your initial implementation should return the number of bits
Packit Service 1d0348
actually checked.
Packit Service 1d0348
For example, if you verify two full bytes and three bits of another
Packit Service 1d0348
byte, bid 19.
Packit Service 1d0348
Note that the initial block may be very short;
Packit Service 1d0348
be careful to only inspect the data you are given.
Packit Service 1d0348
(The current decompressors require two bytes for correct bidding.)
Packit Service 1d0348
.It Initialize
Packit Service 1d0348
The winning bidder will have its init function called.
Packit Service 1d0348
This function should initialize the remaining slots of the
Packit Service 1d0348
.Va struct decompressor_t
Packit Service 1d0348
object pointed to by the
Packit Service 1d0348
.Va decompressor
Packit Service 1d0348
element of the
Packit Service 1d0348
.Va archive_read
Packit Service 1d0348
object.
Packit Service 1d0348
In particular, it should allocate any working data it needs
Packit Service 1d0348
in the
Packit Service 1d0348
.Va data
Packit Service 1d0348
slot of that structure.
Packit Service 1d0348
The init function is called with the block of data that
Packit Service 1d0348
was used for tasting.
Packit Service 1d0348
At this point, the decompressor is responsible for all I/O
Packit Service 1d0348
requests to the client callbacks.
Packit Service 1d0348
The decompressor is free to read more data as and when
Packit Service 1d0348
necessary.
Packit Service 1d0348
.It Satisfy I/O requests
Packit Service 1d0348
The format handler will invoke the
Packit Service 1d0348
.Va read_ahead ,
Packit Service 1d0348
.Va consume ,
Packit Service 1d0348
and
Packit Service 1d0348
.Va skip
Packit Service 1d0348
functions as needed.
Packit Service 1d0348
.It Finish
Packit Service 1d0348
The finish method is called only once when the archive is closed.
Packit Service 1d0348
It should release anything stored in the
Packit Service 1d0348
.Va data
Packit Service 1d0348
and
Packit Service 1d0348
.Va config
Packit Service 1d0348
slots of the
Packit Service 1d0348
.Va decompressor
Packit Service 1d0348
object.
Packit Service 1d0348
It should not invoke the client close callback.
Packit Service 1d0348
.El
Packit Service 1d0348
.Ss Format Layer
Packit Service 1d0348
The read formats have a similar lifecycle to the decompression handlers:
Packit Service 1d0348
.Bl -tag -compact -width indent
Packit Service 1d0348
.It Registration
Packit Service 1d0348
Allocate your private data and initialize your pointers.
Packit Service 1d0348
.It Bid
Packit Service 1d0348
Formats bid by invoking the
Packit Service 1d0348
.Fn read_ahead
Packit Service 1d0348
decompression method but not calling the
Packit Service 1d0348
.Fn consume
Packit Service 1d0348
method.
Packit Service 1d0348
This allows each bidder to look ahead in the input stream.
Packit Service 1d0348
Bidders should not look further ahead than necessary, as long
Packit Service 1d0348
look aheads put pressure on the decompression layer to buffer
Packit Service 1d0348
lots of data.
Packit Service 1d0348
Most formats only require a few hundred bytes of look ahead;
Packit Service 1d0348
look aheads of a few kilobytes are reasonable.
Packit Service 1d0348
(The ISO9660 reader sometimes looks ahead by 48k, which
Packit Service 1d0348
should be considered an upper limit.)
Packit Service 1d0348
.It Read header
Packit Service 1d0348
The header read is usually the most complex part of any format.
Packit Service 1d0348
There are a few strategies worth mentioning:
Packit Service 1d0348
For formats such as tar or cpio, reading and parsing the header is
Packit Service 1d0348
straightforward since headers alternate with data.
Packit Service 1d0348
For formats that store all header data at the beginning of the file,
Packit Service 1d0348
the first header read request may have to read all headers into
Packit Service 1d0348
memory and store that data, sorted by the location of the file
Packit Service 1d0348
data.
Packit Service 1d0348
Subsequent header read requests will skip forward to the
Packit Service 1d0348
beginning of the file data and return the corresponding header.
Packit Service 1d0348
.It Read Data
Packit Service 1d0348
The read data interface supports sparse files; this requires that
Packit Service 1d0348
each call return a block of data specifying the file offset and
Packit Service 1d0348
size.
Packit Service 1d0348
This may require you to carefully track the location so that you
Packit Service 1d0348
can return accurate file offsets for each read.
Packit Service 1d0348
Remember that the decompressor will return as much data as it has.
Packit Service 1d0348
Generally, you will want to request one byte,
Packit Service 1d0348
examine the return value to see how much data is available, and
Packit Service 1d0348
possibly trim that to the amount you can use.
Packit Service 1d0348
You should invoke consume for each block just before you return it.
Packit Service 1d0348
.It Skip All Data
Packit Service 1d0348
The skip data call should skip over all file data and trailing padding.
Packit Service 1d0348
This is called automatically by the API layer just before each
Packit Service 1d0348
header read.
Packit Service 1d0348
It is also called in response to the client calling the public
Packit Service 1d0348
.Fn data_skip
Packit Service 1d0348
function.
Packit Service 1d0348
.It Cleanup
Packit Service 1d0348
On cleanup, the format should release all of its allocated memory.
Packit Service 1d0348
.El
Packit Service 1d0348
.Ss API Layer
Packit Service 1d0348
XXX to do XXX
Packit Service 1d0348
.Sh WRITE ARCHITECTURE
Packit Service 1d0348
The write API has a similar set of four layers:
Packit Service 1d0348
an API layer, a format layer, a compression layer, and an I/O layer.
Packit Service 1d0348
The registration here is much simpler because only
Packit Service 1d0348
one format and one compression can be registered at a time.
Packit Service 1d0348
.Ss I/O Layer and Client Callbacks
Packit Service 1d0348
XXX To be written XXX
Packit Service 1d0348
.Ss Compression Layer
Packit Service 1d0348
XXX To be written XXX
Packit Service 1d0348
.Ss Format Layer
Packit Service 1d0348
XXX To be written XXX
Packit Service 1d0348
.Ss API Layer
Packit Service 1d0348
XXX To be written XXX
Packit Service 1d0348
.Sh WRITE_DISK ARCHITECTURE
Packit Service 1d0348
The write_disk API is intended to look just like the write API
Packit Service 1d0348
to clients.
Packit Service 1d0348
Since it does not handle multiple formats or compression, it
Packit Service 1d0348
is not layered internally.
Packit Service 1d0348
.Sh GENERAL SERVICES
Packit Service 1d0348
The
Packit Service 1d0348
.Nm archive_read ,
Packit Service 1d0348
.Nm archive_write ,
Packit Service 1d0348
and
Packit Service 1d0348
.Nm archive_write_disk
Packit Service 1d0348
objects all contain an initial
Packit Service 1d0348
.Nm archive
Packit Service 1d0348
object which provides common support for a set of standard services.
Packit Service 1d0348
(Recall that ANSI/ISO C90 guarantees that you can cast freely between
Packit Service 1d0348
a pointer to a structure and a pointer to the first element of that
Packit Service 1d0348
structure.)
Packit Service 1d0348
The
Packit Service 1d0348
.Nm archive
Packit Service 1d0348
object has a magic value that indicates which API this object
Packit Service 1d0348
is associated with,
Packit Service 1d0348
slots for storing error information,
Packit Service 1d0348
and function pointers for virtualized API functions.
Packit Service 1d0348
.Sh MISCELLANEOUS NOTES
Packit Service 1d0348
Connecting existing archiving libraries into libarchive is generally
Packit Service 1d0348
quite difficult.
Packit Service 1d0348
In particular, many existing libraries strongly assume that you
Packit Service 1d0348
are reading from a file; they seek forwards and backwards as necessary
Packit Service 1d0348
to locate various pieces of information.
Packit Service 1d0348
In contrast, libarchive never seeks backwards in its input, which
Packit Service 1d0348
sometimes requires very different approaches.
Packit Service 1d0348
.Pp
Packit Service 1d0348
For example, libarchive's ISO9660 support operates very differently
Packit Service 1d0348
from most ISO9660 readers.
Packit Service 1d0348
The libarchive support utilizes a work-queue design that
Packit Service 1d0348
keeps a list of known entries sorted by their location in the input.
Packit Service 1d0348
Whenever libarchive's ISO9660 implementation is asked for the next
Packit Service 1d0348
header, checks this list to find the next item on the disk.
Packit Service 1d0348
Directories are parsed when they are encountered and new
Packit Service 1d0348
items are added to the list.
Packit Service 1d0348
This design relies heavily on the ISO9660 image being optimized so that
Packit Service 1d0348
directories always occur earlier on the disk than the files they
Packit Service 1d0348
describe.
Packit Service 1d0348
.Pp
Packit Service 1d0348
Depending on the specific format, such approaches may not be possible.
Packit Service 1d0348
The ZIP format specification, for example, allows archivers to store
Packit Service 1d0348
key information only at the end of the file.
Packit Service 1d0348
In theory, it is possible to create ZIP archives that cannot
Packit Service 1d0348
be read without seeking.
Packit Service 1d0348
Fortunately, such archives are very rare, and libarchive can read
Packit Service 1d0348
most ZIP archives, though it cannot always extract as much information
Packit Service 1d0348
as a dedicated ZIP program.
Packit Service 1d0348
.Sh SEE ALSO
Packit Service 1d0348
.Xr archive_entry 3 ,
Packit Service 1d0348
.Xr archive_read 3 ,
Packit Service 1d0348
.Xr archive_write 3 ,
Packit Service 1d0348
.Xr archive_write_disk 3
Packit Service 1d0348
.Xr libarchive 3 ,
Packit Service 1d0348
.Sh HISTORY
Packit Service 1d0348
The
Packit Service 1d0348
.Nm libarchive
Packit Service 1d0348
library first appeared in
Packit Service 1d0348
.Fx 5.3 .
Packit Service 1d0348
.Sh AUTHORS
Packit Service 1d0348
.An -nosplit
Packit Service 1d0348
The
Packit Service 1d0348
.Nm libarchive
Packit Service 1d0348
library was written by
Packit Service 1d0348
.An Tim Kientzle Aq kientzle@acm.org .