Blame doc/html/libarchive_internals.3.html

Packit Service 1d0348
Packit Service 1d0348
Packit Service 1d0348
Packit Service 1d0348
"http://www.w3.org/TR/html4/loose.dtd">
Packit Service 1d0348
<html>
Packit Service 1d0348
<head>
Packit Service 1d0348
<meta name="generator" content="groff -Thtml, see www.gnu.org">
Packit Service 1d0348
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
Packit Service 1d0348
<meta name="Content-Style" content="text/css">
Packit Service 1d0348
<style type="text/css">
Packit Service 1d0348
       p       { margin-top: 0; margin-bottom: 0; vertical-align: top }
Packit Service 1d0348
       pre     { margin-top: 0; margin-bottom: 0; vertical-align: top }
Packit Service 1d0348
       table   { margin-top: 0; margin-bottom: 0; vertical-align: top }
Packit Service 1d0348
       h1      { text-align: center }
Packit Service 1d0348
</style>
Packit Service 1d0348
<title></title>
Packit Service 1d0348
</head>
Packit Service 1d0348
<body>
Packit Service 1d0348
Packit Service 1d0348

Packit Service 1d0348
Packit Service 1d0348
Packit Service 1d0348

LIBARCHIVE_INTERNALS(3) BSD Library Functions Manual

Packit Service 1d0348
LIBARCHIVE_INTERNALS(3)

Packit Service 1d0348
Packit Service 1d0348

NAME

Packit Service 1d0348
Packit Service 1d0348

libarchive_internals

Packit Service 1d0348
— description of libarchive internal interfaces

Packit Service 1d0348
Packit Service 1d0348

OVERVIEW

Packit Service 1d0348
Packit Service 1d0348

The libarchive library

Packit Service 1d0348
provides a flexible interface for reading and writing
Packit Service 1d0348
streaming archive files such as tar and cpio. Internally, it
Packit Service 1d0348
follows a modular layered design that should make it easy to
Packit Service 1d0348
add new archive and compression formats.

Packit Service 1d0348
Packit Service 1d0348

GENERAL ARCHITECTURE

Packit Service 1d0348
Packit Service 1d0348

Externally, libarchive exposes

Packit Service 1d0348
most operations through an opaque, object-style interface.
Packit Service 1d0348
The archive_entry(3) objects store information about a
Packit Service 1d0348
single filesystem object. The rest of the library provides
Packit Service 1d0348
facilities to write archive_entry(3) objects to archive
Packit Service 1d0348
files, read them from archive files, and write them to disk.
Packit Service 1d0348
(There are plans to add a facility to read archive_entry(3)
Packit Service 1d0348
objects from disk as well.)

Packit Service 1d0348
Packit Service 1d0348

The read and

Packit Service 1d0348
write APIs each have four layers: a public API layer, a
Packit Service 1d0348
format layer that understands the archive file format, a
Packit Service 1d0348
compression layer, and an I/O layer. The I/O layer is
Packit Service 1d0348
completely exposed to clients who can replace it entirely
Packit Service 1d0348
with their own functions.

Packit Service 1d0348
Packit Service 1d0348

In order to

Packit Service 1d0348
provide as much consistency as possible for clients, some
Packit Service 1d0348
public functions are virtualized. Eventually, it should be
Packit Service 1d0348
possible for clients to open an archive or disk writer, and
Packit Service 1d0348
then use a single set of code to select and write entries,
Packit Service 1d0348
regardless of the target.

Packit Service 1d0348
Packit Service 1d0348

READ ARCHITECTURE

Packit Service 1d0348
Packit Service 1d0348

From the outside, clients use

Packit Service 1d0348
the archive_read(3) API to manipulate an archive
Packit Service 1d0348
object to read entries and bodies from an archive stream.
Packit Service 1d0348
Internally, the archive object is cast to an
Packit Service 1d0348
archive_read object, which holds all read-specific
Packit Service 1d0348
data. The API has four layers: The lowest layer is the I/O
Packit Service 1d0348
layer. This layer can be overridden by clients, but most
Packit Service 1d0348
clients use the packaged I/O callbacks provided, for
Packit Service 1d0348
example, by archive_read_open_memory(3), and
Packit Service 1d0348
archive_read_open_fd(3). The compression layer calls the I/O
Packit Service 1d0348
layer to read bytes and decompresses them for the format
Packit Service 1d0348
layer. The format layer unpacks a stream of uncompressed
Packit Service 1d0348
bytes and creates archive_entry objects from the
Packit Service 1d0348
incoming data. The API layer tracks overall state (for
Packit Service 1d0348
example, it prevents clients from reading data before
Packit Service 1d0348
reading a header) and invokes the format and compression
Packit Service 1d0348
layer operations through registered function pointers. In
Packit Service 1d0348
particular, the API layer drives the format-detection
Packit Service 1d0348
process: When opening the archive, it reads an initial block
Packit Service 1d0348
of data and offers it to each registered compression
Packit Service 1d0348
handler. The one with the highest bid is initialized with
Packit Service 1d0348
the first block. Similarly, the format handlers are polled
Packit Service 1d0348
to see which handler is the best for each archive. (Prior to
Packit Service 1d0348
2.4.0, the format bidders were invoked for each entry, but
Packit Service 1d0348
this design hindered error recovery.)

Packit Service 1d0348
Packit Service 1d0348

I/O Layer and

Packit Service 1d0348
Client Callbacks 
Packit Service 1d0348
The read API goes to some lengths to be nice to clients. As
Packit Service 1d0348
a result, there are few restrictions on the behavior of the
Packit Service 1d0348
client callbacks.

Packit Service 1d0348
Packit Service 1d0348

The client read

Packit Service 1d0348
callback is expected to provide a block of data on each
Packit Service 1d0348
call. A zero-length return does indicate end of file, but
Packit Service 1d0348
otherwise blocks may be as small as one byte or as large as
Packit Service 1d0348
the entire file. In particular, blocks may be of different
Packit Service 1d0348
sizes.

Packit Service 1d0348
Packit Service 1d0348

The client skip

Packit Service 1d0348
callback returns the number of bytes actually skipped, which
Packit Service 1d0348
may be much smaller than the skip requested. The only
Packit Service 1d0348
requirement is that the skip not be larger. In particular,
Packit Service 1d0348
clients are allowed to return zero for any skip that they
Packit Service 1d0348
don’t want to handle. The skip callback must never be
Packit Service 1d0348
invoked with a negative value.

Packit Service 1d0348
Packit Service 1d0348

Keep in mind

Packit Service 1d0348
that not all clients are reading from disk: clients reading
Packit Service 1d0348
from networks may provide different-sized blocks on every
Packit Service 1d0348
request and cannot skip at all; advanced clients may use
Packit Service 1d0348
mmap(2) to read the entire file into memory at once and
Packit Service 1d0348
return the entire file to libarchive as a single block;
Packit Service 1d0348
other clients may begin asynchronous I/O operations for the
Packit Service 1d0348
next block on each request.

Packit Service 1d0348
Packit Service 1d0348
Packit Service 1d0348

Decompresssion

Packit Service 1d0348
Layer 
Packit Service 1d0348
The decompression layer not only handles decompression, it
Packit Service 1d0348
also buffers data so that the format handlers see a much
Packit Service 1d0348
nicer I/O model. The decompression API is a two stage
Packit Service 1d0348
peek/consume model. A read_ahead request specifies a minimum
Packit Service 1d0348
read amount; the decompression layer must provide a pointer
Packit Service 1d0348
to at least that much data. If more data is immediately
Packit Service 1d0348
available, it should return more: the format layer handles
Packit Service 1d0348
bulk data reads by asking for a minimum of one byte and then
Packit Service 1d0348
copying as much data as is available.

Packit Service 1d0348
Packit Service 1d0348

A subsequent

Packit Service 1d0348
call to the consume() function advances the read
Packit Service 1d0348
pointer. Note that data returned from a read_ahead()
Packit Service 1d0348
call is guaranteed to remain in place until the next call to
Packit Service 1d0348
read_ahead(). Intervening calls to consume()
Packit Service 1d0348
should not cause the data to move.

Packit Service 1d0348
Packit Service 1d0348

Skip requests

Packit Service 1d0348
must always be handled exactly. Decompression handlers that
Packit Service 1d0348
cannot seek forward should not register a skip handler; the
Packit Service 1d0348
API layer fills in a generic skip handler that reads and
Packit Service 1d0348
discards data.

Packit Service 1d0348
Packit Service 1d0348

A decompression

Packit Service 1d0348
handler has a specific lifecycle:

Packit Service 1d0348
Packit Service 1d0348

Registration/Configuration

Packit Service 1d0348
Packit Service 1d0348

When the client invokes the

Packit Service 1d0348
public support function, the decompression handler invokes
Packit Service 1d0348
the internal __archive_read_register_compression()
Packit Service 1d0348
function to provide bid and initialization functions. This
Packit Service 1d0348
function returns NULL on error or else a pointer to a
Packit Service 1d0348
struct decompressor_t. This structure contains a
Packit Service 1d0348
void * config slot that can be used for storing any
Packit Service 1d0348
customization information.

Packit Service 1d0348
Packit Service 1d0348

Bid

Packit Service 1d0348
Packit Service 1d0348

The bid

Packit Service 1d0348
function is invoked with a pointer and size of a block of
Packit Service 1d0348
data. The decompressor can access its config data through
Packit Service 1d0348
the decompressor element of the archive_read
Packit Service 1d0348
object. The bid function is otherwise stateless. In
Packit Service 1d0348
particular, it must not perform any I/O operations.

Packit Service 1d0348
Packit Service 1d0348

The value

Packit Service 1d0348
returned by the bid function indicates its suitability for
Packit Service 1d0348
handling this data stream. A bid of zero will ensure that
Packit Service 1d0348
this decompressor is never invoked. Return zero if magic
Packit Service 1d0348
number checks fail. Otherwise, your initial implementation
Packit Service 1d0348
should return the number of bits actually checked. For
Packit Service 1d0348
example, if you verify two full bytes and three bits of
Packit Service 1d0348
another byte, bid 19. Note that the initial block may be
Packit Service 1d0348
very short; be careful to only inspect the data you are
Packit Service 1d0348
given. (The current decompressors require two bytes for
Packit Service 1d0348
correct bidding.)

Packit Service 1d0348
Packit Service 1d0348

Initialize

Packit Service 1d0348
Packit Service 1d0348

The winning bidder will have

Packit Service 1d0348
its init function called. This function should initialize
Packit Service 1d0348
the remaining slots of the struct decompressor_t
Packit Service 1d0348
object pointed to by the decompressor element of the
Packit Service 1d0348
archive_read object. In particular, it should
Packit Service 1d0348
allocate any working data it needs in the data slot
Packit Service 1d0348
of that structure. The init function is called with the
Packit Service 1d0348
block of data that was used for tasting. At this point, the
Packit Service 1d0348
decompressor is responsible for all I/O requests to the
Packit Service 1d0348
client callbacks. The decompressor is free to read more data
Packit Service 1d0348
as and when necessary.

Packit Service 1d0348
Packit Service 1d0348

Satisfy I/O requests

Packit Service 1d0348
Packit Service 1d0348

The format handler will invoke

Packit Service 1d0348
the read_ahead, consume, and skip
Packit Service 1d0348
functions as needed.

Packit Service 1d0348
Packit Service 1d0348

Finish

Packit Service 1d0348
Packit Service 1d0348

The finish

Packit Service 1d0348
method is called only once when the archive is closed. It
Packit Service 1d0348
should release anything stored in the data and
Packit Service 1d0348
config slots of the decompressor object. It
Packit Service 1d0348
should not invoke the client close callback.

Packit Service 1d0348
Packit Service 1d0348

Format

Packit Service 1d0348
Layer 
Packit Service 1d0348
The read formats have a similar lifecycle to the
Packit Service 1d0348
decompression handlers:

Packit Service 1d0348
Packit Service 1d0348

Registration

Packit Service 1d0348
Packit Service 1d0348

Allocate your private data and

Packit Service 1d0348
initialize your pointers.

Packit Service 1d0348
Packit Service 1d0348

Bid

Packit Service 1d0348
Packit Service 1d0348

Formats bid by

Packit Service 1d0348
invoking the read_ahead() decompression method but
Packit Service 1d0348
not calling the consume() method. This allows each
Packit Service 1d0348
bidder to look ahead in the input stream. Bidders should not
Packit Service 1d0348
look further ahead than necessary, as long look aheads put
Packit Service 1d0348
pressure on the decompression layer to buffer lots of data.
Packit Service 1d0348
Most formats only require a few hundred bytes of look ahead;
Packit Service 1d0348
look aheads of a few kilobytes are reasonable. (The ISO9660
Packit Service 1d0348
reader sometimes looks ahead by 48k, which should be
Packit Service 1d0348
considered an upper limit.)

Packit Service 1d0348
Packit Service 1d0348

Read header

Packit Service 1d0348
Packit Service 1d0348

The header read is usually the

Packit Service 1d0348
most complex part of any format. There are a few strategies
Packit Service 1d0348
worth mentioning: For formats such as tar or cpio, reading
Packit Service 1d0348
and parsing the header is straightforward since headers
Packit Service 1d0348
alternate with data. For formats that store all header data
Packit Service 1d0348
at the beginning of the file, the first header read request
Packit Service 1d0348
may have to read all headers into memory and store that
Packit Service 1d0348
data, sorted by the location of the file data. Subsequent
Packit Service 1d0348
header read requests will skip forward to the beginning of
Packit Service 1d0348
the file data and return the corresponding header.

Packit Service 1d0348
Packit Service 1d0348

Read Data

Packit Service 1d0348
Packit Service 1d0348

The read data interface

Packit Service 1d0348
supports sparse files; this requires that each call return a
Packit Service 1d0348
block of data specifying the file offset and size. This may
Packit Service 1d0348
require you to carefully track the location so that you can
Packit Service 1d0348
return accurate file offsets for each read. Remember that
Packit Service 1d0348
the decompressor will return as much data as it has.
Packit Service 1d0348
Generally, you will want to request one byte, examine the
Packit Service 1d0348
return value to see how much data is available, and possibly
Packit Service 1d0348
trim that to the amount you can use. You should invoke
Packit Service 1d0348
consume for each block just before you return it.

Packit Service 1d0348
Packit Service 1d0348

Skip All Data

Packit Service 1d0348
Packit Service 1d0348

The skip data call should skip

Packit Service 1d0348
over all file data and trailing padding. This is called
Packit Service 1d0348
automatically by the API layer just before each header read.
Packit Service 1d0348
It is also called in response to the client calling the
Packit Service 1d0348
public data_skip() function.

Packit Service 1d0348
Packit Service 1d0348

Cleanup

Packit Service 1d0348
Packit Service 1d0348

On cleanup, the format should

Packit Service 1d0348
release all of its allocated memory.

Packit Service 1d0348
Packit Service 1d0348

API Layer

Packit Service 1d0348

Packit Service 1d0348
XXX to do XXX

Packit Service 1d0348
Packit Service 1d0348

WRITE ARCHITECTURE

Packit Service 1d0348
Packit Service 1d0348

The write API has a similar set

Packit Service 1d0348
of four layers: an API layer, a format layer, a compression
Packit Service 1d0348
layer, and an I/O layer. The registration here is much
Packit Service 1d0348
simpler because only one format and one compression can be
Packit Service 1d0348
registered at a time.

Packit Service 1d0348
Packit Service 1d0348

I/O Layer and

Packit Service 1d0348
Client Callbacks 
Packit Service 1d0348
XXX To be written XXX

Packit Service 1d0348
Packit Service 1d0348

Compression

Packit Service 1d0348
Layer 
Packit Service 1d0348
XXX To be written XXX

Packit Service 1d0348
Packit Service 1d0348

Format

Packit Service 1d0348
Layer 
Packit Service 1d0348
XXX To be written XXX

Packit Service 1d0348
Packit Service 1d0348

API Layer

Packit Service 1d0348

Packit Service 1d0348
XXX To be written XXX

Packit Service 1d0348
Packit Service 1d0348

WRITE_DISK

Packit Service 1d0348
ARCHITECTURE

Packit Service 1d0348
Packit Service 1d0348

The write_disk API is intended

Packit Service 1d0348
to look just like the write API to clients. Since it does
Packit Service 1d0348
not handle multiple formats or compression, it is not
Packit Service 1d0348
layered internally.

Packit Service 1d0348
Packit Service 1d0348

GENERAL SERVICES

Packit Service 1d0348
Packit Service 1d0348

The archive_read,

Packit Service 1d0348
archive_write, and archive_write_disk objects
Packit Service 1d0348
all contain an initial archive object which provides
Packit Service 1d0348
common support for a set of standard services. (Recall that
Packit Service 1d0348
ANSI/ISO C90 guarantees that you can cast freely between a
Packit Service 1d0348
pointer to a structure and a pointer to the first element of
Packit Service 1d0348
that structure.) The archive object has a magic value
Packit Service 1d0348
that indicates which API this object is associated with,
Packit Service 1d0348
slots for storing error information, and function pointers
Packit Service 1d0348
for virtualized API functions.

Packit Service 1d0348
Packit Service 1d0348

MISCELLANEOUS NOTES

Packit Service 1d0348
Packit Service 1d0348

Connecting existing archiving

Packit Service 1d0348
libraries into libarchive is generally quite difficult. In
Packit Service 1d0348
particular, many existing libraries strongly assume that you
Packit Service 1d0348
are reading from a file; they seek forwards and backwards as
Packit Service 1d0348
necessary to locate various pieces of information. In
Packit Service 1d0348
contrast, libarchive never seeks backwards in its input,
Packit Service 1d0348
which sometimes requires very different approaches.

Packit Service 1d0348
Packit Service 1d0348

For example,

Packit Service 1d0348
libarchive’s ISO9660 support operates very differently
Packit Service 1d0348
from most ISO9660 readers. The libarchive support utilizes a
Packit Service 1d0348
work-queue design that keeps a list of known entries sorted
Packit Service 1d0348
by their location in the input. Whenever libarchive’s
Packit Service 1d0348
ISO9660 implementation is asked for the next header, checks
Packit Service 1d0348
this list to find the next item on the disk. Directories are
Packit Service 1d0348
parsed when they are encountered and new items are added to
Packit Service 1d0348
the list. This design relies heavily on the ISO9660 image
Packit Service 1d0348
being optimized so that directories always occur earlier on
Packit Service 1d0348
the disk than the files they describe.

Packit Service 1d0348
Packit Service 1d0348

Depending on the

Packit Service 1d0348
specific format, such approaches may not be possible. The
Packit Service 1d0348
ZIP format specification, for example, allows archivers to
Packit Service 1d0348
store key information only at the end of the file. In
Packit Service 1d0348
theory, it is possible to create ZIP archives that cannot be
Packit Service 1d0348
read without seeking. Fortunately, such archives are very
Packit Service 1d0348
rare, and libarchive can read most ZIP archives, though it
Packit Service 1d0348
cannot always extract as much information as a dedicated ZIP
Packit Service 1d0348
program.

Packit Service 1d0348
Packit Service 1d0348

SEE ALSO

Packit Service 1d0348
Packit Service 1d0348

archive_entry(3),

Packit Service 1d0348
archive_read(3), archive_write(3), archive_write_disk(3)
Packit Service 1d0348
libarchive(3),

Packit Service 1d0348
Packit Service 1d0348

HISTORY

Packit Service 1d0348
Packit Service 1d0348

The libarchive library

Packit Service 1d0348
first appeared in FreeBSD 5.3.

Packit Service 1d0348
Packit Service 1d0348

AUTHORS

Packit Service 1d0348
Packit Service 1d0348

The libarchive library

Packit Service 1d0348
was written by Tim Kientzle <kientzle@acm.org>.

Packit Service 1d0348
Packit Service 1d0348

BSD

Packit Service 1d0348
January 26, 2011 BSD

Packit Service 1d0348

Packit Service 1d0348
</body>
Packit Service 1d0348
</html>