Blame doc/wiki/ManPageLibarchiveInternals3.wiki

Packit Service 1d0348
LIBARCHIVE_INTERNALS(3) manual page 
Packit Service 1d0348
== NAME == 
Packit Service 1d0348
'''libarchive_internals''' 
Packit Service 1d0348
- description of libarchive internal interfaces 
Packit Service 1d0348
== OVERVIEW == 
Packit Service 1d0348
The 
Packit Service 1d0348
'''libarchive''' 
Packit Service 1d0348
library provides a flexible interface for reading and writing 
Packit Service 1d0348
streaming archive files such as tar and cpio. 
Packit Service 1d0348
Internally, it follows a modular layered design that should 
Packit Service 1d0348
make it easy to add new archive and compression formats. 
Packit Service 1d0348
== GENERAL ARCHITECTURE == 
Packit Service 1d0348
Externally, libarchive exposes most operations through an 
Packit Service 1d0348
opaque, object-style interface. 
Packit Service 1d0348
The 
Packit Service 1d0348
[[ManPagerchiventry3]] 
Packit Service 1d0348
objects store information about a single filesystem object. 
Packit Service 1d0348
The rest of the library provides facilities to write 
Packit Service 1d0348
[[ManPagerchiventry3]] 
Packit Service 1d0348
objects to archive files, 
Packit Service 1d0348
read them from archive files, 
Packit Service 1d0348
and write them to disk. 
Packit Service 1d0348
(There are plans to add a facility to read 
Packit Service 1d0348
[[ManPagerchiventry3]] 
Packit Service 1d0348
objects from disk as well.) 
Packit Service 1d0348
Packit Service 1d0348
The read and write APIs each have four layers: a public API 
Packit Service 1d0348
layer, a format layer that understands the archive file format, 
Packit Service 1d0348
a compression layer, and an I/O layer. 
Packit Service 1d0348
The I/O layer is completely exposed to clients who can replace 
Packit Service 1d0348
it entirely with their own functions. 
Packit Service 1d0348
Packit Service 1d0348
In order to provide as much consistency as possible for clients, 
Packit Service 1d0348
some public functions are virtualized. 
Packit Service 1d0348
Eventually, it should be possible for clients to open 
Packit Service 1d0348
an archive or disk writer, and then use a single set of 
Packit Service 1d0348
code to select and write entries, regardless of the target. 
Packit Service 1d0348
== READ ARCHITECTURE == 
Packit Service 1d0348
From the outside, clients use the 
Packit Service 1d0348
[[ManPagerchiveead3]] 
Packit Service 1d0348
API to manipulate an 
Packit Service 1d0348
'''archive''' 
Packit Service 1d0348
object to read entries and bodies from an archive stream. 
Packit Service 1d0348
Internally, the 
Packit Service 1d0348
'''archive''' 
Packit Service 1d0348
object is cast to an 
Packit Service 1d0348
'''archive_read''' 
Packit Service 1d0348
object, which holds all read-specific data. 
Packit Service 1d0348
The API has four layers: 
Packit Service 1d0348
The lowest layer is the I/O layer. 
Packit Service 1d0348
This layer can be overridden by clients, but most clients use 
Packit Service 1d0348
the packaged I/O callbacks provided, for example, by 
Packit Service 1d0348
[[ManPagerchiveeadpenemory3]], 
Packit Service 1d0348
and 
Packit Service 1d0348
[[ManPagerchiveeadpend3]]. 
Packit Service 1d0348
The compression layer calls the I/O layer to 
Packit Service 1d0348
read bytes and decompresses them for the format layer. 
Packit Service 1d0348
The format layer unpacks a stream of uncompressed bytes and 
Packit Service 1d0348
creates 
Packit Service 1d0348
'''archive_entry''' 
Packit Service 1d0348
objects from the incoming data. 
Packit Service 1d0348
The API layer tracks overall state 
Packit Service 1d0348
(for example, it prevents clients from reading data before reading a header) 
Packit Service 1d0348
and invokes the format and compression layer operations 
Packit Service 1d0348
through registered function pointers. 
Packit Service 1d0348
In particular, the API layer drives the format-detection process: 
Packit Service 1d0348
When opening the archive, it reads an initial block of data 
Packit Service 1d0348
and offers it to each registered compression handler. 
Packit Service 1d0348
The one with the highest bid is initialized with the first block. 
Packit Service 1d0348
Similarly, the format handlers are polled to see which handler 
Packit Service 1d0348
is the best for each archive. 
Packit Service 1d0348
(Prior to 2.4.0, the format bidders were invoked for each 
Packit Service 1d0348
entry, but this design hindered error recovery.) 
Packit Service 1d0348
=== I/O Layer and Client Callbacks=== 
Packit Service 1d0348
The read API goes to some lengths to be nice to clients. 
Packit Service 1d0348
As a result, there are few restrictions on the behavior of 
Packit Service 1d0348
the client callbacks. 
Packit Service 1d0348
Packit Service 1d0348
The client read callback is expected to provide a block 
Packit Service 1d0348
of data on each call. 
Packit Service 1d0348
A zero-length return does indicate end of file, but otherwise 
Packit Service 1d0348
blocks may be as small as one byte or as large as the entire file. 
Packit Service 1d0348
In particular, blocks may be of different sizes. 
Packit Service 1d0348
Packit Service 1d0348
The client skip callback returns the number of bytes actually 
Packit Service 1d0348
skipped, which may be much smaller than the skip requested. 
Packit Service 1d0348
The only requirement is that the skip not be larger. 
Packit Service 1d0348
In particular, clients are allowed to return zero for any 
Packit Service 1d0348
skip that they don't want to handle. 
Packit Service 1d0348
The skip callback must never be invoked with a negative value. 
Packit Service 1d0348
Packit Service 1d0348
Keep in mind that not all clients are reading from disk: 
Packit Service 1d0348
clients reading from networks may provide different-sized 
Packit Service 1d0348
blocks on every request and cannot skip at all; 
Packit Service 1d0348
advanced clients may use 
Packit Service 1d0348
[[mmap(2)|http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2]] 
Packit Service 1d0348
to read the entire file into memory at once and return the 
Packit Service 1d0348
entire file to libarchive as a single block; 
Packit Service 1d0348
other clients may begin asynchronous I/O operations for the 
Packit Service 1d0348
next block on each request. 
Packit Service 1d0348
=== Decompresssion Layer=== 
Packit Service 1d0348
The decompression layer not only handles decompression, 
Packit Service 1d0348
it also buffers data so that the format handlers see a 
Packit Service 1d0348
much nicer I/O model. 
Packit Service 1d0348
The decompression API is a two stage peek/consume model. 
Packit Service 1d0348
A read_ahead request specifies a minimum read amount; 
Packit Service 1d0348
the decompression layer must provide a pointer to at least 
Packit Service 1d0348
that much data. 
Packit Service 1d0348
If more data is immediately available, it should return more: 
Packit Service 1d0348
the format layer handles bulk data reads by asking for a minimum 
Packit Service 1d0348
of one byte and then copying as much data as is available. 
Packit Service 1d0348
Packit Service 1d0348
A subsequent call to the 
Packit Service 1d0348
'''consume'''() 
Packit Service 1d0348
function advances the read pointer. 
Packit Service 1d0348
Note that data returned from a 
Packit Service 1d0348
'''read_ahead'''() 
Packit Service 1d0348
call is guaranteed to remain in place until 
Packit Service 1d0348
the next call to 
Packit Service 1d0348
'''read_ahead'''(). 
Packit Service 1d0348
Intervening calls to 
Packit Service 1d0348
'''consume'''() 
Packit Service 1d0348
should not cause the data to move. 
Packit Service 1d0348
Packit Service 1d0348
Skip requests must always be handled exactly. 
Packit Service 1d0348
Decompression handlers that cannot seek forward should 
Packit Service 1d0348
not register a skip handler; 
Packit Service 1d0348
the API layer fills in a generic skip handler that reads and discards data. 
Packit Service 1d0348
Packit Service 1d0348
A decompression handler has a specific lifecycle: 
Packit Service 1d0348
Packit Service 1d0348
Registration/Configuration
Packit Service 1d0348
When the client invokes the public support function, 
Packit Service 1d0348
the decompression handler invokes the internal 
Packit Service 1d0348
'''__archive_read_register_compression'''() 
Packit Service 1d0348
function to provide bid and initialization functions. 
Packit Service 1d0348
This function returns 
Packit Service 1d0348
'''NULL''' 
Packit Service 1d0348
on error or else a pointer to a 
Packit Service 1d0348
'''struct''' decompressor_t. 
Packit Service 1d0348
This structure contains a 
Packit Service 1d0348
''void'' * config 
Packit Service 1d0348
slot that can be used for storing any customization information. 
Packit Service 1d0348
Bid
Packit Service 1d0348
The bid function is invoked with a pointer and size of a block of data. 
Packit Service 1d0348
The decompressor can access its config data 
Packit Service 1d0348
through the 
Packit Service 1d0348
''decompressor'' 
Packit Service 1d0348
element of the 
Packit Service 1d0348
'''archive_read''' 
Packit Service 1d0348
object. 
Packit Service 1d0348
The bid function is otherwise stateless. 
Packit Service 1d0348
In particular, it must not perform any I/O operations. 
Packit Service 1d0348
Packit Service 1d0348
The value returned by the bid function indicates its suitability 
Packit Service 1d0348
for handling this data stream. 
Packit Service 1d0348
A bid of zero will ensure that this decompressor is never invoked. 
Packit Service 1d0348
Return zero if magic number checks fail. 
Packit Service 1d0348
Otherwise, your initial implementation should return the number of bits 
Packit Service 1d0348
actually checked. 
Packit Service 1d0348
For example, if you verify two full bytes and three bits of another 
Packit Service 1d0348
byte, bid 19. 
Packit Service 1d0348
Note that the initial block may be very short; 
Packit Service 1d0348
be careful to only inspect the data you are given. 
Packit Service 1d0348
(The current decompressors require two bytes for correct bidding.) 
Packit Service 1d0348
Initialize
Packit Service 1d0348
The winning bidder will have its init function called. 
Packit Service 1d0348
This function should initialize the remaining slots of the 
Packit Service 1d0348
''struct'' decompressor_t 
Packit Service 1d0348
object pointed to by the 
Packit Service 1d0348
''decompressor'' 
Packit Service 1d0348
element of the 
Packit Service 1d0348
''archive_read'' 
Packit Service 1d0348
object. 
Packit Service 1d0348
In particular, it should allocate any working data it needs 
Packit Service 1d0348
in the 
Packit Service 1d0348
''data'' 
Packit Service 1d0348
slot of that structure. 
Packit Service 1d0348
The init function is called with the block of data that 
Packit Service 1d0348
was used for tasting. 
Packit Service 1d0348
At this point, the decompressor is responsible for all I/O 
Packit Service 1d0348
requests to the client callbacks. 
Packit Service 1d0348
The decompressor is free to read more data as and when 
Packit Service 1d0348
necessary. 
Packit Service 1d0348
Satisfy I/O requests
Packit Service 1d0348
The format handler will invoke the 
Packit Service 1d0348
''read_ahead'', 
Packit Service 1d0348
''consume'', 
Packit Service 1d0348
and 
Packit Service 1d0348
''skip'' 
Packit Service 1d0348
functions as needed. 
Packit Service 1d0348
Finish
Packit Service 1d0348
The finish method is called only once when the archive is closed. 
Packit Service 1d0348
It should release anything stored in the 
Packit Service 1d0348
''data'' 
Packit Service 1d0348
and 
Packit Service 1d0348
''config'' 
Packit Service 1d0348
slots of the 
Packit Service 1d0348
''decompressor'' 
Packit Service 1d0348
object. 
Packit Service 1d0348
It should not invoke the client close callback. 
Packit Service 1d0348
 
Packit Service 1d0348
=== Format Layer=== 
Packit Service 1d0348
The read formats have a similar lifecycle to the decompression handlers: 
Packit Service 1d0348
Packit Service 1d0348
Registration
Packit Service 1d0348
Allocate your private data and initialize your pointers. 
Packit Service 1d0348
Bid
Packit Service 1d0348
Formats bid by invoking the 
Packit Service 1d0348
'''read_ahead'''() 
Packit Service 1d0348
decompression method but not calling the 
Packit Service 1d0348
'''consume'''() 
Packit Service 1d0348
method. 
Packit Service 1d0348
This allows each bidder to look ahead in the input stream. 
Packit Service 1d0348
Bidders should not look further ahead than necessary, as long 
Packit Service 1d0348
look aheads put pressure on the decompression layer to buffer 
Packit Service 1d0348
lots of data. 
Packit Service 1d0348
Most formats only require a few hundred bytes of look ahead; 
Packit Service 1d0348
look aheads of a few kilobytes are reasonable. 
Packit Service 1d0348
(The ISO9660 reader sometimes looks ahead by 48k, which 
Packit Service 1d0348
should be considered an upper limit.) 
Packit Service 1d0348
Read header
Packit Service 1d0348
The header read is usually the most complex part of any format. 
Packit Service 1d0348
There are a few strategies worth mentioning: 
Packit Service 1d0348
For formats such as tar or cpio, reading and parsing the header is 
Packit Service 1d0348
straightforward since headers alternate with data. 
Packit Service 1d0348
For formats that store all header data at the beginning of the file, 
Packit Service 1d0348
the first header read request may have to read all headers into 
Packit Service 1d0348
memory and store that data, sorted by the location of the file 
Packit Service 1d0348
data. 
Packit Service 1d0348
Subsequent header read requests will skip forward to the 
Packit Service 1d0348
beginning of the file data and return the corresponding header. 
Packit Service 1d0348
Read Data
Packit Service 1d0348
The read data interface supports sparse files; this requires that 
Packit Service 1d0348
each call return a block of data specifying the file offset and 
Packit Service 1d0348
size. 
Packit Service 1d0348
This may require you to carefully track the location so that you 
Packit Service 1d0348
can return accurate file offsets for each read. 
Packit Service 1d0348
Remember that the decompressor will return as much data as it has. 
Packit Service 1d0348
Generally, you will want to request one byte, 
Packit Service 1d0348
examine the return value to see how much data is available, and 
Packit Service 1d0348
possibly trim that to the amount you can use. 
Packit Service 1d0348
You should invoke consume for each block just before you return it. 
Packit Service 1d0348
Skip All Data
Packit Service 1d0348
The skip data call should skip over all file data and trailing padding. 
Packit Service 1d0348
This is called automatically by the API layer just before each 
Packit Service 1d0348
header read. 
Packit Service 1d0348
It is also called in response to the client calling the public 
Packit Service 1d0348
'''data_skip'''() 
Packit Service 1d0348
function. 
Packit Service 1d0348
Cleanup
Packit Service 1d0348
On cleanup, the format should release all of its allocated memory. 
Packit Service 1d0348
 
Packit Service 1d0348
=== API Layer=== 
Packit Service 1d0348
XXX to do XXX 
Packit Service 1d0348
== WRITE ARCHITECTURE == 
Packit Service 1d0348
The write API has a similar set of four layers: 
Packit Service 1d0348
an API layer, a format layer, a compression layer, and an I/O layer. 
Packit Service 1d0348
The registration here is much simpler because only 
Packit Service 1d0348
one format and one compression can be registered at a time. 
Packit Service 1d0348
=== I/O Layer and Client Callbacks=== 
Packit Service 1d0348
XXX To be written XXX 
Packit Service 1d0348
=== Compression Layer=== 
Packit Service 1d0348
XXX To be written XXX 
Packit Service 1d0348
=== Format Layer=== 
Packit Service 1d0348
XXX To be written XXX 
Packit Service 1d0348
=== API Layer=== 
Packit Service 1d0348
XXX To be written XXX 
Packit Service 1d0348
== WRITE_DISK ARCHITECTURE == 
Packit Service 1d0348
The write_disk API is intended to look just like the write API 
Packit Service 1d0348
to clients. 
Packit Service 1d0348
Since it does not handle multiple formats or compression, it 
Packit Service 1d0348
is not layered internally. 
Packit Service 1d0348
== GENERAL SERVICES == 
Packit Service 1d0348
The 
Packit Service 1d0348
'''archive_read''', 
Packit Service 1d0348
'''archive_write''', 
Packit Service 1d0348
and 
Packit Service 1d0348
'''archive_write_disk''' 
Packit Service 1d0348
objects all contain an initial 
Packit Service 1d0348
'''archive''' 
Packit Service 1d0348
object which provides common support for a set of standard services. 
Packit Service 1d0348
(Recall that ANSI/ISO C90 guarantees that you can cast freely between 
Packit Service 1d0348
a pointer to a structure and a pointer to the first element of that 
Packit Service 1d0348
structure.) 
Packit Service 1d0348
The 
Packit Service 1d0348
'''archive''' 
Packit Service 1d0348
object has a magic value that indicates which API this object 
Packit Service 1d0348
is associated with, 
Packit Service 1d0348
slots for storing error information, 
Packit Service 1d0348
and function pointers for virtualized API functions. 
Packit Service 1d0348
== MISCELLANEOUS NOTES == 
Packit Service 1d0348
Connecting existing archiving libraries into libarchive is generally 
Packit Service 1d0348
quite difficult. 
Packit Service 1d0348
In particular, many existing libraries strongly assume that you 
Packit Service 1d0348
are reading from a file; they seek forwards and backwards as necessary 
Packit Service 1d0348
to locate various pieces of information. 
Packit Service 1d0348
In contrast, libarchive never seeks backwards in its input, which 
Packit Service 1d0348
sometimes requires very different approaches. 
Packit Service 1d0348
Packit Service 1d0348
For example, libarchive's ISO9660 support operates very differently 
Packit Service 1d0348
from most ISO9660 readers. 
Packit Service 1d0348
The libarchive support utilizes a work-queue design that 
Packit Service 1d0348
keeps a list of known entries sorted by their location in the input. 
Packit Service 1d0348
Whenever libarchive's ISO9660 implementation is asked for the next 
Packit Service 1d0348
header, checks this list to find the next item on the disk. 
Packit Service 1d0348
Directories are parsed when they are encountered and new 
Packit Service 1d0348
items are added to the list. 
Packit Service 1d0348
This design relies heavily on the ISO9660 image being optimized so that 
Packit Service 1d0348
directories always occur earlier on the disk than the files they 
Packit Service 1d0348
describe. 
Packit Service 1d0348
Packit Service 1d0348
Depending on the specific format, such approaches may not be possible. 
Packit Service 1d0348
The ZIP format specification, for example, allows archivers to store 
Packit Service 1d0348
key information only at the end of the file. 
Packit Service 1d0348
In theory, it is possible to create ZIP archives that cannot 
Packit Service 1d0348
be read without seeking. 
Packit Service 1d0348
Fortunately, such archives are very rare, and libarchive can read 
Packit Service 1d0348
most ZIP archives, though it cannot always extract as much information 
Packit Service 1d0348
as a dedicated ZIP program. 
Packit Service 1d0348
== SEE ALSO == 
Packit Service 1d0348
[[ManPagerchiventry3]], 
Packit Service 1d0348
[[ManPagerchiveead3]], 
Packit Service 1d0348
[[ManPagerchiverite3]], 
Packit Service 1d0348
[[ManPagerchiveriteisk3]] 
Packit Service 1d0348
[[ManPageibarchive3]], 
Packit Service 1d0348
== HISTORY == 
Packit Service 1d0348
The 
Packit Service 1d0348
'''libarchive''' 
Packit Service 1d0348
library first appeared in 
Packit Service 1d0348
FreeBSD 5.3. 
Packit Service 1d0348
== AUTHORS == 
Packit Service 1d0348
The 
Packit Service 1d0348
'''libarchive''' 
Packit Service 1d0348
library was written by 
Packit Service 1d0348
Tim Kientzle  <kientzle@acm.org.>