The libarchive project develops a portable, efficient C library that
can read and write streaming archives in a variety of formats. It
also includes implementations of the common
command-line tools that use the libarchive library.
This distribution bundle includes the following major components:
The top-level directory contains the following information files:
configurescript, you can try to construct it by running the script in
The following files in the top-level directory are used by the 'configure' script:
configure.ac - used to build this distribution, only needed by maintainers
config.h.in - templates used by configure script
In addition to the informational articles and documentation in the online libarchive Wiki, the distribution also includes a number of manual pages:
The manual pages above are provided in the 'doc' directory in a number of different formats.
You should also read the copious comments in
archive.h and the
source code for the sample programs for more details. Please let us
know about any errors or omissions you find.
Currently, the library automatically detects and reads the following fomats: * Old V7 tar archives * POSIX ustar * GNU tar format (including GNU long filenames, long link names, and sparse files) * Solaris 9 extended tar format (including ACLs) * POSIX pax interchange format * POSIX octet-oriented cpio * SVR4 ASCII cpio * POSIX octet-oriented cpio * Binary cpio (big-endian or little-endian) * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) * GNU and BSD 'ar' archives * 'mtree' format * 7-Zip archives * Microsoft CAB format * LHA and LZH archives * RAR archives (with some limitations due to RAR's proprietary status) * XAR archives
The library also detects and handles any of the following before evaluating the archive: * uuencoded files * files with RPM wrapper * gzip compression * bzip2 compression * compress/LZW compression * lzma, lzip, and xz compression * lz4 compression * lzop compression * zstandard compression
The library can create archives in any of the following formats: * POSIX ustar * POSIX pax interchange format * "restricted" pax format, which will create ustar archives except for entries that require pax extensions (for long filenames, ACLs, etc). * Old GNU tar format * Old V7 tar format * POSIX octet-oriented cpio * SVR4 "newc" cpio * shar archives * ZIP archives (with uncompressed or "deflate" compressed entries) * GNU and BSD 'ar' archives * 'mtree' format * ISO9660 format * 7-Zip archives * XAR archives
When creating archives, the result can be filtered with any of the following: * uuencode * gzip compression * bzip2 compression * compress/LZW compression * lzma, lzip, and xz compression * lz4 compression * lzop compression * zstandard compression
The following notes address many of the most common questions we are asked about libarchive:
This is a heavily stream-oriented system. That means that it is optimized to read or write the archive in a single pass from beginning to end. For example, this allows libarchive to process archives too large to store on disk by processing them on-the-fly as they are read from or written to a network or tape drive. This also makes libarchive useful for tools that need to produce archives on-the-fly (such as webservers that provide archived contents of a users account).
In-place modification and random access to the contents of an archive are not directly supported. For some formats, this is not an issue: For example, tar.gz archives are not designed for random access. In some other cases, libarchive can re-open an archive and scan it from the beginning quickly enough to provide the needed abilities even without true random access. Of course, some applications do require true random access; those applications should consider alternatives to libarchive.
The library is designed to be extended with new compression and archive formats. The only requirement is that the format be readable or writable as a stream and that each archive entry be independent. There are articles on the libarchive Wiki explaining how to extend libarchive.
On read, compression and format are always detected automatically.
The same API is used for all formats; it should be very easy for software using libarchive to transparently handle any of libarchive's archiving formats.
Libarchive's automatic support for decompression can be used without archiving by explicitly selecting the "raw" and "empty" formats.
I've attempted to minimize static link pollution. If you don't explicitly invoke a particular feature (such as support for a particular compression or format), it won't get pulled in to statically-linked programs. In particular, if you don't explicitly enable a particular compression or decompression support, you won't need to link against the corresponding compression or decompression libraries. This also reduces the size of statically-linked binaries in environments where that matters.
The library is generally thread safe depending on the platform: it does not define any global variables of its own. However, some platforms do not provide fully thread-safe versions of key C library functions. On those platforms, libarchive will use the non-thread-safe functions. Patches to improve this are of great interest to us.
In particular, libarchive's modules to read or write a directory
tree do use
chdir() to optimize the directory traversals. This
can cause problems for programs that expect to do disk access from
multiple threads. Of course, those modules are completely
optional and you can use the rest of libarchive without them.
The library is not thread aware, however. It does no locking or thread management of any kind. If you create a libarchive object and need to access it from multiple threads, you will need to provide your own locking.
On read, the library accepts whatever blocks you hand it. Your read callback is free to pass the library a byte at a time or mmap the entire archive and give it to the library at once. On write, the library always produces correctly-blocked output.
The object-style approach allows you to have multiple archive streams open at once. bsdtar uses this in its "@archive" extension.
The archive itself is read/written using callback functions. You can read an archive directly from an in-memory buffer or write it to a socket, if you wish. There are some utility functions to provide easy-to-use "open file," etc, capabilities.
The read/write APIs are designed to allow individual entries to be read or written to any data source: You can create a block of data in memory and add it to a tar archive without first writing a temporary file. You can also read an entry from an archive and write the data directly to a socket. If you want to read/write entries to disk, there are convenience functions to make this especially easy.
Note: The "pax interchange format" is a POSIX standard extended tar format that should be used when the older ustar format is not appropriate. It has many advantages over other tar formats (including the legacy GNU tar format) and is widely supported by current tar implementations.