Blame doc/README_malloc.txt

Packit fe9d6e
The libatomic_ops_gpl includes a simple almost-lock-free malloc implementation.
Packit fe9d6e
Packit fe9d6e
This is intended as a safe way to allocate memory from a signal handler,
Packit fe9d6e
or to allocate memory in the context of a library that does not know what
Packit fe9d6e
thread library it will be used with.  In either case locking is impossible.
Packit fe9d6e
Packit fe9d6e
Note that the operations are only guaranteed to be 1-lock-free, i.e. a
Packit fe9d6e
single blocked thread will not prevent progress, but multiple blocked
Packit fe9d6e
threads may.  To safely use these operations in a signal handler,
Packit fe9d6e
the handler should be non-reentrant, i.e. it should not be interruptable
Packit fe9d6e
by another handler using these operations.  Furthermore use outside
Packit fe9d6e
of signal handlers in a multithreaded application should be protected
Packit fe9d6e
by a lock, so that at most one invocation may be interrupted by a signal.
Packit fe9d6e
The header will define the macro "AO_MALLOC_IS_LOCK_FREE" on platforms
Packit fe9d6e
on which malloc is completely lock-free, and hence these restrictions
Packit fe9d6e
do not apply.
Packit fe9d6e
Packit fe9d6e
In the presence of threads, but absence of contention, the time performance
Packit fe9d6e
of this package should be as good, or slightly better than, most system
Packit fe9d6e
malloc implementations.  Its space performance
Packit fe9d6e
is theoretically optimal (to within a constant factor), but probably
Packit fe9d6e
quite poor in practice.  In particular, no attempt is made to
Packit fe9d6e
coalesce free small memory blocks.  Something like Doug Lea's malloc is
Packit fe9d6e
likely to use significantly less memory for complex applications.
Packit fe9d6e
Packit fe9d6e
Performance on platforms without an efficient compare-and-swap implementation
Packit fe9d6e
will be poor.
Packit fe9d6e
Packit fe9d6e
This package was not designed for processor-scalability in the face of
Packit fe9d6e
high allocation rates.  If all threads happen to allocate different-sized
Packit fe9d6e
objects, you might get lucky.  Otherwise expect contention and false-sharing
Packit fe9d6e
problems.  If this is an issue, something like Maged Michael's algorithm
Packit fe9d6e
(PLDI 2004) would be technically a far better choice.  If you are concerned
Packit fe9d6e
only with scalability, and not signal-safety, you might also consider
Packit fe9d6e
using Hoard instead.  We have seen a factor of 3 to 4 slowdown from the
Packit fe9d6e
standard glibc malloc implementation with contention, even when the
Packit fe9d6e
performance without contention was faster.  (To make the implementation
Packit fe9d6e
more scalable, one would need to replicate at least the free list headers,
Packit fe9d6e
so that concurrent access is possible without cache conflicts.)
Packit fe9d6e
Packit fe9d6e
Unfortunately there is no portable async-signal-safe way to obtain large
Packit fe9d6e
chunks of memory from the OS.  Based on reading of the source code,
Packit fe9d6e
mmap-based allocation appears safe under Linux, and probably BSD variants.
Packit fe9d6e
It is probably unsafe for operating systems built on Mach, such as
Packit fe9d6e
Apple's Darwin.  Without use of mmap, the allocator is
Packit fe9d6e
limited to a fixed size, statically preallocated heap (2MB by default),
Packit fe9d6e
and will fail to allocate objects above a certain size (just under 64K
Packit fe9d6e
by default).  Use of mmap to circumvent these limitations requires an
Packit fe9d6e
explicit call.
Packit fe9d6e
Packit fe9d6e
The entire interface to the AO_malloc package currently consists of:
Packit fe9d6e
Packit fe9d6e
#include <atomic_ops_malloc.h> /* includes atomic_ops.h */
Packit fe9d6e
Packit fe9d6e
void *AO_malloc(size_t sz);
Packit fe9d6e
void AO_free(void *p);
Packit fe9d6e
void AO_malloc_enable_mmap(void);