Blame doc/README_details.txt

Packit fe9d6e
Usage:
Packit fe9d6e
Packit fe9d6e
0) If possible, do this on a multiprocessor, especially if you are planning
Packit fe9d6e
on modifying or enhancing the package.  It will work on a uniprocessor,
Packit fe9d6e
but the tests are much more likely to pass in the presence of serious problems.
Packit fe9d6e
Packit fe9d6e
1) Type ./configure --prefix=<install dir>; make; make check
Packit fe9d6e
in the directory containing unpacked source.  The usual GNU build machinery
Packit fe9d6e
is used, except that only static, but position-independent, libraries
Packit fe9d6e
are normally built.  On Windows, read README_win32.txt instead.
Packit fe9d6e
Packit fe9d6e
2) Applications should include atomic_ops.h.  Nearly all operations
Packit fe9d6e
are implemented by header files included from it.  It is sometimes
Packit fe9d6e
necessary, and always recommended to also link against libatomic_ops.a.
Packit fe9d6e
To use the almost non-blocking stack or malloc implementations,
Packit fe9d6e
see the corresponding README files, and also link against libatomic_gpl.a
Packit fe9d6e
before linking against libatomic_ops.a.
Packit fe9d6e
Packit fe9d6e
OVERVIEW:
Packit fe9d6e
Atomic_ops.h defines a large collection of operations, each one of which is
Packit fe9d6e
a combination of an (optional) atomic memory operation, and a memory barrier.
Packit fe9d6e
Also defines associated feature-test macros to determine whether a particular
Packit fe9d6e
operation is available on the current target hardware (either directly or
Packit fe9d6e
by synthesis).  This is an attempt to replace various existing files with
Packit fe9d6e
similar goals, since they usually do not handle differences in memory
Packit fe9d6e
barrier styles with sufficient generality.
Packit fe9d6e
Packit fe9d6e
If this is included after defining AO_REQUIRE_CAS, then the package makes
Packit fe9d6e
an attempt to emulate AO_compare_and_swap* (single-width) in a way that (at
Packit fe9d6e
least on Linux) should still be async-signal-safe.  As a result, most other
Packit fe9d6e
atomic operations will then be defined using the compare-and-swap
Packit fe9d6e
emulation.  This emulation is slow, since it needs to disable signals.
Packit fe9d6e
And it needs to block in case of contention.  If you care about performance
Packit fe9d6e
on a platform that can't directly provide compare-and-swap, there are
Packit fe9d6e
probably better alternatives.  But this allows easy ports to some such
Packit fe9d6e
platforms (e.g. PA_RISC).  The option is ignored if compare-and-swap
Packit fe9d6e
can be implemented directly.
Packit fe9d6e
Packit fe9d6e
If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all
Packit fe9d6e
atomic operations will be emulated with pthread locking.  This is NOT
Packit fe9d6e
async-signal-safe.  And it is slow.  It is intended primarily for debugging
Packit fe9d6e
of the atomic_ops package itself.
Packit fe9d6e
Packit fe9d6e
Note that the implementation reflects our understanding of real processor
Packit fe9d6e
behavior.  This occasionally diverges from the documented behavior.  (E.g.
Packit fe9d6e
the documented X86 behavior seems to be weak enough that it is impractical
Packit fe9d6e
to use.  Current real implementations appear to be much better behaved.)
Packit fe9d6e
We of course are in no position to guarantee that future processors
Packit fe9d6e
(even HPs) will continue to behave this way, though we hope they will.
Packit fe9d6e
Packit fe9d6e
This is a work in progress.  Corrections/additions for other platforms are
Packit fe9d6e
greatly appreciated.  It passes rudimentary tests on X86, Itanium, and
Packit fe9d6e
Alpha.
Packit fe9d6e
Packit fe9d6e
OPERATIONS:
Packit fe9d6e
Packit fe9d6e
Most operations operate on values of type AO_t, which are unsigned integers
Packit fe9d6e
whose size matches that of pointers on the given architecture.  Exceptions
Packit fe9d6e
are:
Packit fe9d6e
Packit fe9d6e
- AO_test_and_set operates on AO_TS_t, which is whatever size the hardware
Packit fe9d6e
supports with good performance.  In some cases this is the length of a cache
Packit fe9d6e
line.  In some cases it is a byte.  In many cases it is equivalent to AO_t.
Packit fe9d6e
Packit fe9d6e
- A few operations are implemented on smaller or larger size integers.
Packit fe9d6e
Such operations are indicated by the appropriate prefix:
Packit fe9d6e
Packit fe9d6e
AO_char_... Operates on unsigned char values.
Packit fe9d6e
AO_short_... Operates on unsigned short values.
Packit fe9d6e
AO_int_... Operates on unsigned int values.
Packit fe9d6e
Packit fe9d6e
(Currently a very limited selection of these is implemented.  We're
Packit fe9d6e
working on it.)
Packit fe9d6e
Packit fe9d6e
The defined operations are all of the form AO_[<size>_]<op><barrier>(<args>).
Packit fe9d6e
Packit fe9d6e
The <op> component specifies an atomic memory operation.  It may be
Packit fe9d6e
one of the following, where the corresponding argument and result types
Packit fe9d6e
are also specified:
Packit fe9d6e
Packit fe9d6e
void nop()
Packit fe9d6e
        No atomic operation.  The barrier may still be useful.
Packit fe9d6e
AO_t load(const volatile AO_t * addr)
Packit fe9d6e
        Atomic load of *addr.
Packit fe9d6e
void store(volatile AO_t * addr, AO_t new_val)
Packit fe9d6e
        Atomically store new_val to *addr.
Packit fe9d6e
AO_t fetch_and_add(volatile AO_t *addr, AO_t incr)
Packit fe9d6e
        Atomically add incr to *addr, and return the original value of *addr.
Packit fe9d6e
AO_t fetch_and_add1(volatile AO_t *addr)
Packit fe9d6e
        Equivalent to AO_fetch_and_add(addr, 1).
Packit fe9d6e
AO_t fetch_and_sub1(volatile AO_t *addr)
Packit fe9d6e
        Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)).
Packit fe9d6e
void and(volatile AO_t *addr, AO_t value)
Packit fe9d6e
        Atomically 'and' value into *addr.
Packit fe9d6e
void or(volatile AO_t *addr, AO_t value)
Packit fe9d6e
        Atomically 'or' value into *addr.
Packit fe9d6e
void xor(volatile AO_t *addr, AO_t value)
Packit fe9d6e
        Atomically 'xor' value into *addr.
Packit fe9d6e
int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
Packit fe9d6e
        Atomically compare *addr to old_val, and replace *addr by new_val
Packit fe9d6e
        if the first comparison succeeds.  Returns nonzero if the comparison
Packit fe9d6e
        succeeded and *addr was updated.
Packit fe9d6e
AO_t fetch_compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
Packit fe9d6e
        Atomically compare *addr to old_val, and replace *addr by new_val
Packit fe9d6e
        if the first comparison succeeds; returns the original value of *addr.
Packit fe9d6e
AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr)
Packit fe9d6e
        Atomically read the binary value at *addr, and set it.  AO_TS_VAL_t
Packit fe9d6e
        is an enumeration type which includes two values AO_TS_SET and
Packit fe9d6e
        AO_TS_CLEAR.  An AO_TS_t location is capable of holding an
Packit fe9d6e
        AO_TS_VAL_t, but may be much larger, as dictated by hardware
Packit fe9d6e
        constraints.  Test_and_set logically sets the value to AO_TS_SET.
Packit fe9d6e
        It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro.
Packit fe9d6e
        AO_TS_t locations should be initialized to AO_TS_INITIALIZER.
Packit fe9d6e
        The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent.
Packit fe9d6e
        (On PA-RISC, AO_TS_SET is zero!)
Packit fe9d6e
Packit fe9d6e
Test_and_set is a more limited version of compare_and_swap.  Its only
Packit fe9d6e
advantage is that it is more easily implementable on some hardware.  It
Packit fe9d6e
should thus be used if only binary test-and-set functionality is needed.
Packit fe9d6e
Packit fe9d6e
If available, we also provide compare_and_swap operations that operate
Packit fe9d6e
on wider values.  Since standard data types for double width values
Packit fe9d6e
may not be available, these explicitly take pairs of arguments for the
Packit fe9d6e
new and/or old value.  Unfortunately, there are two common variants,
Packit fe9d6e
neither of which can easily and efficiently emulate the other.
Packit fe9d6e
The first performs a comparison against the entire value being replaced,
Packit fe9d6e
where the second replaces a double-width replacement, but performs
Packit fe9d6e
a single-width comparison:
Packit fe9d6e
Packit fe9d6e
int compare_double_and_swap_double(volatile AO_double_t * addr,
Packit fe9d6e
                                   AO_t old_val1, AO_t old_val2,
Packit fe9d6e
                                   AO_t new_val1, AO_t new_val2);
Packit fe9d6e
Packit fe9d6e
int compare_and_swap_double(volatile AO_double_t * addr,
Packit fe9d6e
                            AO_t old_val1,
Packit fe9d6e
                            AO_t new_val1, AO_t new_val2);
Packit fe9d6e
Packit fe9d6e
where AO_double_t is a structure containing AO_val1 and AO_val2 fields,
Packit fe9d6e
both of type AO_t.  For compare_and_swap_double, we compare against
Packit fe9d6e
the val1 field.  AO_double_t exists only if AO_HAVE_double_t
Packit fe9d6e
is defined.
Packit fe9d6e
Packit fe9d6e
Please note that AO_double_t (and AO_stack_t) variables should be properly
Packit fe9d6e
aligned (8-byte alignment on 32-bit targets, 16-byte alignment on 64-bit ones)
Packit fe9d6e
otherwise the behavior of a double-wide atomic primitive might be undefined
Packit fe9d6e
(or an assertion violation might occur) if such a misaligned variable is
Packit fe9d6e
passed (as a reference) to the primitive.  Global and static variables should
Packit fe9d6e
already have proper alignment automatically but automatic variables (i.e.
Packit fe9d6e
located on the stack) might be misaligned because the stack might be
Packit fe9d6e
word-aligned (e.g. 4-byte stack alignment is the default one for x86).
Packit fe9d6e
Luckily, stack-allocated AO variables is a rare case in practice.
Packit fe9d6e
Packit fe9d6e
ORDERING CONSTRAINTS:
Packit fe9d6e
Packit fe9d6e
Each operation name also includes a suffix that specifies the associated
Packit fe9d6e
ordering semantics.  The ordering constraint limits reordering of this
Packit fe9d6e
operation with respect to other atomic operations and ordinary memory
Packit fe9d6e
references.  The current implementation assumes that all memory references
Packit fe9d6e
are to ordinary cacheable memory; the ordering guarantee is with respect
Packit fe9d6e
to other threads or processes, not I/O devices.  (Whether or not this
Packit fe9d6e
distinction is important is platform-dependent.)
Packit fe9d6e
Packit fe9d6e
Ordering suffixes are one of the following:
Packit fe9d6e
Packit fe9d6e
<none>: No memory barrier.  A plain AO_nop() really does nothing.
Packit fe9d6e
_release: Earlier operations must become visible to other threads
Packit fe9d6e
          before the atomic operation.
Packit fe9d6e
_acquire: Later operations must become visible after this operation.
Packit fe9d6e
_read: Subsequent reads must become visible after reads included in
Packit fe9d6e
       the atomic operation or preceding it.  Rarely useful for clients?
Packit fe9d6e
_write: Earlier writes become visible before writes during or after
Packit fe9d6e
        the atomic operation.  Rarely useful for clients?
Packit fe9d6e
_full: The associated operation is ordered with respect to both earlier and
Packit fe9d6e
       later memory ops.  If the associated operation is nop, then this orders
Packit fe9d6e
       all earlier memory operations with respect to subsequent ones.
Packit fe9d6e
       AO_store_full or AO_nop_full are the normal ways to force a store
Packit fe9d6e
       to be ordered with respect to a later load.
Packit fe9d6e
_release_write: Ordered with respect to earlier writes.  This is
Packit fe9d6e
                normally implemented as either a _write or _release
Packit fe9d6e
                barrier.
Packit fe9d6e
_acquire_read: Ordered with respect to later reads. This is
Packit fe9d6e
                normally implemented as either a _read or _acquire barrier.
Packit fe9d6e
_dd_acquire_read: Ordered with respect to later reads that are data
Packit fe9d6e
               dependent on this one.  This is needed on
Packit fe9d6e
               a pointer read, which is later dereferenced to read a
Packit fe9d6e
               second value, with the expectation that the second
Packit fe9d6e
               read is ordered after the first one.  On most architectures,
Packit fe9d6e
               this is equivalent to no barrier.  (This is very
Packit fe9d6e
               hard to define precisely.  It should probably be avoided.
Packit fe9d6e
               A major problem is that optimizers tend to try to
Packit fe9d6e
               eliminate dependencies from the generated code, since
Packit fe9d6e
               dependencies force the hardware to execute the code
Packit fe9d6e
               serially.)
Packit fe9d6e
Packit fe9d6e
We assume that if a store is data-dependent on a previous load, then
Packit fe9d6e
the two are always implicitly ordered.
Packit fe9d6e
Packit fe9d6e
It is possible to test whether AO_<op><barrier> is available on the
Packit fe9d6e
current platform by checking whether AO_HAVE_<op>_<barrier> is defined
Packit fe9d6e
as a macro.
Packit fe9d6e
Packit fe9d6e
Note that we generally don't implement operations that are either
Packit fe9d6e
meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to
Packit fe9d6e
have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write,
Packit fe9d6e
AO_store_read).  On some platforms (e.g. PA-RISC) many operations
Packit fe9d6e
will remain undefined unless AO_REQUIRE_CAS is defined before including
Packit fe9d6e
the package.
Packit fe9d6e
Packit fe9d6e
When typed in the package build directory, the following command
Packit fe9d6e
will print operations that are unimplemented on the platform:
Packit fe9d6e
Packit fe9d6e
make test_atomic; ./test_atomic
Packit fe9d6e
Packit fe9d6e
The following command generates a file "list_atomic.i" containing the
Packit fe9d6e
macro expansions of all implemented operations on the platform:
Packit fe9d6e
Packit fe9d6e
make list_atomic.i
Packit fe9d6e
Packit fe9d6e
Known issues include:
Packit fe9d6e
Packit fe9d6e
We should be more precise in defining the semantics of the ordering
Packit fe9d6e
constraints, and if and how we can guarantee sequential consistency.
Packit fe9d6e
Packit fe9d6e
Dd_acquire_read is very hard or impossible to define in a way that cannot
Packit fe9d6e
be invalidated by reasonably standard compiler transformations.
Packit fe9d6e
Packit fe9d6e
There is probably no good reason to provide operations on standard
Packit fe9d6e
integer types, since those may have the wrong alignment constraints.
Packit fe9d6e
Packit fe9d6e
Packit fe9d6e
Example:
Packit fe9d6e
Packit fe9d6e
If you want to initialize an object, and then "publish" a pointer to it
Packit fe9d6e
in a global location p, such that other threads reading the new value of
Packit fe9d6e
p are guaranteed to see an initialized object, it suffices to use
Packit fe9d6e
AO_release_write(p, ...) to write the pointer to the object, and to
Packit fe9d6e
retrieve it in other threads with AO_acquire_read(p).
Packit fe9d6e
Packit fe9d6e
Platform notes:
Packit fe9d6e
Packit fe9d6e
All X86: We quietly assume 486 or better.
Packit fe9d6e
Packit fe9d6e
Gcc on x86:
Packit fe9d6e
Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction.
Packit fe9d6e
Currently this is appears to be of marginal benefit.