|
Packit |
fe9d6e |
Usage:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
0) If possible, do this on a multiprocessor, especially if you are planning
|
|
Packit |
fe9d6e |
on modifying or enhancing the package. It will work on a uniprocessor,
|
|
Packit |
fe9d6e |
but the tests are much more likely to pass in the presence of serious problems.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
1) Type ./configure --prefix=<install dir>; make; make check
|
|
Packit |
fe9d6e |
in the directory containing unpacked source. The usual GNU build machinery
|
|
Packit |
fe9d6e |
is used, except that only static, but position-independent, libraries
|
|
Packit |
fe9d6e |
are normally built. On Windows, read README_win32.txt instead.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
2) Applications should include atomic_ops.h. Nearly all operations
|
|
Packit |
fe9d6e |
are implemented by header files included from it. It is sometimes
|
|
Packit |
fe9d6e |
necessary, and always recommended to also link against libatomic_ops.a.
|
|
Packit |
fe9d6e |
To use the almost non-blocking stack or malloc implementations,
|
|
Packit |
fe9d6e |
see the corresponding README files, and also link against libatomic_gpl.a
|
|
Packit |
fe9d6e |
before linking against libatomic_ops.a.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
OVERVIEW:
|
|
Packit |
fe9d6e |
Atomic_ops.h defines a large collection of operations, each one of which is
|
|
Packit |
fe9d6e |
a combination of an (optional) atomic memory operation, and a memory barrier.
|
|
Packit |
fe9d6e |
Also defines associated feature-test macros to determine whether a particular
|
|
Packit |
fe9d6e |
operation is available on the current target hardware (either directly or
|
|
Packit |
fe9d6e |
by synthesis). This is an attempt to replace various existing files with
|
|
Packit |
fe9d6e |
similar goals, since they usually do not handle differences in memory
|
|
Packit |
fe9d6e |
barrier styles with sufficient generality.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
If this is included after defining AO_REQUIRE_CAS, then the package makes
|
|
Packit |
fe9d6e |
an attempt to emulate AO_compare_and_swap* (single-width) in a way that (at
|
|
Packit |
fe9d6e |
least on Linux) should still be async-signal-safe. As a result, most other
|
|
Packit |
fe9d6e |
atomic operations will then be defined using the compare-and-swap
|
|
Packit |
fe9d6e |
emulation. This emulation is slow, since it needs to disable signals.
|
|
Packit |
fe9d6e |
And it needs to block in case of contention. If you care about performance
|
|
Packit |
fe9d6e |
on a platform that can't directly provide compare-and-swap, there are
|
|
Packit |
fe9d6e |
probably better alternatives. But this allows easy ports to some such
|
|
Packit |
fe9d6e |
platforms (e.g. PA_RISC). The option is ignored if compare-and-swap
|
|
Packit |
fe9d6e |
can be implemented directly.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
If atomic_ops.h is included after defining AO_USE_PTHREAD_DEFS, then all
|
|
Packit |
fe9d6e |
atomic operations will be emulated with pthread locking. This is NOT
|
|
Packit |
fe9d6e |
async-signal-safe. And it is slow. It is intended primarily for debugging
|
|
Packit |
fe9d6e |
of the atomic_ops package itself.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Note that the implementation reflects our understanding of real processor
|
|
Packit |
fe9d6e |
behavior. This occasionally diverges from the documented behavior. (E.g.
|
|
Packit |
fe9d6e |
the documented X86 behavior seems to be weak enough that it is impractical
|
|
Packit |
fe9d6e |
to use. Current real implementations appear to be much better behaved.)
|
|
Packit |
fe9d6e |
We of course are in no position to guarantee that future processors
|
|
Packit |
fe9d6e |
(even HPs) will continue to behave this way, though we hope they will.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
This is a work in progress. Corrections/additions for other platforms are
|
|
Packit |
fe9d6e |
greatly appreciated. It passes rudimentary tests on X86, Itanium, and
|
|
Packit |
fe9d6e |
Alpha.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
OPERATIONS:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Most operations operate on values of type AO_t, which are unsigned integers
|
|
Packit |
fe9d6e |
whose size matches that of pointers on the given architecture. Exceptions
|
|
Packit |
fe9d6e |
are:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
- AO_test_and_set operates on AO_TS_t, which is whatever size the hardware
|
|
Packit |
fe9d6e |
supports with good performance. In some cases this is the length of a cache
|
|
Packit |
fe9d6e |
line. In some cases it is a byte. In many cases it is equivalent to AO_t.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
- A few operations are implemented on smaller or larger size integers.
|
|
Packit |
fe9d6e |
Such operations are indicated by the appropriate prefix:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
AO_char_... Operates on unsigned char values.
|
|
Packit |
fe9d6e |
AO_short_... Operates on unsigned short values.
|
|
Packit |
fe9d6e |
AO_int_... Operates on unsigned int values.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
(Currently a very limited selection of these is implemented. We're
|
|
Packit |
fe9d6e |
working on it.)
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
The defined operations are all of the form AO_[<size>_]<op><barrier>(<args>).
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
The <op> component specifies an atomic memory operation. It may be
|
|
Packit |
fe9d6e |
one of the following, where the corresponding argument and result types
|
|
Packit |
fe9d6e |
are also specified:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
void nop()
|
|
Packit |
fe9d6e |
No atomic operation. The barrier may still be useful.
|
|
Packit |
fe9d6e |
AO_t load(const volatile AO_t * addr)
|
|
Packit |
fe9d6e |
Atomic load of *addr.
|
|
Packit |
fe9d6e |
void store(volatile AO_t * addr, AO_t new_val)
|
|
Packit |
fe9d6e |
Atomically store new_val to *addr.
|
|
Packit |
fe9d6e |
AO_t fetch_and_add(volatile AO_t *addr, AO_t incr)
|
|
Packit |
fe9d6e |
Atomically add incr to *addr, and return the original value of *addr.
|
|
Packit |
fe9d6e |
AO_t fetch_and_add1(volatile AO_t *addr)
|
|
Packit |
fe9d6e |
Equivalent to AO_fetch_and_add(addr, 1).
|
|
Packit |
fe9d6e |
AO_t fetch_and_sub1(volatile AO_t *addr)
|
|
Packit |
fe9d6e |
Equivalent to AO_fetch_and_add(addr, (AO_t)(-1)).
|
|
Packit |
fe9d6e |
void and(volatile AO_t *addr, AO_t value)
|
|
Packit |
fe9d6e |
Atomically 'and' value into *addr.
|
|
Packit |
fe9d6e |
void or(volatile AO_t *addr, AO_t value)
|
|
Packit |
fe9d6e |
Atomically 'or' value into *addr.
|
|
Packit |
fe9d6e |
void xor(volatile AO_t *addr, AO_t value)
|
|
Packit |
fe9d6e |
Atomically 'xor' value into *addr.
|
|
Packit |
fe9d6e |
int compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
|
|
Packit |
fe9d6e |
Atomically compare *addr to old_val, and replace *addr by new_val
|
|
Packit |
fe9d6e |
if the first comparison succeeds. Returns nonzero if the comparison
|
|
Packit |
fe9d6e |
succeeded and *addr was updated.
|
|
Packit |
fe9d6e |
AO_t fetch_compare_and_swap(volatile AO_t * addr, AO_t old_val, AO_t new_val)
|
|
Packit |
fe9d6e |
Atomically compare *addr to old_val, and replace *addr by new_val
|
|
Packit |
fe9d6e |
if the first comparison succeeds; returns the original value of *addr.
|
|
Packit |
fe9d6e |
AO_TS_VAL_t test_and_set(volatile AO_TS_t * addr)
|
|
Packit |
fe9d6e |
Atomically read the binary value at *addr, and set it. AO_TS_VAL_t
|
|
Packit |
fe9d6e |
is an enumeration type which includes two values AO_TS_SET and
|
|
Packit |
fe9d6e |
AO_TS_CLEAR. An AO_TS_t location is capable of holding an
|
|
Packit |
fe9d6e |
AO_TS_VAL_t, but may be much larger, as dictated by hardware
|
|
Packit |
fe9d6e |
constraints. Test_and_set logically sets the value to AO_TS_SET.
|
|
Packit |
fe9d6e |
It may be reset to AO_TS_CLEAR with the AO_CLEAR(AO_TS_t *) macro.
|
|
Packit |
fe9d6e |
AO_TS_t locations should be initialized to AO_TS_INITIALIZER.
|
|
Packit |
fe9d6e |
The values of AO_TS_SET and AO_TS_CLEAR are hardware dependent.
|
|
Packit |
fe9d6e |
(On PA-RISC, AO_TS_SET is zero!)
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Test_and_set is a more limited version of compare_and_swap. Its only
|
|
Packit |
fe9d6e |
advantage is that it is more easily implementable on some hardware. It
|
|
Packit |
fe9d6e |
should thus be used if only binary test-and-set functionality is needed.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
If available, we also provide compare_and_swap operations that operate
|
|
Packit |
fe9d6e |
on wider values. Since standard data types for double width values
|
|
Packit |
fe9d6e |
may not be available, these explicitly take pairs of arguments for the
|
|
Packit |
fe9d6e |
new and/or old value. Unfortunately, there are two common variants,
|
|
Packit |
fe9d6e |
neither of which can easily and efficiently emulate the other.
|
|
Packit |
fe9d6e |
The first performs a comparison against the entire value being replaced,
|
|
Packit |
fe9d6e |
where the second replaces a double-width replacement, but performs
|
|
Packit |
fe9d6e |
a single-width comparison:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
int compare_double_and_swap_double(volatile AO_double_t * addr,
|
|
Packit |
fe9d6e |
AO_t old_val1, AO_t old_val2,
|
|
Packit |
fe9d6e |
AO_t new_val1, AO_t new_val2);
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
int compare_and_swap_double(volatile AO_double_t * addr,
|
|
Packit |
fe9d6e |
AO_t old_val1,
|
|
Packit |
fe9d6e |
AO_t new_val1, AO_t new_val2);
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
where AO_double_t is a structure containing AO_val1 and AO_val2 fields,
|
|
Packit |
fe9d6e |
both of type AO_t. For compare_and_swap_double, we compare against
|
|
Packit |
fe9d6e |
the val1 field. AO_double_t exists only if AO_HAVE_double_t
|
|
Packit |
fe9d6e |
is defined.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Please note that AO_double_t (and AO_stack_t) variables should be properly
|
|
Packit |
fe9d6e |
aligned (8-byte alignment on 32-bit targets, 16-byte alignment on 64-bit ones)
|
|
Packit |
fe9d6e |
otherwise the behavior of a double-wide atomic primitive might be undefined
|
|
Packit |
fe9d6e |
(or an assertion violation might occur) if such a misaligned variable is
|
|
Packit |
fe9d6e |
passed (as a reference) to the primitive. Global and static variables should
|
|
Packit |
fe9d6e |
already have proper alignment automatically but automatic variables (i.e.
|
|
Packit |
fe9d6e |
located on the stack) might be misaligned because the stack might be
|
|
Packit |
fe9d6e |
word-aligned (e.g. 4-byte stack alignment is the default one for x86).
|
|
Packit |
fe9d6e |
Luckily, stack-allocated AO variables is a rare case in practice.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
ORDERING CONSTRAINTS:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Each operation name also includes a suffix that specifies the associated
|
|
Packit |
fe9d6e |
ordering semantics. The ordering constraint limits reordering of this
|
|
Packit |
fe9d6e |
operation with respect to other atomic operations and ordinary memory
|
|
Packit |
fe9d6e |
references. The current implementation assumes that all memory references
|
|
Packit |
fe9d6e |
are to ordinary cacheable memory; the ordering guarantee is with respect
|
|
Packit |
fe9d6e |
to other threads or processes, not I/O devices. (Whether or not this
|
|
Packit |
fe9d6e |
distinction is important is platform-dependent.)
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Ordering suffixes are one of the following:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
<none>: No memory barrier. A plain AO_nop() really does nothing.
|
|
Packit |
fe9d6e |
_release: Earlier operations must become visible to other threads
|
|
Packit |
fe9d6e |
before the atomic operation.
|
|
Packit |
fe9d6e |
_acquire: Later operations must become visible after this operation.
|
|
Packit |
fe9d6e |
_read: Subsequent reads must become visible after reads included in
|
|
Packit |
fe9d6e |
the atomic operation or preceding it. Rarely useful for clients?
|
|
Packit |
fe9d6e |
_write: Earlier writes become visible before writes during or after
|
|
Packit |
fe9d6e |
the atomic operation. Rarely useful for clients?
|
|
Packit |
fe9d6e |
_full: The associated operation is ordered with respect to both earlier and
|
|
Packit |
fe9d6e |
later memory ops. If the associated operation is nop, then this orders
|
|
Packit |
fe9d6e |
all earlier memory operations with respect to subsequent ones.
|
|
Packit |
fe9d6e |
AO_store_full or AO_nop_full are the normal ways to force a store
|
|
Packit |
fe9d6e |
to be ordered with respect to a later load.
|
|
Packit |
fe9d6e |
_release_write: Ordered with respect to earlier writes. This is
|
|
Packit |
fe9d6e |
normally implemented as either a _write or _release
|
|
Packit |
fe9d6e |
barrier.
|
|
Packit |
fe9d6e |
_acquire_read: Ordered with respect to later reads. This is
|
|
Packit |
fe9d6e |
normally implemented as either a _read or _acquire barrier.
|
|
Packit |
fe9d6e |
_dd_acquire_read: Ordered with respect to later reads that are data
|
|
Packit |
fe9d6e |
dependent on this one. This is needed on
|
|
Packit |
fe9d6e |
a pointer read, which is later dereferenced to read a
|
|
Packit |
fe9d6e |
second value, with the expectation that the second
|
|
Packit |
fe9d6e |
read is ordered after the first one. On most architectures,
|
|
Packit |
fe9d6e |
this is equivalent to no barrier. (This is very
|
|
Packit |
fe9d6e |
hard to define precisely. It should probably be avoided.
|
|
Packit |
fe9d6e |
A major problem is that optimizers tend to try to
|
|
Packit |
fe9d6e |
eliminate dependencies from the generated code, since
|
|
Packit |
fe9d6e |
dependencies force the hardware to execute the code
|
|
Packit |
fe9d6e |
serially.)
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
We assume that if a store is data-dependent on a previous load, then
|
|
Packit |
fe9d6e |
the two are always implicitly ordered.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
It is possible to test whether AO_<op><barrier> is available on the
|
|
Packit |
fe9d6e |
current platform by checking whether AO_HAVE_<op>_<barrier> is defined
|
|
Packit |
fe9d6e |
as a macro.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Note that we generally don't implement operations that are either
|
|
Packit |
fe9d6e |
meaningless (e.g. AO_nop_acquire, AO_nop_release) or which appear to
|
|
Packit |
fe9d6e |
have no clear use (e.g. AO_load_release, AO_store_acquire, AO_load_write,
|
|
Packit |
fe9d6e |
AO_store_read). On some platforms (e.g. PA-RISC) many operations
|
|
Packit |
fe9d6e |
will remain undefined unless AO_REQUIRE_CAS is defined before including
|
|
Packit |
fe9d6e |
the package.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
When typed in the package build directory, the following command
|
|
Packit |
fe9d6e |
will print operations that are unimplemented on the platform:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
make test_atomic; ./test_atomic
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
The following command generates a file "list_atomic.i" containing the
|
|
Packit |
fe9d6e |
macro expansions of all implemented operations on the platform:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
make list_atomic.i
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Known issues include:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
We should be more precise in defining the semantics of the ordering
|
|
Packit |
fe9d6e |
constraints, and if and how we can guarantee sequential consistency.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Dd_acquire_read is very hard or impossible to define in a way that cannot
|
|
Packit |
fe9d6e |
be invalidated by reasonably standard compiler transformations.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
There is probably no good reason to provide operations on standard
|
|
Packit |
fe9d6e |
integer types, since those may have the wrong alignment constraints.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Example:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
If you want to initialize an object, and then "publish" a pointer to it
|
|
Packit |
fe9d6e |
in a global location p, such that other threads reading the new value of
|
|
Packit |
fe9d6e |
p are guaranteed to see an initialized object, it suffices to use
|
|
Packit |
fe9d6e |
AO_release_write(p, ...) to write the pointer to the object, and to
|
|
Packit |
fe9d6e |
retrieve it in other threads with AO_acquire_read(p).
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Platform notes:
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
All X86: We quietly assume 486 or better.
|
|
Packit |
fe9d6e |
|
|
Packit |
fe9d6e |
Gcc on x86:
|
|
Packit |
fe9d6e |
Define AO_USE_PENTIUM4_INSTRS to use the Pentium 4 mfence instruction.
|
|
Packit |
fe9d6e |
Currently this is appears to be of marginal benefit.
|