|
Packit Service |
c5cf8c |
Single-threaded implementation of RMA for shared memory
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Basic Assumptions
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* All of the local windows associated with the specified window object
|
|
Packit Service |
c5cf8c |
are located in and accessible through shared memory.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* All processors involved in the communicator are homogeneous.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Only basic datatypes are supported.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
General Notes
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Data Structures
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_shm_Win_create
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the shared memory is not cache coherent, the initialize the
|
|
Packit Service |
c5cf8c |
preceding put flag
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the local window is located in non-cache coherent shared memory,
|
|
Packit Service |
c5cf8c |
then we need to track put operations to the local window which
|
|
Packit Service |
c5cf8c |
(might) have occurred since the last fence. This tracking is
|
|
Packit Service |
c5cf8c |
required so that cache lines associated with the local window can be
|
|
Packit Service |
c5cf8c |
invalidated, ensuring that the local process sees the changes.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: Can puts happen before the first fence? In other words, is an
|
|
Packit Service |
c5cf8c |
exposure epoch implicitly opened as part of the window creation
|
|
Packit Service |
c5cf8c |
process?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Initialize the inter-process (shared memory) mutex
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Mutexes are required in order to ensure that accumulate operations
|
|
Packit Service |
c5cf8c |
on any given element (basic datatype) in the local window are
|
|
Packit Service |
c5cf8c |
atomic.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: multiple mutexes may be needed if the local window is broken
|
|
Packit Service |
c5cf8c |
into multiple regions. For details, see the discussion in
|
|
Packit Service |
c5cf8c |
MPID_shm_Accumulate().
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_shm_Win_fence
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the shared memory is not cache coherent, flush cache and/or write
|
|
Packit Service |
c5cf8c |
buffer as necessary
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the shared memrory is not cache coherent and stores were
|
|
Packit Service |
c5cf8c |
performed to the local window, then (depending on the
|
|
Packit Service |
c5cf8c |
architecture specifics and the RMA implementation) we might need
|
|
Packit Service |
c5cf8c |
to perform the following operations.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
1) if system is using a write-back caching strategy, then flush
|
|
Packit Service |
c5cf8c |
the cache
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
2) flush the write buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: It may be possible to defer these operations when
|
|
Packit Service |
c5cf8c |
NOSUCCEED is also supplied. It's currently unclear if this
|
|
Packit Service |
c5cf8c |
would be beneficial.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* barrier
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
We need a barrier to ensure that all remote puts and local stores to
|
|
Packit Service |
c5cf8c |
the local window have completed so the results are available to
|
|
Packit Service |
c5cf8c |
operations performed after the fence operation. We also
|
|
Packit Service |
c5cf8c |
need to ensure that any remote gets and local loads from the local
|
|
Packit Service |
c5cf8c |
window are complete before any future remote puts or local stores
|
|
Packit Service |
c5cf8c |
are allowed to affect the local window.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the shared memory is not cache coherent
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* invlidate cache
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the shared memrory is not cache coherent and RMA puts were
|
|
Packit Service |
c5cf8c |
performed to the local window, then (depending on the
|
|
Packit Service |
c5cf8c |
architecture specifics and the RMA implementation) we might to
|
|
Packit Service |
c5cf8c |
invalidate any cache lines associated with the shared memory
|
|
Packit Service |
c5cf8c |
bound to this window.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* set (or clear) preceding put flag based on the assertions
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: To reduce unncessary cache and write buffer flushes, the
|
|
Packit Service |
c5cf8c |
barrier (above) could be replaced with an alltoall gather of the
|
|
Packit Service |
c5cf8c |
operation occuring between node pairs. Using this information, we
|
|
Packit Service |
c5cf8c |
could eliminate flushes except when an operation actually affected
|
|
Packit Service |
c5cf8c |
the local window.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_shm_Get
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Copy data directly from the target buffer (located in shared memory)
|
|
Packit Service |
c5cf8c |
to the origin buffer.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_shm_Put
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Copy data directly from the the origin buffer to the target buffer
|
|
Packit Service |
c5cf8c |
(located in shared memory).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_shm_Accumulate
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Lock target local window
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The standard says that operations on elements (basic datatypes) need
|
|
Packit Service |
c5cf8c |
to be atomic, but the entire accumulate operation need not be atomic
|
|
Packit Service |
c5cf8c |
with repsect to other accumulate operations. The simple solution is
|
|
Packit Service |
c5cf8c |
to lock the whole window when performing an operation; however this
|
|
Packit Service |
c5cf8c |
ensures that operations are serialized which will seriously hurt
|
|
Packit Service |
c5cf8c |
performance when multiple processes/threads are attempting to
|
|
Packit Service |
c5cf8c |
accumulate data into a single window (or even a single large buffer
|
|
Packit Service |
c5cf8c |
in that window).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
TODO: Develop an algorithm for performing the operations when the
|
|
Packit Service |
c5cf8c |
local window is broken into multiple regions, with a mutex per
|
|
Packit Service |
c5cf8c |
region. Care must be taken to ensure that if an element spans two
|
|
Packit Service |
c5cf8c |
regions, then the mutexes for both regions must be locked before the
|
|
Packit Service |
c5cf8c |
operation is performed on that element. Performing these lock
|
|
Packit Service |
c5cf8c |
operations is likely to be somewhat expensive, so we will want a
|
|
Packit Service |
c5cf8c |
tuneable parameter for specifying the minimum size of a region.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: Do inter-process mutexes also ensure mutual exclusion for threads
|
|
Packit Service |
c5cf8c |
within the same process? If not, then we need to a acquire both a
|
|
Packit Service |
c5cf8c |
thread and process locks. We probably want to acquire the thread
|
|
Packit Service |
c5cf8c |
lock first to minimize the contention at the process lock.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Perform requested accumulation
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
We need an algorithm for performing accumulations when the
|
|
Packit Service |
c5cf8c |
datatype are non-contiguous. Ideally, the two dataloops and the
|
|
Packit Service |
c5cf8c |
accumulation operations could be processed without requiring any
|
|
Packit Service |
c5cf8c |
extra copying, packing, or temporary buffers.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: While it may be possible to write a function to perform the
|
|
Packit Service |
c5cf8c |
requested operations, it is likely that such functionality will need
|
|
Packit Service |
c5cf8c |
to be inlined so that appropriate locking of local window regions
|
|
Packit Service |
c5cf8c |
occurs as data is being processed. Also, the dataloops will need to
|
|
Packit Service |
c5cf8c |
be optimized so that it is not necessary to acquire a region's mutex
|
|
Packit Service |
c5cf8c |
more than once per request.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Unlock target local window
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|