Blame doc/notes/rma/shm.txt

Packit Service c5cf8c
Single-threaded implementation of RMA for shared memory
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
Basic Assumptions
Packit Service c5cf8c
Packit Service c5cf8c
* All of the local windows associated with the specified window object
Packit Service c5cf8c
  are located in and accessible through shared memory.
Packit Service c5cf8c
Packit Service c5cf8c
* All processors involved in the communicator are homogeneous.
Packit Service c5cf8c
Packit Service c5cf8c
* Only basic datatypes are supported.
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
General Notes
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
Data Structures
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
MPID_shm_Win_create
Packit Service c5cf8c
Packit Service c5cf8c
Packit Service c5cf8c
* If the shared memory is not cache coherent, the initialize the
Packit Service c5cf8c
  preceding put flag
Packit Service c5cf8c
Packit Service c5cf8c
  If the local window is located in non-cache coherent shared memory,
Packit Service c5cf8c
  then we need to track put operations to the local window which
Packit Service c5cf8c
  (might) have occurred since the last fence.  This tracking is
Packit Service c5cf8c
  required so that cache lines associated with the local window can be
Packit Service c5cf8c
  invalidated, ensuring that the local process sees the changes.
Packit Service c5cf8c
Packit Service c5cf8c
  Q: Can puts happen before the first fence?  In other words, is an
Packit Service c5cf8c
  exposure epoch implicitly opened as part of the window creation
Packit Service c5cf8c
  process?
Packit Service c5cf8c
Packit Service c5cf8c
* Initialize the inter-process (shared memory) mutex
Packit Service c5cf8c
Packit Service c5cf8c
  Mutexes are required in order to ensure that accumulate operations
Packit Service c5cf8c
  on any given element (basic datatype) in the local window are
Packit Service c5cf8c
  atomic.
Packit Service c5cf8c
Packit Service c5cf8c
  NOTE: multiple mutexes may be needed if the local window is broken
Packit Service c5cf8c
  into multiple regions.  For details, see the discussion in
Packit Service c5cf8c
  MPID_shm_Accumulate().
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
MPID_shm_Win_fence
Packit Service c5cf8c
Packit Service c5cf8c
* If the shared memory is not cache coherent, flush cache and/or write
Packit Service c5cf8c
  buffer as necessary
Packit Service c5cf8c
  
Packit Service c5cf8c
  If the shared memrory is not cache coherent and stores were
Packit Service c5cf8c
  performed to the local window, then (depending on the
Packit Service c5cf8c
  architecture specifics and the RMA implementation) we might need
Packit Service c5cf8c
  to perform the following operations.
Packit Service c5cf8c
  
Packit Service c5cf8c
  1) if system is using a write-back caching strategy, then flush
Packit Service c5cf8c
  the cache
Packit Service c5cf8c
  
Packit Service c5cf8c
  2) flush the write buffer
Packit Service c5cf8c
  
Packit Service c5cf8c
  NOTE: It may be possible to defer these operations when
Packit Service c5cf8c
  NOSUCCEED is also supplied.  It's currently unclear if this
Packit Service c5cf8c
  would be beneficial.
Packit Service c5cf8c
Packit Service c5cf8c
* barrier
Packit Service c5cf8c
Packit Service c5cf8c
  We need a barrier to ensure that all remote puts and local stores to
Packit Service c5cf8c
  the local window have completed so the results are available to
Packit Service c5cf8c
  operations performed after the fence operation.  We also
Packit Service c5cf8c
  need to ensure that any remote gets and local loads from the local
Packit Service c5cf8c
  window are complete before any future remote puts or local stores
Packit Service c5cf8c
  are allowed to affect the local window.
Packit Service c5cf8c
  
Packit Service c5cf8c
* If the shared memory is not cache coherent
Packit Service c5cf8c
Packit Service c5cf8c
  * invlidate cache
Packit Service c5cf8c
Packit Service c5cf8c
    If the shared memrory is not cache coherent and RMA puts were
Packit Service c5cf8c
    performed to the local window, then (depending on the
Packit Service c5cf8c
    architecture specifics and the RMA implementation) we might to
Packit Service c5cf8c
    invalidate any cache lines associated with the shared memory
Packit Service c5cf8c
    bound to this window.                        
Packit Service c5cf8c
Packit Service c5cf8c
  * set (or clear) preceding put flag based on the assertions
Packit Service c5cf8c
Packit Service c5cf8c
    NOTE: To reduce unncessary cache and write buffer flushes, the
Packit Service c5cf8c
    barrier (above) could be replaced with an alltoall gather of the
Packit Service c5cf8c
    operation occuring between node pairs.  Using this information, we
Packit Service c5cf8c
    could eliminate flushes except when an operation actually affected
Packit Service c5cf8c
    the local window.
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
MPID_shm_Get
Packit Service c5cf8c
Packit Service c5cf8c
* Copy data directly from the target buffer (located in shared memory)
Packit Service c5cf8c
  to the origin buffer.
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
MPID_shm_Put
Packit Service c5cf8c
Packit Service c5cf8c
* Copy data directly from the the origin buffer to the target buffer
Packit Service c5cf8c
  (located in shared memory).
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------
Packit Service c5cf8c
Packit Service c5cf8c
MPID_shm_Accumulate
Packit Service c5cf8c
Packit Service c5cf8c
* Lock target local window
Packit Service c5cf8c
Packit Service c5cf8c
  The standard says that operations on elements (basic datatypes) need
Packit Service c5cf8c
  to be atomic, but the entire accumulate operation need not be atomic
Packit Service c5cf8c
  with repsect to other accumulate operations.  The simple solution is
Packit Service c5cf8c
  to lock the whole window when performing an operation; however this
Packit Service c5cf8c
  ensures that operations are serialized which will seriously hurt
Packit Service c5cf8c
  performance when multiple processes/threads are attempting to
Packit Service c5cf8c
  accumulate data into a single window (or even a single large buffer
Packit Service c5cf8c
  in that window).
Packit Service c5cf8c
  
Packit Service c5cf8c
  TODO: Develop an algorithm for performing the operations when the
Packit Service c5cf8c
  local window is broken into multiple regions, with a mutex per
Packit Service c5cf8c
  region.  Care must be taken to ensure that if an element spans two
Packit Service c5cf8c
  regions, then the mutexes for both regions must be locked before the
Packit Service c5cf8c
  operation is performed on that element.  Performing these lock
Packit Service c5cf8c
  operations is likely to be somewhat expensive, so we will want a
Packit Service c5cf8c
  tuneable parameter for specifying the minimum size of a region.
Packit Service c5cf8c
  
Packit Service c5cf8c
  Q: Do inter-process mutexes also ensure mutual exclusion for threads
Packit Service c5cf8c
  within the same process?  If not, then we need to a acquire both a
Packit Service c5cf8c
  thread and process locks.  We probably want to acquire the thread
Packit Service c5cf8c
  lock first to minimize the contention at the process lock.
Packit Service c5cf8c
Packit Service c5cf8c
* Perform requested accumulation
Packit Service c5cf8c
Packit Service c5cf8c
  We need an algorithm for performing accumulations when the
Packit Service c5cf8c
  datatype are non-contiguous.  Ideally, the two dataloops and the
Packit Service c5cf8c
  accumulation operations could be processed without requiring any
Packit Service c5cf8c
  extra copying, packing, or temporary buffers.
Packit Service c5cf8c
Packit Service c5cf8c
  NOTE: While it may be possible to write a function to perform the
Packit Service c5cf8c
  requested operations, it is likely that such functionality will need
Packit Service c5cf8c
  to be inlined so that appropriate locking of local window regions
Packit Service c5cf8c
  occurs as data is being processed.  Also, the dataloops will need to
Packit Service c5cf8c
  be optimized so that it is not necessary to acquire a region's mutex
Packit Service c5cf8c
  more than once per request.
Packit Service c5cf8c
Packit Service c5cf8c
 * Unlock target local window
Packit Service c5cf8c
Packit Service c5cf8c
------------------------------------------------------------------------