Blame doc/notes/rma/pt-rma.txt

Packit 0848f5
Below is the current design for passive target RMA on top of CH3.
Packit 0848f5
Packit 0848f5
We assume that there is some asychronous agent (thread) that
Packit 0848f5
periodically pokes the progress engine, i.e., periodically calls
Packit 0848f5
MPID_Progress_test(). We need to do a general poke of the progress
Packit 0848f5
engine because we don't know whether there will be passive target RMA
Packit 0848f5
or not and there may be other communication going on. 
Packit 0848f5
Packit 0848f5
This thread is created only when MPI_Win_create is called and the user
Packit 0848f5
did not pass an info object with the key "no_locks" set to "true". (As
Packit 0848f5
an aside, I wish this was an assert instead of an info. An assert can
Packit 0848f5
be easily passed, whereas users are not likely to go through the
Packit 0848f5
trouble of creating an info object to say no_locks, even if they only
Packit 0848f5
plan to use fence.)
Packit 0848f5
Packit 0848f5
Assuming that such a thread exists, passive target RMA is implemented
Packit 0848f5
in the CH3 channel as follows:
Packit 0848f5
Packit 0848f5
On the source side, MPI_Win_lock and all the RMA operations after it
Packit 0848f5
are simply queued until MPI_Win_unlock is called (similar to what we
Packit 0848f5
do with fence and start/complete). At MPI_Win_unlock, a "lock" packet
Packit 0848f5
is sent over to the target, containing the lock type and the rank of
Packit 0848f5
the source. The source then waits for a "lock granted" reply from the
Packit 0848f5
target. (Singleton puts/gets are handled differently. See
Packit 0848f5
Optimization for Single Puts/Gets/Accs below.) 
Packit 0848f5
Packit 0848f5
When the target receives a lock packet, it examines the current lock
Packit 0848f5
information for that window (described later below) and either grants
Packit 0848f5
the lock by sending a "lock granted" packet to the source or just
Packit 0848f5
queues up the lock request. 
Packit 0848f5
Packit 0848f5
When the source receives a "lock granted" reply, it performs the RMA
Packit 0848f5
operations exactly as in the fence or start/complete case. The last
Packit 0848f5
RMA operation also releases the lock on the window at the target.
Packit 0848f5
No separate unlock packet needs to be sent.
Packit 0848f5
Packit 0848f5
Note that RMA requests that the target receives (put, get, acc) are
Packit 0848f5
always satisfied, because they won't be sent in the first place unless
Packit 0848f5
the source has been authorized to send them.
Packit 0848f5
Packit 0848f5
If none of the RMA operations is a get, the target must send an
Packit 0848f5
acknowledgement to the source when the last RMA operation has
Packit 0848f5
completed. If any one of the operations is a get, we reorder the
Packit 0848f5
operations and perform the get last. In this case, since the source
Packit 0848f5
must wait to receive the data, the acknowledgement is not needed
Packit 0848f5
assuming that data transfer is ordered. If data transfer is not
Packit 0848f5
ordered, an acknowledgement is needed even if the last operation is a
Packit 0848f5
get.
Packit 0848f5
Packit 0848f5
The MPI_Win object needs the following lock info:
Packit 0848f5
   int current_lock_type;  /* no_lock, shared, exclusive */
Packit 0848f5
   int shared_lock_ref_cnt;  /* count of active shared locks */
Packit 0848f5
   a queue of unsatisfied lock requests;
Packit 0848f5
Packit 0848f5
When a lock request arrives at the target, it looks at the incoming
Packit 0848f5
and existing lock types and takes the following action:
Packit 0848f5
Packit 0848f5
Incoming           Existing             Action
Packit 0848f5
--------           --------             ------
Packit 0848f5
Shared             Exclusive            Queue it
Packit 0848f5
Shared             NoLock/Shared        Grant it
Packit 0848f5
Exclusive          NoLock               Grant it
Packit 0848f5
Exclusive          Exclusive/Shared     Queue it
Packit 0848f5
Packit 0848f5
No change needs to be made to the existing code in the progress engine
Packit 0848f5
for handling put/get/accumulate, except that when the last RMA
Packit 0848f5
operation from a source is completed, grant the next queued lock
Packit 0848f5
request if there is one and change the lock_type if necessary. This
Packit 0848f5
can be done even if the sync model is fence or post/start because in
Packit 0848f5
that case there will be no lock request to grant. Therefore we don't
Packit 0848f5
need to know whether the current synch model is lock/unlock or not.
Packit 0848f5
Packit 0848f5
Packit 0848f5
Optimization for Single Puts, Gets, Accs
Packit 0848f5
----------------------------------------
Packit 0848f5
For the case where the lock/unlock is for a single short
Packit 0848f5
put/accumulate or get, we can send over the put data (or get info)
Packit 0848f5
along with the lock pkt. If the lock needs to be queued, it will be
Packit 0848f5
queued with this data or info. When the lock is granted, no "lock
Packit 0848f5
granted" reply needs to be sent. Instead the put data is simply copied
Packit 0848f5
or the get data is sent over. Except in the case of get operations,
Packit 0848f5
Win_unlock must block until it receives an acknowledgement from the
Packit 0848f5
target that the RMA operation has completed (for both shared and
Packit 0848f5
exclusive locks). 
Packit 0848f5