|
Packit Service |
c5cf8c |
Single-threaded implementation of RMA for distributed memory
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Base Assumptions
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* All of the local windows are located in process local (not shared or
|
|
Packit Service |
c5cf8c |
remotely accessible) memory.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Only basic datatypes are supported for the target.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Only active (fence) synchronization is supported.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* The application is single threaded.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* The MPI runtime system is single threaded.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
General Notes
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* "Lessons Learned from Implmenenting BSP" by J. Hill and
|
|
Packit Service |
c5cf8c |
D.B. Skillicorn suggests that we should not be performing RMA
|
|
Packit Service |
c5cf8c |
operations as they are requested, but rather queue the entire set of
|
|
Packit Service |
c5cf8c |
operations and perform the operations at the next synchronization
|
|
Packit Service |
c5cf8c |
operation.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Data Structures
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* MPID_Win
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* struct MPIR_Win
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* handles - an array of local window handles (one per process)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: Do we really need local window IDs? We need to be able to map
|
|
Packit Service |
c5cf8c |
remote handler calls back to a particular window, but we might be
|
|
Packit Service |
c5cf8c |
able to do this using an attribute on a communicator. Would an
|
|
Packit Service |
c5cf8c |
attribute lookup be too slow?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_Win_fence
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Since remote handler calls might be sent on another socket or
|
|
Packit Service |
c5cf8c |
processed in another thread, no natural synchronization occurs
|
|
Packit Service |
c5cf8c |
between RHCs and the collective operations. Therefore, we need to
|
|
Packit Service |
c5cf8c |
know how many RHCs we should expect so that we don't prematurely
|
|
Packit Service |
c5cf8c |
return from the fence. Likewise, we need to tell the other
|
|
Packit Service |
c5cf8c |
processes how many RHCs we have made.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* We need to block until such time that all incoming RHCs have been
|
|
Packit Service |
c5cf8c |
handled and all local requests and flags have completed.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: What is the right interface for this blocking operation? The
|
|
Packit Service |
c5cf8c |
operation should block, but it needs to guarantee that forward
|
|
Packit Service |
c5cf8c |
progress is being made on both the incoming RHCs and locally posted
|
|
Packit Service |
c5cf8c |
operations.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: We either need to pass dwin to a function or declare/cast the
|
|
Packit Service |
c5cf8c |
counters used in the while statement as volatile, otherwise the
|
|
Packit Service |
c5cf8c |
compiler may not generate instructions to reload the counter values
|
|
Packit Service |
c5cf8c |
before each iteration of the while loop.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: It would be useful if the MPID layer could increment a counter
|
|
Packit Service |
c5cf8c |
(or call a non-blocking function) when the asynchronous request or
|
|
Packit Service |
c5cf8c |
RHC completed. This seems like a much more ideal interface than
|
|
Packit Service |
c5cf8c |
requests and flags, at least for RMA. Might something of this
|
|
Packit Service |
c5cf8c |
nature be possible without putting undo burden on the device or
|
|
Packit Service |
c5cf8c |
significantly complicating the ADI?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Wait for all other processes in the window to complete
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: Should we perform a barrier here? If we eliminate the barrier,
|
|
Packit Service |
c5cf8c |
then all processes still waiting for operations to complete will
|
|
Packit Service |
c5cf8c |
have to enqueue incoming requests from the next epoch until the
|
|
Packit Service |
c5cf8c |
operations from the currrent epoch are complete. Not performing the
|
|
Packit Service |
c5cf8c |
barrier complicates the RMA operations, but the performance benefit
|
|
Packit Service |
c5cf8c |
may be significant for some cases. (What are they? How common are
|
|
Packit Service |
c5cf8c |
they?)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_Get
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the target and origin ranks are the same, then copy the data from
|
|
Packit Service |
c5cf8c |
the target buffer to the origin buffer.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Otherwise, we are attempting to get data from a remote node
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Post an asynchronous receive for the data
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: the tag must be unique for this epoch so as to ensure that
|
|
Packit Service |
c5cf8c |
the soon-to-be incoming message is matched with this receive.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: the request needs be allocated from the window's active
|
|
Packit Service |
c5cf8c |
requests object so that it can tracked.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Issue a remote handler call requesting the data from the remote
|
|
Packit Service |
c5cf8c |
process
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: the local completion flag needs be allocated from the
|
|
Packit Service |
c5cf8c |
window's active flags object so that it can tracked.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPIDI_Win_get_hdlr
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Post an asynchronous send of the requested data to origin process
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Increment the "number of RHCs processed" counter
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPID_Put
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the target and origin ranks are the same, then copy the data from
|
|
Packit Service |
c5cf8c |
the origin buffer to the target buffer.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Otherwise, if the source and target buffers are contiguous and data
|
|
Packit Service |
c5cf8c |
conversion is not required
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: What I would like to do here is use MPID_Put_contig, but that
|
|
Packit Service |
c5cf8c |
would require that I communicate with the remote process in order to
|
|
Packit Service |
c5cf8c |
agree on a flag. It would be much better if the target completion
|
|
Packit Service |
c5cf8c |
flag were a counter so that the counter could be prearranged and
|
|
Packit Service |
c5cf8c |
used for all Put operations.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Otherwise, if the data is sufficiently small
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the data is not contiguous, then pack the data into a temporary
|
|
Packit Service |
c5cf8c |
buffer.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: This assumes that MPID_Pack() does not add a header to the
|
|
Packit Service |
c5cf8c |
packed data.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Issue a remote handler call (MPIDI_Win_put_eager_hdlr) requesting
|
|
Packit Service |
c5cf8c |
the the data be written to the target's local window
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Otherwise, the data is large enough to send in a separate message(s)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Issue a remote handler call (MPIDI_Win_put_hdlr) letting the
|
|
Packit Service |
c5cf8c |
target now that data is being sent that needs to be written into
|
|
Packit Service |
c5cf8c |
the target's local window
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Post an asynchronous send of the origin buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: Instead of using MPI_Isend(), should we instead use segments and
|
|
Packit Service |
c5cf8c |
multiple RHCs to send the data? Would doing so imply that the RMA
|
|
Packit Service |
c5cf8c |
subsystem now needs to do flow control?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: Should we have yet another case, where a rendezvous occurs,
|
|
Packit Service |
c5cf8c |
guaranteeing that the target is able to post a receive before the send
|
|
Packit Service |
c5cf8c |
is issued? This would allow us to use MPID_Irsend(), potentially
|
|
Packit Service |
c5cf8c |
eliminating an extra message. Rather than having another case, should
|
|
Packit Service |
c5cf8c |
we use this technique anytime the data is larger than the eager
|
|
Packit Service |
c5cf8c |
message threshold?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPIDI_Win_put_eager_hdlr
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Unpack the data into local window buffer, performing data conversion
|
|
Packit Service |
c5cf8c |
if necessary
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: How are the header and data obtained? Depending on the interface
|
|
Packit Service |
c5cf8c |
and the datatype, we should be able to read the header directly into
|
|
Packit Service |
c5cf8c |
the window buffer.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Increment the "number of RHCs processed" counter
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPIDI_Win_put_hdlr
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Post an asynchronous receive of data into the window buffer
|
|
Packit Service |
c5cf8c |
defined in the RHC header
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: What should we do if a communication failure occurs? Is the
|
|
Packit Service |
c5cf8c |
origin somehow notified of the failure?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Increment the "number of RHCs processed" counter
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPI_Accumulate
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the target and origin ranks are the same, then copy the data from
|
|
Packit Service |
c5cf8c |
the target memory region to the origin memory region.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: For now, we are assuming the application and the message agent
|
|
Packit Service |
c5cf8c |
are single-threaded so we do not need to hold a mutex before
|
|
Packit Service |
c5cf8c |
performing the operation.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Otherwise, if the data is sufficiently small:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* If the data is not contiguous, then pack the data into a temporary
|
|
Packit Service |
c5cf8c |
buffer.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: This assumes that MPID_Pack() does not add a header to the
|
|
Packit Service |
c5cf8c |
packed data.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Issue a remote handler call (MPIDI_Win_acc_eager_hdlr) requesting
|
|
Packit Service |
c5cf8c |
the the enclosed data be accumulated into target buffer using the
|
|
Packit Service |
c5cf8c |
specified operation.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Otherwise, the data is large enough that it needs to be segmented.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Q: On the target side, we don't want to have to unpack the segment
|
|
Packit Service |
c5cf8c |
into a temporary buffer first. We would like to do the data
|
|
Packit Service |
c5cf8c |
conversion and accumulation directly from the segment that will be
|
|
Packit Service |
c5cf8c |
received. Does this make it impossible to use the MPIR_Segment API?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPIDI_Win_acc_eager_hdlr
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Perform the requested operation, converting the data on the fly if
|
|
Packit Service |
c5cf8c |
necessary
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Increment the "number of RHCs processed" counter
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
------------------------------------------------------------------------
|