|
Packit Service |
c5cf8c |
* Definitions
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- MPI buffer - count, datatype, memory pointer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Communication subsystem capabilities (requirements)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- general MPI messaging (MPI_*send() and MPI_*recv())
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- sending messages
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- if send-side MPI buffer is sufficiently contiguous, send data
|
|
Packit Service |
c5cf8c |
directly from MPI buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- if RMA capabilities exist and MPI receive buffer is
|
|
Packit Service |
c5cf8c |
sufficiently contiguous, write message data directly into MPI
|
|
Packit Service |
c5cf8c |
receive buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- receiving messages
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- match incoming messages with posted receives
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- handle posting and matching of wildcard (source) receives
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- special handling for already posted receives
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- if MPI buffer is sufficiently contiguous, receive directly
|
|
Packit Service |
c5cf8c |
into MPI buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- if user buffer is non-contiguous, unpack data as portions of
|
|
Packit Service |
c5cf8c |
message data are received
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- persistent MPI messaging (MPI_*_init())
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- for some network interfaces, we should be able to perform
|
|
Packit Service |
c5cf8c |
one-time initialization to eliminate unnecessary data copies
|
|
Packit Service |
c5cf8c |
(manipulating the MPI buffer directly)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- collective operations
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- send portions of a MPI buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- receive portions of a MPI buffer
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- forward portions of an incoming message
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Use pipelining instead of store and forward to increase network
|
|
Packit Service |
c5cf8c |
utilization (and thus performance).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Potentially multicast the same portion to multiple remote
|
|
Packit Service |
c5cf8c |
processes. Nick's prototype shows this is a big win for TCP and
|
|
Packit Service |
c5cf8c |
vMPI. I suspect it is a big win in general.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- share buffers between methods to avoid copying data during forward
|
|
Packit Service |
c5cf8c |
operations
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- perform MPI computations (as defined by MPI_Reduce()) while
|
|
Packit Service |
c5cf8c |
receiving/unpacking data
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Computations may need to be performed at intermediate processes
|
|
Packit Service |
c5cf8c |
(processes not receiving any of the results) which implies that
|
|
Packit Service |
c5cf8c |
computations may need to be performed without the presence of a
|
|
Packit Service |
c5cf8c |
user provided MPI buffer (or datatype).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- handle multiple simultaneous collective operations on the same
|
|
Packit Service |
c5cf8c |
communicator (multi-threaded MPI)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
We should be able to use the tag field and a rolling counter to
|
|
Packit Service |
c5cf8c |
separate messages from independent collective operations. This
|
|
Packit Service |
c5cf8c |
would allow us to use the same matching mechanisms that we use
|
|
Packit Service |
c5cf8c |
for general MPI messaging.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- remote memory operations
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- aggregate operations (from the same exposure epoch?) into a
|
|
Packit Service |
c5cf8c |
single communication
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- perform MPI computations on remote memory
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- match communicated operations with exposure epochs (either
|
|
Packit Service |
c5cf8c |
explicit or implied)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Is context sufficient for this? Do we need a tag to separate
|
|
Packit Service |
c5cf8c |
independent access/exposure epochs?
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- unreliable communication and QoS
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Theoretically, a MPI communicator could be tagged to allow
|
|
Packit Service |
c5cf8c |
unrealiable delivery, QoS, etc. We haven't thought much what
|
|
Packit Service |
c5cf8c |
impact this has on our design, but we probably don't want to
|
|
Packit Service |
c5cf8c |
prevent these capabilites.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Communication subsystem components
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- virtual connnections
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
allows late binding to a communication device (method)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
provides a function tables for all connection/communication
|
|
Packit Service |
c5cf8c |
related interfaces
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- progress engine
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- matching incoming messages to posted requests
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- message level flow control
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- shared network buffer management
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- network communication
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- network flow control
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Message level flow control
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
For simple messaging operations, message envelope meta-data must be
|
|
Packit Service |
c5cf8c |
sent to the remote process immediately. Failure to do so may cause
|
|
Packit Service |
c5cf8c |
the remote process to block indefinitely awaiting a particular
|
|
Packit Service |
c5cf8c |
message. However, the method also needs to balance messaging
|
|
Packit Service |
c5cf8c |
performance (sending the entire message immediately) with the memory
|
|
Packit Service |
c5cf8c |
used by the remote process to buffer messages not already posted by
|
|
Packit Service |
c5cf8c |
the remote process.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Messages are typically converted (by the method?) to one of three
|
|
Packit Service |
c5cf8c |
types to obtain this balance: short, eager, and rendezvous.
|
|
Packit Service |
c5cf8c |
Conversion to a particular message type may depend on the memory
|
|
Packit Service |
c5cf8c |
availability of the remote process.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
NOTE: Some communication interfaces such as vendor MPI will do this
|
|
Packit Service |
c5cf8c |
automatically, which means we shouldn't force message level flow
|
|
Packit Service |
c5cf8c |
control upon the method.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
* Method
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- definition of a method
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
A method presents an interface which allows upper layers to convey
|
|
Packit Service |
c5cf8c |
actions it wishes to the method perform in the context of a
|
|
Packit Service |
c5cf8c |
virtual connection. These actions consist of sending and
|
|
Packit Service |
c5cf8c |
receiving messages, performing remote memory operations, and
|
|
Packit Service |
c5cf8c |
providing data and buffers to other methods.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- flow control at the message level
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- flow control at the network buffer (packet) level
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Some methods need to worry about network buffer availability at
|
|
Packit Service |
c5cf8c |
remote process.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- reliability
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Under a default environemnt, MPI messages are inherently reliable
|
|
Packit Service |
c5cf8c |
which means that some methods may need concern themselves with
|
|
Packit Service |
c5cf8c |
acknowledgments and retransmission if the underlying network does
|
|
Packit Service |
c5cf8c |
not guarantee reliability.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- matching incoming messages to requests
|