|
Packit |
857059 |
The tools in this directory are intended to stress a fabric in different ways
|
|
Packit |
857059 |
in order to provide information about how a fabric is running.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
mpi_groupstress:
|
|
Packit |
857059 |
|
|
Packit |
857059 |
USAGE:
|
|
Packit |
857059 |
-v/--verbose Verbose.
|
|
Packit |
857059 |
-g/--group <arg> Group size. Should be an even number between 2 and 128
|
|
Packit |
857059 |
-l/--min <arg> Minimum Message Size. Should be between 16384 and
|
|
Packit |
857059 |
(1<<22)
|
|
Packit |
857059 |
-u/--max <arg> Maximum Message Size. Should be between 16384 and
|
|
Packit |
857059 |
(1<<22)
|
|
Packit |
857059 |
-n/--num <arg> Number of times to repeat the test. Enter -1 to run
|
|
Packit |
857059 |
forever.
|
|
Packit |
857059 |
-h/--help Provides this help text.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
The first tool, mpi_groupstress breaks the nodes into groups and then runs the
|
|
Packit |
857059 |
osu bandwidth benchmark on pairs of nodes within each group.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
This is useful for stressing the fabric in specific ways. For example, consider
|
|
Packit |
857059 |
a fabric where all the nodes are connected to the core switch via leaf switches,
|
|
Packit |
857059 |
with 18 nodes per leaf switch. If you list the nodes in the mpi_hosts file in
|
|
Packit |
857059 |
topological order and run mpi_groupstress with a group size of 18, you can
|
|
Packit |
857059 |
stress all the leaf-to-node connections without sending traffic over the core
|
|
Packit |
857059 |
switch. If you want to test leaf-to-leaf connections, doubling the group size
|
|
Packit |
857059 |
to 36 will ensure that every single test will pass through an inter-switch link.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
Note that, as mentioned above, adding nodes to the hosts file is very important.
|
|
Packit |
857059 |
mpi_groupstress has no knowledge of the fabric topology, so that knowledge
|
|
Packit |
857059 |
must be embedded in the hosts file.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
A third use case might be to stress a single link as hard as possible. For
|
|
Packit |
857059 |
example, if each node has 16 cores, and you want to stress the path between
|
|
Packit |
857059 |
two nodes, list each node 16 times in the hostfile, then run mpi_groupstress
|
|
Packit |
857059 |
with a group size of 32.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
|
|
Packit |
857059 |
|
|
Packit |
857059 |
mpi_latencystress:
|
|
Packit |
857059 |
|
|
Packit |
857059 |
USAGE:
|
|
Packit |
857059 |
-v/--verbose Verbose. Outputs some debugging information.
|
|
Packit |
857059 |
Use multiple times for more detailed information.
|
|
Packit |
857059 |
-s/--size Message Size. Should be between 0 and (1<<22)
|
|
Packit |
857059 |
-n/--num <arg> Number of times to repeat the test. Enter -1 to
|
|
Packit |
857059 |
run forever.
|
|
Packit |
857059 |
-h/--help Provides this help text.
|
|
Packit |
857059 |
|
|
Packit |
857059 |
mpi_latencystress iterates through every possible pair of nodes in the fabric,
|
|
Packit |
857059 |
looking for slow links. Unlike similar tools, it will do as many pair-wise
|
|
Packit |
857059 |
tests in parallel as it can, to reduce the total run time of the test.
|