Blame MpiApps/apps/groupstress/README.txt

Packit 857059
The tools in this directory are intended to stress a fabric in different ways 
Packit 857059
in order to provide information about how a fabric is running.
Packit 857059
Packit 857059
mpi_groupstress:
Packit 857059
Packit 857059
USAGE:
Packit 857059
  -v/--verbose            Verbose.
Packit 857059
  -g/--group    <arg>     Group size. Should be an even number between 2 and 128
Packit 857059
  -l/--min      <arg>     Minimum Message Size. Should be between 16384 and
Packit 857059
  					 	  (1<<22)
Packit 857059
  -u/--max      <arg>     Maximum Message Size. Should be between 16384 and
Packit 857059
 						  (1<<22)
Packit 857059
  -n/--num      <arg>     Number of times to repeat the test. Enter -1 to run
Packit 857059
  						  forever.
Packit 857059
  -h/--help               Provides this help text.
Packit 857059
Packit 857059
The first tool, mpi_groupstress breaks the nodes into groups and then runs the 
Packit 857059
osu bandwidth benchmark on pairs of nodes within each group.
Packit 857059
Packit 857059
This is useful for stressing the fabric in specific ways. For example, consider
Packit 857059
a fabric where all the nodes are connected to the core switch via leaf switches,
Packit 857059
with 18 nodes per leaf switch. If you list the nodes in the mpi_hosts file in
Packit 857059
topological order and run mpi_groupstress with a group size of 18, you can
Packit 857059
stress all the leaf-to-node connections without sending traffic over the core
Packit 857059
switch. If you want to test leaf-to-leaf connections, doubling the group size
Packit 857059
to 36 will ensure that every single test will pass through an inter-switch link.
Packit 857059
Packit 857059
Note that, as mentioned above, adding nodes to the hosts file is very important.
Packit 857059
mpi_groupstress has no knowledge of the fabric topology, so that knowledge
Packit 857059
must be embedded in the hosts file.
Packit 857059
Packit 857059
A third use case might be to stress a single link as hard as possible. For 
Packit 857059
example, if each node has 16 cores,  and you want to stress the path between
Packit 857059
two nodes, list each node 16 times in the hostfile, then run mpi_groupstress
Packit 857059
with a group size of 32.
Packit 857059
Packit 857059
Packit 857059
Packit 857059
mpi_latencystress:
Packit 857059
Packit 857059
USAGE:
Packit 857059
  -v/--verbose  			 Verbose. Outputs some debugging information. 
Packit 857059
  							 Use multiple times for more detailed information.
Packit 857059
  -s/--size     			 Message Size. Should be between 0 and (1<<22)
Packit 857059
  -n/--num      <arg>	     Number of times to repeat the test. Enter -1 to 
Packit 857059
  							 run forever.
Packit 857059
  -h/--help     			 Provides this help text.
Packit 857059
Packit 857059
mpi_latencystress iterates through every possible pair of nodes in the fabric,
Packit 857059
looking for slow links. Unlike similar tools, it will do as many pair-wise
Packit 857059
tests in parallel as it can, to reduce the total run time of the test.