|
Packit Service |
c5cf8c |
MPICH Release 3.3.2
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPICH is a high-performance and widely portable implementation of the
|
|
Packit Service |
c5cf8c |
MPI-3.1 standard from the Argonne National Laboratory. This release
|
|
Packit Service |
c5cf8c |
has all MPI 3.1 functions and features required by the standard with
|
|
Packit Service |
c5cf8c |
the exception of support for the "external32" portable I/O format and
|
|
Packit Service |
c5cf8c |
user-defined data representations for I/O.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This README file should contain enough information to get you started
|
|
Packit Service |
c5cf8c |
with MPICH. More extensive installation and user guides can be found
|
|
Packit Service |
c5cf8c |
in the doc/installguide/install.pdf and doc/userguide/user.pdf files
|
|
Packit Service |
c5cf8c |
respectively. Additional information regarding the contents of the
|
|
Packit Service |
c5cf8c |
release can be found in the CHANGES file in the top-level directory,
|
|
Packit Service |
c5cf8c |
and in the RELEASE_NOTES file, where certain restrictions are
|
|
Packit Service |
c5cf8c |
detailed. Finally, the MPICH web site, http://www.mpich.org, contains
|
|
Packit Service |
c5cf8c |
information on bug fixes and new releases.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
1. Getting Started
|
|
Packit Service |
c5cf8c |
2. Reporting Installation or Usage Problems
|
|
Packit Service |
c5cf8c |
3. Compiler Flags
|
|
Packit Service |
c5cf8c |
4. Alternate Channels and Devices
|
|
Packit Service |
c5cf8c |
5. Alternate Process Managers
|
|
Packit Service |
c5cf8c |
6. Alternate Configure Options
|
|
Packit Service |
c5cf8c |
7. Testing the MPICH installation
|
|
Packit Service |
c5cf8c |
8. Fault Tolerance
|
|
Packit Service |
c5cf8c |
9. Developer Builds
|
|
Packit Service |
c5cf8c |
10. Multiple Fortran compiler support
|
|
Packit Service |
c5cf8c |
11. ABI Compatibility
|
|
Packit Service |
c5cf8c |
12. Capability Sets
|
|
Packit Service |
c5cf8c |
13. Threads
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
1. Getting Started
|
|
Packit Service |
c5cf8c |
==================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The following instructions take you through a sequence of steps to get
|
|
Packit Service |
c5cf8c |
the default configuration (ch3 device, nemesis channel (with TCP and
|
|
Packit Service |
c5cf8c |
shared memory), Hydra process management) of MPICH up and running.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(a) You will need the following prerequisites.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- REQUIRED: This tar file mpich-3.3.2.tar.gz
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- REQUIRED: A C compiler (C99 support is required. See
|
|
Packit Service |
c5cf8c |
https://wiki.mpich.org/mpich/index.php/Shifting_toward_C99)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- OPTIONAL: A C++ compiler, if C++ applications are to be used
|
|
Packit Service |
c5cf8c |
(g++, etc.). If you do not require support for C++ applications,
|
|
Packit Service |
c5cf8c |
you can disable this support using the configure option
|
|
Packit Service |
c5cf8c |
--disable-cxx (configuring MPICH is described in step 1(d)
|
|
Packit Service |
c5cf8c |
below).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- OPTIONAL: A Fortran compiler, if Fortran applications are to be
|
|
Packit Service |
c5cf8c |
used (gfortran, ifort, etc.). If you do not require support for
|
|
Packit Service |
c5cf8c |
Fortran applications, you can disable this support using
|
|
Packit Service |
c5cf8c |
--disable-fortran (configuring MPICH is described in step 1(d)
|
|
Packit Service |
c5cf8c |
below).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Also, you need to know what shell you are using since different shell
|
|
Packit Service |
c5cf8c |
has different command syntax. Command "echo $SHELL" prints out the
|
|
Packit Service |
c5cf8c |
current shell used by your terminal program.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(b) Unpack the tar file and go to the top level directory:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
tar xzf mpich-3.3.2.tar.gz
|
|
Packit Service |
c5cf8c |
cd mpich-3.3.2
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If your tar doesn't accept the z option, use
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
gunzip mpich-3.3.2.tar.gz
|
|
Packit Service |
c5cf8c |
tar xf mpich-3.3.2.tar
|
|
Packit Service |
c5cf8c |
cd mpich-3.3.2
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(c) Choose an installation directory, say
|
|
Packit Service |
c5cf8c |
/home/<USERNAME>/mpich-install, which is assumed to non-existent
|
|
Packit Service |
c5cf8c |
or empty. It will be most convenient if this directory is shared
|
|
Packit Service |
c5cf8c |
by all of the machines where you intend to run processes. If not,
|
|
Packit Service |
c5cf8c |
you will have to duplicate it on the other machines after
|
|
Packit Service |
c5cf8c |
installation.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(d) Configure MPICH specifying the installation directory:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for csh and tcsh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --prefix=/home/<USERNAME>/mpich-install |& tee c.txt
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for bash and sh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --prefix=/home/<USERNAME>/mpich-install 2>&1 | tee c.txt
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Bourne-like shells, sh and bash, accept "2>&1 |". Csh-like shell,
|
|
Packit Service |
c5cf8c |
csh and tcsh, accept "|&". If a failure occurs, the configure
|
|
Packit Service |
c5cf8c |
command will display the error. Most errors are straight-forward
|
|
Packit Service |
c5cf8c |
to follow. For example, if the configure command fails with:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
"No Fortran compiler found. If you don't need to build any
|
|
Packit Service |
c5cf8c |
Fortran programs, you can disable Fortran support using
|
|
Packit Service |
c5cf8c |
--disable-fortran. If you do want to build Fortran programs,
|
|
Packit Service |
c5cf8c |
you need to install a Fortran compiler such as gfortran or
|
|
Packit Service |
c5cf8c |
ifort before you can proceed."
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... it means that you don't have a Fortran compiler :-). You will
|
|
Packit Service |
c5cf8c |
need to either install one, or disable Fortran support in MPICH.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If you are unable to understand what went wrong, please go to step
|
|
Packit Service |
c5cf8c |
(2) below, for reporting the issue to the MPICH developers and
|
|
Packit Service |
c5cf8c |
other users.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(e) Build MPICH:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for csh and tcsh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make |& tee m.txt
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for bash and sh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make 2>&1 | tee m.txt
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This step should succeed if there were no problems with the
|
|
Packit Service |
c5cf8c |
preceding step. Check file m.txt. If there were problems, do a
|
|
Packit Service |
c5cf8c |
"make clean" and then run make again with V=1.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make V=1 |& tee m.txt (for csh and tcsh)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
OR
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make V=1 2>&1 | tee m.txt (for bash and sh)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Then go to step (2) below, for reporting the issue to the MPICH
|
|
Packit Service |
c5cf8c |
developers and other users.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(f) Install the MPICH commands:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for csh and tcsh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make install |& tee mi.txt
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for bash and sh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make install 2>&1 | tee mi.txt
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This step collects all required executables and scripts in the bin
|
|
Packit Service |
c5cf8c |
subdirectory of the directory specified by the prefix argument to
|
|
Packit Service |
c5cf8c |
configure.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(g) Add the bin subdirectory of the installation directory to your
|
|
Packit Service |
c5cf8c |
path in your startup script (.bashrc for bash, .cshrc for csh,
|
|
Packit Service |
c5cf8c |
etc.):
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for csh and tcsh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
setenv PATH /home/<USERNAME>/mpich-install/bin:$PATH
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
for bash and sh:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
PATH=/home/<USERNAME>/mpich-install/bin:$PATH ; export PATH
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Check that everything is in order at this point by doing:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
which mpicc
|
|
Packit Service |
c5cf8c |
which mpiexec
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
These commands should display the path to your bin subdirectory of
|
|
Packit Service |
c5cf8c |
your install directory.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
IMPORTANT NOTE: The install directory has to be visible at exactly
|
|
Packit Service |
c5cf8c |
the same path on all machines you want to run your applications
|
|
Packit Service |
c5cf8c |
on. This is typically achieved by installing MPICH on a shared
|
|
Packit Service |
c5cf8c |
NFS file-system. If you do not have a shared NFS directory, you
|
|
Packit Service |
c5cf8c |
will need to manually copy the install directory to all machines
|
|
Packit Service |
c5cf8c |
at exactly the same location.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(h) MPICH uses a process manager for starting MPI applications. The
|
|
Packit Service |
c5cf8c |
process manager provides the "mpiexec" executable, together with
|
|
Packit Service |
c5cf8c |
other utility executables. MPICH comes packaged with multiple
|
|
Packit Service |
c5cf8c |
process managers; the default is called Hydra.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Now we will run an MPI job, using the mpiexec command as specified
|
|
Packit Service |
c5cf8c |
in the MPI standard. There are some examples in the install
|
|
Packit Service |
c5cf8c |
directory, which you have already put in your path, as well as in
|
|
Packit Service |
c5cf8c |
the directory mpich-3.3.2/examples. One of them is the classic
|
|
Packit Service |
c5cf8c |
CPI example, which computes the value of pi by numerical
|
|
Packit Service |
c5cf8c |
integration in parallel.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
To run the CPI example with 'n' processes on your local machine,
|
|
Packit Service |
c5cf8c |
you can use:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpiexec -n <number> ./examples/cpi
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Test that you can run an 'n' process CPI job on multiple nodes:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpiexec -f machinefile -n <number> ./examples/cpi
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The 'machinefile' is of the form:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
host1
|
|
Packit Service |
c5cf8c |
host2:2
|
|
Packit Service |
c5cf8c |
host3:4 # Random comments
|
|
Packit Service |
c5cf8c |
host4:1
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
'host1', 'host2', 'host3' and 'host4' are the hostnames of the
|
|
Packit Service |
c5cf8c |
machines you want to run the job on. The ':2', ':4', ':1' segments
|
|
Packit Service |
c5cf8c |
depict the number of processes you want to run on each node. If
|
|
Packit Service |
c5cf8c |
nothing is specified, ':1' is assumed.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
More details on interacting with Hydra can be found at
|
|
Packit Service |
c5cf8c |
http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If you have completed all of the above steps, you have successfully
|
|
Packit Service |
c5cf8c |
installed MPICH and run an MPI example.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
2. Reporting Installation or Usage Problems
|
|
Packit Service |
c5cf8c |
===========================================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
[VERY IMPORTANT: PLEASE COMPRESS ALL FILES BEFORE SENDING THEM TO
|
|
Packit Service |
c5cf8c |
US. DO NOT SPAM THE MAILING LIST WITH LARGE ATTACHMENTS.]
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The distribution has been tested by us on a variety of machines in our
|
|
Packit Service |
c5cf8c |
environments as well as our partner institutes. If you have problems
|
|
Packit Service |
c5cf8c |
with the installation or usage of MPICH, please follow these steps:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
1. First see the Frequently Asked Questions (FAQ) page at
|
|
Packit Service |
c5cf8c |
http://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions to
|
|
Packit Service |
c5cf8c |
see if the problem you are facing has a simple solution. Many common
|
|
Packit Service |
c5cf8c |
problems and their solutions are listed here.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
2. If you cannot find an answer on the FAQ page, look through previous
|
|
Packit Service |
c5cf8c |
email threads on the discuss@mpich.org mailing list archive
|
|
Packit Service |
c5cf8c |
(https://lists.mpich.org/mailman/listinfo/discuss). It is likely
|
|
Packit Service |
c5cf8c |
someone else had a similar problem, which has already been resolved
|
|
Packit Service |
c5cf8c |
before.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
3. If neither of the above steps work, please send an email to
|
|
Packit Service |
c5cf8c |
discuss@mpich.org. You need to subscribe to this list
|
|
Packit Service |
c5cf8c |
(https://lists.mpich.org/mailman/listinfo/discuss) before sending an
|
|
Packit Service |
c5cf8c |
email.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Your email should contain the following files. ONCE AGAIN, PLEASE
|
|
Packit Service |
c5cf8c |
COMPRESS BEFORE SENDING, AS THE FILES CAN BE LARGE. Note that,
|
|
Packit Service |
c5cf8c |
depending on which step the build failed, some of the files might not
|
|
Packit Service |
c5cf8c |
exist.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/c.txt (generated in step 1(d) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/m.txt (generated in step 1(e) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/mi.txt (generated in step 1(f) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/config.log (generated in step 1(d) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/src/openpa/config.log (generated in step 1(d) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/src/mpl/config.log (generated in step 1(d) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/src/pm/hydra/config.log (generated in step 1(d) above)
|
|
Packit Service |
c5cf8c |
mpich-3.3.2/src/pm/hydra/tools/topo/hwloc/hwloc/config.log (generated in step 1(d) above)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
DID WE MENTION? DO NOT FORGET TO COMPRESS THESE FILES!
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If you have compiled MPICH and are having trouble running an
|
|
Packit Service |
c5cf8c |
application, please provide the output of the following command in
|
|
Packit Service |
c5cf8c |
your email.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpiexec -info
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Finally, please include the actual error you are seeing when running
|
|
Packit Service |
c5cf8c |
the application, including the mpiexec command used, and the host
|
|
Packit Service |
c5cf8c |
file. If possible, please try to reproduce the error with a smaller
|
|
Packit Service |
c5cf8c |
application or benchmark and send that along in your bug report.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
4. If you have found a bug in MPICH, you can report it on our Github
|
|
Packit Service |
c5cf8c |
page (https://github.com/pmodels/mpich/issues).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
3. Compiler Flags
|
|
Packit Service |
c5cf8c |
=================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPICH allows several sets of compiler flags to be used. The first
|
|
Packit Service |
c5cf8c |
three sets are configure-time options for MPICH, while the fourth is
|
|
Packit Service |
c5cf8c |
only relevant when compiling applications with mpicc and friends.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(a) CFLAGS, CPPFLAGS, CXXFLAGS, FFLAGS, FCFLAGS, LDFLAGS and LIBS
|
|
Packit Service |
c5cf8c |
(abbreviated as xFLAGS): Setting these flags would result in the
|
|
Packit Service |
c5cf8c |
MPICH library being compiled/linked with these flags and the flags
|
|
Packit Service |
c5cf8c |
internally being used in mpicc and friends.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(b) MPICHLIB_CFLAGS, MPICHLIB_CPPFLAGS, MPICHLIB_CXXFLAGS,
|
|
Packit Service |
c5cf8c |
MPICHLIB_FFLAGS, and MPICHLIB_FCFLAGS (abbreviated as
|
|
Packit Service |
c5cf8c |
MPICHLIB_xFLAGS): Setting these flags would result in the MPICH
|
|
Packit Service |
c5cf8c |
library being compiled with these flags. However, these flags will
|
|
Packit Service |
c5cf8c |
*not* be used by mpicc and friends.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(c) MPICH_MPICC_CFLAGS, MPICH_MPICC_CPPFLAGS, MPICH_MPICC_LDFLAGS,
|
|
Packit Service |
c5cf8c |
MPICH_MPICC_LIBS, and so on for MPICXX, MPIF77 and MPIFORT
|
|
Packit Service |
c5cf8c |
(abbreviated as MPICH_MPIX_FLAGS): These flags do *not* affect the
|
|
Packit Service |
c5cf8c |
compilation of the MPICH library itself, but will be internally used
|
|
Packit Service |
c5cf8c |
by mpicc and friends.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
+--------------------------------------------------------------------+
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
| | MPICH library | mpicc and friends |
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
+--------------------+----------------------+------------------------+
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
| xFLAGS | Yes | Yes |
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
+--------------------+----------------------+------------------------+
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
| MPICHLIB_xFLAGS | Yes | No |
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
+--------------------+----------------------+------------------------+
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
| MPICH_MPIX_FLAGS | No | Yes |
|
|
Packit Service |
c5cf8c |
| | | |
|
|
Packit Service |
c5cf8c |
+--------------------+----------------------+------------------------+
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
All these flags can be set as part of configure command or through
|
|
Packit Service |
c5cf8c |
environment variables.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Default flags
|
|
Packit Service |
c5cf8c |
--------------
|
|
Packit Service |
c5cf8c |
By default, MPICH automatically adds certain compiler optimizations
|
|
Packit Service |
c5cf8c |
to MPICHLIB_CFLAGS. The currently used optimization level is -O2.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
** IMPORTANT NOTE: Remember that this only affects the compilation of
|
|
Packit Service |
c5cf8c |
the MPICH library and is not used in the wrappers (mpicc and friends)
|
|
Packit Service |
c5cf8c |
that are used to compile your applications or other libraries.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This optimization level can be changed with the --enable-fast option
|
|
Packit Service |
c5cf8c |
passed to configure. For example, to build an MPICH environment with
|
|
Packit Service |
c5cf8c |
-O3 for all language bindings, one can simply do:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --enable-fast=O3
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Or to disable all compiler optimizations, one can do:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --disable-fast
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
For more details of --enable-fast, see the output of "configure
|
|
Packit Service |
c5cf8c |
--help".
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
For performance testing, we recommend the following flags:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --enable-fast=O3,ndebug --disable-error-checking --without-timing \
|
|
Packit Service |
c5cf8c |
--without-mpit-pvars
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Examples
|
|
Packit Service |
c5cf8c |
--------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Example 1:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --disable-fast MPICHLIB_CFLAGS=-O3 MPICHLIB_FFLAGS=-O3 \
|
|
Packit Service |
c5cf8c |
MPICHLIB_CXXFLAGS=-O3 MPICHLIB_FCFLAGS=-O3
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This will cause the MPICH libraries to be built with -O3, and -O3
|
|
Packit Service |
c5cf8c |
will *not* be included in the mpicc and other MPI wrapper script.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Example 2:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --disable-fast CFLAGS=-O3 FFLAGS=-O3 CXXFLAGS=-O3 FCFLAGS=-O3
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This will cause the MPICH libraries to be built with -O3, and -O3
|
|
Packit Service |
c5cf8c |
will be included in the mpicc and other MPI wrapper script.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
4. Alternate Channels and Devices
|
|
Packit Service |
c5cf8c |
=================================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The communication mechanisms in MPICH are called "devices". MPICH
|
|
Packit Service |
c5cf8c |
supports ch3 (default) and ch4 (experimental), as well as many
|
|
Packit Service |
c5cf8c |
third-party devices that are released and maintained by other
|
|
Packit Service |
c5cf8c |
institutes.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
*************************************
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
ch3 device
|
|
Packit Service |
c5cf8c |
**********
|
|
Packit Service |
c5cf8c |
The ch3 device contains different internal communication options
|
|
Packit Service |
c5cf8c |
called "channels". We currently support nemesis (default) and sock
|
|
Packit Service |
c5cf8c |
channels.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
nemesis channel
|
|
Packit Service |
c5cf8c |
---------------
|
|
Packit Service |
c5cf8c |
Nemesis provides communication using different networks (tcp, mx) as
|
|
Packit Service |
c5cf8c |
well as various shared-memory optimizations. To configure MPICH with
|
|
Packit Service |
c5cf8c |
nemesis, you can use the following configure option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch3:nemesis
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Shared-memory optimizations are enabled by default to improve
|
|
Packit Service |
c5cf8c |
performance for multi-processor/multi-core platforms. They can be
|
|
Packit Service |
c5cf8c |
disabled (at the cost of performance) either by setting the
|
|
Packit Service |
c5cf8c |
environment variable MPICH_NO_LOCAL to 1, or using the following
|
|
Packit Service |
c5cf8c |
configure option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--enable-nemesis-dbg-nolocal
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The --with-shared-memory= configure option allows you to choose how
|
|
Packit Service |
c5cf8c |
Nemesis allocates shared memory. The options are "auto", "sysv", and
|
|
Packit Service |
c5cf8c |
"mmap". Using "sysv" will allocate shared memory using the System V
|
|
Packit Service |
c5cf8c |
shmget(), shmat(), etc. functions. Using "mmap" will allocate shared
|
|
Packit Service |
c5cf8c |
memory by creating a file (in /dev/shm if it exists, otherwise /tmp),
|
|
Packit Service |
c5cf8c |
then mmap() the file. The default is "auto". Note that System V
|
|
Packit Service |
c5cf8c |
shared memory has limits on the size of shared memory segments so
|
|
Packit Service |
c5cf8c |
using this for Nemesis may limit the number of processes that can be
|
|
Packit Service |
c5cf8c |
started on a single node.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mxm network module
|
|
Packit Service |
c5cf8c |
``````````````````
|
|
Packit Service |
c5cf8c |
The mxm netmod provides support for Mellanox InfiniBand adapters. It
|
|
Packit Service |
c5cf8c |
can be built with the following configure option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch3:nemesis:mxm
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If your MXM library is installed in a non-standard location, you might
|
|
Packit Service |
c5cf8c |
need to help configure find it using the following configure option
|
|
Packit Service |
c5cf8c |
(assuming the libraries are present in /path/to/mxm/lib and the
|
|
Packit Service |
c5cf8c |
include headers are present in /path/to/mxm/include):
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-mxm=/path/to/mxm
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(or)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-mxm-lib=/path/to/mxm/lib
|
|
Packit Service |
c5cf8c |
--with-mxm-include=/path/to/mxm/include
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
By default, the mxm library throws warnings when the system does not
|
|
Packit Service |
c5cf8c |
enable certain features that might hurt performance. These are
|
|
Packit Service |
c5cf8c |
important warnings that might cause performance degradation on your
|
|
Packit Service |
c5cf8c |
system. But you might need root privileges to fix some of them. If
|
|
Packit Service |
c5cf8c |
you would like to disable such warnings, you can set the MXM log level
|
|
Packit Service |
c5cf8c |
to "error" instead of the default "warn" by using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MXM_LOG_LEVEL=error
|
|
Packit Service |
c5cf8c |
export MXM_LOG_LEVEL
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
portals4 network module
|
|
Packit Service |
c5cf8c |
```````````````````````
|
|
Packit Service |
c5cf8c |
The portals4 netmod provides support for the Portals 4 network
|
|
Packit Service |
c5cf8c |
programming interface. To enable, configure with the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch3:nemesis:portals4
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the Portals 4 include files and libraries are not in the normal
|
|
Packit Service |
c5cf8c |
search paths, you can specify them with the following options:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-portals4-include= and --with-portals4-lib=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... or the if lib/ and include/ are in the same directory, you can use
|
|
Packit Service |
c5cf8c |
the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-portals4=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the Portals libraries are shared libraries, they need to be in the
|
|
Packit Service |
c5cf8c |
shared library search path. This can be done by adding the path to
|
|
Packit Service |
c5cf8c |
/etc/ld.so.conf, or by setting the LD_LIBRARY_PATH variable in your
|
|
Packit Service |
c5cf8c |
environment. It's also possible to set the shared library search path
|
|
Packit Service |
c5cf8c |
in the binary. If you're using gcc, you can do this by adding
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
LD_LIBRARY_PATH=/path/to/lib
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(and)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
LDFLAGS="-Wl,-rpath -Wl,/path/to/lib"
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... as arguments to configure.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Currently, use of MPI_ANY_SOURCE and MPI dynamic processes are unsupported
|
|
Packit Service |
c5cf8c |
with the portals4 netmod.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
ofi network module
|
|
Packit Service |
c5cf8c |
```````````````````
|
|
Packit Service |
c5cf8c |
The ofi netmod provides support for the OFI network programming interface.
|
|
Packit Service |
c5cf8c |
To enable, configure with the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch3:nemesis:ofi
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the OFI include files and libraries are not in the normal search paths,
|
|
Packit Service |
c5cf8c |
you can specify them with the following options:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-ofi-include= and --with-ofi-lib=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... or the if lib/ and include/ are in the same directory, you can use
|
|
Packit Service |
c5cf8c |
the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-ofi=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the OFI libraries are shared libraries, they need to be in the
|
|
Packit Service |
c5cf8c |
shared library search path. This can be done by adding the path to
|
|
Packit Service |
c5cf8c |
/etc/ld.so.conf, or by setting the LD_LIBRARY_PATH variable in your
|
|
Packit Service |
c5cf8c |
environment. It's also possible to set the shared library search path
|
|
Packit Service |
c5cf8c |
in the binary. If you're using gcc, you can do this by adding
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
LD_LIBRARY_PATH=/path/to/lib
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
(and)
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
LDFLAGS="-Wl,-rpath -Wl,/path/to/lib"
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... as arguments to configure.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
sock channel
|
|
Packit Service |
c5cf8c |
------------
|
|
Packit Service |
c5cf8c |
sock is the traditional TCP sockets based communication channel. It
|
|
Packit Service |
c5cf8c |
uses TCP/IP sockets for all communication including intra-node
|
|
Packit Service |
c5cf8c |
communication. So, though the performance of this channel is worse
|
|
Packit Service |
c5cf8c |
than that of nemesis, it should work on almost every platform. This
|
|
Packit Service |
c5cf8c |
channel can be configured using the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch3:sock
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
ch4 device
|
|
Packit Service |
c5cf8c |
**********
|
|
Packit Service |
c5cf8c |
The ch4 device contains different network and shared memory modules
|
|
Packit Service |
c5cf8c |
for communication. We currently support the ofi and ucx network
|
|
Packit Service |
c5cf8c |
modules, and posix shared memory module.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
ofi network module
|
|
Packit Service |
c5cf8c |
```````````````````
|
|
Packit Service |
c5cf8c |
The ofi netmod provides support for the OFI network programming interface.
|
|
Packit Service |
c5cf8c |
To enable, configure with the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch4:ofi
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the OFI include files and libraries are not in the normal search paths,
|
|
Packit Service |
c5cf8c |
you can specify them with the following options:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-libfabric-include= and --with-libfabric-lib=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... or the if lib/ and include/ are in the same directory, you can use
|
|
Packit Service |
c5cf8c |
the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-libfabric=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
ucx network module
|
|
Packit Service |
c5cf8c |
``````````````````
|
|
Packit Service |
c5cf8c |
The ucx netmod provides support for the Unified Communication X
|
|
Packit Service |
c5cf8c |
library. It can be built with the following configure option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch4:ucx
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the UCX include files and libraries are not in the normal search paths,
|
|
Packit Service |
c5cf8c |
you can specify them with the following options:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-ucx-include= and --with-ucx-lib=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
... or the if lib/ and include/ are in the same directory, you can use
|
|
Packit Service |
c5cf8c |
the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-ucx=
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
By default, the UCX library throws warnings when the system does not
|
|
Packit Service |
c5cf8c |
enable certain features that might hurt performance. These are
|
|
Packit Service |
c5cf8c |
important warnings that might cause performance degradation on your
|
|
Packit Service |
c5cf8c |
system. But you might need root privileges to fix some of them. If
|
|
Packit Service |
c5cf8c |
you would like to disable such warnings, you can set the UCX log level
|
|
Packit Service |
c5cf8c |
to "error" instead of the default "warn" by using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
UCX_LOG_LEVEL=error
|
|
Packit Service |
c5cf8c |
export UCX_LOG_LEVEL
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
5. Alternate Process Managers
|
|
Packit Service |
c5cf8c |
=============================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
hydra
|
|
Packit Service |
c5cf8c |
-----
|
|
Packit Service |
c5cf8c |
Hydra is the default process management framework that uses existing
|
|
Packit Service |
c5cf8c |
daemons on nodes (e.g., ssh, pbs, slurm, sge) to start MPI
|
|
Packit Service |
c5cf8c |
processes. More information on Hydra can be found at
|
|
Packit Service |
c5cf8c |
http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
gforker
|
|
Packit Service |
c5cf8c |
-------
|
|
Packit Service |
c5cf8c |
gforker is a process manager that creates processes on a single
|
|
Packit Service |
c5cf8c |
machine, by having mpiexec directly fork and exec them. gforker is
|
|
Packit Service |
c5cf8c |
mostly meant as a research platform and for debugging purposes, as it
|
|
Packit Service |
c5cf8c |
is only meant for single-node systems.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
slurm
|
|
Packit Service |
c5cf8c |
-----
|
|
Packit Service |
c5cf8c |
SLURM is an external process manager not distributed with
|
|
Packit Service |
c5cf8c |
MPICH. MPICH's default process manager, hydra, has native support
|
|
Packit Service |
c5cf8c |
for slurm and you can directly use it in slurm environments (it will
|
|
Packit Service |
c5cf8c |
automatically detect slurm and use slurm capabilities). However, if
|
|
Packit Service |
c5cf8c |
you want to use the slurm provided "srun" process manager, you can use
|
|
Packit Service |
c5cf8c |
the "--with-pmi=slurm --with-pm=no" option with configure. Note that
|
|
Packit Service |
c5cf8c |
the "srun" process manager that comes with slurm uses an older PMI
|
|
Packit Service |
c5cf8c |
standard which does not have some of the performance enhancements that
|
|
Packit Service |
c5cf8c |
hydra provides in slurm environments.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
6. Alternate Configure Options
|
|
Packit Service |
c5cf8c |
==============================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPICH has a number of other features. If you are exploring MPICH as
|
|
Packit Service |
c5cf8c |
part of a development project, you might want to tweak the MPICH
|
|
Packit Service |
c5cf8c |
build with the following configure options. A complete list of
|
|
Packit Service |
c5cf8c |
configuration options can be found using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
./configure --help
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
7. Testing the MPICH installation
|
|
Packit Service |
c5cf8c |
==================================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
To test MPICH, we package the MPICH test suite in the MPICH
|
|
Packit Service |
c5cf8c |
distribution. You can run the test suite using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
make testing
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The results summary will be placed in test/summary.xml
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
8. Fault Tolerance
|
|
Packit Service |
c5cf8c |
==================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPICH has some tolerance to process failures, and supports
|
|
Packit Service |
c5cf8c |
checkpointing and restart.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Tolerance to Process Failures
|
|
Packit Service |
c5cf8c |
-----------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The features described in this section should be considered
|
|
Packit Service |
c5cf8c |
experimental. Which means that they have not been fully tested, and
|
|
Packit Service |
c5cf8c |
the behavior may change in future releases. The below notes are some
|
|
Packit Service |
c5cf8c |
guidelines on what can be expected in this feature:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- ERROR RETURNS: Communication failures in MPICH are not fatal
|
|
Packit Service |
c5cf8c |
errors. This means that if the user sets the error handler to
|
|
Packit Service |
c5cf8c |
MPI_ERRORS_RETURN, MPICH will return an appropriate error code in
|
|
Packit Service |
c5cf8c |
the event of a communication failure. When a process detects a
|
|
Packit Service |
c5cf8c |
failure when communicating with another process, it will consider
|
|
Packit Service |
c5cf8c |
the other process as having failed and will no longer attempt to
|
|
Packit Service |
c5cf8c |
communicate with that process. The user can, however, continue
|
|
Packit Service |
c5cf8c |
making communication calls to other processes. Any outstanding
|
|
Packit Service |
c5cf8c |
send or receive operations to a failed process, or wildcard
|
|
Packit Service |
c5cf8c |
receives (i.e., with MPI_ANY_SOURCE) posted to communicators with a
|
|
Packit Service |
c5cf8c |
failed process, will be immediately completed with an appropriate
|
|
Packit Service |
c5cf8c |
error code.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- COLLECTIVES: For collective operations performed on communicators
|
|
Packit Service |
c5cf8c |
with a failed process, the collective would return an error on
|
|
Packit Service |
c5cf8c |
some, but not necessarily all processes. A collective call
|
|
Packit Service |
c5cf8c |
returning MPI_SUCCESS on a given process means that the part of the
|
|
Packit Service |
c5cf8c |
collective performed by that process has been successful.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- PROCESS MANAGER: If used with the hydra process manager, hydra will
|
|
Packit Service |
c5cf8c |
detect failed processes and notify the MPICH library. Users can
|
|
Packit Service |
c5cf8c |
query the list of failed processes using MPIX_Comm_group_failed().
|
|
Packit Service |
c5cf8c |
This functions returns a group consisting of the failed processes
|
|
Packit Service |
c5cf8c |
in the communicator. The function MPIX_Comm_remote_group_failed()
|
|
Packit Service |
c5cf8c |
is provided for querying failed processes in the remote processes
|
|
Packit Service |
c5cf8c |
of an intercommunicator.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Note that hydra by default will abort the entire application when
|
|
Packit Service |
c5cf8c |
any process terminates before calling MPI_Finalize. In order to
|
|
Packit Service |
c5cf8c |
allow an application to continue running despite failed processes,
|
|
Packit Service |
c5cf8c |
you will need to pass the -disable-auto-cleanup option to mpiexec.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
- FAILURE NOTIFICATION: THIS IS AN UNSUPPORTED FEATURE AND WILL
|
|
Packit Service |
c5cf8c |
ALMOST CERTAINLY CHANGE IN THE FUTURE!
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
In the current release, hydra notifies the MPICH library of failed
|
|
Packit Service |
c5cf8c |
processes by sending a SIGUSR1 signal. The application can catch
|
|
Packit Service |
c5cf8c |
this signal to be notified of failed processes. If the application
|
|
Packit Service |
c5cf8c |
replaces the library's signal handler with its own, the application
|
|
Packit Service |
c5cf8c |
must be sure to call the library's handler from it's own
|
|
Packit Service |
c5cf8c |
handler. Note that you cannot call any MPI function from inside a
|
|
Packit Service |
c5cf8c |
signal handler.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Checkpoint and Restart
|
|
Packit Service |
c5cf8c |
----------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPICH supports checkpointing and restart fault-tolerance using BLCR.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
CONFIGURATION
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
First, you need to have BLCR version 0.8.2 or later installed on your
|
|
Packit Service |
c5cf8c |
machine. If it's installed in the default system location, you don't
|
|
Packit Service |
c5cf8c |
need to do anything.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If BLCR is not installed in the default system location, you'll need
|
|
Packit Service |
c5cf8c |
to tell MPICH's configure where to find it. You might also need to
|
|
Packit Service |
c5cf8c |
set the LD_LIBRARY_PATH environment variable so that BLCR's shared
|
|
Packit Service |
c5cf8c |
libraries can be found. In this case add the following options to
|
|
Packit Service |
c5cf8c |
your configure command:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-blcr=<BLCR_INSTALL_DIR>
|
|
Packit Service |
c5cf8c |
LD_LIBRARY_PATH=<BLCR_INSTALL_DIR>/lib
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
where <BLCR_INSTALL_DIR> is the directory where BLCR has been
|
|
Packit Service |
c5cf8c |
installed (whatever was specified in --prefix when BLCR was
|
|
Packit Service |
c5cf8c |
configured).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
After it's configured compile as usual (e.g., make; make install).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Note, checkpointing is only supported with the Hydra process manager.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
VERIFYING CHECKPOINTING SUPPORT
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Make sure MPICH is correctly configured with BLCR. You can do this
|
|
Packit Service |
c5cf8c |
using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpiexec -info
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This should display 'BLCR' under 'Checkpointing libraries available'.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
CHECKPOINTING THE APPLICATION
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
There are two ways to cause the application to checkpoint. You can ask
|
|
Packit Service |
c5cf8c |
mpiexec to periodically checkpoint the application using the mpiexec
|
|
Packit Service |
c5cf8c |
option -ckpoint-interval (seconds):
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint \
|
|
Packit Service |
c5cf8c |
-ckpoint-interval 3600 -f hosts -n 4 ./app
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Alternatively, you can also manually force checkpointing by sending a
|
|
Packit Service |
c5cf8c |
SIGUSR1 signal to mpiexec.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The checkpoint/restart parameters can also be controlled with the
|
|
Packit Service |
c5cf8c |
environment variables HYDRA_CKPOINTLIB, HYDRA_CKPOINT_PREFIX and
|
|
Packit Service |
c5cf8c |
HYDRA_CKPOINT_INTERVAL.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
To restart a process:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -f hosts -n 4 -ckpoint-num <N>
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
where <N> is the checkpoint number you want to restart from.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
These instructions can also be found on the MPICH wiki:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
http://wiki.mpich.org/mpich/index.php/Checkpointing
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
9. Developer Builds
|
|
Packit Service |
c5cf8c |
===================
|
|
Packit Service |
c5cf8c |
For MPICH developers who want to directly work on the primary version
|
|
Packit Service |
c5cf8c |
control system, there are a few additional steps involved (people
|
|
Packit Service |
c5cf8c |
using the release tarballs do not have to follow these steps). Details
|
|
Packit Service |
c5cf8c |
about these steps can be found here:
|
|
Packit Service |
c5cf8c |
http://wiki.mpich.org/mpich/index.php/Getting_And_Building_MPICH
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
10. Multiple Fortran compiler support
|
|
Packit Service |
c5cf8c |
=====================================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If the C compiler that is used to build MPICH libraries supports both
|
|
Packit Service |
c5cf8c |
multiple weak symbols and multiple aliases of common symbols, the
|
|
Packit Service |
c5cf8c |
Fortran binding can support multiple Fortran compilers. The
|
|
Packit Service |
c5cf8c |
multiple weak symbols support allow MPICH to provide different name
|
|
Packit Service |
c5cf8c |
mangling scheme (of subroutine names) required by differen Fortran
|
|
Packit Service |
c5cf8c |
compilers. The multiple aliases of common symbols support enables
|
|
Packit Service |
c5cf8c |
MPICH to equal different common block symbols of the MPI Fortran
|
|
Packit Service |
c5cf8c |
constant, e.g. MPI_IN_PLACE, MPI_STATUS_IGNORE. So they are understood
|
|
Packit Service |
c5cf8c |
by different Fortran compilers.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Since the support of multiple aliases of common symbols is
|
|
Packit Service |
c5cf8c |
new/experimental, users can disable the feature by using configure
|
|
Packit Service |
c5cf8c |
option --disable-multi-aliases if it causes any undesirable effect,
|
|
Packit Service |
c5cf8c |
e.g. linker warnings of different sizes of common symbols, MPIFCMB*
|
|
Packit Service |
c5cf8c |
(the warning should be harmless).
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
We have only tested this support on a limited set of
|
|
Packit Service |
c5cf8c |
platforms/compilers. On linux, if the C compiler that builds MPICH is
|
|
Packit Service |
c5cf8c |
either gcc or icc, the above support will be enabled by configure. At
|
|
Packit Service |
c5cf8c |
the time of this writing, pgcc does not seem to have this multiple
|
|
Packit Service |
c5cf8c |
aliases of common symbols, so configure will detect the deficiency and
|
|
Packit Service |
c5cf8c |
disable the feature automatically. The tested Fortran compilers
|
|
Packit Service |
c5cf8c |
include GNU Fortran compilers (gfortan), Intel Fortran compiler
|
|
Packit Service |
c5cf8c |
(ifort), Portland Group Fortran compilers (pgfortran), Absoft Fortran
|
|
Packit Service |
c5cf8c |
compilers (af90), and IBM XL fortran compiler (xlf). What this means
|
|
Packit Service |
c5cf8c |
is that if mpich is built by gcc/gfortran, the resulting mpich library
|
|
Packit Service |
c5cf8c |
can be used to link a Fortran program compiled/linked by another
|
|
Packit Service |
c5cf8c |
fortran compiler, say pgf90, say through mpifort -fc=pgf90. As long
|
|
Packit Service |
c5cf8c |
as the Fortran program is linked without any errors by one of these
|
|
Packit Service |
c5cf8c |
compilers, the program shall be running fine.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
11. ABI Compatibility
|
|
Packit Service |
c5cf8c |
=====================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The MPICH ABI compatibility initiative was announced at SC 2014
|
|
Packit Service |
c5cf8c |
(http://www.mpich.org/abi). As a part of this initiative, Argonne,
|
|
Packit Service |
c5cf8c |
Intel, IBM and Cray have committed to maintaining ABI compatibility
|
|
Packit Service |
c5cf8c |
with each other.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
As a first step in this initiative, starting with version 3.1, MPICH
|
|
Packit Service |
c5cf8c |
is binary (ABI) compatible with Intel MPI 5.0. This means you can
|
|
Packit Service |
c5cf8c |
build your program with one MPI implementation and run with the other.
|
|
Packit Service |
c5cf8c |
Specifically, binary-only applications that were built and distributed
|
|
Packit Service |
c5cf8c |
with one of these MPI implementations can now be executed with the
|
|
Packit Service |
c5cf8c |
other MPI implementation.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
Some setup is required to achieve this. Suppose you have MPICH
|
|
Packit Service |
c5cf8c |
installed in /path/to/mpich and Intel MPI installed in /path/to/impi.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
You can run your application with mpich using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
% export LD_LIBRARY_PATH=/path/to/mpich/lib:$LD_LIBRARY_PATH
|
|
Packit Service |
c5cf8c |
% mpiexec -np 100 ./foo
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
or using Intel MPI using:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
% export LD_LIBRARY_PATH=/path/to/impi/lib:$LD_LIBRARY_PATH
|
|
Packit Service |
c5cf8c |
% mpiexec -np 100 ./foo
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This works irrespective of which MPI implementation your application
|
|
Packit Service |
c5cf8c |
was compiled with, as long as you use one of the MPI implementations
|
|
Packit Service |
c5cf8c |
in the ABI compatibility initiative.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
12. Capability Sets
|
|
Packit Service |
c5cf8c |
=====================
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
The CH4 device contains a feature called "capability sets" to simplify
|
|
Packit Service |
c5cf8c |
configuration of MPICH on systems using the OFI netmod. This feature
|
|
Packit Service |
c5cf8c |
configures MPICH to use a predetermined set of OFI features based on the
|
|
Packit Service |
c5cf8c |
provider being used. Capability sets can be configured at compile time or
|
|
Packit Service |
c5cf8c |
runtime. Compile time configuration provides better performance by
|
|
Packit Service |
c5cf8c |
reducing unnecessary code branches, but at the cost of flexibility.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
To configure at compile time, the device string should be amended to include
|
|
Packit Service |
c5cf8c |
the OFI provider with the following option:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-device=ch4:ofi:sockets
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This will setup the OFI netmod to use the optimal configuration for the
|
|
Packit Service |
c5cf8c |
sockets provider, and will set various compile time constants. These settings
|
|
Packit Service |
c5cf8c |
cannot be changed at runtime.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
If runtime configuration is needed, continue to use the device string as before
|
|
Packit Service |
c5cf8c |
(without the OFI provider extension) and set various environment variables to
|
|
Packit Service |
c5cf8c |
achieve a similar configuration. To select the desired provider:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
% export MPIR_CVAR_OFI_USE_PROVIDER=sockets
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
This will select the OFI provider and the associated MPICH capability set. To
|
|
Packit Service |
c5cf8c |
change the preset configuration, there exists an extended set of environment
|
|
Packit Service |
c5cf8c |
variables. As an example, the immediate data fields can be disabled by using
|
|
Packit Service |
c5cf8c |
the environment variable:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
% export MPIR_CVAR_OFI_ENABLE_DATA=0
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
A full list of capability set configuration variables can be found in the
|
|
Packit Service |
c5cf8c |
environment variables README.envvar.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
-------------------------------------------------------------------------
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
13. Threads
|
|
Packit Service |
c5cf8c |
===========
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
MPICH supports multiple threading packages. The default is posix
|
|
Packit Service |
c5cf8c |
threads (pthreads), but solaris threads, windows threads and argobots
|
|
Packit Service |
c5cf8c |
are also supported.
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
To configure mpich to work with argobots threads, use the following
|
|
Packit Service |
c5cf8c |
configure options:
|
|
Packit Service |
c5cf8c |
|
|
Packit Service |
c5cf8c |
--with-thread-package=argobots \
|
|
Packit Service |
c5cf8c |
CFLAGS="-I<path_to_argobots/include>" \
|
|
Packit Service |
c5cf8c |
LDFLAGS="-L<path_to_argobots/lib>"
|