|
Packit |
0848f5 |
MPICH Release 3.2.1
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
MPICH is a high-performance and widely portable implementation of the
|
|
Packit |
0848f5 |
MPI-3.1 standard from the Argonne National Laboratory. This release
|
|
Packit |
0848f5 |
has all MPI 3.1 functions and features required by the standard with
|
|
Packit |
0848f5 |
the exception of support for the "external32" portable I/O format and
|
|
Packit |
0848f5 |
user-defined data representations for I/O.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This README file should contain enough information to get you started
|
|
Packit |
0848f5 |
with MPICH. More extensive installation and user guides can be found
|
|
Packit |
0848f5 |
in the doc/installguide/install.pdf and doc/userguide/user.pdf files
|
|
Packit |
0848f5 |
respectively. Additional information regarding the contents of the
|
|
Packit |
0848f5 |
release can be found in the CHANGES file in the top-level directory,
|
|
Packit |
0848f5 |
and in the RELEASE_NOTES file, where certain restrictions are
|
|
Packit |
0848f5 |
detailed. Finally, the MPICH web site, http://www.mpich.org, contains
|
|
Packit |
0848f5 |
information on bug fixes and new releases.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
1. Getting Started
|
|
Packit |
0848f5 |
2. Reporting Installation or Usage Problems
|
|
Packit |
0848f5 |
3. Compiler Flags
|
|
Packit |
0848f5 |
4. Alternate Channels and Devices
|
|
Packit |
0848f5 |
5. Alternate Process Managers
|
|
Packit |
0848f5 |
6. Alternate Configure Options
|
|
Packit |
0848f5 |
7. Testing the MPICH installation
|
|
Packit |
0848f5 |
8. Fault Tolerance
|
|
Packit |
0848f5 |
9. Developer Builds
|
|
Packit |
0848f5 |
10. Multiple Fortran compiler support
|
|
Packit |
0848f5 |
11. ABI Compatibility
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
1. Getting Started
|
|
Packit |
0848f5 |
==================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The following instructions take you through a sequence of steps to get
|
|
Packit |
0848f5 |
the default configuration (ch3 device, nemesis channel (with TCP and
|
|
Packit |
0848f5 |
shared memory), Hydra process management) of MPICH up and running.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(a) You will need the following prerequisites.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- REQUIRED: This tar file mpich-3.2.1.tar.gz
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- REQUIRED: A C compiler (gcc is sufficient)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- OPTIONAL: A C++ compiler, if C++ applications are to be used
|
|
Packit |
0848f5 |
(g++, etc.). If you do not require support for C++ applications,
|
|
Packit |
0848f5 |
you can disable this support using the configure option
|
|
Packit |
0848f5 |
--disable-cxx (configuring MPICH is described in step 1(d)
|
|
Packit |
0848f5 |
below).
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- OPTIONAL: A Fortran compiler, if Fortran applications are to be
|
|
Packit |
0848f5 |
used (gfortran, ifort, etc.). If you do not require support for
|
|
Packit |
0848f5 |
Fortran applications, you can disable this support using
|
|
Packit |
0848f5 |
--disable-fortran (configuring MPICH is described in step 1(d)
|
|
Packit |
0848f5 |
below).
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Also, you need to know what shell you are using since different shell
|
|
Packit |
0848f5 |
has different command syntax. Command "echo $SHELL" prints out the
|
|
Packit |
0848f5 |
current shell used by your terminal program.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(b) Unpack the tar file and go to the top level directory:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
tar xzf mpich-3.2.1.tar.gz
|
|
Packit |
0848f5 |
cd mpich-3.2.1
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If your tar doesn't accept the z option, use
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
gunzip mpich-3.2.1.tar.gz
|
|
Packit |
0848f5 |
tar xf mpich-3.2.1.tar
|
|
Packit |
0848f5 |
cd mpich-3.2.1
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(c) Choose an installation directory, say
|
|
Packit |
0848f5 |
/home/<USERNAME>/mpich-install, which is assumed to non-existent
|
|
Packit |
0848f5 |
or empty. It will be most convenient if this directory is shared
|
|
Packit |
0848f5 |
by all of the machines where you intend to run processes. If not,
|
|
Packit |
0848f5 |
you will have to duplicate it on the other machines after
|
|
Packit |
0848f5 |
installation.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(d) Configure MPICH specifying the installation directory:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for csh and tcsh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --prefix=/home/<USERNAME>/mpich-install |& tee c.txt
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for bash and sh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --prefix=/home/<USERNAME>/mpich-install 2>&1 | tee c.txt
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Bourne-like shells, sh and bash, accept "2>&1 |". Csh-like shell,
|
|
Packit |
0848f5 |
csh and tcsh, accept "|&". If a failure occurs, the configure
|
|
Packit |
0848f5 |
command will display the error. Most errors are straight-forward
|
|
Packit |
0848f5 |
to follow. For example, if the configure command fails with:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
"No Fortran compiler found. If you don't need to build any
|
|
Packit |
0848f5 |
Fortran programs, you can disable Fortran support using
|
|
Packit |
0848f5 |
--disable-fortran. If you do want to build Fortran programs,
|
|
Packit |
0848f5 |
you need to install a Fortran compiler such as gfortran or
|
|
Packit |
0848f5 |
ifort before you can proceed."
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
... it means that you don't have a Fortran compiler :-). You will
|
|
Packit |
0848f5 |
need to either install one, or disable Fortran support in MPICH.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If you are unable to understand what went wrong, please go to step
|
|
Packit |
0848f5 |
(2) below, for reporting the issue to the MPICH developers and
|
|
Packit |
0848f5 |
other users.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(e) Build MPICH:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for csh and tcsh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make |& tee m.txt
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for bash and sh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make 2>&1 | tee m.txt
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This step should succeed if there were no problems with the
|
|
Packit |
0848f5 |
preceding step. Check file m.txt. If there were problems, do a
|
|
Packit |
0848f5 |
"make clean" and then run make again with V=1.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make V=1 |& tee m.txt (for csh and tcsh)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
OR
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make V=1 2>&1 | tee m.txt (for bash and sh)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Then go to step (2) below, for reporting the issue to the MPICH
|
|
Packit |
0848f5 |
developers and other users.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(f) Install the MPICH commands:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for csh and tcsh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make install |& tee mi.txt
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for bash and sh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make install 2>&1 | tee mi.txt
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This step collects all required executables and scripts in the bin
|
|
Packit |
0848f5 |
subdirectory of the directory specified by the prefix argument to
|
|
Packit |
0848f5 |
configure.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(g) Add the bin subdirectory of the installation directory to your
|
|
Packit |
0848f5 |
path in your startup script (.bashrc for bash, .cshrc for csh,
|
|
Packit |
0848f5 |
etc.):
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for csh and tcsh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
setenv PATH /home/<USERNAME>/mpich-install/bin:$PATH
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
for bash and sh:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
PATH=/home/<USERNAME>/mpich-install/bin:$PATH ; export PATH
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Check that everything is in order at this point by doing:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
which mpicc
|
|
Packit |
0848f5 |
which mpiexec
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
These commands should display the path to your bin subdirectory of
|
|
Packit |
0848f5 |
your install directory.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
IMPORTANT NOTE: The install directory has to be visible at exactly
|
|
Packit |
0848f5 |
the same path on all machines you want to run your applications
|
|
Packit |
0848f5 |
on. This is typically achieved by installing MPICH on a shared
|
|
Packit |
0848f5 |
NFS file-system. If you do not have a shared NFS directory, you
|
|
Packit |
0848f5 |
will need to manually copy the install directory to all machines
|
|
Packit |
0848f5 |
at exactly the same location.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(h) MPICH uses a process manager for starting MPI applications. The
|
|
Packit |
0848f5 |
process manager provides the "mpiexec" executable, together with
|
|
Packit |
0848f5 |
other utility executables. MPICH comes packaged with multiple
|
|
Packit |
0848f5 |
process managers; the default is called Hydra.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Now we will run an MPI job, using the mpiexec command as specified
|
|
Packit |
0848f5 |
in the MPI standard. There are some examples in the install
|
|
Packit |
0848f5 |
directory, which you have already put in your path, as well as in
|
|
Packit |
0848f5 |
the directory mpich-3.2.1/examples. One of them is the classic
|
|
Packit |
0848f5 |
CPI example, which computes the value of pi by numerical
|
|
Packit |
0848f5 |
integration in parallel.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
To run the CPI example with 'n' processes on your local machine,
|
|
Packit |
0848f5 |
you can use:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpiexec -n <number> ./examples/cpi
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Test that you can run an 'n' process CPI job on multiple nodes:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpiexec -f machinefile -n <number> ./examples/cpi
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The 'machinefile' is of the form:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
host1
|
|
Packit |
0848f5 |
host2:2
|
|
Packit |
0848f5 |
host3:4 # Random comments
|
|
Packit |
0848f5 |
host4:1
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
'host1', 'host2', 'host3' and 'host4' are the hostnames of the
|
|
Packit |
0848f5 |
machines you want to run the job on. The ':2', ':4', ':1' segments
|
|
Packit |
0848f5 |
depict the number of processes you want to run on each node. If
|
|
Packit |
0848f5 |
nothing is specified, ':1' is assumed.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
More details on interacting with Hydra can be found at
|
|
Packit |
0848f5 |
http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If you have completed all of the above steps, you have successfully
|
|
Packit |
0848f5 |
installed MPICH and run an MPI example.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
2. Reporting Installation or Usage Problems
|
|
Packit |
0848f5 |
===========================================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
[VERY IMPORTANT: PLEASE COMPRESS ALL FILES BEFORE SENDING THEM TO
|
|
Packit |
0848f5 |
US. DO NOT SPAM THE MAILING LIST WITH LARGE ATTACHMENTS.]
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The distribution has been tested by us on a variety of machines in our
|
|
Packit |
0848f5 |
environments as well as our partner institutes. If you have problems
|
|
Packit |
0848f5 |
with the installation or usage of MPICH, please follow these steps:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
1. First see the Frequently Asked Questions (FAQ) page at
|
|
Packit |
0848f5 |
http://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions to
|
|
Packit |
0848f5 |
see if the problem you are facing has a simple solution. Many common
|
|
Packit |
0848f5 |
problems and their solutions are listed here.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
2. If you cannot find an answer on the FAQ page, look through previous
|
|
Packit |
0848f5 |
email threads on the discuss@mpich.org mailing list archive
|
|
Packit |
0848f5 |
(https://lists.mpich.org/mailman/listinfo/discuss). It is likely
|
|
Packit |
0848f5 |
someone else had a similar problem, which has already been resolved
|
|
Packit |
0848f5 |
before.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
3. If neither of the above steps work, please send an email to
|
|
Packit |
0848f5 |
discuss@mpich.org. You need to subscribe to this list
|
|
Packit |
0848f5 |
(https://lists.mpich.org/mailman/listinfo/discuss) before sending an
|
|
Packit |
0848f5 |
email.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Your email should contain the following files. ONCE AGAIN, PLEASE
|
|
Packit |
0848f5 |
COMPRESS BEFORE SENDING, AS THE FILES CAN BE LARGE. Note that,
|
|
Packit |
0848f5 |
depending on which step the build failed, some of the files might not
|
|
Packit |
0848f5 |
exist.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpich-3.2.1/c.txt (generated in step 1(d) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/m.txt (generated in step 1(e) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/mi.txt (generated in step 1(f) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/config.log (generated in step 1(d) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/src/openpa/config.log (generated in step 1(d) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/src/mpl/config.log (generated in step 1(d) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/src/pm/hydra/config.log (generated in step 1(d) above)
|
|
Packit |
0848f5 |
mpich-3.2.1/src/pm/hydra/tools/topo/hwloc/hwloc/config.log (generated in step 1(d) above)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
DID WE MENTION? DO NOT FORGET TO COMPRESS THESE FILES!
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If you have compiled MPICH and are having trouble running an
|
|
Packit |
0848f5 |
application, please provide the output of the following command in
|
|
Packit |
0848f5 |
your email.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpiexec -info
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Finally, please include the actual error you are seeing when running
|
|
Packit |
0848f5 |
the application, including the mpiexec command used, and the host
|
|
Packit |
0848f5 |
file. If possible, please try to reproduce the error with a smaller
|
|
Packit |
0848f5 |
application or benchmark and send that along in your bug report.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
4. If you have found a bug in MPICH, we request that you report it at
|
|
Packit |
0848f5 |
our bug tracking system:
|
|
Packit |
0848f5 |
(https://trac.mpich.org/projects/mpich/newticket). Even if you believe
|
|
Packit |
0848f5 |
you have found a bug, we recommend you sending an email to
|
|
Packit |
0848f5 |
discuss@mpich.org first.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
3. Compiler Flags
|
|
Packit |
0848f5 |
=================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
MPICH allows several sets of compiler flags to be used. The first
|
|
Packit |
0848f5 |
three sets are configure-time options for MPICH, while the fourth is
|
|
Packit |
0848f5 |
only relevant when compiling applications with mpicc and friends.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(a) CFLAGS, CPPFLAGS, CXXFLAGS, FFLAGS, FCFLAGS, LDFLAGS and LIBS
|
|
Packit |
0848f5 |
(abbreviated as xFLAGS): Setting these flags would result in the
|
|
Packit |
0848f5 |
MPICH library being compiled/linked with these flags and the flags
|
|
Packit |
0848f5 |
internally being used in mpicc and friends.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(b) MPICHLIB_CFLAGS, MPICHLIB_CPPFLAGS, MPICHLIB_CXXFLAGS,
|
|
Packit |
0848f5 |
MPICHLIB_FFLAGS, and MPICHLIB_FCFLAGS (abbreviated as
|
|
Packit |
0848f5 |
MPICHLIB_xFLAGS): Setting these flags would result in the MPICH
|
|
Packit |
0848f5 |
library being compiled with these flags. However, these flags will
|
|
Packit |
0848f5 |
*not* be used by mpicc and friends.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(c) MPICH_MAKE_CFLAGS: Setting these flags would result in MPICH's
|
|
Packit |
0848f5 |
configure tests to not use these flags, but the makefile's to use
|
|
Packit |
0848f5 |
them. This is a temporary hack for certain cases that advanced
|
|
Packit |
0848f5 |
developers might be interested in, but which break existing configure
|
|
Packit |
0848f5 |
tests (e.g., -Werror). These are NOT recommended for regular users.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(d) MPICH_MPICC_CFLAGS, MPICH_MPICC_CPPFLAGS, MPICH_MPICC_LDFLAGS,
|
|
Packit |
0848f5 |
MPICH_MPICC_LIBS, and so on for MPICXX, MPIF77 and MPIFORT
|
|
Packit |
0848f5 |
(abbreviated as MPICH_MPIX_FLAGS): These flags do *not* affect the
|
|
Packit |
0848f5 |
compilation of the MPICH library itself, but will be internally used
|
|
Packit |
0848f5 |
by mpicc and friends.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
+--------------------------------------------------------------------+
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
| | MPICH library | mpicc and friends |
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
+--------------------+----------------------+------------------------+
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
| xFLAGS | Yes | Yes |
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
+--------------------+----------------------+------------------------+
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
| MPICHLIB_xFLAGS | Yes | No |
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
+--------------------+----------------------+------------------------+
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
| MPICH_MAKE_xFLAGS | Yes | No |
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
+--------------------+----------------------+------------------------+
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
| MPICH_MPIX_FLAGS | No | Yes |
|
|
Packit |
0848f5 |
| | | |
|
|
Packit |
0848f5 |
+--------------------+----------------------+------------------------+
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
All these flags can be set as part of configure command or through
|
|
Packit |
0848f5 |
environment variables.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Default flags
|
|
Packit |
0848f5 |
--------------
|
|
Packit |
0848f5 |
By default, MPICH automatically adds certain compiler optimizations
|
|
Packit |
0848f5 |
to MPICHLIB_CFLAGS. The currently used optimization level is -O2.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
** IMPORTANT NOTE: Remember that this only affects the compilation of
|
|
Packit |
0848f5 |
the MPICH library and is not used in the wrappers (mpicc and friends)
|
|
Packit |
0848f5 |
that are used to compile your applications or other libraries.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This optimization level can be changed with the --enable-fast option
|
|
Packit |
0848f5 |
passed to configure. For example, to build an MPICH environment with
|
|
Packit |
0848f5 |
-O3 for all language bindings, one can simply do:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --enable-fast=O3
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Or to disable all compiler optimizations, one can do:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --disable-fast
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
For more details of --enable-fast, see the output of "configure
|
|
Packit |
0848f5 |
--help".
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
For performance testing, we recommend the following flags:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --enable-fast=O3,ndebug --disable-error-checking --without-timing \
|
|
Packit |
0848f5 |
--without-mpit-pvars
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Examples
|
|
Packit |
0848f5 |
--------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Example 1:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --disable-fast MPICHLIB_CFLAGS=-O3 MPICHLIB_FFLAGS=-O3 \
|
|
Packit |
0848f5 |
MPICHLIB_CXXFLAGS=-O3 MPICHLIB_FCFLAGS=-O3
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This will cause the MPICH libraries to be built with -O3, and -O3
|
|
Packit |
0848f5 |
will *not* be included in the mpicc and other MPI wrapper script.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Example 2:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --disable-fast CFLAGS=-O3 FFLAGS=-O3 CXXFLAGS=-O3 FCFLAGS=-O3
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This will cause the MPICH libraries to be built with -O3, and -O3
|
|
Packit |
0848f5 |
will be included in the mpicc and other MPI wrapper script.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Example 3:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
There are certain compiler flags that should not be used with MPICH's
|
|
Packit |
0848f5 |
configure, e.g. gcc's -Werror, which would confuse configure and cause
|
|
Packit |
0848f5 |
certain configure tests to fail to detect the correct system features.
|
|
Packit |
0848f5 |
To use -Werror in building MPICH libraries, you can pass the compiler
|
|
Packit |
0848f5 |
flags during the make step through the Makefile variable
|
|
Packit |
0848f5 |
MPICH_MAKE_CFLAGS as follows:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make MPICH_MAKE_CFLAGS="-Wall -Werror"
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The content of MPICH_MAKE_CFLAGS is appended to the CFLAGS in all
|
|
Packit |
0848f5 |
relevant Makefiles.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
4. Alternate Channels and Devices
|
|
Packit |
0848f5 |
=================================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The communication mechanisms in MPICH are called "devices". MPICH
|
|
Packit |
0848f5 |
supports ch3 (default), as well as many third-party devices that are
|
|
Packit |
0848f5 |
released and maintained by other institutes.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
*************************************
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
ch3 device
|
|
Packit |
0848f5 |
**********
|
|
Packit |
0848f5 |
The ch3 device contains different internal communication options
|
|
Packit |
0848f5 |
called "channels". We currently support nemesis (default) and sock
|
|
Packit |
0848f5 |
channels.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
nemesis channel
|
|
Packit |
0848f5 |
---------------
|
|
Packit |
0848f5 |
Nemesis provides communication using different networks (tcp, mx) as
|
|
Packit |
0848f5 |
well as various shared-memory optimizations. To configure MPICH with
|
|
Packit |
0848f5 |
nemesis, you can use the following configure option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-device=ch3:nemesis
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The TCP network module gets configured in by default.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Shared-memory optimizations are enabled by default to improve
|
|
Packit |
0848f5 |
performance for multi-processor/multi-core platforms. They can be
|
|
Packit |
0848f5 |
disabled (at the cost of performance) either by setting the
|
|
Packit |
0848f5 |
environment variable MPICH_NO_LOCAL to 1, or using the following
|
|
Packit |
0848f5 |
configure option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--enable-nemesis-dbg-nolocal
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The --with-shared-memory= configure option allows you to choose how
|
|
Packit |
0848f5 |
Nemesis allocates shared memory. The options are "auto", "sysv", and
|
|
Packit |
0848f5 |
"mmap". Using "sysv" will allocate shared memory using the System V
|
|
Packit |
0848f5 |
shmget(), shmat(), etc. functions. Using "mmap" will allocate shared
|
|
Packit |
0848f5 |
memory by creating a file (in /dev/shm if it exists, otherwise /tmp),
|
|
Packit |
0848f5 |
then mmap() the file. The default is "auto". Note that System V
|
|
Packit |
0848f5 |
shared memory has limits on the size of shared memory segments so
|
|
Packit |
0848f5 |
using this for Nemesis may limit the number of processes that can be
|
|
Packit |
0848f5 |
started on a single node.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mxm network module
|
|
Packit |
0848f5 |
``````````````````
|
|
Packit |
0848f5 |
The mxm netmod provides support for Mellanox InfiniBand adapters. It
|
|
Packit |
0848f5 |
can be built with the following configure option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-device=ch3:nemesis:mxm
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If your MXM library is installed in a non-standard location, you might
|
|
Packit |
0848f5 |
need to help configure find it using the following configure option
|
|
Packit |
0848f5 |
(assuming the libraries are present in /path/to/mxm/lib and the
|
|
Packit |
0848f5 |
include headers are present in /path/to/mxm/include):
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-mxm=/path/to/mxm
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(or)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-mxm-lib=/path/to/mxm/lib
|
|
Packit |
0848f5 |
--with-mxm-include=/path/to/mxm/include
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
By default, the mxm library throws warnings when the system does not
|
|
Packit |
0848f5 |
enable certain features that might hurt performance. These are
|
|
Packit |
0848f5 |
important warnings that might cause performance degradation on your
|
|
Packit |
0848f5 |
system. But you might need root privileges to fix some of them. If
|
|
Packit |
0848f5 |
you would like to disable such warnings, you can set the MXM log level
|
|
Packit |
0848f5 |
to "error" instead of the default "warn" by using:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
MXM_LOG_LEVEL=error
|
|
Packit |
0848f5 |
export MXM_LOG_LEVEL
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
portals4 network module
|
|
Packit |
0848f5 |
```````````````````````
|
|
Packit |
0848f5 |
The portals4 netmod provides support for the Portals 4 network
|
|
Packit |
0848f5 |
programming interface. To enable, configure with the following option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-device=ch3:nemesis:portals4
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If the Portals 4 include files and libraries are not in the normal
|
|
Packit |
0848f5 |
search paths, you can specify them with the following options:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-portals4-include= and --with-portals4-lib=
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
... or the if lib/ and include/ are in the same directory, you can use
|
|
Packit |
0848f5 |
the following option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-portals4=
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If the Portals libraries are shared libraries, they need to be in the
|
|
Packit |
0848f5 |
shared library search path. This can be done by adding the path to
|
|
Packit |
0848f5 |
/etc/ld.so.conf, or by setting the LD_LIBRARY_PATH variable in your
|
|
Packit |
0848f5 |
environment. It's also possible to set the shared library search path
|
|
Packit |
0848f5 |
in the binary. If you're using gcc, you can do this by adding
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
LD_LIBRARY_PATH=/path/to/lib
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(and)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
LDFLAGS="-Wl,-rpath -Wl,/path/to/lib"
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
... as arguments to configure.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Currently, use of MPI_ANY_SOURCE and MPI dynamic processes are unsupported
|
|
Packit |
0848f5 |
with the portals4 netmod.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
ofi network module
|
|
Packit |
0848f5 |
```````````````````
|
|
Packit |
0848f5 |
The ofi netmod provides support for the OFI network programming interface.
|
|
Packit |
0848f5 |
To enable, configure with the following option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-device=ch3:nemesis:ofi
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If the OFI include files and libraries are not in the normal search paths,
|
|
Packit |
0848f5 |
you can specify them with the following options:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-ofi-include= and --with-ofi-lib=
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
... or the if lib/ and include/ are in the same directory, you can use
|
|
Packit |
0848f5 |
the following option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-ofi=
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If the OFI libraries are shared libraries, they need to be in the
|
|
Packit |
0848f5 |
shared library search path. This can be done by adding the path to
|
|
Packit |
0848f5 |
/etc/ld.so.conf, or by setting the LD_LIBRARY_PATH variable in your
|
|
Packit |
0848f5 |
environment. It's also possible to set the shared library search path
|
|
Packit |
0848f5 |
in the binary. If you're using gcc, you can do this by adding
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
LD_LIBRARY_PATH=/path/to/lib
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
(and)
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
LDFLAGS="-Wl,-rpath -Wl,/path/to/lib"
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
... as arguments to configure.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
sock channel
|
|
Packit |
0848f5 |
------------
|
|
Packit |
0848f5 |
sock is the traditional TCP sockets based communication channel. It
|
|
Packit |
0848f5 |
uses TCP/IP sockets for all communication including intra-node
|
|
Packit |
0848f5 |
communication. So, though the performance of this channel is worse
|
|
Packit |
0848f5 |
than that of nemesis, it should work on almost every platform. This
|
|
Packit |
0848f5 |
channel can be configured using the following option:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-device=ch3:sock
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
pamid device
|
|
Packit |
0848f5 |
************
|
|
Packit |
0848f5 |
This is the device used on the IBM Blue Gene/Q system. The following
|
|
Packit |
0848f5 |
configure options can be used:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --host=powerpc64-bgq-linux \
|
|
Packit |
0848f5 |
--with-device=pamid:BGQ \
|
|
Packit |
0848f5 |
--with-file-system=bg+bglockless
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The Blue Gene/Q cross compilers must either be in the $PATH, or
|
|
Packit |
0848f5 |
explicitly specified using environment variables, before configure.
|
|
Packit |
0848f5 |
For example:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
PATH=$PATH:/bgsys/drivers/ppcfloor/gnu-linux/bin
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
or
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc
|
|
Packit |
0848f5 |
CXX=...
|
|
Packit |
0848f5 |
...
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
There are several other configure options that are specific to building
|
|
Packit |
0848f5 |
on a Blue Gene/Q system. See the wiki page for more information:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
https://wiki.mpich.org/mpich/index.php/BGQ
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
5. Alternate Process Managers
|
|
Packit |
0848f5 |
=============================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
hydra
|
|
Packit |
0848f5 |
-----
|
|
Packit |
0848f5 |
Hydra is the default process management framework that uses existing
|
|
Packit |
0848f5 |
daemons on nodes (e.g., ssh, pbs, slurm, sge) to start MPI
|
|
Packit |
0848f5 |
processes. More information on Hydra can be found at
|
|
Packit |
0848f5 |
http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
gforker
|
|
Packit |
0848f5 |
-------
|
|
Packit |
0848f5 |
gforker is a process manager that creates processes on a single
|
|
Packit |
0848f5 |
machine, by having mpiexec directly fork and exec them. gforker is
|
|
Packit |
0848f5 |
mostly meant as a research platform and for debugging purposes, as it
|
|
Packit |
0848f5 |
is only meant for single-node systems.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
slurm
|
|
Packit |
0848f5 |
-----
|
|
Packit |
0848f5 |
SLURM is an external process manager not distributed with
|
|
Packit |
0848f5 |
MPICH. MPICH's default process manager, hydra, has native support
|
|
Packit |
0848f5 |
for slurm and you can directly use it in slurm environments (it will
|
|
Packit |
0848f5 |
automatically detect slurm and use slurm capabilities). However, if
|
|
Packit |
0848f5 |
you want to use the slurm provided "srun" process manager, you can use
|
|
Packit |
0848f5 |
the "--with-pmi=slurm --with-pm=no" option with configure. Note that
|
|
Packit |
0848f5 |
the "srun" process manager that comes with slurm uses an older PMI
|
|
Packit |
0848f5 |
standard which does not have some of the performance enhancements that
|
|
Packit |
0848f5 |
hydra provides in slurm environments.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
6. Alternate Configure Options
|
|
Packit |
0848f5 |
==============================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
MPICH has a number of other features. If you are exploring MPICH as
|
|
Packit |
0848f5 |
part of a development project, you might want to tweak the MPICH
|
|
Packit |
0848f5 |
build with the following configure options. A complete list of
|
|
Packit |
0848f5 |
configuration options can be found using:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
./configure --help
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
7. Testing the MPICH installation
|
|
Packit |
0848f5 |
==================================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
To test MPICH, we package the MPICH test suite in the MPICH
|
|
Packit |
0848f5 |
distribution. You can run the test suite using:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
make testing
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The results summary will be placed in test/summary.xml
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
8. Fault Tolerance
|
|
Packit |
0848f5 |
==================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
MPICH has some tolerance to process failures, and supports
|
|
Packit |
0848f5 |
checkpointing and restart.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Tolerance to Process Failures
|
|
Packit |
0848f5 |
-----------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The features described in this section should be considered
|
|
Packit |
0848f5 |
experimental. Which means that they have not been fully tested, and
|
|
Packit |
0848f5 |
the behavior may change in future releases. The below notes are some
|
|
Packit |
0848f5 |
guidelines on what can be expected in this feature:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- ERROR RETURNS: Communication failures in MPICH are not fatal
|
|
Packit |
0848f5 |
errors. This means that if the user sets the error handler to
|
|
Packit |
0848f5 |
MPI_ERRORS_RETURN, MPICH will return an appropriate error code in
|
|
Packit |
0848f5 |
the event of a communication failure. When a process detects a
|
|
Packit |
0848f5 |
failure when communicating with another process, it will consider
|
|
Packit |
0848f5 |
the other process as having failed and will no longer attempt to
|
|
Packit |
0848f5 |
communicate with that process. The user can, however, continue
|
|
Packit |
0848f5 |
making communication calls to other processes. Any outstanding
|
|
Packit |
0848f5 |
send or receive operations to a failed process, or wildcard
|
|
Packit |
0848f5 |
receives (i.e., with MPI_ANY_SOURCE) posted to communicators with a
|
|
Packit |
0848f5 |
failed process, will be immediately completed with an appropriate
|
|
Packit |
0848f5 |
error code.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- COLLECTIVES: For collective operations performed on communicators
|
|
Packit |
0848f5 |
with a failed process, the collective would return an error on
|
|
Packit |
0848f5 |
some, but not necessarily all processes. A collective call
|
|
Packit |
0848f5 |
returning MPI_SUCCESS on a given process means that the part of the
|
|
Packit |
0848f5 |
collective performed by that process has been successful.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- PROCESS MANAGER: If used with the hydra process manager, hydra will
|
|
Packit |
0848f5 |
detect failed processes and notify the MPICH library. Users can
|
|
Packit |
0848f5 |
query the list of failed processes using MPIX_Comm_group_failed().
|
|
Packit |
0848f5 |
This functions returns a group consisting of the failed processes
|
|
Packit |
0848f5 |
in the communicator. The function MPIX_Comm_remote_group_failed()
|
|
Packit |
0848f5 |
is provided for querying failed processes in the remote processes
|
|
Packit |
0848f5 |
of an intercommunicator.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Note that hydra by default will abort the entire application when
|
|
Packit |
0848f5 |
any process terminates before calling MPI_Finalize. In order to
|
|
Packit |
0848f5 |
allow an application to continue running despite failed processes,
|
|
Packit |
0848f5 |
you will need to pass the -disable-auto-cleanup option to mpiexec.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
- FAILURE NOTIFICATION: THIS IS AN UNSUPPORTED FEATURE AND WILL
|
|
Packit |
0848f5 |
ALMOST CERTAINLY CHANGE IN THE FUTURE!
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
In the current release, hydra notifies the MPICH library of failed
|
|
Packit |
0848f5 |
processes by sending a SIGUSR1 signal. The application can catch
|
|
Packit |
0848f5 |
this signal to be notified of failed processes. If the application
|
|
Packit |
0848f5 |
replaces the library's signal handler with its own, the application
|
|
Packit |
0848f5 |
must be sure to call the library's handler from it's own
|
|
Packit |
0848f5 |
handler. Note that you cannot call any MPI function from inside a
|
|
Packit |
0848f5 |
signal handler.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Checkpoint and Restart
|
|
Packit |
0848f5 |
----------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
MPICH supports checkpointing and restart fault-tolerance using BLCR.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
CONFIGURATION
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
First, you need to have BLCR version 0.8.2 or later installed on your
|
|
Packit |
0848f5 |
machine. If it's installed in the default system location, you don't
|
|
Packit |
0848f5 |
need to do anything.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If BLCR is not installed in the default system location, you'll need
|
|
Packit |
0848f5 |
to tell MPICH's configure where to find it. You might also need to
|
|
Packit |
0848f5 |
set the LD_LIBRARY_PATH environment variable so that BLCR's shared
|
|
Packit |
0848f5 |
libraries can be found. In this case add the following options to
|
|
Packit |
0848f5 |
your configure command:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
--with-blcr=<BLCR_INSTALL_DIR>
|
|
Packit |
0848f5 |
LD_LIBRARY_PATH=<BLCR_INSTALL_DIR>/lib
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
where <BLCR_INSTALL_DIR> is the directory where BLCR has been
|
|
Packit |
0848f5 |
installed (whatever was specified in --prefix when BLCR was
|
|
Packit |
0848f5 |
configured).
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
After it's configured compile as usual (e.g., make; make install).
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Note, checkpointing is only supported with the Hydra process manager.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
VERIFYING CHECKPOINTING SUPPORT
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Make sure MPICH is correctly configured with BLCR. You can do this
|
|
Packit |
0848f5 |
using:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpiexec -info
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This should display 'BLCR' under 'Checkpointing libraries available'.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
CHECKPOINTING THE APPLICATION
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
There are two ways to cause the application to checkpoint. You can ask
|
|
Packit |
0848f5 |
mpiexec to periodically checkpoint the application using the mpiexec
|
|
Packit |
0848f5 |
option -ckpoint-interval (seconds):
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint \
|
|
Packit |
0848f5 |
-ckpoint-interval 3600 -f hosts -n 4 ./app
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Alternatively, you can also manually force checkpointing by sending a
|
|
Packit |
0848f5 |
SIGUSR1 signal to mpiexec.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The checkpoint/restart parameters can also be controlled with the
|
|
Packit |
0848f5 |
environment variables HYDRA_CKPOINTLIB, HYDRA_CKPOINT_PREFIX and
|
|
Packit |
0848f5 |
HYDRA_CKPOINT_INTERVAL.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
To restart a process:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -f hosts -n 4 -ckpoint-num <N>
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
where <N> is the checkpoint number you want to restart from.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
These instructions can also be found on the MPICH wiki:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
http://wiki.mpich.org/mpich/index.php/Checkpointing
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
9. Developer Builds
|
|
Packit |
0848f5 |
===================
|
|
Packit |
0848f5 |
For MPICH developers who want to directly work on the primary version
|
|
Packit |
0848f5 |
control system, there are a few additional steps involved (people
|
|
Packit |
0848f5 |
using the release tarballs do not have to follow these steps). Details
|
|
Packit |
0848f5 |
about these steps can be found here:
|
|
Packit |
0848f5 |
http://wiki.mpich.org/mpich/index.php/Getting_And_Building_MPICH
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
10. Multiple Fortran compiler support
|
|
Packit |
0848f5 |
=====================================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
If the C compiler that is used to build MPICH libraries supports both
|
|
Packit |
0848f5 |
multiple weak symbols and multiple aliases of common symbols, the
|
|
Packit |
0848f5 |
Fortran binding can support multiple Fortran compilers. The
|
|
Packit |
0848f5 |
multiple weak symbols support allow MPICH to provide different name
|
|
Packit |
0848f5 |
mangling scheme (of subroutine names) required by differen Fortran
|
|
Packit |
0848f5 |
compilers. The multiple aliases of common symbols support enables
|
|
Packit |
0848f5 |
MPICH to equal different common block symbols of the MPI Fortran
|
|
Packit |
0848f5 |
constant, e.g. MPI_IN_PLACE, MPI_STATUS_IGNORE. So they are understood
|
|
Packit |
0848f5 |
by different Fortran compilers.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Since the support of multiple aliases of common symbols is
|
|
Packit |
0848f5 |
new/experimental, users can disable the feature by using configure
|
|
Packit |
0848f5 |
option --disable-multi-aliases if it causes any undesirable effect,
|
|
Packit |
0848f5 |
e.g. linker warnings of different sizes of common symbols, MPIFCMB*
|
|
Packit |
0848f5 |
(the warning should be harmless).
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
We have only tested this support on a limited set of
|
|
Packit |
0848f5 |
platforms/compilers. On linux, if the C compiler that builds MPICH is
|
|
Packit |
0848f5 |
either gcc or icc, the above support will be enabled by configure. At
|
|
Packit |
0848f5 |
the time of this writing, pgcc does not seem to have this multiple
|
|
Packit |
0848f5 |
aliases of common symbols, so configure will detect the deficiency and
|
|
Packit |
0848f5 |
disable the feature automatically. The tested Fortran compilers
|
|
Packit |
0848f5 |
include GNU Fortran compilers (gfortan), Intel Fortran compiler
|
|
Packit |
0848f5 |
(ifort), Portland Group Fortran compilers (pgfortran), Absoft Fortran
|
|
Packit |
0848f5 |
compilers (af90), and IBM XL fortran compiler (xlf). What this means
|
|
Packit |
0848f5 |
is that if mpich is built by gcc/gfortran, the resulting mpich library
|
|
Packit |
0848f5 |
can be used to link a Fortran program compiled/linked by another
|
|
Packit |
0848f5 |
fortran compiler, say pgf90, say through mpifort -fc=pgf90. As long
|
|
Packit |
0848f5 |
as the Fortran program is linked without any errors by one of these
|
|
Packit |
0848f5 |
compilers, the program shall be running fine.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
-------------------------------------------------------------------------
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
11. ABI Compatibility
|
|
Packit |
0848f5 |
=====================
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
The MPICH ABI compatibility initiative was announced at SC 2014
|
|
Packit |
0848f5 |
(http://www.mpich.org/abi). As a part of this initiative, Argonne,
|
|
Packit |
0848f5 |
Intel, IBM and Cray have committed to maintaining ABI compatibility
|
|
Packit |
0848f5 |
with each other.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
As a first step in this initiative, starting with version 3.1, MPICH
|
|
Packit |
0848f5 |
is binary (ABI) compatible with Intel MPI 5.0. This means you can
|
|
Packit |
0848f5 |
build your program with one MPI implementation and run with the other.
|
|
Packit |
0848f5 |
Specifically, binary-only applications that were built and distributed
|
|
Packit |
0848f5 |
with one of these MPI implementations can now be executed with the
|
|
Packit |
0848f5 |
other MPI implementation.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
Some setup is required to achieve this. Suppose you have MPICH
|
|
Packit |
0848f5 |
installed in /path/to/mpich and Intel MPI installed in /path/to/impi.
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
You can run your application with mpich using:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
% export LD_LIBRARY_PATH=/path/to/mpich/lib:$LD_LIBRARY_PATH
|
|
Packit |
0848f5 |
% mpiexec -np 100 ./foo
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
or using Intel MPI using:
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
% export LD_LIBRARY_PATH=/path/to/impi/lib:$LD_LIBRARY_PATH
|
|
Packit |
0848f5 |
% mpiexec -np 100 ./foo
|
|
Packit |
0848f5 |
|
|
Packit |
0848f5 |
This works irrespective of which MPI implementation your application
|
|
Packit |
0848f5 |
was compiled with, as long as you use one of the MPI implementations
|
|
Packit |
0848f5 |
in the ABI compatibility initiative.
|