/* * File: INSTALL.txt * CVS: $Id$ * Author: Kevin London * london@cs.utk.edu * Mods: Dan Terpstra * terpstra@cs.utk.edu * Mods: Philip Mucci * mucci@cs.utk.edu * Mods: * */ ***************************************************************************** HOW TO INSTALL PAPI ONTO YOUR SYSTEM ***************************************************************************** On some of the systems that PAPI supports, you can install PAPI right out of the box without any additional setup. Others require drivers or patches to be installed first. The general installation steps are below, but first find your particular Operating System's section for any additional steps that may be necessary. NOTE: the configure and make files are located in the papi/src directory. General Installation 1. % ./configure % make 2. Check for errors. a) Run a simple test case: (This will run ctests/zero) % make test If you get good counts, you can optionally run all the test programs with the included test harness. This will run the tests in quiet mode, which will print PASSED, FAILED, or SKIPPED. Tests are SKIPPED if the functionality being tested is not supported by that platform. % make fulltest (This will run ./run_tests.sh) To run the tests in verbose mode: % ./run_tests.sh -v 3. Create a PAPI binary distribution or install PAPI directly. a) To install PAPI libraries and header files from the build tree: % make install b) To install PAPI manual pages from the build tree: % make install-man c) To install PAPI test programs from the build tree: % make install-tests d) To install all of the above in one step from the build tree: % make install-all e) To create a binary kit, papi-.tgz: % make dist ***************************************************************************** MORE ABOUT CONFIGURE OPTIONS ***************************************************************************** There is an extensive array of options available from the configure command-line. These can differ significantly from version to versions of PAPI. For complete details on the command-line options, use: % ./configure --help ***************************************************************************** DOCUMENTATION BY DOXYGEN ***************************************************************************** PAPI now ships with documentation generated by doxygen. Documentation for the public apis can be created by running doxygen from the doc directory. More complete documentation of all internal apis and structures can be generated with: % doxygen Doxyfile-html Doxygen documentation for the currently released version of PAPI is also available on the website. ***************************************************************************** Operating System Specific Installation Steps (In Alphabetical Order by OS) ***************************************************************************** AIX - IBM POWER5 and POWER6 and POWER7 ***************************************************************************** PAPI is supported on AIX 5.x for POWER5 and POWER6. PAPI is also tested on AIX 6.1 for POWER7. Use ./configure to select the desired make options for your system, specifying the --with_bitmode=32 or --with-bitmode=64 to select wordlength. 32 bits is the default. 1. On AIX 5.x, the bos.pmapi is a product level fileset (part of the OS). However, it is not installed by default. Consult your sysadmin to make sure it is installed. 2. Follow the general instructions for installing PAPI. WARNING: PAPI requires XLC version 6 or greater. Your version can be determined by running 'lslpp -a -l | grep -i xlc'. BG/P ***************************************************************************** BG/P is a cross-compiled environment. The machine on which PAPI is compiled is not the machine on which PAPI runs. To compile PAPI on BG/P, specify the BG/P environment as shown below: % ./configure --with-OS=bgp % make NOTE: ./configure might fail if the cross compiler is not in your path. If that is the case, just add it to your path and everything should work: % export PATH=$PATH:/bgsys/drivers/ppcfloor/gnu-linux/bin By default this will make a subset of tests in the ctests directory and all tests in the ftests directory. There is an additional C test program provided for the BG/P environment that exercises the specific BG/P events and demonstrates how to intermix the PAPI and BG/P UPC native calls. This test program is built with the normal make sequence and can be found in the ctests/bgp directory. The testing targets in the make file will not work in the BG/P environment. Since BG/P supports multiple queuing systems, you must manually execute individual programs in the ctests and ftests directories to check for successful library creation. You can also manually edit the run_tests.sh script to automate testing for your installation. Most papi utilities work for BGP, including papi_avail, papi_native_avail, and papi_command_line. Many ctests pass for BGP, but many others produce errors due to the non-traditional architecture of BGP. In particular, PAPI_TOT_CYC always seems to produce 0 counts, although papi_get_virt_usec and papi_get_real_usec appear to work. The IBM RedPaper: http://www.redbooks.ibm.com/abstracts/redp4256.html provides further discussion about PAPI on BGP along with other performance issues. BG/Q ***************************************************************************** Five new components have been added to PAPI to support hardware performance monitoring for the BG/Q platform; in particular the BG/Q network, the I/O system, the Compute Node Kernel in addition to the processing core. There are no specific component configure scripts for L2unit, IOunit, NWunit, CNKunit. In order to configure PAPI for BG/Q, use the following configure options at the papi/src level: % ./configure --prefix=< your_choice > \ --with-OS=bgq \ --with-bgpm_installdir=/bgsys/drivers/ppcfloor \ CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc \ F77=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gfortran \ --with-components="bgpm/L2unit bgpm/CNKunit bgpm/IOunit bgpm/NWunit" CLE - Cray XT and XE Opteron ***************************************************************************** The Cray XT/XE is a cross-compiled environment. You must specify the perfmon version to configure as shown below. Before running configure to create the makefile that supports a Cray XT/XE CLE build of PAPI, execute the following module commands: % module purge % module load gcc Note: do not load the programming environment module (e.g. PrgEnv-gnu) but the compiler module (e.g. gcc) as shown above. Check CLE compute nodes for the version of perfmon2 that it supports: % aprun -b -a xt cat /sys/kernel/perfmon/version and use this version when configuring PAPI for a perfmon2 substrate: % configure CFLAGS="-D__crayxt" \ --with-perfmon=2.82 --prefix= \ --with-virtualtimer=times --with-tls=__thread \ --with-walltimer=cycle --with-ffsll --with-shared-lib=no \ --with-static-tools Configure PAPI for a perf events substrate: % configure CFLAGS="-D__crayxt" \ --with-perf-events --with-pe-incdir= \ --with-assumed-kernel=2.6.34 --prefix= \ --with-virtualtimer=times --with-tls=__thread \ --with-walltimer=cycle --with-ffsll --with-shared-lib=no \ --with-static-tools Invoke the make accordingly: % make CONFIG_PFMLIB_ARCH_CRAYXT=y CONFIG_PFMLIB_SHARED=n % make CONFIG_PFMLIB_ARCH_CRAYXT=y CONFIG_PFMLIB_SHARED=n install The testing targets in the makefile will not work in the XT/XE CLE environment. It is necessary to log into an interactive session and run the tests manually through the job submission system. For example, instead of: % make test use: % aprun -n1 ctests/zero and instead of: % make fulltest use: % ./run_cat_tests.sh after substituting "aprun -n1" for "yod -sz 1" in run_cat_tests.sh. FreeBSD - i386 & amd64 ***************************************************************************** PAPI requires FreeBSD 6 or higher to work. Kernel needs some modifications to provide PAPI access to the performance monitoring counters. Simply, add "options HWPMC_HOOKS" and "device hwpmc" in the kernel configuration file. For i386 systems, add also "device apic". (You can obtain more information in hwpmc(4), see NOTE 1 to check the supported HW) After this step, just recompile the kernel and boot it. FreeBSD 7 (or greater) does not ship with a fortran compiler. To compile fortan tests you will need to install a fortran compiler first (e.g. installing it from /usr/ports/lang/gcc42), and setup the F77 environment variable with the compiler you want to use (e.g. gfortran42). Fortran compilers may issue errors due to "Integer too big for its kind *". Add to FFLAGS environment variable a compiler option to use int*8 by default (in gfortran42 it is -fdefault-integer-8). Follow the "General Installation" steps. NOTE 1: -- HWPMC driver supports the following processors: Intel Pentium 2, Intel Pentium Pro, Intel Pentium 3, Intel Pentium M, Intel Celeron, Intel Pentium 4, AMD K7 (AMD Athlon) and AMD K8 (AMD Athlon64 / Opteron). FreeBSD 8 also adds support for Core/Core2/Core-i[357]/Atom processors. There is also a patch for FreeBSD 7/7.1 in http://wiki.freebsd.org/PmcTools Linux - Xeon Phi [MIC, KNC, Knight's Corner] ***************************************************************************** Full PAPI support of the MIC card requires MPSS Gold Update 2 or above, and a cross-compilation toolchain from Intel, the Intel C compiler is also supported. The compiler ----------------------------------------------------------------------------- * Download one of the MPSS full source bundles at [http://software.intel.com/en-us/articles/intel-manycore-platform-software-stack-mpss] * Untar the download. * Extract gpl/package-cross-k1om.tar.bz2 Building PAPI - gcc cross compiler ----------------------------------------------------------------------------- * Add usr/linux-k1om-4.7/bin or equivalent to your PATH so PAPI can find the cross-build utils. (see above for instructions on acquiring the cross compilation toolchain) * You will need to invoke configure with options: > ./configure --with-mic --host=x86_64-k1om-linux --with-arch=k1om This sets up cross-compilation and sets options needed by PAPI. * Run make to build the library. Building PAPI - icc ----------------------------------------------------------------------------- If icc is in your path, > ./configure --with-mic You may have to provide additional configuration options... try > ./configure --with-mic --with-ffsll --with-walltimer=cycle --with-tls=__thread --with-virtualtimer=clock_thread_cputime_id This builds a mic native version of the library. Offload Code ------------ To use PAPI in MIC offload code, build a mic-native version of PAPI as detailed above. The PAPI utility programs can be run on the MIC using the micnativeloadex tool provided by Intel. The MIC events may require additional qualifiers to set the exclude_guest and exclude_host bits to 0 (eventname:mg=1:mh=1). For example, get a list of events available on the MIC by calling: micnativeloadex ./utils/papi_native_avail Then get an event count while setting the appropriate qualifiers micnativeloadex ./utils/papi_command_line -a "CPU_CLK_UNHALTED:mg=1:mh=1" To add offload code into your program, wrap the papi.h header as follows: #pragma offload_attribute (push,target(mic)) #include "papi.h" #pragma offload_attribute (pop) Make PAPI calls from offload code as normal. Finally add -offload-option,mic,ld,$(path_to_papi)/libpapi.a to your compile incantation or if that does not recognise papi library try -offload-option,mic,compiler,"-lpapi -L" to your compile incantation Linux - Itanium II & Montecito ***************************************************************************** PAPI on Itanium Linux links to the perfmon library. The library version and the Itanium version are automatically determined by configure. If you wish to override the defaults, a number of pfm options are available to configure. Use: % ./configure --help to learn more about these options. Follow the general installation instructions to complete your installation. PLATFORM NOTES: The earprofile test fails under perfmon for Itanium II. It has been reconfigured to work on the upcoming perfmon2 interface. Linux - PPC64 (POWER5, POWER5+, POWER6 and PowerPC970) **************************************************************************** Linux/PPC64 requires that the kernel be patched and recompiled with the PerfCtr patch if the kernel is version 2.6.30 or older. The required patches and complete installation instructions are provided in the papi/src/perfctr-2.7.x directory. PPC64 is the ONLY platform that REQUIRES use of PerfCtr 2.7.x. *- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -* WARNING: You should always use a PerfCtr distribution that has been distributed with a version of PAPI or your build will fail. The reason for this is that PAPI builds a shared library of the Perfctr runtime, on which libpapi.so depends. PAPI also depends on the .a file, which it decomposes into component objects files and includes in the libpapi.a file for convenience. If you install a new perfctr, even a shared library, YOU MUST REBUILD PAPI to get a proper, working libpapi.a. There are several options in configure to allow you to specify your perfctr version and location. Use: % ./configure --help to learn more about these options. Follow the general installation instructions to complete your installation. Linux Perf Events ( with kernel 2.6.32 and newer ) ***************************************************************************** Performance counter support has been merged as the "Perf Events" subsystem as of Linux 2.6.32. This means that PAPI can be built without patching the kernel on new enough systems. Perf Events support is new, and certain functionality does not work. If you need any of the functionality listed below, we recommend you install the PerfCtr patchset and use that in conjunction with PAPI. + PAPI requires at least Linux kernel 2.6.32, as the earlier 2.6.31 version had some significant API changes. + Kernels before 2.6.33 have extra overhead when determining whether events conflict or not. + Counter multiplexing is handled by PAPI (rather than perf_events) on kernels before 2.6.33 due to a bug in the kernel perf_events code. + Nehalem EX support requires kernel 2.6.34 or newer. + Pentium 4 support requires kernel 2.6.35 or newer. The PAPI configure script should auto-detect the availability of Perf Events on new enough distributions (this mainly requires that perf_event.h be available in /usr/include/linux) On older distributions (even ones that include the 2.6.32 kernel) the perf_event.h file might not be there. One fix is to install your distributions linux kernel headers package, which is often an optional package not installed by default. If you cannot install the kernel headers, you can obtain the perf_event.h file from your kernel and run configure as such: ./configure --with-pe-incdir=INCDIR replacing INCDIR with the directory that perf_event.h is in. Linux PerfCtr (requires patching the kernel) ***************************************************************************** When using Linux kernels before 2.6.32 the kernel must be patched with the PerfCtr patch set. (This patchset can also be used on more recent kernels if the support provided by Perf Events is not enough for your workload). The required patches and complete installation instructions are provided in the papi/src/perfctr-x.y directory. Please see the INSTALL file in that directory. Do not forget, you also need to build your kernel with APIC support in order for hardware overflow to work. This is very important for accurate statistical profiling ala gprof via the hardware counters. So, when you configure your kernel to build with PERFCTR as above, make sure you turn on APIC support in the "Processor type and features" section. This should be enabled by default if you are on an SMP, but it is disabled by default on a UP. In our 2.4.x kernels: > grep PIC /usr/src/linux/.config /usr/src/linux/.config:CONFIG_X86_GOOD_APIC=y /usr/src/linux/.config:CONFIG_X86_UP_APIC=y /usr/src/linux/.config:CONFIG_X86_UP_IOAPIC=y /usr/src/linux/.config:CONFIG_X86_LOCAL_APIC=y /usr/src/linux/.config:CONFIG_X86_IO_APIC=y You can verify the APIC is working after rebooting with the new kernel by running the 'perfex -i' command found in the perfctr/examples/perfex directory. PAPI on x86 assumes PerfCtr 2.6.x. NOTE: THE VERSIONS OF PERFCTR DO NOT CORRESPOND TO LINUX KERNEL VERSIONS. *- IF YOU HAVE ALREADY PATCHED YOUR KERNEL AND/OR INSTALLED PERFCTR -* WARNING: You should always use a PerfCtr distribution that has been distributed with a version of PAPI or your build may fail. Newer versions with backward compatibility may also work. PAPI builds a shared library of the Perfctr runtime, on which libpapi.so depends. PAPI also depends on the .a file, which it decomposes into component objects files and includes in the libpapi.a file for convenience. If you install a new PerfCtr, even a shared library, YOU MUST REBUILD PAPI to get a proper, working libpapi.a. There are several options in configure to allow you to specify your perfctr version and location. Use: % ./configure --help to learn more about these options. Follow the general installation instructions to complete your installation.PERFCT *- IF PERFCTR IS INSTALLED BUT PAPI FAILS TO INITIALIZE -* You may be running udev, which is not smart enough to know the permissions of dynamically created devices. To fix this, find your udev/devices directory, often /lib/udev/devices or /etc/udev/devices and perform the following actions: mknod perfctr c 10 182 chmod 644 perfctr On Ubuntu 6.06 (and probably other debian distros), add a line to /etc/udev/rules.d/40-permissions.rules like this: KERNEL=="perfctr", MODE="0666" On SuSE, you may need to add something like the following to /etc/udev/rules.d/50-udev-default.rules: (SuSE does not have the 40-permissions.rules file in it.] # cpu devices KERNEL=="cpu[0-9]*", NAME="cpu/%n/cpuid" KERNEL=="msr[0-9]*", NAME="cpu/%n/msr" KERNEL=="microcode", NAME="cpu/microcode", MODE="0600" KERNEL=="perfctr", NAME="perfctr", MODE="0644" These lines tell udev to always create the device file with the appropriate permissions. Use 'perfex -i' from the perfctr distribution to test this fix. PLATFORM NOTES: Opteron fails the matrix-hl test because the default definition of PAPI_FP_OPS overcounts speculative floating point operations. Solaris 8 - Ultrasparc ***************************************************************************** The only requirement for Solaris is that you must be running version 2.8 or newer. As long as that requirement is met, no additional steps are required to install PAPI and you can follow the general installation guide. Solaris 10 - UltraSPARC T2/Niagara 2 ***************************************************************************** PAPI supports the Niagara 2 on Solaris 10. The substrate offers support for common basic operations like adding/reading/etc and the advanced features multiplexing (see below), overflow handling and profiling. The implementation for Solaris 10 is based on libcpc 2, which offers access to the underlying performance counters. Performance counters for the UltraSPARC architecture are described in the UltraSPARC architecture manual in general with detailed descriptions in the actual processor manual. In case of this substrate the documentation for performance counters can be found at: - http://www.opensparc.net/publications/specifications/ In order to install PAPI on this platform make sure the packages SUNWcpc and SUNWcpcu are installed. For the compilation Sun Studio 12 was used while the substrate has been developed. GNU GCC has not been tested and would require to modify the makefiles Makefile.solaris-niagara2 (32 bit) and Makefile.solaris-niagara2-64bit (64 bit). The steps required for installation are as follows: ./configure --with-bitmode=[32|64] --prefix=/is/optional If no --with-bitmode parameter is present a default of 32 bit is assumed. If no --prefix is used, a default of /usr/local is assumed. make make install If you want to link your application against your installation you should make sure to include at least the following linker options: -lpapi -lcpc PLEASE NOTE: This is the first revision of Niagara 2/libcpc 2/Solaris 10 support and needs further testing! Contributions, especially for the preset definitions, would be very appreciated. MULTIPLEXING: As the Niagara 2 offers no native event to count the cycles elapsed, a "synthetic event" was created offering access to the cycle count. This event is neither as accurate as the native events, nor it should be used for anything else than the multiplexing mode, which needs the cycle count in order to work. Therefore multiplexing and the preset PAPI_TOT_CYC should be only used with caution. BEWARE OF WRONG COUNTER RESULTS! Windows XP/2000/Server 2003 - Intel Pentium III or AMD Athlon / Opteron ***************************************************************************** Please use PAPI 3.7 (http://icl.cs.utk.edu/projects/papi/downloads/papi-3.7.2.tar.gz) The Windows source tree comes with Microsoft Visual Studio Version 8 projects to build a graphical shell application, the PAPI library as a DLL, a kernel driver to provide access to the counters, and a collection of C test programs. The WinPMC driver must be installed with administrator privileges. See the winpmc.html file in the papi/win2k/winpmc directory for details on building and installing this driver. The general installation instructions are irrelevant for Windows. Other Platforms ***************************************************************************** PAPI can be compiled and installed on most platforms that have GNU compilers regardless of operating system or hardware. This includes, for example, Macintosh systems running recent versions of OSX. However, PAPI can only provide access to the CPU hardware counters on platforms that are directly supported. Unsupported platforms will run, buttony provide basic timing functions, and potential access to some non-cpu components. ***************************************************************************** CREATING AND RUNNING COMPONENTS ***************************************************************************** Basic instructions on how to create a new component can be found in src/components/README. The components directory contains several components developed by the PAPI team along with a simple yet functional "example" component which can be used as a guide to aid third-party developers. Assuming components are developed according to the specified guidelines, they will function within the PAPI framework without requiring any changes to PAPI source code. Before running any component that requires configuration, the configure script for that component must be executed in order to generate the Makefile which contains the configuration settings. Normally, the script will only need to be executed once. Depending on the component, configure may require that one or more configuration settings be specified by the user. The components to be added to PAPI are specified during the configuration of PAPI by adding the --with-components= command line option to configure. For example, to add the acpi, lustre, and net components, the option would be: % ./configure --with-components="acpi lustre net" Attempting to add a component to PAPI which requires configuration and has not been configured will result in a compilation error because the PAPI build environment will be unable to find the Makefile for that component.