Blame src/components/cuda/tests/cuda_ld_preload_example.README

Packit 577717
Example of using LD_PRELOAD with the CUDA component.  
Packit 577717
Asim YarKhan (2015)
Packit 577717
Packit 577717
A short example of using LD_PRELOAD on a Linux system to intercept
Packit 577717
function calls and PAPI-enable an un-instrumented CUDA binary.
Packit 577717
Packit 577717
Several CUDA events (e.g. SM PM counters) require a CUcontext handle
Packit 577717
to be a provided since they are context switched. This means that we
Packit 577717
cannot use a PAPI_attach from an external process to measure those
Packit 577717
events in a preexisting executable.  These events can only be measured
Packit 577717
from within the CUcontext, that is, within the CUDA enabled code we
Packit 577717
are trying to measure.  If the user is unable to change the source
Packit 577717
code, they may be able to use LD_PRELOAD's ability to trap functions
Packit 577717
and measure the events for within the executable.
Packit 577717
Packit 577717
This example is designed to work with the simpleMultiGPU_no_counters
Packit 577717
binary in the PAPI CUDA component tests directory.  We use ltrace to
Packit 577717
figure out where to attach the PAPI start, PAPI eventset management
Packit 577717
and PAPI_stop.  Please note that this is a rough example; return codes
Packit 577717
are not be checked and other changes may be required to make sure that
Packit 577717
the calls are intercepted at the right moment.
Packit 577717
Packit 577717
First trace the library calls in simpleMultiGPU_no_counters binary
Packit 577717
were traced using ltrace.  Note in the ltrace output that the CUDA C
Packit 577717
APIs are different from the CUDA calls visible to nvcc. Then figure
Packit 577717
out appropriate place to attach the PAPI calls.  The initialization is
Packit 577717
attached to the first entry to cudaSetDevice.  Each cudaSetDevice is
Packit 577717
also used to setup the PAPI events for that device.  It was harder to
Packit 577717
figure out where to attach the PAPI_start.  After running some tests,
Packit 577717
I attached it to the 18th invocation of gettimeofday (kind of
Packit 577717
arbitrary! Sorry! May need tweaking).  The PAPI_stop was attached to
Packit 577717
the first invocation of cudaFreeHost.
Packit 577717
Packit 577717
Packit 577717
[Note: There are other events that do not require a CUcontext.  The PM
Packit 577717
counter for TEX, L2, and FB are not context switched so it would be
Packit 577717
possible to sample these values from any context as long as the
Packit 577717
context is on the same CUDA device. These events could be measured
Packit 577717
using a PAPI_attach from another process using the same CUDA device.]
Packit 577717
Packit 577717
Packit 577717
--------------------------------------------------
Packit 577717
How to use this example... please read carefully to make sense of the following.
Packit 577717
Packit 577717
Build:
Packit 577717
make cuda_ld_preload_example.so
Packit 577717
Packit 577717
Trace the executable using ltrace to figure out where to intercept the calls: 
Packit 577717
# Do the tracing with a small example!
Packit 577717
# ( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH}  && ltrace --output ltrace.out --library /usr/lib64/libcuda.so.1 ./simpleMultiGPU_no_counters  )
Packit 577717
# ( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && LD_PRELOAD=./cuda_ld_preload_example.so ltrace ./simpleMultiGPU_no_counters )
Packit 577717
Packit 577717
Run using dynamic linking to find the correct libraries:
Packit 577717
( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && LD_PRELOAD=./cuda_ld_preload_example.so ./simpleMultiGPU_no_counters )
Packit 577717
Packit 577717
make cuda_ld_preload_example.so && ( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && LD_PRELOAD=./cuda_ld_preload_example.so ./simpleMultiGPU_no_counters )