|
Packit |
577717 |
Example of using LD_PRELOAD with the CUDA component.
|
|
Packit |
577717 |
Asim YarKhan (2015)
|
|
Packit |
577717 |
|
|
Packit |
577717 |
A short example of using LD_PRELOAD on a Linux system to intercept
|
|
Packit |
577717 |
function calls and PAPI-enable an un-instrumented CUDA binary.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Several CUDA events (e.g. SM PM counters) require a CUcontext handle
|
|
Packit |
577717 |
to be a provided since they are context switched. This means that we
|
|
Packit |
577717 |
cannot use a PAPI_attach from an external process to measure those
|
|
Packit |
577717 |
events in a preexisting executable. These events can only be measured
|
|
Packit |
577717 |
from within the CUcontext, that is, within the CUDA enabled code we
|
|
Packit |
577717 |
are trying to measure. If the user is unable to change the source
|
|
Packit |
577717 |
code, they may be able to use LD_PRELOAD's ability to trap functions
|
|
Packit |
577717 |
and measure the events for within the executable.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
This example is designed to work with the simpleMultiGPU_no_counters
|
|
Packit |
577717 |
binary in the PAPI CUDA component tests directory. We use ltrace to
|
|
Packit |
577717 |
figure out where to attach the PAPI start, PAPI eventset management
|
|
Packit |
577717 |
and PAPI_stop. Please note that this is a rough example; return codes
|
|
Packit |
577717 |
are not be checked and other changes may be required to make sure that
|
|
Packit |
577717 |
the calls are intercepted at the right moment.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
First trace the library calls in simpleMultiGPU_no_counters binary
|
|
Packit |
577717 |
were traced using ltrace. Note in the ltrace output that the CUDA C
|
|
Packit |
577717 |
APIs are different from the CUDA calls visible to nvcc. Then figure
|
|
Packit |
577717 |
out appropriate place to attach the PAPI calls. The initialization is
|
|
Packit |
577717 |
attached to the first entry to cudaSetDevice. Each cudaSetDevice is
|
|
Packit |
577717 |
also used to setup the PAPI events for that device. It was harder to
|
|
Packit |
577717 |
figure out where to attach the PAPI_start. After running some tests,
|
|
Packit |
577717 |
I attached it to the 18th invocation of gettimeofday (kind of
|
|
Packit |
577717 |
arbitrary! Sorry! May need tweaking). The PAPI_stop was attached to
|
|
Packit |
577717 |
the first invocation of cudaFreeHost.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
[Note: There are other events that do not require a CUcontext. The PM
|
|
Packit |
577717 |
counter for TEX, L2, and FB are not context switched so it would be
|
|
Packit |
577717 |
possible to sample these values from any context as long as the
|
|
Packit |
577717 |
context is on the same CUDA device. These events could be measured
|
|
Packit |
577717 |
using a PAPI_attach from another process using the same CUDA device.]
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
--------------------------------------------------
|
|
Packit |
577717 |
How to use this example... please read carefully to make sense of the following.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Build:
|
|
Packit |
577717 |
make cuda_ld_preload_example.so
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Trace the executable using ltrace to figure out where to intercept the calls:
|
|
Packit |
577717 |
# Do the tracing with a small example!
|
|
Packit |
577717 |
# ( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && ltrace --output ltrace.out --library /usr/lib64/libcuda.so.1 ./simpleMultiGPU_no_counters )
|
|
Packit |
577717 |
# ( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && LD_PRELOAD=./cuda_ld_preload_example.so ltrace ./simpleMultiGPU_no_counters )
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Run using dynamic linking to find the correct libraries:
|
|
Packit |
577717 |
( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && LD_PRELOAD=./cuda_ld_preload_example.so ./simpleMultiGPU_no_counters )
|
|
Packit |
577717 |
|
|
Packit |
577717 |
make cuda_ld_preload_example.so && ( export PAPI_DIR=`pwd`/../../.. && export LIBPFM_LIBDIR=`pwd`/../../../libpfm4/lib && export LD_LIBRARY_PATH=./:${PAPI_DIR}:${LIBPFM_LIBDIR}:${LD_LIBRARY_PATH} && LD_PRELOAD=./cuda_ld_preload_example.so ./simpleMultiGPU_no_counters )
|