|
Packit |
577717 |
########################################
|
|
Packit |
577717 |
Author: Sangamesh Ragate
|
|
Packit |
577717 |
Date : Nov 12th 2015
|
|
Packit |
577717 |
email : sragate@vols.utk.edu
|
|
Packit |
577717 |
INNOVATIVE COMPUTING LABORATORY, UTK
|
|
Packit |
577717 |
#######################################
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Descripttion : This Utility helps in configuring CUPTI for performing PC_SAMPLING
|
|
Packit |
577717 |
on MAXWELL GPUS and those which have PC_SAMPLING support. It is a
|
|
Packit |
577717 |
standalone tool that works like nvprof and can be used to get the
|
|
Packit |
577717 |
PC sampling result on any cuda application without re building
|
|
Packit |
577717 |
the application.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
************************************************************************************************************
|
|
Packit |
577717 |
To Compile:
|
|
Packit |
577717 |
|
|
Packit |
577717 |
This utility get compiled automatically when the cuda component for PAPI is compiled
|
|
Packit |
577717 |
The *papi_sampling_cuda executable is generated in the src/utils diretory of PAPI
|
|
Packit |
577717 |
|
|
Packit |
577717 |
************************************************************************************************************
|
|
Packit |
577717 |
To run the utility:
|
|
Packit |
577717 |
|
|
Packit |
577717 |
./papi_sampling_cuda [-d <GPU Device ID>] [-s <sampling period>] cuda_app [its arguments]
|
|
Packit |
577717 |
|
|
Packit |
577717 |
-d : This switch is optional and is used to supply GPU device ID, should
|
|
Packit |
577717 |
be integer > 0, Default is GPU device ID 0
|
|
Packit |
577717 |
-s : This switch is optional and is used to supply PC sampling period.
|
|
Packit |
577717 |
Range from 0 to 5, refer "enum CUpti_ActivityPCSamplingPeriod" in
|
|
Packit |
577717 |
CUPTI user manual. Default is set to 5.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
cuda_app : this is the cuda applicationf for which PC SAMPLING is performed
|
|
Packit |
577717 |
All the arguments that come next belong to the cuda_app.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
************************************************************************************************************
|
|
Packit |
577717 |
Example to try:
|
|
Packit |
577717 |
|
|
Packit |
577717 |
After successful compilation of PAPI with cuda component
|
|
Packit |
577717 |
> cd src/components/cuda/sampling/test
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
NOTE: Make sure papi and cuda shared libraries are in the LD_LIBRARY_PATH before you run the test.
|
|
Packit |
577717 |
matmul is a cuda application which performs 512x512 matrix multiplication
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Try:
|
|
Packit |
577717 |
> ./papi_sampling_cuda matmul
|
|
Packit |
577717 |
> ./papi_sampling_cuda -d 0 -s 0 matmul
|
|
Packit |
577717 |
> ./papi_sampling_cuda -d 0 -s 5 matmul
|
|
Packit |
577717 |
> ./papi_sampling_cuda -d 0 matmul
|
|
Packit |
577717 |
> ./papi_sampling_cuda -s 2 matmul
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
************************************************************************************************************
|
|
Packit |
577717 |
Output:
|
|
Packit |
577717 |
|
|
Packit |
577717 |
>Kernel activity record : This gives information about the cuda kernel that was launched for PC SAMPLING
|
|
Packit |
577717 |
>Activity Kind record : This gives information about the cuda kernel that was launched for PC SAMPLING
|
|
Packit |
577717 |
>PC_SAMPLING record : Kernel identification, PC value, samples, stall reason
|
|
Packit |
577717 |
>Source locator record : This is generated if cuda_app is compiled using "-lineinfo" in nvcc
|
|
Packit |
577717 |
>STALL SUMMARY : This gives the histogram of Stall reason Vs Number of samples due to the
|
|
Packit |
577717 |
corresponding stall.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
NOTE: To better understand the output generated, the user should be familiar with the "Activity API"
|
|
Packit |
577717 |
Records of cupti, more specifically KERNEL,SOURCE_LOACTOR,PC_SAMPLING activity records mentioned in the
|
|
Packit |
577717 |
CUPT manual.
|
|
Packit |
577717 |
************************************************************************************************************
|
|
Packit |
577717 |
Additional Feature:
|
|
Packit |
577717 |
|
|
Packit |
577717 |
The utility also generates SASS dump that can be used to trace the stall to the source code line in the
|
|
Packit |
577717 |
CUDA application. To get the source code line info, recompile only your cuda_app using "-lineinfo" flag in the
|
|
Packit |
577717 |
nvcc.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
|
|
Packit |
577717 |
************************************************************************************************************
|