######################################## Author: Sangamesh Ragate Date : Nov 12th 2015 email : sragate@vols.utk.edu INNOVATIVE COMPUTING LABORATORY, UTK ####################################### Descripttion : This Utility helps in configuring CUPTI for performing PC_SAMPLING on MAXWELL GPUS and those which have PC_SAMPLING support. It is a standalone tool that works like nvprof and can be used to get the PC sampling result on any cuda application without re building the application. ************************************************************************************************************ To Compile: This utility get compiled automatically when the cuda component for PAPI is compiled The *papi_sampling_cuda executable is generated in the src/utils diretory of PAPI ************************************************************************************************************ To run the utility: ./papi_sampling_cuda [-d ] [-s ] cuda_app [its arguments] -d : This switch is optional and is used to supply GPU device ID, should be integer > 0, Default is GPU device ID 0 -s : This switch is optional and is used to supply PC sampling period. Range from 0 to 5, refer "enum CUpti_ActivityPCSamplingPeriod" in CUPTI user manual. Default is set to 5. cuda_app : this is the cuda applicationf for which PC SAMPLING is performed All the arguments that come next belong to the cuda_app. ************************************************************************************************************ Example to try: After successful compilation of PAPI with cuda component > cd src/components/cuda/sampling/test NOTE: Make sure papi and cuda shared libraries are in the LD_LIBRARY_PATH before you run the test. matmul is a cuda application which performs 512x512 matrix multiplication Try: > ./papi_sampling_cuda matmul > ./papi_sampling_cuda -d 0 -s 0 matmul > ./papi_sampling_cuda -d 0 -s 5 matmul > ./papi_sampling_cuda -d 0 matmul > ./papi_sampling_cuda -s 2 matmul ************************************************************************************************************ Output: >Kernel activity record : This gives information about the cuda kernel that was launched for PC SAMPLING >Activity Kind record : This gives information about the cuda kernel that was launched for PC SAMPLING >PC_SAMPLING record : Kernel identification, PC value, samples, stall reason >Source locator record : This is generated if cuda_app is compiled using "-lineinfo" in nvcc >STALL SUMMARY : This gives the histogram of Stall reason Vs Number of samples due to the corresponding stall. NOTE: To better understand the output generated, the user should be familiar with the "Activity API" Records of cupti, more specifically KERNEL,SOURCE_LOACTOR,PC_SAMPLING activity records mentioned in the CUPT manual. ************************************************************************************************************ Additional Feature: The utility also generates SASS dump that can be used to trace the stall to the source code line in the CUDA application. To get the source code line info, recompile only your cuda_app using "-lineinfo" flag in the nvcc. ************************************************************************************************************