Blob Blame History Raw
########################################
Author: Sangamesh Ragate
Date  : Nov 12th 2015
email : sragate@vols.utk.edu
INNOVATIVE COMPUTING LABORATORY, UTK
#######################################


Descripttion :	This Utility helps in configuring CUPTI for performing PC_SAMPLING
				on MAXWELL GPUS and those which have PC_SAMPLING support. It is a 
				standalone tool that works like nvprof and can be used to get the 
				PC sampling result on any cuda application without re building 
				the application. 

************************************************************************************************************
To Compile:

	This utility get compiled automatically when the cuda component for PAPI is compiled
	The *papi_sampling_cuda executable is generated in the src/utils diretory of PAPI

************************************************************************************************************
To run the utility:

	./papi_sampling_cuda [-d <GPU Device ID>] [-s <sampling period>] cuda_app [its arguments]

	-d : This switch is optional and is used to supply GPU device ID, should 
		 be integer > 0, Default is GPU device ID 0
	-s : This switch is optional and is used to supply PC sampling period. 
	     Range from 0 to 5, refer "enum CUpti_ActivityPCSamplingPeriod" in
		 CUPTI user manual. Default is set to 5.

    cuda_app : this is the cuda applicationf for which PC SAMPLING is performed
	All the arguments that come next belong to the cuda_app.
	

************************************************************************************************************
Example to try:
	
	After successful compilation of PAPI with cuda component
	> cd src/components/cuda/sampling/test


	NOTE: Make sure papi and cuda shared libraries are in the LD_LIBRARY_PATH before you run the test.
	matmul is a cuda application which performs 512x512 matrix multiplication

	Try:
	> ./papi_sampling_cuda matmul
	> ./papi_sampling_cuda -d 0 -s 0 matmul
	> ./papi_sampling_cuda -d 0 -s 5 matmul
	> ./papi_sampling_cuda -d 0 matmul
	> ./papi_sampling_cuda -s 2 matmul


************************************************************************************************************
Output:

	>Kernel activity record : This gives information about the cuda kernel that was launched for PC SAMPLING
	>Activity Kind record   : This gives information about the cuda kernel that was launched for PC SAMPLING
	>PC_SAMPLING record     : Kernel identification, PC value, samples, stall reason
	>Source locator record  : This is generated if cuda_app is compiled using "-lineinfo" in nvcc
	>STALL SUMMARY          : This gives the histogram of Stall reason Vs Number of samples due to the 
							  corresponding stall.

	NOTE: To better understand the output generated, the user should be familiar with the "Activity API" 
	Records of cupti, more specifically KERNEL,SOURCE_LOACTOR,PC_SAMPLING activity records mentioned in the 
	CUPT manual.
************************************************************************************************************
Additional Feature:

	The utility also generates SASS dump that can be used to trace the stall to the source code line in the 
	CUDA application. To get the source code line info, recompile only your cuda_app using "-lineinfo" flag in the
	nvcc.


************************************************************************************************************