Blame src/components/cuda/sampling/README

Packit 577717
########################################
Packit 577717
Author: Sangamesh Ragate
Packit 577717
Date  : Nov 12th 2015
Packit 577717
email : sragate@vols.utk.edu
Packit 577717
INNOVATIVE COMPUTING LABORATORY, UTK
Packit 577717
#######################################
Packit 577717
Packit 577717
Packit 577717
Descripttion :	This Utility helps in configuring CUPTI for performing PC_SAMPLING
Packit 577717
				on MAXWELL GPUS and those which have PC_SAMPLING support. It is a 
Packit 577717
				standalone tool that works like nvprof and can be used to get the 
Packit 577717
				PC sampling result on any cuda application without re building 
Packit 577717
				the application. 
Packit 577717
Packit 577717
************************************************************************************************************
Packit 577717
To Compile:
Packit 577717
Packit 577717
	This utility get compiled automatically when the cuda component for PAPI is compiled
Packit 577717
	The *papi_sampling_cuda executable is generated in the src/utils diretory of PAPI
Packit 577717
Packit 577717
************************************************************************************************************
Packit 577717
To run the utility:
Packit 577717
Packit 577717
	./papi_sampling_cuda [-d <GPU Device ID>] [-s <sampling period>] cuda_app [its arguments]
Packit 577717
Packit 577717
	-d : This switch is optional and is used to supply GPU device ID, should 
Packit 577717
		 be integer > 0, Default is GPU device ID 0
Packit 577717
	-s : This switch is optional and is used to supply PC sampling period. 
Packit 577717
	     Range from 0 to 5, refer "enum CUpti_ActivityPCSamplingPeriod" in
Packit 577717
		 CUPTI user manual. Default is set to 5.
Packit 577717
Packit 577717
    cuda_app : this is the cuda applicationf for which PC SAMPLING is performed
Packit 577717
	All the arguments that come next belong to the cuda_app.
Packit 577717
	
Packit 577717
Packit 577717
************************************************************************************************************
Packit 577717
Example to try:
Packit 577717
	
Packit 577717
	After successful compilation of PAPI with cuda component
Packit 577717
	> cd src/components/cuda/sampling/test
Packit 577717
Packit 577717
Packit 577717
	NOTE: Make sure papi and cuda shared libraries are in the LD_LIBRARY_PATH before you run the test.
Packit 577717
	matmul is a cuda application which performs 512x512 matrix multiplication
Packit 577717
Packit 577717
	Try:
Packit 577717
	> ./papi_sampling_cuda matmul
Packit 577717
	> ./papi_sampling_cuda -d 0 -s 0 matmul
Packit 577717
	> ./papi_sampling_cuda -d 0 -s 5 matmul
Packit 577717
	> ./papi_sampling_cuda -d 0 matmul
Packit 577717
	> ./papi_sampling_cuda -s 2 matmul
Packit 577717
Packit 577717
Packit 577717
************************************************************************************************************
Packit 577717
Output:
Packit 577717
Packit 577717
	>Kernel activity record : This gives information about the cuda kernel that was launched for PC SAMPLING
Packit 577717
	>Activity Kind record   : This gives information about the cuda kernel that was launched for PC SAMPLING
Packit 577717
	>PC_SAMPLING record     : Kernel identification, PC value, samples, stall reason
Packit 577717
	>Source locator record  : This is generated if cuda_app is compiled using "-lineinfo" in nvcc
Packit 577717
	>STALL SUMMARY          : This gives the histogram of Stall reason Vs Number of samples due to the 
Packit 577717
							  corresponding stall.
Packit 577717
Packit 577717
	NOTE: To better understand the output generated, the user should be familiar with the "Activity API" 
Packit 577717
	Records of cupti, more specifically KERNEL,SOURCE_LOACTOR,PC_SAMPLING activity records mentioned in the 
Packit 577717
	CUPT manual.
Packit 577717
************************************************************************************************************
Packit 577717
Additional Feature:
Packit 577717
Packit 577717
	The utility also generates SASS dump that can be used to trace the stall to the source code line in the 
Packit 577717
	CUDA application. To get the source code line info, recompile only your cuda_app using "-lineinfo" flag in the
Packit 577717
	nvcc.
Packit 577717
Packit 577717
Packit 577717
************************************************************************************************************