|
Packit |
577717 |
.TH LIBPFM 3 "January, 2009" "" "Linux Programmer's Manual"
|
|
Packit |
577717 |
.SH NAME
|
|
Packit |
577717 |
libpfm_nehalem - support for Intel Nehalem processor family
|
|
Packit |
577717 |
.SH SYNOPSIS
|
|
Packit |
577717 |
.nf
|
|
Packit |
577717 |
.B #include <perfmon/pfmlib.h>
|
|
Packit |
577717 |
.B #include <perfmon/pfmlib_intel_nhm.h>
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
.SH DESCRIPTION
|
|
Packit |
577717 |
The libpfm library provides full support for the Intel Nehalem processor family, such as
|
|
Packit |
577717 |
Intel Core i7. The interface is defined in \fBpfmlib_intel_nhm.h\fR. It consists of a set
|
|
Packit |
577717 |
of functions and structures describing the Intel Nehalem processor specific PMU features.
|
|
Packit |
577717 |
The Intel Nehalem processor is a quad core, dual thread processor. It includes two types
|
|
Packit |
577717 |
of PMU: core and uncore. The latter measures events at the socket level and is therefore
|
|
Packit |
577717 |
disconnected from any of the four cores. The core PMU implements Intel architectural
|
|
Packit |
577717 |
perfmon version 3 with four generic counters and three fixed counters. The uncore has
|
|
Packit |
577717 |
eight generic counters and one fixed counter. Each Intel Nehalem core also implement
|
|
Packit |
577717 |
a 16-deep branch trace buffer, called Last Branch Record (LBR), which can be used in
|
|
Packit |
577717 |
combination with the core PMU. Intel Nehalem implements a newer version of the
|
|
Packit |
577717 |
Precise Event-Based Sampling (PEBS) mechanism which has the ability to capture
|
|
Packit |
577717 |
where cache misses occur.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
When Intel Nehalem processor specific features are needed to support a measurement, their
|
|
Packit |
577717 |
descriptions must be passed as model-specific input arguments to the
|
|
Packit |
577717 |
\fBpfm_dispatch_events()\fR function. The Intel Nehalem processors specific input
|
|
Packit |
577717 |
arguments are described in the \fBpfmlib_nhm_input_param_t\fR structure. No
|
|
Packit |
577717 |
output parameters are currently defined. The input parameters are defined as follows:
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
.nf
|
|
Packit |
577717 |
typedef struct {
|
|
Packit |
577717 |
unsigned long cnt_mask;
|
|
Packit |
577717 |
unsigned int flags;
|
|
Packit |
577717 |
} pfmlib_nhm_counter_t;
|
|
Packit |
577717 |
|
|
Packit |
577717 |
typedef struct {
|
|
Packit |
577717 |
unsigned int lbr_used;
|
|
Packit |
577717 |
unsigned int lbr_plm;
|
|
Packit |
577717 |
unsigned int lbr_filter;
|
|
Packit |
577717 |
} pfmlib_nhm_lbr_t;
|
|
Packit |
577717 |
|
|
Packit |
577717 |
typedef struct {
|
|
Packit |
577717 |
unsigned int pebs_used;
|
|
Packit |
577717 |
unsigned int ld_lat_thres;
|
|
Packit |
577717 |
} pfmlib_nhm_pebs_t;
|
|
Packit |
577717 |
|
|
Packit |
577717 |
typedef struct {
|
|
Packit |
577717 |
pfmlib_nhm_counter_t pfp_nhm_counters[PMU_NHM_NUM_COUNTERS];
|
|
Packit |
577717 |
pfmlib_nhm_pebs_t pfp_nhm_pebs;
|
|
Packit |
577717 |
pfmlib_nhm_lbr_t pfm_nhm_lbr;
|
|
Packit |
577717 |
uint64_t reserved[4];
|
|
Packit |
577717 |
} pfmlib_nhm_input_param_t;
|
|
Packit |
577717 |
.fi
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
The Intel Nehalem processor provides a few additional per-event features for
|
|
Packit |
577717 |
counters: thresholding, inversion, edge detection, monitoring of both
|
|
Packit |
577717 |
threads, occupancy. They can be set using the \fBpfp_nhm_counters\fR data
|
|
Packit |
577717 |
structure for each event. The \fBflags\fR field can be initialized with
|
|
Packit |
577717 |
the following values, depending on the event:
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFMLIB_NHM_SEL_INV
|
|
Packit |
577717 |
Inverse the results of the \fBcnt_mask\fR comparison when set. This
|
|
Packit |
577717 |
flag is supported for core and uncore PMU events.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFMLIB_NHM_SEL_EDGE
|
|
Packit |
577717 |
Enables edge detection of events. This
|
|
Packit |
577717 |
flag is supported for core and uncore PMU events.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFMLIB_NHM_SEL_ANYTHR
|
|
Packit |
577717 |
Enable measuring the event in any of the two processor threads assuming hyper-threading
|
|
Packit |
577717 |
is enabled. By default, only the current thread is measured. This flag is restricted
|
|
Packit |
577717 |
to core PMU events.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFMLIB_NHM_SEL_OCC_RST
|
|
Packit |
577717 |
When set, the queue occupancy counter associated with the event is cleared. This flag
|
|
Packit |
577717 |
is only available to uncore PMU events.
|
|
Packit |
577717 |
.LP
|
|
Packit |
577717 |
The \fBcnt_mask\fR field is used to set the event threshold.
|
|
Packit |
577717 |
The value of the counter is incremented for each cycle in which the
|
|
Packit |
577717 |
number of occurrences of the event is greater or equal to the value of
|
|
Packit |
577717 |
the field. Thus, the event is modified to actually measure the number
|
|
Packit |
577717 |
of qualifying cycles. When zero all occurrences are counted (this is the default).
|
|
Packit |
577717 |
This flag is supported for core and uncore PMU events.
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
.SH Support for Precise-Event Based Sampling (PEBS)
|
|
Packit |
577717 |
The library can be used to setup the PMC registers associated with PEBS. In this case,
|
|
Packit |
577717 |
the \fBpfp_nhm_pebs_t\fR structure must be used and the \fBpebs_used\fR field must
|
|
Packit |
577717 |
be set to 1.
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
To enable the PEBS load latency filtering capability, it is necessary to program the
|
|
Packit |
577717 |
\fBMEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD\fR event into one generic counter. The
|
|
Packit |
577717 |
latency threshold must be passed to the library in the \fBld_lat_thres\fR field.
|
|
Packit |
577717 |
It is expressed in core cycles and \fBmust\fR greater than 3. Note that \fBpebs_used\fR
|
|
Packit |
577717 |
must be set as well.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
.SH Support for Last Branch Record (LBR)
|
|
Packit |
577717 |
The library can be used to setup LBR registers. On Intel Nehalem processors, the
|
|
Packit |
577717 |
LBR is 16-entry deep and it is possible to filter branches, based on privilege level
|
|
Packit |
577717 |
or type. To configure the LBR, the \fBpfm_nhm_lbr_t\fR structure must be used.
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
Like core PMU counters, LBR only distinguishes two privilege levels, 0 and the rest (1,2,3).
|
|
Packit |
577717 |
When running Linux natively, the kernel is at privilege level 0, applications at level 3.
|
|
Packit |
577717 |
It is possible to specify the privilege level of LBR using the \fBlbr_plm\fR. Any attempt
|
|
Packit |
577717 |
to pass \fBPFM_PLM1\fB or \fBPFM_PLM2\fR will be rejected. If \fB\lbr_plm\fR is 0, then the global
|
|
Packit |
577717 |
value in \fBpfmlib_input_param_t\fR and the \fBpfp_dfl_plm\fR is used.
|
|
Packit |
577717 |
.sp
|
|
Packit |
577717 |
By default, LBR captures all branches. It is possible to filter out branches by passing
|
|
Packit |
577717 |
a set of flags in \fBlbr_select\fR. The flags are as follows:
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFMLIB_NHM_LBR_JCC
|
|
Packit |
577717 |
When set, LBR does not capture conditional branches. Default: off.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFM_NHM_LBR_NEAR_REL_CALL
|
|
Packit |
577717 |
When set, LBR does not capture near calls. Default: off.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFM_NHM_LBR_NEAR_IND_CALL
|
|
Packit |
577717 |
When set, LBR does not capture indirect calls. Default: off.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFM_NHM_LBR_NEAR_RET
|
|
Packit |
577717 |
When set, LBR does not capture return branches. Default: off.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFM_NHM_LBR_NEAR_IND_JMP
|
|
Packit |
577717 |
When set, LBR does not capture indirect branches. Default: off.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFM_NHM_LBR_NEAR_REL_JMP
|
|
Packit |
577717 |
When set, LBR does not capture relative branches. Default: off.
|
|
Packit |
577717 |
.TP
|
|
Packit |
577717 |
.B PFM_NHM_LBR_FAR_BRANCH
|
|
Packit |
577717 |
When set, LBR does not capture far branches. Default: off.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
.SH Support for uncore PMU
|
|
Packit |
577717 |
|
|
Packit |
577717 |
By nature, the uncore PMU does not distinguish privilege levels, therefore
|
|
Packit |
577717 |
it captures events at all privilege levels. To avoid any misinterpretation,
|
|
Packit |
577717 |
the library enforces that uncore events be measured with both \fBPFM_PLM0\fR
|
|
Packit |
577717 |
and \fBPFM_PLM3\fR set.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
Tools and operating system kernel interfaces may impose further restrictions
|
|
Packit |
577717 |
on how the uncore PMU can be accessed.
|
|
Packit |
577717 |
|
|
Packit |
577717 |
.SH SEE ALSO
|
|
Packit |
577717 |
pfm_dispatch_events(3) and set of examples shipped with the library
|
|
Packit |
577717 |
.SH AUTHOR
|
|
Packit |
577717 |
Stephane Eranian <eranian@gmail.com>
|
|
Packit |
577717 |
.PP
|