Blame src/perfctr-2.7.x/linux/Documentation/perfctr/low-level-x86.txt

Packit 577717
$Id: low-level-x86.txt,v 1.2 2004/07/11 17:12:28 mikpe Exp $
Packit 577717
Packit 577717
PERFCTRS X86 LOW-LEVEL API
Packit 577717
==========================
Packit 577717
Packit 577717
See low-level-api.txt for the common low-level API.
Packit 577717
This document only describes x86-specific behaviour.
Packit 577717
For detailed hardware control register layouts, see
Packit 577717
the manufacturers' documentation.
Packit 577717
Packit 577717
Contents
Packit 577717
========
Packit 577717
- Supported processors
Packit 577717
- Contents of <asm-i386/perfctr.h>
Packit 577717
- Processor-specific Notes
Packit 577717
- Implementation Notes
Packit 577717
Packit 577717
Supported processors
Packit 577717
====================
Packit 577717
- Intel P5, P5MMX, P6, P4.
Packit 577717
- AMD K7, K8. (P6 clones, with some changes)
Packit 577717
- Cyrix 6x86MX, MII, and III. (good P5 clones)
Packit 577717
- Centaur WinChip C6, 2, and 3. (bad P5 clones)
Packit 577717
- VIA C3. (bad P6 clone)
Packit 577717
- Any generic x86 with a TSC.
Packit 577717
Packit 577717
Contents of <asm-i386/perfctr.h>
Packit 577717
================================
Packit 577717
Packit 577717
"struct perfctr_sum_ctrs"
Packit 577717
-------------------------
Packit 577717
struct perfctr_sum_ctrs {
Packit 577717
	unsigned long long tsc;
Packit 577717
	unsigned long long pmc[18];
Packit 577717
};
Packit 577717
Packit 577717
The pmc[] array has room for 18 counters.
Packit 577717
Packit 577717
"struct perfctr_cpu_control"
Packit 577717
----------------------------
Packit 577717
struct perfctr_cpu_control {
Packit 577717
	unsigned int tsc_on;
Packit 577717
	unsigned int nractrs;		/* # of a-mode counters */
Packit 577717
	unsigned int nrictrs;		/* # of i-mode counters */
Packit 577717
	unsigned int pmc_map[18];
Packit 577717
	unsigned int evntsel[18];	/* one per counter, even on P5 */
Packit 577717
	struct {
Packit 577717
		unsigned int escr[18];
Packit 577717
		unsigned int pebs_enable;	/* for replay tagging */
Packit 577717
		unsigned int pebs_matrix_vert;	/* for replay tagging */
Packit 577717
	} p4;
Packit 577717
	int ireset[18];			/* < 0, for i-mode counters */
Packit 577717
	unsigned int _reserved1;
Packit 577717
	unsigned int _reserved2;
Packit 577717
	unsigned int _reserved3;
Packit 577717
	unsigned int _reserved4;
Packit 577717
};
Packit 577717
Packit 577717
The per-counter arrays have room for 18 elements.
Packit 577717
Packit 577717
ireset[] values must be negative, since overflow occurs on
Packit 577717
the negative-to-non-negative transition.
Packit 577717
Packit 577717
The p4 sub-struct contains P4-specific control data:
Packit 577717
- escr[]: the control data to write to the ESCR register
Packit 577717
  associatied with the counter
Packit 577717
- pebs_enable: the control data to write to the PEBS_ENABLE MSR
Packit 577717
- pebs_matrix_vert: the control data to write to the
Packit 577717
  PEBS_MATRIX_VERT MSR
Packit 577717
Packit 577717
"struct perfctr_cpu_state"
Packit 577717
--------------------------
Packit 577717
struct perfctr_cpu_state {
Packit 577717
	unsigned int cstatus;
Packit 577717
	struct {	/* k1 is opaque in the user ABI */
Packit 577717
		unsigned int id;
Packit 577717
		int isuspend_cpu;
Packit 577717
	} k1;
Packit 577717
	/* The two tsc fields must be inlined. Placing them in a
Packit 577717
	   sub-struct causes unwanted internal padding on x86-64. */
Packit 577717
	unsigned int tsc_start;
Packit 577717
	unsigned long long tsc_sum;
Packit 577717
	struct {
Packit 577717
		unsigned int map;
Packit 577717
		unsigned int start;
Packit 577717
		unsigned long long sum;
Packit 577717
	} pmc[18];	/* the size is not part of the user ABI */
Packit 577717
#ifdef __KERNEL__
Packit 577717
	struct perfctr_cpu_control control;
Packit 577717
	unsigned int p4_escr_map[18];
Packit 577717
#endif
Packit 577717
};
Packit 577717
Packit 577717
The k1 sub-struct is used by the low-level driver for
Packit 577717
caching purposes. "id" identifies the control data, and
Packit 577717
"isuspend_cpu" identifies the CPU on which the i-mode
Packit 577717
counters were last suspended.
Packit 577717
Packit 577717
The pmc[] array has room for 18 elements.
Packit 577717
Packit 577717
p4_escr_map[] is computed from control by the low-level driver,
Packit 577717
and provides the MSR number for the counter's associated ESCR.
Packit 577717
Packit 577717
User-space overflow signal handler items
Packit 577717
----------------------------------------
Packit 577717
#ifdef __KERNEL__
Packit 577717
#define SI_PMC_OVF	(__SI_FAULT|'P')
Packit 577717
#else
Packit 577717
#define SI_PMC_OVF	('P')
Packit 577717
#endif
Packit 577717
#define si_pmc_ovf_mask	_sifields._pad[0]
Packit 577717
Packit 577717
Kernel-internal API
Packit 577717
-------------------
Packit 577717
Packit 577717
In perfctr_cpu_update_control(), the is_global parameter controls
Packit 577717
whether monitoring the other thread (T1) on HT P4s is permitted
Packit 577717
or not. On other processors the parameter is ignored.
Packit 577717
Packit 577717
SMP kernels define CONFIG_PERFCTR_CPUS_FORBIDDEN_MASK and
Packit 577717
"extern cpumask_t perfctr_cpus_forbidden_mask;".
Packit 577717
On HT P4s, resource conflicts can occur because both threads
Packit 577717
(T0 and T1) in a processor share the same perfctr registers.
Packit 577717
To prevent conflicts, only thread 0 in each processor is allowed
Packit 577717
to access the counters. perfctr_cpus_forbidden_mask contains the
Packit 577717
smp_processor_id()s of each processor's thread 1, and it is the
Packit 577717
responsibility of the high-level driver to ensure that it never
Packit 577717
accesses the perfctr state from a forbidden thread.
Packit 577717
Packit 577717
Overflow interrupt handling requires local APIC support in the kernel.
Packit 577717
Packit 577717
Processor-specific Notes
Packit 577717
========================
Packit 577717
Packit 577717
General
Packit 577717
-------
Packit 577717
pmc_map[] contains a counter number, as used by the RDPMC instruction.
Packit 577717
It never contains an MSR number.
Packit 577717
Packit 577717
Counters are 32, 40, or 48 bits wide. The driver always only
Packit 577717
reads the low 32 bits. This avoids performance issues, and
Packit 577717
errata on some processors.
Packit 577717
Packit 577717
Writing to counters or their control registers tends to be
Packit 577717
very expensive. This is why a-mode counters only use read
Packit 577717
operations on the counter registers. Caching of control
Packit 577717
register contents is done to avoid writing them. "Suspend CPU"
Packit 577717
is recorded for i-mode counters to avoid writing the counter
Packit 577717
registers when the counters are resumed (their control
Packit 577717
registers must be written at both suspend and resume, however).
Packit 577717
Packit 577717
Some processors are unable to stop the counters (Centaur/VIA),
Packit 577717
and some are unable to reinitialise them to arbitrary values (P6).
Packit 577717
Storing the counters' total counts in the hardware counters
Packit 577717
would break as soon as context-switches occur. This is another
Packit 577717
reason why the accumulate-differences method for maintaining the
Packit 577717
counter values is used.
Packit 577717
Packit 577717
Intel P5
Packit 577717
--------
Packit 577717
The hardware stores both counters' control data in a single
Packit 577717
control register, the CESR MSR. The evntsel values are
Packit 577717
limited to 16 bits each, and are combined by the low-level
Packit 577717
driver to form the value for the CESR. Apart from that,
Packit 577717
the evntsel values are direct images of the CESR.
Packit 577717
Packit 577717
Bits 0xFE00 in an evntsel value are reserved.
Packit 577717
At least one evntsel CPL bit (0x00C0) must be set.
Packit 577717
Packit 577717
For Cyrix' P5 clones, evntsel bits 0xFA00  are reserved.
Packit 577717
Packit 577717
For Centaur's P5 clones, evntsel bits 0xFF00 are reserved.
Packit 577717
It has no CPL bits to set. The TSC is broken and cannot be used.
Packit 577717
Packit 577717
Intel P6
Packit 577717
--------
Packit 577717
The evntsel values are mapped directly onto the counters'
Packit 577717
EVNTSEL control registers.
Packit 577717
Packit 577717
The global enable bit (22) in EVNTSEL0 must be set. That bit is
Packit 577717
reserved in EVNTSEL1.
Packit 577717
Packit 577717
Bits 21 and 19 (0x00280000) in each evntsel are reserved.
Packit 577717
Packit 577717
For an i-mode counter, bit 20 (0x00100000) of its evntsel must be
Packit 577717
set. For a-mode counters, that bit must not be set.
Packit 577717
Packit 577717
Hardware quirk: Counters are 40 bits wide, but writing to a
Packit 577717
counter only writes the low 32 bits: remaining bits are
Packit 577717
sign-extended from bit 31.
Packit 577717
Packit 577717
AMD K7/K8
Packit 577717
---------
Packit 577717
Similar to Intel P6. The main difference is that each evntsel has
Packit 577717
its own enable bit, which must be set.
Packit 577717
Packit 577717
VIA C3
Packit 577717
------
Packit 577717
Superficially similar to Intel P6, but only PERFCTR1/EVNTSEL1
Packit 577717
are programmable. pmc_map[0] must be 1, if nractrs == 1.
Packit 577717
Packit 577717
Bits 0xFFFFFE00 in the evntsel are reserved. There are no auxiliary
Packit 577717
control bits to set.
Packit 577717
Packit 577717
Generic
Packit 577717
-------
Packit 577717
Only permits TSC sampling, with tsc_on == 1 and nractrs == nrictrs == 0
Packit 577717
in the control data.
Packit 577717
Packit 577717
Intel P4
Packit 577717
--------
Packit 577717
For each counter, its evntsel[] value is mapped onto its CCCR
Packit 577717
control register, and its p4.escr[] value is mapped onto its
Packit 577717
associated ESCR control register.
Packit 577717
Packit 577717
The ESCR register number is computed from the hardware counter
Packit 577717
number (from pmc_map[]) and the ESCR SELECT field in the CCCR,
Packit 577717
and is cached in p4_escr_map[].
Packit 577717
Packit 577717
pmc_map[] contains the value to pass to RDPMC when reading the
Packit 577717
counter. It is strongly recommended to set bit 31 (fast rdpmc).
Packit 577717
Packit 577717
In each evntsel/CCCR value:
Packit 577717
- the OVF, OVF_PMI_T1 and hardware-reserved bits (0xB80007FF)
Packit 577717
  are reserved and must not be set
Packit 577717
- bit 11 (EXTENDED_CASCADE) is only permitted on P4 models >= 2,
Packit 577717
  and for counters 12 and 15-17
Packit 577717
- bits 16 and 17 (ACTIVE_THREAD) must both be set on non-HT processors
Packit 577717
- at least one of bits 12 (ENABLE), 30 (CASCADE), or 11 (EXTENDED_CASCADE)
Packit 577717
  must be set
Packit 577717
- bit 26 (OVF_PMI_T0) must be clear for a-mode counters, and set
Packit 577717
  for i-mode counters; if bit 25 (FORCE_OVF) also is set, then
Packit 577717
  the corresponding ireset[] value must be exactly -1
Packit 577717
Packit 577717
In each p4.escr[] value:
Packit 577717
- bit 32 is reserved and must not be set
Packit 577717
- the CPL_T1 field (bits 0 and 1) must be zero except on HT processors
Packit 577717
  when global-mode counters are used
Packit 577717
- IQ_ESCR0 and IQ_ESCR1 can only be used on P4 models <= 2
Packit 577717
Packit 577717
PEBS is not supported, but the replay tagging bits in PEBS_ENABLE
Packit 577717
and PEBS_MATRIX_VERT may be used.
Packit 577717
Packit 577717
If p4.pebs_enable is zero, then p4.pebs_matrix_vert must also be zero.
Packit 577717
Packit 577717
If p4.pebs_enable is non-zero:
Packit 577717
- only bits 24, 10, 9, 2, 1, and 0 may be set; note that in contrast
Packit 577717
  to Intel's documentation, bit 25 (ENABLE_PEBS_MY_THR) is not needed
Packit 577717
  and must not be set
Packit 577717
- bit 24 (UOP_TAG) must be set
Packit 577717
- at least one of bits 10, 9, 2, 1, or 0 must be set
Packit 577717
- in p4.pebs_matrix_vert, all bits except 1 and 0 must be clear,
Packit 577717
  and at least one of bits 1 and 0 must be set
Packit 577717
Packit 577717
Implementation Notes
Packit 577717
====================
Packit 577717
Packit 577717
Caching
Packit 577717
-------
Packit 577717
Each 'struct perfctr_cpu_state' contains two cache-related fields:
Packit 577717
- 'id': a unique identifier for the control data contents
Packit 577717
- 'isuspend_cpu': the identity of the CPU on which a state containing
Packit 577717
  interrupt-mode counters was last suspended
Packit 577717
Packit 577717
To this the driver adds a per-CPU cache, recording:
Packit 577717
- the 'id' of the control data currently in that CPU
Packit 577717
- the current contents of each control register
Packit 577717
Packit 577717
When perfctr_cpu_update_control() has validated the new control data,
Packit 577717
it also updates the id field.
Packit 577717
Packit 577717
The driver's internal 'write_control' function, called from the
Packit 577717
perfctr_cpu_resume() API function, first checks if the state's id
Packit 577717
matches that of the CPU's cache, and if so, returns. Otherwise
Packit 577717
it checks each control register in the state and updates those
Packit 577717
that do not match the cache. Finally, it writes the state's id
Packit 577717
to the cache. Tests on various x86 processor types have shown that
Packit 577717
MSR writes are very expensive: the purpose of these cache checks
Packit 577717
is to avoid MSR writes whenever possible.
Packit 577717
Packit 577717
Unlike accumulation-mode counters, interrupt-mode counters must be
Packit 577717
physically stopped when suspended, primilarly to avoid overflow
Packit 577717
interrupts in contexts not expecting them, and secondarily to avoid
Packit 577717
increments to the counters themselves (see below).
Packit 577717
Packit 577717
When suspending interrupt-mode counters, the driver:
Packit 577717
- records the CPU identity in the per-CPU cache
Packit 577717
- stops each interrupt-mode counter by disabling its control register
Packit 577717
- lets the cache and state id values remain the same
Packit 577717
Packit 577717
Later, when resuming interrupt-mode counters, the driver:
Packit 577717
- if the state and cache id values match:
Packit 577717
  * the cache id is cleared, to force a reload of the control
Packit 577717
    registers stopped at suspend (see below)
Packit 577717
  * if the state's "suspend" CPU identity matches the current CPU,
Packit 577717
    the counter registers are still valid, and the procedure returns
Packit 577717
- if the procedure did not return above, it then loops over each
Packit 577717
  interrupt-mode counter:
Packit 577717
  * the counter's control register is physically disabled, unless
Packit 577717
    the cache indicates that it already is disabled; this is necessary
Packit 577717
    to prevent premature events and overflow interrupts if the CPU's
Packit 577717
    registers previously belonged to some other state
Packit 577717
  * then the counter register itself is restored
Packit 577717
After this interrupt-mode specific resume code is complete, the
Packit 577717
driver continues by calling 'write_control' as described above.
Packit 577717
The state and cache ids will not match, forcing write_control to
Packit 577717
reload the disabled interrupt-mode control registers.
Packit 577717
Packit 577717
Call-site Backpatching
Packit 577717
----------------------
Packit 577717
The x86 family of processors is quite diverse in how their
Packit 577717
performance counters work and are accessed. There are three
Packit 577717
main designs (P5, P6, and P4) with several variations.
Packit 577717
To handle this the processor type detection and initialisation
Packit 577717
code sets up a number of function pointers to point to the
Packit 577717
correct procedures for the actual CPU type.
Packit 577717
Packit 577717
Calls via function pointers are more expensive than direct calls,
Packit 577717
so the driver actually performs direct calls to wrappers that
Packit 577717
backpatch the original call sites to instead call the actual
Packit 577717
CPU-specific functions in the future.
Packit 577717
Packit 577717
Unsynchronised code backpatching in SMP systems doesn't work
Packit 577717
on Intel P6 processors due to an erratum, so the driver performs
Packit 577717
a "finalise backpatching" step after the CPU-specific function
Packit 577717
pointers have been set up. This step invokes the API procedures
Packit 577717
on a temporary state object, set up to force every backpatchable
Packit 577717
call site to be invoked and adjusted.
Packit 577717
Packit 577717
Several low-level API procedures are called in the context-switch
Packit 577717
path by the per-process perfctrs kernel extension, which motivates
Packit 577717
the efforts to reduce runtime overheads as much as possible.
Packit 577717
Packit 577717
Overflow Interrupts
Packit 577717
-------------------
Packit 577717
The x86 hardware enables overflow interrupts via the local
Packit 577717
APIC's LVTPC entry, which is only present in P6/K7/K8/P4.
Packit 577717
Packit 577717
The low-level driver supports overflow interrupts as follows:
Packit 577717
- It reserves a local APIC vector, 0xee, as LOCAL_PERFCTR_VECTOR.
Packit 577717
- It adds a local APIC exception handler to entry.S, which
Packit 577717
  invokes the driver's smp_perfctr_interrupt() procedure.
Packit 577717
- It adds code to i8259.c to bind the LOCAL_PERFCTR_VECTOR
Packit 577717
  interrupt gate to the exception handler in entry.S.
Packit 577717
- During processor type detection, it records whether the
Packit 577717
  processor supports the local APIC, and sets up function pointers
Packit 577717
  for the suspend and resume operations on interrupt-mode counters.
Packit 577717
- When the low-level driver is activated, it enables overflow
Packit 577717
  interrupts by writing LOCAL_PERFCTR_VECTOR to each CPU's APIC_LVTPC.
Packit 577717
- Overflow interrupts now end up in smp_perfctr_interrupt(), which
Packit 577717
  ACKs the interrupt and invokes the interrupt handler installed
Packit 577717
  by the high-level service/driver.
Packit 577717
- When the low-level driver is deactivated, it disables overflow
Packit 577717
  interrupts by masking APIC_LVTPC in each CPU. It then releases
Packit 577717
  the local APIC back to the NMI watchdog.
Packit 577717
Packit 577717
At compile-time, the low-level driver indicates overflow interrupt
Packit 577717
support by enabling CONFIG_PERFCTR_INTERRUPT_SUPPORT. If the feature
Packit 577717
is also available at runtime, it sets the PERFCTR_FEATURE_PCINT flag
Packit 577717
in the perfctr_info object.