Blob Blame History Raw
$Id: low-level-api.txt,v 1.1 2004/07/02 18:57:05 mikpe Exp $

PERFCTR LOW-LEVEL DRIVERS API
=============================

This document describes the common low-level API.
See low-level-$ARCH.txt for architecture-specific documentation.

General Model
=============
The model is that of a processor with:
- A non-programmable clock-like counter, the "TSC".
  The TSC frequency is assumed to be constant, but it is not
  assumed to be identical to the core frequency.
  The TSC may be absent.
- A set of programmable counters, the "perfctrs" or "pmcs".
  Control data may be per-counter, global, or both.
  The counters are not assumed to be interchangeable.

  A normal counter that simply counts events is referred to
  as an "accumulation-mode" or "a-mode" counter. Its total
  count is computed by adding the counts for the individual
  periods during which the counter is active. Two per-counter
  state variables are used for this: "sum", which is the
  total count up to but not including the current period,
  and "start", which records the value of the hardware counter
  at the start of the current period. At the end of a period,
  the hardware counter's value is read again, and the increment
  relative the start value is added to the sum. This strategy
  is used because it avoids a number of hardware problems.

  A counter that has been programmed to generate an interrupt
  on overflow is referred to as an "interrupt-mode" or "i-mode"
  counter. I-mode counters are initialised to specific values,
  and after overflowing are reset to their (re)start values.
  The total event count is available just as for a-mode counters.

  The set of counters may be empty, in which case only the
  TSC (which must be present) can be sampled.

Contents of <asm-$ARCH/perfctr.h>
=================================

"struct perfctr_sum_ctrs"
-------------------------
struct perfctr_sum_ctrs {
	unsigned long long tsc;
	unsigned long long pmc[..];	/* one per counter */
};

Architecture-specific container for counter values.
Used in the kernel/user API, but not by the low-level drivers.

"struct perfctr_cpu_control"
----------------------------
This struct includes at least the following fields:

	unsigned int tsc_on;
	unsigned int nractrs;		/* # of a-mode counters */
	unsigned int nrictrs;		/* # of i-mode counters */
	unsigned int pmc_map[..];	/* one per counter: virt-to-phys mapping */
	unsigned int evntsel[..];	/* one per counter: hw control data */
	int ireset[..];			/* one per counter: i-mode (re)start value */

Architecture-specific container for control data.
Used both in the kernel/user API and by the low-level drivers
(embedded in "struct perfctr_cpu_state").

"tsc_on" is non-zero if the TSC should be sampled.

"nractrs" is the number of a-mode counters, corresponding to
elements 0..nractrs-1 in the per-counter arrays.

"nrictrs" is the number of i-mode counters, corresponding to
elements nractrs..nractrs+nrictrs-1 in the per-counter arrays.

"nractrs+nrictrs" is the total number of counters to program
and sample. A-mode and i-mode counters are separated in order
to allow quick enumeration of either set, which is needed in
some low-level driver operations.

"pmc_map[]" maps each counter to its corresponding hardware counter
identification. No two counters may map to the same hardware counter.
This mapping is present because the hardware may have asymmetric
counters or other addressing quirks, which means that a counter's index
may not suffice to address its hardware counter.

"evntsel[]" contains the per-counter control data. Architecture-specific
global control data, if any, is placed in architecture-specific fields.

"ireset[]" contains the (re)start values for the i-mode counters.
Only indices nractrs..nractrs+nrictrs-1 are used.

"struct perfctr_cpu_state"
--------------------------
This struct includes at least the following fields:

	unsigned int cstatus;
	unsigned int tsc_start;
	unsigned long long tsc_sum;
	struct {
		unsigned int map;
		unsigned int start;
		unsigned long long sum;
	} pmc[..];	/* one per counter; the size is not part of the user ABI */
#ifdef __KERNEL__
	struct perfctr_cpu_control control;
#endif

This type records the state and control data for a collection
of counters. It is used by many low-level operations, and may
be exported to user-space via mmap().

"cstatus" is a re-encoding of control.tsc_on/nractrs/nrictrs,
used because it reduces overheads in key low-level operations.
Operations on cstatus values include:
- unsigned int perfctr_mk_cstatus(unsigned int tsc_on, unsigned int nractrs, unsigned int nrictrs);
  Construct a cstatus value.
- unsigned int perfctr_cstatus_enabled(unsigned int cstatus);
  Check if any part (tsc_on, nractrs, nrictrs) of the cstatus is non-zero.
- int perfctr_cstatus_has_tsc(unsigned int cstatus);
  Check if the tsc_on part of the cstatus is non-zero.
- unsigned int perfctr_cstatus_nrctrs(unsigned int cstatus);
  Retrieve nractrs+nrictrs from the cstatus.
- unsigned int perfctr_cstatus_has_ictrs(unsigned int cstatus);
  Check if the nrictrs part of cstatus is non-zero.

"tsc_start" and "tsc_sum" record the state of the TSC.

"pmc[]" contains the per-counter state, in the "start" and "sum"
fields. The "map" field contains the corresponding hardware counter
identification, from the counter's entry in "control.pmc_map[]";
it is copied into pmc[] to reduce overheads in key low-level operations.

"control" contains the control data which determines the
behaviour of the counters.

User-space overflow signal handler items
----------------------------------------
After a counter has overflowed, a user-space signal handler may
be invoked with a "struct siginfo" identifying the source of the
signal and the set of overflown counters.

#define SI_PMC_OVF	..

Value to be stored in "si.si_code".

#define si_pmc_ovf_mask	..

Field in which to store a bit-mask of the overflown counters.

Kernel-internal API
-------------------

/* Driver init/exit.
   perfctr_cpu_init() performs hardware detection and may fail. */
extern int perfctr_cpu_init(void);
extern void perfctr_cpu_exit(void);

/* CPU type name. Set if perfctr_cpu_init() was successful. */
extern char *perfctr_cpu_name;

/* Hardware reservation. A high-level driver must reserve the
   hardware before it may use it, and release it afterwards.
   "service" is a unique string identifying the high-level driver.
   perfctr_cpu_reserve() returns NULL on success; if another
   high-level driver has reserved the hardware, then that
   driver's "service" string is returned. */
extern const char *perfctr_cpu_reserve(const char *service);
extern void perfctr_cpu_release(const char *service);

/* PRE: state has no running interrupt-mode counters.
   Check that the new control data is valid.
   Update the low-level driver's private control data.
   is_global should be zero for per-process counters and non-zero
   for global-mode counters.
   Returns a negative error code if the control data is invalid. */
extern int perfctr_cpu_update_control(struct perfctr_cpu_state *state, int is_global);

/* Stop i-mode counters. Update sums and start values.
   Read a-mode counters. Subtract from start and accumulate into sums.
   Must be called with preemption disabled. */
extern void perfctr_cpu_suspend(struct perfctr_cpu_state *state);

/* Reset i-mode counters to their start values.
   Write control registers.
   Read a-mode counters and update their start values.
   Must be called with preemption disabled. */
extern void perfctr_cpu_resume(struct perfctr_cpu_state *state);

/* Perform an efficient combined suspend/resume operation.
   Must be called with preemption disabled. */
extern void perfctr_cpu_sample(struct perfctr_cpu_state *state);

/* The type of a perfctr overflow interrupt handler.
   It will be called in IRQ context, with preemption disabled. */
typedef void (*perfctr_ihandler_t)(unsigned long pc);

/* Install a perfctr overflow interrupt handler.
   Should be called after perfctr_cpu_reserve() but before
   any counter state has been activated. */
extern void perfctr_cpu_set_ihandler(perfctr_ihandler_t);

/* PRE: The state has been suspended and sampled by perfctr_cpu_suspend().
   Should be called from the high-level driver's perfctr_ihandler_t,
   and preemption must not have been enabled.
   Identify which counters have overflown, reset their start values
   from ireset[], and perform any necessary hardware cleanup.
   Returns a bit-mask of the overflown counters. */
extern unsigned int perfctr_cpu_identify_overflow(struct perfctr_cpu_state*);

/* Call perfctr_cpu_ireload() just before perfctr_cpu_resume() to
   bypass internal caching and force a reload of the i-mode pmcs.
   This ensures that perfctr_cpu_identify_overflow()'s state changes
   are propagated to the hardware. */
extern void perfctr_cpu_ireload(struct perfctr_cpu_state*);