Blame src/perfctr-2.7.x/linux/Documentation/perfctr/virtual.txt

Packit 577717
$Id: virtual.txt,v 1.3 2004/08/09 09:42:22 mikpe Exp $
Packit 577717
Packit 577717
VIRTUAL PER-PROCESS PERFORMANCE COUNTERS
Packit 577717
========================================
Packit 577717
This document describes the virtualised per-process performance
Packit 577717
counters kernel extension. See "General Model" in low-level-api.txt
Packit 577717
for the model of the processor's performance counters.
Packit 577717
Packit 577717
Contents
Packit 577717
========
Packit 577717
- Summary
Packit 577717
- Design & Implementation Notes
Packit 577717
  * State
Packit 577717
  * Thread Management Hooks
Packit 577717
  * Synchronisation Rules
Packit 577717
  * The Pseudo File System
Packit 577717
- API For User-Space
Packit 577717
  * Opening/Creating the State
Packit 577717
  * Updating the Control
Packit 577717
  * Unlinking the State
Packit 577717
  * Reading the State
Packit 577717
  * Resuming After Handling Overflow Signal
Packit 577717
  * Reading the Counter Values
Packit 577717
- Limitations / TODO List
Packit 577717
Packit 577717
Summary
Packit 577717
=======
Packit 577717
The virtualised per-process performance counters facility
Packit 577717
(virtual perfctrs) is a kernel extension which extends the
Packit 577717
thread state to record perfctr settings and values, and augments
Packit 577717
the context-switch code to save perfctr values at suspends and
Packit 577717
restore them at resumes. This "virtualises" the performance
Packit 577717
counters in much the same way as the kernel already virtualises
Packit 577717
general-purpose and floating-point registers.
Packit 577717
Packit 577717
Virtual perfctrs also adds an API allowing non-privileged
Packit 577717
user-space processes to set up and access their perfctrs.
Packit 577717
Packit 577717
As this facility is primarily intended to support developers
Packit 577717
of user-space code, both virtualisation and allowing access
Packit 577717
from non-privileged code are essential features.
Packit 577717
Packit 577717
Design & Implementation Notes
Packit 577717
=============================
Packit 577717
Packit 577717
State
Packit 577717
-----
Packit 577717
The state of a thread's perfctrs is packaged up in an object of
Packit 577717
type 'struct vperfctr'. It consists of CPU-dependent state, a
Packit 577717
sampling timer, and some auxiliary administrative data. This is
Packit 577717
an independent object, with its own lifetime and access rules.
Packit 577717
Packit 577717
The state object is attached to the thread via a pointer in its
Packit 577717
thread_struct. While attached, the object records the identity
Packit 577717
of its owner thread: this is used for user-space API accesses
Packit 577717
from threads other than the owner.
Packit 577717
Packit 577717
The state is separate from the thread_struct for several resons:
Packit 577717
- It's potentially large, hence it's allocated only when needed.
Packit 577717
- It can outlive its owner thread. The state can be opened as
Packit 577717
  a pseudo file: as long as that file is live, so is the object.
Packit 577717
- It can be mapped, via mmap() on the pseudo file's descriptor.
Packit 577717
  To facilitate this, a full page is allocated and reserved.
Packit 577717
Packit 577717
Thread Management Hooks
Packit 577717
-----------------------
Packit 577717
Virtual perfctrs hooks into several thread management events:
Packit 577717
Packit 577717
- exit_thread(): Calls perfctr_exit_thread() to stop the counters
Packit 577717
  and mark the vperfctr object as dead.
Packit 577717
Packit 577717
- copy_thread(): Calls perfctr_copy_thread() to initialise
Packit 577717
  the child's vperfctr pointer. The child gets a new vperfctr
Packit 577717
  object containing the same control data as its parent.
Packit 577717
  Kernel-generated threads do not inherit any vperfctr state.
Packit 577717
Packit 577717
- release_task(): Calls perfctr_release_task() to detach the
Packit 577717
  vperfctr object from the thread. If the child and its parent
Packit 577717
  still have the same perfctr control settings, then the child's
Packit 577717
  final counts are propagated back into its parent.
Packit 577717
Packit 577717
- switch_to():
Packit 577717
  * Calls perfctr_suspend_thread() on the previous thread, to
Packit 577717
    suspend its counters.
Packit 577717
  * Calls perfctr_resume_thread() on the next thread, to resume
Packit 577717
    its counters. Also resets the sampling timer (see below).
Packit 577717
Packit 577717
- update_process_times(): Calls perfctr_sample_thread(), which
Packit 577717
  decrements the sampling timer and samples the counters if the
Packit 577717
  timer reaches zero.
Packit 577717
Packit 577717
  Sampling is normally only done at switch_to(), but if too much
Packit 577717
  time passes before the next switch_to(), a hardware counter may
Packit 577717
  increment by more than its range (usually 2^32). If this occurs,
Packit 577717
  the difference from its start value will be incorrect, causing
Packit 577717
  its updated sum to also be incorrect. The sampling timer is used
Packit 577717
  to prevent this problem, which has been observed on SMP machines,
Packit 577717
  and on high clock frequency UP machines.
Packit 577717
Packit 577717
- set_cpus_allowed(): Calls perfctr_set_cpus_allowed() to detect
Packit 577717
  attempts to migrate the thread to a "forbidden" CPU, in which
Packit 577717
  case a flag in the vperfctr object is set. perfctr_resume_thread()
Packit 577717
  checks this flag, and if set, marks the counters as stopped and
Packit 577717
  sends a SIGILL to the thread.
Packit 577717
Packit 577717
  The notion of forbidden CPUs is a workaround for a design flaw
Packit 577717
  in hyper-threaded Pentium 4s and Xeons. See low-level-x86.txt
Packit 577717
  for details.
Packit 577717
Packit 577717
To reduce overheads, these hooks are implemented as inline functions
Packit 577717
that check if the thread is using perfctrs before calling the code
Packit 577717
that implements the behaviour. The hooks also reduce to no-ops if
Packit 577717
CONFIG_PERFCTR_VIRTUAL is disabled.
Packit 577717
Packit 577717
Synchronisation Rules
Packit 577717
---------------------
Packit 577717
There are five types of accesses to a thread's perfctr state:
Packit 577717
Packit 577717
1. Thread management events (see above) done by the thread itself.
Packit 577717
   Suspend, resume, and sample are lock-less.
Packit 577717
Packit 577717
2. API operations done by the thread itself.
Packit 577717
   These are lock-less, except when an individual operation
Packit 577717
   has specific synchronisation needs. For instance, preemption
Packit 577717
   is often disabled to prevent accesses due to context switches.
Packit 577717
Packit 577717
3. API operations done by a different thread ("monitor thread").
Packit 577717
   The owner thread must be suspended for the duration of the operation.
Packit 577717
   This is ensured by requiring that the monitor thread is ptrace()ing
Packit 577717
   the owner thread, and that the owner thread is in TASK_STOPPED state.
Packit 577717
Packit 577717
4. set_cpus_allowed().
Packit 577717
   The kernel does not lock the target during set_cpus_allowed(),
Packit 577717
   so it can execute concurrently with the owner thread or with
Packit 577717
   some monitor thread. In particular, the state may be deallocated.
Packit 577717
Packit 577717
   To solve this problem, both perfctr_set_cpus_allowed() and the
Packit 577717
   operations that can change the owner thread's perfctr pointer
Packit 577717
   (creat, unlink, exit) perform a task_lock() on the owner thread
Packit 577717
   before accessing the perfctr pointer.
Packit 577717
Packit 577717
5. release_task().
Packit 577717
   Reaping a child may or may not be done by the parent of that child.
Packit 577717
   When done by the parent, no lock is taken. Otherwise, a task_lock()
Packit 577717
   on the parent is done before accessing its thread's perfctr pointer.
Packit 577717
Packit 577717
The Pseudo File System
Packit 577717
----------------------
Packit 577717
The perfctr state is accessed from user-space via a file descriptor.
Packit 577717
Packit 577717
The main reason for this is to enable mmap() on the file descriptor,
Packit 577717
which gives read-only access to the state.
Packit 577717
Packit 577717
The file descriptor is a handle to the perfctr state object. This
Packit 577717
allows a very simple implementation of the user-space 'perfex'
Packit 577717
program, which runs another program with given perfctr settings
Packit 577717
and reports their final values. Without this handle, monitoring
Packit 577717
applications like perfex would have to be implemented like debuggers
Packit 577717
in order to catch the target thread's exit and retrieve the counter
Packit 577717
values before the exit completes and the state disappears.
Packit 577717
Packit 577717
The file for a perfctr state object belongs to the vperfctrs pseudo
Packit 577717
file system. Files in this file system support only a few operations:
Packit 577717
- mmap()
Packit 577717
- release() decrements the perfctr object's reference count and
Packit 577717
  deallocates the object when no references remain
Packit 577717
- the listing of a thread's open file descriptors identifies
Packit 577717
  perfctr state file descriptors as belonging to "vperfctrfs"
Packit 577717
The implementation is based on the code for pipefs.
Packit 577717
Packit 577717
In previous versions of the perfctr package, the file descriptors
Packit 577717
for perfctr state objects also supported the API's ioctl() method.
Packit 577717
Packit 577717
API For User-Space
Packit 577717
==================
Packit 577717
Packit 577717
Opening/Creating the State
Packit 577717
--------------------------
Packit 577717
int fd = sys_vperfctr_open(int tid, int creat);
Packit 577717
Packit 577717
'tid' must be the id of a thread, or 0 which is interpreted as an
Packit 577717
alias for the current thread.
Packit 577717
Packit 577717
This operation returns an open file descriptor which is a handle
Packit 577717
on the thread's perfctr state object.
Packit 577717
Packit 577717
If 'creat' is non-zero and the object did not exist, then it is
Packit 577717
created and attached to the thread. The newly created state object
Packit 577717
is inactive, with all control fields disabled and all counters
Packit 577717
having the value zero. If 'creat' is non-zero and the object
Packit 577717
already existed, then an EEXIST error is signalled.
Packit 577717
Packit 577717
If 'tid' does not denote the current thread, then it must denote a
Packit 577717
thread that is stopped and under ptrace control by the current thread.
Packit 577717
Packit 577717
Notes:
Packit 577717
- The access rule in the non-self case is the same as for the
Packit 577717
  ptrace() system call. It ensures that no other thread, including
Packit 577717
  the target thread itself, can access or change the target thread's
Packit 577717
  perfctr state during the operation.
Packit 577717
- An open file descriptor for a perfctr state object counts as a
Packit 577717
  reference to that object; even if detached from its thread the
Packit 577717
  object will not be deallocated until the last reference is gone.
Packit 577717
- The file descriptor can be passed to mmap(), for low-overhead
Packit 577717
  counter sampling. See "READING THE COUNTER VALUES" for details.
Packit 577717
- The file descriptor can be passed to another thread. Accesses
Packit 577717
  from threads other than the owner are permitted as long as they
Packit 577717
  posses the file descriptor and use ptrace() for synchronisation.
Packit 577717
Packit 577717
Updating the Control
Packit 577717
--------------------
Packit 577717
int err = sys_vperfctr_control(int fd, const struct vperfctr_control *control);
Packit 577717
Packit 577717
'fd' must be the return value from a call to sys_vperfctr_open(),
Packit 577717
The perfctr object must still be attached to its owner thread.
Packit 577717
Packit 577717
This operation stops and samples any currently running counters in
Packit 577717
the thread, and then updates the control settings. If the resulting
Packit 577717
state has any enabled counters, then the counters are restarted.
Packit 577717
Packit 577717
Before restarting, the counter sums are reset to zero. However,
Packit 577717
if a counter's bit is set in the control object's 'preserve'
Packit 577717
bitmask field, then that counter's sum is not reset. The TSC's
Packit 577717
sum is only reset if the TSC is disabled in the new state.
Packit 577717
Packit 577717
If any of the programmable counters are enabled, then the thread's
Packit 577717
CPU affinity mask is adjusted to exclude the set of forbidden CPUs.
Packit 577717
Packit 577717
If the control data activates any interrupt-mode counters, then
Packit 577717
a signal (specified by the 'si_signo' control field) will be sent
Packit 577717
to the owner thread after an overflow interrupt. The documentation
Packit 577717
for sys_vperfctr_iresume() describes this mechanism.
Packit 577717
Packit 577717
If 'fd' does not denote the current thread, then it must denote a
Packit 577717
thread that is stopped and under ptrace control by the current thread.
Packit 577717
The perfctr state object denoted by 'fd' must still be attached
Packit 577717
to its owner thread.
Packit 577717
Packit 577717
Notes:
Packit 577717
- It is strongly recommended to memset() the vperfctr_control object
Packit 577717
  to all-bits-zero before setting the fields of interest.
Packit 577717
- Stopping the counters is done by invoking the control operation
Packit 577717
  with a control object that activates neither the TSC nor any PMCs.
Packit 577717
Packit 577717
Unlinking the State
Packit 577717
-------------------
Packit 577717
int err = sys_vperfctr_unlink(int fd);
Packit 577717
Packit 577717
'fd' must be the return value from a call to sys_vperfctr_open().
Packit 577717
Packit 577717
This operation stops and samples the thread's counters, and then
Packit 577717
detaches the perfctr state object from the thread. If the object
Packit 577717
already had been detached, then no action is performed.
Packit 577717
Packit 577717
If 'fd' does not denote the current thread, then it must denote a
Packit 577717
thread that is stopped and under ptrace control by the current thread.
Packit 577717
Packit 577717
Reading the State
Packit 577717
-----------------
Packit 577717
int err = sys_vperfctr_read(int fd, struct perfctr_sum_ctrs *sum,
Packit 577717
			    struct vperfctr_control *control,
Packit 577717
			    struct perfctr_sum_ctrs *children);
Packit 577717
Packit 577717
'fd' must be the return value from a call to sys_vperfctr_open().
Packit 577717
Packit 577717
This operation copies data from the perfctr state object to
Packit 577717
user-space. If 'sum' is non-NULL, then the counter sums are
Packit 577717
written to it. If 'control' is non-NULL, then the control data
Packit 577717
is written to it. If 'children' is non-NULL, then the sums of
Packit 577717
exited childrens' counters are written to it.
Packit 577717
Packit 577717
If the perfctr state object is attached to the current thread,
Packit 577717
then the counters are sampled and updated first.
Packit 577717
Packit 577717
If 'fd' does not denote the current thread, then it must denote a
Packit 577717
thread that is stopped and under ptrace control by the current thread.
Packit 577717
Packit 577717
Notes:
Packit 577717
- An alternate and faster way to retrieve the counter sums is described
Packit 577717
  below. This system call can be used if the hardware does not permit
Packit 577717
  user-space reads of the counters.
Packit 577717
Packit 577717
Resuming After Handling Overflow Signal
Packit 577717
---------------------------------------
Packit 577717
int err = sys_vperfctr_iresume(int fd);
Packit 577717
Packit 577717
'fd' must be the return value from a call to sys_vperfctr_open().
Packit 577717
The perfctr object must still be attached to its owner thread.
Packit 577717
Packit 577717
When an interrupt-mode counter has overflowed, the counters
Packit 577717
are sampled and suspended (TSC remains active). Then a signal,
Packit 577717
as specified by the 'si_signo' control field, is sent to the
Packit 577717
owner thread: the associated 'struct siginfo' has 'si_code'
Packit 577717
equal to 'SI_PMC_OVF', and 'si_pmc_ovf_mask' equal to the set
Packit 577717
of overflown counters.
Packit 577717
Packit 577717
The counters are suspended to avoid generating new performance
Packit 577717
counter events during the execution of the signal handler, but
Packit 577717
the previous settings are saved. Calling sys_vperfctr_iresume()
Packit 577717
restores the previous settings and resumes the counters. Doing
Packit 577717
this is optional.
Packit 577717
Packit 577717
If 'fd' does not denote the current thread, then it must denote a
Packit 577717
thread that is stopped and under ptrace control by the current thread.
Packit 577717
Packit 577717
Reading the Counter Values
Packit 577717
--------------------------
Packit 577717
The value of a counter is computed from three components:
Packit 577717
Packit 577717
	value = sum + (now - start);
Packit 577717
Packit 577717
Two of these (sum and start) reside in the kernel's state object,
Packit 577717
and the third (now) is the contents of the hardware counter.
Packit 577717
To perform this computation in user-space requires access to
Packit 577717
the state object. This is achieved by passing the file descriptor
Packit 577717
from sys_vperfctr_open() to mmap():
Packit 577717
Packit 577717
	volatile const struct vperfctr_state *kstate;
Packit 577717
	kstate = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, fd, 0);
Packit 577717
Packit 577717
Reading the three components is a non-atomic operation. If the
Packit 577717
thread is scheduled during the operation, the three values will
Packit 577717
not be consistent and the wrong result will be computed.
Packit 577717
To detect this situation, user-space should check the kernel
Packit 577717
state's TSC start value before and after the operation, and
Packit 577717
retry the operation in case of a mismatch.
Packit 577717
Packit 577717
The algorithm for retrieving the value of counter 'i' is:
Packit 577717
Packit 577717
	tsc0 = kstate->cpu_state.tsc_start;
Packit 577717
	for(;;) {
Packit 577717
		rdpmcl(kstate->cpu_state.pmc[i].map, now);
Packit 577717
		start = kstate->cpu_state.pmc[i].start;
Packit 577717
		sum = kstate->cpu_state.pmc[i].sum;
Packit 577717
		tsc1 = kstate->cpu_state.tsc_start;
Packit 577717
		if (likely(tsc1 == tsc0))
Packit 577717
			break;
Packit 577717
		tsc0 = tsc1;
Packit 577717
	}
Packit 577717
	return sum + (now - start);
Packit 577717
Packit 577717
The algorithm for retrieving the value of the TSC is similar,
Packit 577717
as is the algorithm for retrieving the values of all counters.
Packit 577717
Packit 577717
Notes:
Packit 577717
- Since the state's TSC time-stamps are used, the algorithm requires
Packit 577717
  that user-space enables TSC sampling.
Packit 577717
- The algorithm requires that the hardware allows user-space reads
Packit 577717
  of the counter registers. If this property isn't statically known
Packit 577717
  for the architecture, user-space should retrieve the kernel's
Packit 577717
  'struct perfctr_info' object and check that the PERFCTR_FEATURE_RDPMC
Packit 577717
  flag is set.
Packit 577717
Packit 577717
Limitations / TODO List
Packit 577717
=======================
Packit 577717
- Buffering of overflow samples is not implemented. So far, not a
Packit 577717
  single user has requested it.