README
$Id: README,v 1.48 2007/10/06 13:02:07 mikpe Exp $

	   Linux Performance-Monitoring Counters Driver
		Mikael Pettersson <mikpe@it.uu.se>
========================================================================


Overview
--------
This package adds support to the Linux kernel (2.4.16 or newer)
for using the Performance-Monitoring Counters (PMCs) found in
many modern processors. Supported processors are:
- All Intel Pentium processors, i.e., Pentium, Pentium MMX,
  Pentium Pro, Pentium II, Pentium III, Pentium M and Pentium 4,
  including Celeron and Xeon versions.
- The AMD K7 and K8 processor families.
- Cyrix 6x86MX, MII, and III.
- VIA C3 (Cyrix III).
- Centaur WinChip C6/2/3.
- PowerPC 604, 7xx, and 74xx processors.

PMCs are "event counters" capable of recording any of a large
number of performance-related events during execution.
These events typically include instructions executed, cache
misses, TLB misses, stalls, and other events specific to
the microarchitecture of the processor being used.

PMCs are primarily used to identify low-level performance problems,
and to validate code changes intended to improve performance.

Limited support is available for generic x86 processors with
a Time-Stamp Counter but no PMCs, such as the AMD K6 family.
For these processors, only TSC-based cycle-count measurements
are possible. However, all high-level facilities implemented
by the driver are still available.


Features
--------
Each Linux process has its own set of "virtual" PMCs. That is,
to a process the PMCs appear to be private and unrelated to the
activities of other processes in the system. The virtual PMCs
have 64-bit precision, even though current processors only
implement 32, 40, or 48-bit PMCs. Each process also has a virtual
Time-Stamp Counter (TSC). On most machines, the virtual PMCs can
be sampled entirely in user-space without incurring the overhead
of a system call.

A process accesses its virtual PMCs by opening /dev/perfctr
and issuing system calls on the resulting file descriptor. A
user-space library is included which provides a more high-level
interface.

The driver also supports global-mode or system-wide PMCs.
In this mode, each PMC on each processor can be controlled
and read. The PMCs and TSC on active processors are sampled
periodically and the accumulated sums have 64-bit precision.
Global-mode PMCs are accessed via the /dev/perfctr device file;
the user-space library provides a more high-level interface.

The user-space library is accompanied by several example programs
that illustrate how the driver and the library can be used.

Support for performance-counter overflow interrupts is provided
for Intel P4 and P6, and AMD K7 and K8 processors.


Limitations
-----------
- Kernels older than 2.4.16 are not supported since perfctr-2.6.
  You can use the previous stable series, perfctr-2.4, if you
  must use an older kernel, but this has several limitations:
  * Older kernels do not support AMD64 (x86-64).
  * The performance counters in hyper-threaded P4s/Xeons cannot
    be used with kernels older than 2.4.15. You'd have to disable
    hyper-threading or SMP, or restrict yourself to TSC sampling.
  * No profiling using counter overflow interrupts, except in 2.4.10
    and newer kernels, and some early 2.4-ac/redhat kernels.
  * Application code compiled for perfctr-2.4 is not compatible
    with perfctr-2.6, and vice versa.
  * The perfctr-2.4 series does not support 2.6 kernels.
  Some of these limitations may be fixable. Contact the author if
  you are willing to fund development in this direction.
- The performance counter interrupt facility requires SMP or
  uniprocessor APIC support. In the latter case, the BIOS must be
  reasonably non-buggy. Unfortunately, this is often not the case.
- Neither the kernel driver nor the sample user-space library
  attempt to hide any processor-specific details from the user.
- This package makes it possible to compute aggregate event and
  cycle counts for sections of code. Since many x86-type processors
  use out-of-order execution, it is impossible to attribute exact
  event or cycle counts to individual instructions.
  See the "Continuous Profiling" and "ProfileMe" papers at Compaq's
  DCPI web site for more information on this issue. (The URL is
  listed in the OTHERS file.)
- Centaur WinChip C6/2/3 support requires that the TSC is disabled.
  See linux/drivers/perfctr/x86.c for further information.


Availability
------------
This and future versions of this package can be downloaded from
<http://user.it.uu.se/~mikpe/linux/perfctr/>.

The perfctr-devel mailing list is an open forum for driver update
announcements and general discussions about the perfctr driver
and its usage. To subscribe to perfctr-devel, visit
<http://lists.sourceforge.net/lists/listinfo/perfctr-devel>.


Licensing
---------
Copyright (C) 1999-2007  Mikael Pettersson <mikpe@it.uu.se>

This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA