$Id: CHANGES,v 1.142.2.72 2010/11/07 19:48:14 mikpe Exp $ CHANGES ======= [High-level changes in reverse chronological order. Detailed driver changes are in linux/drivers/perfctr/RELEASE-NOTES.] Version 2.6.42, 2010-11-07 - Classify Westmere processors as Westmere not Nehalem. - Update update-kernel to install arch-specific header files in arch/$arch/include/asm/ for newer kernels. Also make update-kernel --test trace file installation steps. - Update usr.src/Makefile to use the LD make variable rather than plain 'ld' if set. Useful for cross-compilation and builds with non-default target options. - Updated kernel support: 2.6.18-194.17.4.el5 (RHEL5). Version 2.6.41, 2010-06-08 - Extend examples/perfex/ to allow users to set up values for the OFFCORE_RSP MSRs on Nehalem-based processors. - Add driver support for OFFCORE_RSP MSRs on Nehalem-based processors. - Recognise more Nehalem-based processors (models 30, 37). - Renamed PERFCTR_X86_INTEL_COREI7 symbolic CPU/PMU type constant to PERFCTR_X86_INTEL_NHLM, since it denotes the entire Nehalem family not just the (original) Core i7. The old constant is also defined for now, to not break source code compatibility. - Correct Core i7 event setup in examples/global/ to actually work. It was rejected by the driver due to a copy-paste error. - Rewrite the missing event list message in `perfex -l/-L' to indicate that this is a user-space library omission, not an issue with the driver or the hardware. - Updated kernel support: 2.6.18-194.3.1.el5, 2.6.18-194.el5, 2.6.18-164.15.1.el5 (RHEL5). Version 2.6.40, 2010-01-30 - Preliminary support for Intel Xeon 7500 (Nehalem-based) processors. - Preliminary support for Intel i7-980X Gulftown processors. - Added support for AMD Family 11h processors (compatible with fam 10h). - Updated kernel support: 2.6.32, 2.6.31, 2.6.18-164.11.1.el5 and 2.6.18-164.el5 (RHEL5), 2.6.9-89.0.19.EL (RHEL4). Version 2.6.39, 2009-06-11 - Updated driver's AMD multicore detection code to actually work. Tested on Istanbul and Shanghai Opterons. - Updated driver to allow per-thread counting of Northbridge events on multicore AMD processors. Since the NB is shared between cores, threads counting NB events will be forced via their CPU affinity mask to core0 of the available processors. On early K8 RevE processors NB events remain limited to global-mode counting, due to an erratum. - Updated kernel support: 2.6.30, 2.6.29, 2.6.18-128.1.10.el5 (RHEL5), 2.6.18-92.1.26.el5 (RHEL5), 2.6.9-89.EL (RHEL4), 2.6.9-78.0.22.EL (RHEL4). Version 2.6.38, 2009-01-23 - Added rvperfctr_iresume() procedure to the user-space library. This allows a monitor process to resume overflow counters in a target process after the target has received an overflow signal. - Updated kernel support: 2.6.29-rc2, 2.6.28, 2.6.18-128.el5 (RHEL5.3), 2.6.9-78.0.13.EL (RHEL4). - Removed support for 2.4 kernels. 2.4 kernels have effectively been unsupported since early 2007, this change makes it official by removing all 2.4-specific files and #ifdefs. Version 2.6.37, 2008-11-30 - Preliminary support for Intel Core i7 (Nehalem) processors (family 6 model 26). They are currently treated like Core 2s but with four general-purpose counters, and with different events and a new symbolic CPU type. The AnyThread evntsel flag and "uncore" event monitoring are not yet supported. - Updated x86 driver to recognise Xeon 7400 (family 6 model 29) as a member of the Core 2 family. - Extended x86 driver's CPU initialisation on Intel Core 2 and newer CPUs to work around vtune leaving the performance monitor unit in a "very" disabled state. Thanks to Mark Krentel for reporting the problem and for facilitating tests that allowed the cause of the problem to be identified. - Updated kernel support: 2.6.28-rc6, 2.6.18-120.el5 (RHEL5.3 beta), 2.6.18-92.1.18.el5 (RHEL5), 2.6.9-78.0.8.EL (RHEL4). Version 2.6.36, 2008-10-19 - Fixed a driver error which caused Intel Family 6 Model 23 processors to crash kernels with general protection faults if the fixed-function counters ran in interrupt-on-overflow mode. Older Intel Family 6 Model 15 processors tolerate the error which is why it was not detected before. Thanks to Mark Krentel for reporting the problem and for facilitating tests that allowed the cause of the problem to be identified. - Updated kernel support: 2.6.27, 2.6.26, 2.6.18-92.1.13.el5 (RHEL5), 2.6.9-78.0.5.EL (RHEL4). Version 2.6.35, 2008-06-30 - Preliminary support for Intel Atom processors added. These processors are very poorly documented, but they are known to be family 6 model 28, and to support Intel's "architectural performance monitor". Current models seem to have two general-purpose counters, one fixed-function counter, and to support the seven architectural events. Thanks to Steve Blackburn for running tests on his Atom. - Updated x86 driver to recognise the Celeron model 16h as a member of the Core2 family. - Corrected an error in the x86 driver's control validation procedure. The error was introduced in perfctr-2.6.29 when support for the Core2's fixed-function counters was added. The error made the driver accept some invalid controls (on Core2 processors only), which could result in kernel hangs due to exceptions from invalid register accesses. Thanks to Anton Ertl for reporting the initial problem. - Updated README to add Atom and AMD Family 10h to list of supported processors. - Updated kernel support: 2.6.26-rc8, 2.6.18-92.1.6.el5 (RHEL5), 2.6.18-92.1.1.el5 (RHEL5), 2.6.9-67.0.20.EL (RHEL4). Version 2.6.35-pre1, 2008-06-23 - Added optional close-on-exec feature for per-process perfctrs. To enable it, set control.flags |= VPERFCTR_CONTROL_CLOEXEC in a struct vperfctr_control object before passing it to vperfctr_control(). If the flag is set when a thread executes an execve() system call, then its perfctr state is detached from the thread as if a call to vperfctr_unlink() had occurred. If the flag is clear then the state survives execve() just like it always did before. - The vperfctr_open() library function now sets close-on-exec on the file descriptor embedded in the returned vperfctr handle. - Removed library support for the ancient /proc//perfctr kernel interface which hasn't worked since perfctr-2.6.0. Version 2.6.34, 2008-05-29 - Updated kernel support: 2.6.26-rc4, 2.6.18-92.el5 (RHEL5U2), 2.6.18-53.1.21.el5 (RHEL5), 2.6.16.42-0.12 (SuSE). - Corrected the kernel driver's version number: perfctr-2.6.33 forgot to increment it. Version 2.6.33, 2008-05-18 - x86: The support for Intel Family 6 Model 23 processors added in perfctr-2.6.32 was incomplete, causing overflow interrupts to not work properly on those processors. This has been fixed. (Thanks to Mark Krentel for reporting the issue and testing patches.) - Updated kernel support: 2.6.26-rc2, 2.6.18-53.1.19.el5 (RHEL5), 2.6.9-67.0.15.EL (RHEL4). Version 2.6.32, 2008-04-20 - Library: add experimental vperfctr_open_mode(mode) procedure. The plain vperfctr_open() always opens the perfctr state in O_CREAT|O_EXCL mode, which means that it will fail if the invoking thread already has a perfctr state. The mode parameter to vperfctr_open_mode() can be used to avoid this behaviour: with mode == 0 no state will be created and a handle to the thread's existing state (if any) is returned; to select the current behaviour pass mode == VPERFCTR_OPEN_CREAT_EXCL to vperfctr_open_mode(). For example: the_state_is_shared = 0; vperfctr = vperfctr_open_mode(VPERFCTR_OPEN_CREAT_EXCL); if (vperfctr == NULL && errno == EEXIST) { // error out due to the resource conflict, or: vperfctr = vperfctr_open_mode(0); the_state_is_shared = 1; } ... The purpose of this API extension is to hopefully allow PAPI to handle some use cases that currently cause it to error out. - Fix 'make install' to select the correct file to install as . Fixes regression caused by i386/x86_64 arch unification in perfctr-2.6.30. ppc32 also needed fixing. - x86: Recognize Intel Family 6 Model 23 as Core2. - Updated kernel support: 2.6.25, 2.6.18-53.1.14.el5 (RHEL5), 2.6.9-67.0.7.EL (RHEL4). Version 2.6.31, 2008-01-26 - x86: Barcelona (AMD Family 10h) updates: * Correct CPU type constant to read FAM10H with trailing H. The old spelling also remains, for now. * Barcelona event selectors are 64-bit, not 32-bit as in K8. Add evntsel_high[] array to struct perfctr_cpu_control to allow passing high 32 bits of evntsels to the driver. This array overlaps the p4 control sub-struct. (Uses a GCC anonymous union, to avoid source-incompatible changes.) Currently only some Northbridge events need the high bits. * Update driver to accept and check high evntsel bits on Barcelona, and to maintain all 64 evntsel bits in PMU context switches. * Update examples/perfex/ to indicate how to also set up the high 32 evntsel bits for Barcelona (run perfex -h). - Minor coding style (mostly obsolete whitespace style) fixes. - Updated kernel support: 2.6.24, 2.6.18-53.1.6.el5 (RHEL5), 2.6.9-67.0.1.EL (RHEL4). Version 2.6.30, 2007-10-28 - Kernel 2.6.24-rc1 replaced the previously separate i386 and x86_64 source code directories with new shared x86 directories. Updated the linux/include/ hierarchy and the update-kernel script to handle new and old source layouts. - Several driver updates to handle kernel 2.6.24-rc1 changes on both x86 and ppc32. - Fixed a problem which could break RHEL5 kernel builds in some configurations. - Updated kernel support: 2.6.24-rc1, 2.6.23, 2.6.18-8.1.14.el5 (RHEL5), and 2.6.9-55.0.6.EL (RHEL4). Version 2.6.29, 2007-10-07 - Added support for the fixed-function counters in Intel Core 2 processors. To user-space they look like ordinary P6 counters, except their PMC numbers are 0x40000000..0x40000002, and their evntsels only need Enable, INT, and CPL fields set. - Preliminary support for AMD Family 10h processors. Currently only events that do not need to use the high 32 bits of the event select control registers are expected to work. - Fixed driver compilation warnings caused by perfctr needing its own definitions of macros/constants that may or may not be defined in the specific kernel version used. - Updated kernel support: 2.6.23-rc9, 2.6.5-7.276 (SuSE). Version 2.6.28, 2007-07-18 - Fixed path to udev rules file (/etc/udev.d/ -> /etc/udev/). - Updated to handle changes in the 2.6.22 kernel on ppc32. - Updated to handle changes in the 2.6.22 kernel on x86. - Updated kernel support: 2.6.22, 2.6.21, 2.6.18-8.1.8.el5 (RHEL5), 2.6.9-55.0.2.EL (RHEL4), and 2.6.9-55.EL (RHEL4). Version 2.6.27, 2007-04-09 - Updated INSTALL with instructions for making /dev/perfctr creation and perfctr module autoloading work with udev. These instructions are known to work on Fedora Core 4. Also updated the rpm package accordingly. - Updated for the RHEL5 2.6.18-8.1.1.el5 kernel. This kernel removed a ptrace-related function that is used for perfctr's remote-control API. For now, remote-control is disabled in the RHEL5 kernel, but everything else should work. - Updated kernel support: 2.6.21-rc6, 2.6.18.2-34 (SuSE), 2.6.9-42.0.10.EL (RHEL4). Version 2.6.26, 2007-02-11 - My old @csd.uu.se email address no longer works. Updated documentation and kernel messages to show my @it.uu.se address. - Added driver support for ARM/XScale processors. Overflow interrupts are not yet supported, in part due to conflicts with Intel's ixp400_eth driver. Plain event counting works. - Updated kernel support: 2.6.20, 2.6.19, 2.6.9-42.0.8.EL (RHEL4), 2.4.34, 2.4.21-47.0.1.EL (RHEL3), 2.6.16.21 (SLES10). Version 2.6.25, 2006-10-15 - The Intel Core 2 processors are substantially different from the old Core processors. Core 2 processors are now mapped to a new cpu_type PERFCTR_X86_INTEL_CORE2, and must be programmed so that every EVNTSEL used has its Enable flag set. This is consistent with Intel's documentation and observations made by others on later steppings of the Core 2. Early steppings may be more P6-like (master Enable in EVNTSEL0), but as long as a control setup includes EVNTSEL0 it should work on any stepping. - Major x86 driver updates for changes in kernel 2.6.19-rc1. - Updated kernel support: 2.6.19-rc2, 2.6.18, 2.6.9-42.0.3.EL (RHEL4), and 2.4.34-pre4. - Fixed perfex -l/-L to handle unavailability of event set data gracefully and not signal an error in those cases. Version 2.6.24, 2006-09-17 - Fixed a driver linkage failure in 64-bit x86 kernels when CONFIG_PERFCTR_INIT_TESTS was enabled, caused by an omission in the perfctr-2.6.23 changes to support Intel Core 2 CPUs. - Updated kernel support: 2.6.18-rc7, 2.6.9-42.0.2.EL (RHEL4), and 2.4.34-pre2. Version 2.6.23, 2006-08-20 - Intel Core 2 fixes: detect Core2 processors (Model 15) and allow them to be used in 64-bit builds. - Updated kernel support: 2.6.18-rc4, 2.6.17, 2.6.9-34.0.2.EL (RHEL4), 2.4.33, 2.4.34-pre1, and 2.4.21-47.EL (RHEL3). - The update-kernel script is now able to automatically identify SuSE Linux kernel versions. A SuSE kernel MUST be configured (".config" exists) for the identification to work. Version 2.6.22, 2006-06-02 - Preliminary support for Intel Core (family 6 model 14) processors. - A serious error in the x86 driver's code to identify hyper-threads was fixed. The driver logic is correct but it used a kernel function which does not provide the required behaviour in 64-bit kernels or older 32-bit kernels. As a result bogus data could be input to the hyper-thread detection code, leading to various failures. - A change in the 2.6.16 32-bit x86 kernel caused a compilation error when CONFIG_PERFCTR_INIT_TESTS was enabled. Fixed this. - Updated kernel support: 2.6.9-34.0.1 (RHEL4), 2.6.17-rc5, and 2.4.33-pre3. Version 2.6.21, 2006-04-03 - Updated kernel support: 2.4.21-40.EL (RHEL3), 2.6.16, and 2.6.17-rc1. - Updates for internal changes in kernels 2.6.16 and 2.6.17-rc1. - Corrected a mistake in perfctr-2.6.20 which caused compilation errors with RHEL3 kernels. Version 2.6.20, 2006-03-12 - Updated kernel support: 2.6.9-34 (RHEL4), 2.6.16-rc6, and 2.4.33-pre2. Version 2.6.19, 2006-01-22 - Updated ppc32 driver for ppc32/ppc64 kernel merging changes in the 2.6.16-rc1 kernel. The driver now dynamically claims and release the hardware, allowing it to coexist with other PMU drivers such as oprofile. - Updated kernel support: 2.6.16-rc1, 2.6.9-22.0.2 (RHEL4). Version 2.6.18, 2006-01-03 - Added perfctr_get_info() library API procedure, which allows users to acquire information about the system without needing a handle to an open perfctr (per-process or global) state. - Rearranged structure marshalling descriptor declarations to increase code sharing for all supported architectures. - Updated kernel support: 2.6.15, 2.6.14, 2.6.9-22.0.1 (RHEL4), 2.6.5-7.201 (SuSE), 2.4.33-pre1, 2.4.32. Version 2.6.17, 2005-10-02 - The x86 kernel driver has been updated to work correctly on dual-core P4 processors. Previous versions would fail during CPU detection (on HT DC P4s) or would erroneously restrict access for one of the cores (non-HT DC P4s). - Updated kernel support: 2.4.21-37.EL (RHEL3), 2.4.32-rc1, and 2.6.14-rc3. Version 2.6.16, 2005-09-04 - The ppc32 driver will now compile in kernels that lack Open Firmware support, which is needed for some embedded systems. - Updated kernel support: 2.6.9-11.EL (RHEL4), 2.4.21-32.0.1.EL (RHEL3), 2.6.12, 2.6.13, 2.4.31, and 2.4.32-pre3. Version 2.6.15, 2005-05-06 - Preliminary code in the x86/x86-64 low-level driver to detect multicore AMD K8 processors, and to prevent resource conflicts and an erratum related to northbridge events. On multicore K8s, northbridge events are only allowed when using the global-mode counters API. - Updated kernel support: 2.6.9-5.0.5.EL (RHEL4), 2.4.21-27.0.4.EL (RHEL3), 2.4.31-pre1, and 2.6.12-rc3. Version 2.6.14, 2005-04-09 - Changed vperfctr_open() so that if the thread already has an attached perfctr state, then the call fails with EEXIST. This allows self-monitoring code to detect if it is under the control of an external monitoring agent, before it changes the counters' control setup. - Reverted the workaround in perfctr-2.6.13 for the problem that gcc-4.0 snapshots broke the x86/x86-64 low-level driver, as recent gcc-4.0 prereleases seem to work correctly. - Updated kernel support: 2.4.30, 2.6.12-rc2, and 2.6.9-5.0.3 (RHEL4). Version 2.6.13, 2005-02-13 - Changed the global-mode counters to allow user-space to disable the in-kernel sampling timer and to move the sampling points into the read system calls. This may improve sampling precision in some scenarios. - gcc-4.0 snapshots broke the x86/x86-64 low-level driver. Changed the driver to prevent those problems. - Updated kernel support: 2.6.11-rc4, 2.4.30-pre1, and 2.4.21-27.0.2 (RHEL3). Version 2.6.12, 2004-12-19 - PPC32 driver updated to be more robust in its detection of timebase and core clock frequencies. Some information sources can give wrong values for those frequencies, so the driver now tries other more reliable methods first. - On x86/x86-64, perfctr_event_codes.h now includes P4 events. - On x86-64 libraries will now be installed in PREFIX/lib64/, as per current standards, unless overridden by LIBDIR. - Perfex had a bug in which it interpreted all numbers as hex, even those without "0x" prefixes. Perfex now emits warnings for ambiguous numbers. To silence the warnings, (a) prefix hex numbers with "0x" (preferred), or (b) use the "-d" option to enable decimal numbers, which requires "0x" prefixes on hex numbers, or (c) use the "-x" option to force all numbers to be interpreted as hex (deprecated). The "-d" option should be the default, but unfortunately that would break user-level scripts that assumed that "0x"-less numbers are still hex. - Changes in examples/signal/ to handle glibc-2.3.3 on PPC32. Version 2.6.11, 2004-11-14 - Workarounds for a hardware quirk on x86 and x86-64, where interrupts can be delivered some time after the counters have been stopped. Due to scheduling, an interrupt could be taken in the context of an unrelated process, which would prematurely terminate interrupt reporting for the original process. - Fixed a bug in the x86 and x86-64 kernels where the context-switch path suspended the previous process' performance counters too late. This could allow an overflow interrupt to be taken in the context of an unrelated process, with effects similar to the hardware quirk described above. - PPC32 updates: Enable overflow interrupts on all G4 processors starting with the 7410 Rev 1.3, and all IBM G3 processors starting with the 750FX DD2.3. Add support for MPC7447A and MPC7448. - Removed patches for obsolete kernels. Version 2.6.10.3, 2004-10-24 - Driver modifications to handle two significant changes in the 2.6.10-rc1 kernel. - PPC32: added MPC7447A and MPC7448 support. - Cleanups to bring the driver closer to the development version. Version 2.6.10.2, 2004-10-19 - Updated kernel support: 2.6.9, 2.4.28-pre4. - Corrected the PPC32 driver's handling of MMCR0 changes due to use of the FCECE or TRIGGER control flags. - Fixed a synchronisation error in the interface between the per-process counters driver and the low-level drivers. The error triggered warnings in DEBUG_SPINLOCK_SLEEP- enabled kernels. Version 2.6.10.1, 2004-09-18 - Fixed a problem causing an incomplete "wrapper" file to be installed as /usr/include/asm/perfctr.h on x86_64 systems. Version 2.6.10, 2004-09-14 - Eliminated a potential kernel crash on P4 model 3 Prescott processors, due to the driver initialising two control registers that have been removed from P4M3. The P4M3 Nocona does not appear to have been affected by this error. - Fixed install procedure to not fail to install the shared library's symbolic links when updating an older installation. - Updated kernel support: 2.4.21-20 (RHEL3), 2.4.28-pre3, 2.6.9-rc2. Version 2.6.10-pre1, 2004-08-03 - x86-64 now uses the same driver and data structures as x86. Intel's 64-bit P4 Xeon should work in the x86-64 kernel. x86-64 application-level data structures have changed. - Updated library and example applications to include P4 support on x86-64. - Fixed update-kernel script to use 'head' in a POSIX compliant way. - Added kernel support for the Model 13 Pentium-M. - Many code cleanups in the x86 driver. - Event 0x76 is now officially CPU_CLK_UNHALTED on AMD64. - Updated kernel support: 2.4.27-rc4. Version 2.6.9, 2004-07-27 - Updated kernel support: 2.4.27-rc3, 2.6.7, 2.6.8-rc2, 2.4.21-15.0.3 (RHEL3), 2.6.5-7.95 (SuSE). - x86: enforce -fno-unit-at-a-time with gcc-3.4, to prevent kernel crashes due to stack overflow in 2.6 kernels < 2.6.6. - x86: do sync_core() before rdtsc() in the internal micro- benchmarking code, to avoid bogus data on K8 processors. - x86: prevent stray WRMSR at driver init time, which could disable the NMI watchdog or Oprofile. - x86: prevent inlining from breaking code backpatching mechanism. - x86: fix CONFIG_X86_LOCAL_APIC=n linkage error in init tests. - PPC32: fix to allow 7400/7410 to specify MMCR2[THRESHMULT]. - PPC32: add support for generic processors using only the timebase register for high-resolution time measurements. Version 2.6.8, 2004-05-29 - Added support for the IBM PowerPC 750GX processor. - Updated kernel support: 2.4.27-pre3, 2.6.7-rc1, 2.6.7-rc1-mm1, 2.4.21-15.EL (RHEL3), 2.6.5-1.358 (FC2). - Fixed an error in the 2.4.21-193 SuSE kernel patch file which broke compilation on x86-64. - Perfctr and Oprofile can now coexist safely in newer kernels, thanks to changes in kernel 2.6.6. Backported support for those changes from perfctr-2.7.1. Version 2.6.7, 2004-05-04 - Merged several x86_64-specific driver files with their x86 counterparts, reducing the amount of duplicated code. - Added textual descriptions to the library's P6 event sets. From Bryan O'Sullivan. - Changed examples/signal/signal to count retired instructions instead of retired micro-operations on AMD K7. Needed to avoid a loop with the same instruction overflowing indefinitely. - Updated kernel support: 2.6.6-rc3, 2.4.27-pre1, 2.4.22-1.2188.nptl (FC1), 2.4.21-9.0.1 (RHEL3), 2.4.20-31.9 (RH9). Version 2.6.6, 2004-02-21 - Pentium-M has an undocumented local APIC quirk which can stop perfctr interrupt delivery. Added workaround to prevent this. - Fixed a bug in x86-64's perfctr interrupt entry code in 2.4 kernels. Luckily, the bug turned out to be harmless (a bogus "rip" value was retrieved, but never used by the higher-level interrupt handler). - Added support for Pentium 4 Model 3 processors, which have slight event set changes from earlier models. - Updated kernel support: 2.6.3, 2.4.25, 2.4.22-1.2174.nptl (FC1), 2.4.20-30.9 (RH9), and 2.4.21-193 (SuSE). Removed support for some obsolete FC1 and RH update kernels. Version 2.6.5, 2004-01-26 - Relaxed and corrected control checks on Pentium 4: * Allow ESCR.CPL_T1 to be non-zero when using global-mode counters on HT processors. * Don't require ESCR.CPL_T0 to be non-zero. CPL_T0==0b00 is safe and potentially useful (global counters on HT). * Require CCCR.ACTIVE_THREAD==0b11 on non-HT processors, as documented in the IA32 Volume 3 manual. Old non-HT P4s seem to work Ok for all four values, but this is neither guaranteed nor useful. - Per-process counters driver updated for filp->f_mapping change in 2.6.2-rc kernels. - Support 2.4.21-9.EL (RHEL3) and 2.4.22-1.2149.nptl (FC1) kernels. - Library updates for PowerPC: * Added cpu_type constants for struct perfctr_info. * Decode PVR and define perfctr_info.cpu_type accordingly. * Added event set descriptions for 604/604e/750. Version 2.6.4, 2004-01-12 - Added support for PowerPC 604/7xx/74xx processors. * Overflow interrupts are not yet supported due to a hardware erratum affecting many 7xx and early 74xx processors. * The user-space components support PowerPC, but CPU detection and event set descriptions are not yet implemented. * Supported in 2.6.1 and 2.4.23 and newer 2.4 kernels. - Updated kernel support: 2.6.1, 2.4.25-pre4, 2.4.22-1.2140.nptl (FC1 update), 2.4.21-4.0.2.EL (RHEL update), and 2.4.20-28.x (RH 7.x/8.0/9 update). Version 2.6.3-pl1, 2004-01-01 - Updated kernel support: 2.6.1-rc1, 2.4.24-pre3, 2.4.22-1.2135.nptl (FC1 update), 2.4.21-6.EL (RHEL Taroon beta update), and 2.4.20-27.x (RH 7.x/8.0/9 update). - Moved the x86 performance counter interrupt handler code from the driver source to the kernel, via the patch kit. Needed to cope with changes in RedHat's 2.4.21-6.EL kernel. This change only affects 2.4.21 and later 2.4 kernels. Version 2.6.3, 2003-12-21 - Fixed a bug where a read of the global-mode counters could fail with EOVERFLOW due to an incorrect structure descriptor. The bug only existed in perfctr-2.6.2. (Thanks to Pavel Machek for reporting this problem.) - AMD64 IA32 emulation code cleaned up for kernel 2.4.23. - Updated kernel support: 2.6.0, 2.4.24-pre1, 2.4.23, 2.4.22-1.2129.nptl (FC1 update), 2.4.21-1.1931.2.393.ent (RHEL Taroon beta), and 2.4.20-24 (RH 7.x/8/9 update). - User-space package rpm spec file fixes: * Don't remove /dev/perfctr on package uninstall. * Don't add alias to /etc/modules.conf if it's already there. Version 2.6.2, 2003-11-23 - libperfctr.so is now installed with proper versioning. - ABI control and info structures padded to accommodate some extensions without breaking application/library binary compatibility. ABI version incremented to '5'. - Driver checks that only P4 models <= 2 use IQ_ESCR0/1. - Added support for Fedora Core 1's 2.4.22-1.2115.nptl kernel. - Driver compile fix for AMD64 in SMP 2.6 kernels. Version 2.6.1, 2003-10-05 - Opening a process' virtual perfctrs is now done via /dev/perfctr instead of /proc//perfctr. This is needed due to the changed semantics for /proc/self and /proc// in kernel 2.6.0-test6. User-space is not affected since the perfctr-2.6 API and user-space library was prepared for this access method change. User-space code monitoring other processes should use gettid() to identify tasks in 2.6 kernels, since getpid() does the wrong thing for process threads. - Driver cleanups from obsoleting 2.4.15 and older kernels. - Made examples/global/global.c more robust. - Simplified usage with 2.6 kernels: it's no longer necessary to add an 'alias' declaration in /etc/modprobe.conf. - Added support for AMD K8 Revision C processors. Version 2.6.0, 2003-09-08 - The driver now kills a process' performance counters if the process migrates to a forbidden CPU. This ensures that unsafe changes to a process' CPU affinity mask don't break the driver, the hardware state, or other processes. (This is an issue on hyper-threaded P4s only.) - A bug fix in perfctr-2.6.0-pre3 broke compiling the driver non-modular in modular 2.4 kernels. Corrected that problem. Version 2.6.0-pre5, 2003-08-31 - Disabled driver debug code which could printk() in the kernel's context-switch path, as that is disallowed. - 2.4.16 is now the oldest supported kernel. - Compilation fixes for driver's ia32 emulation code on x86-64. Version 2.6.0-pre4, 2003-08-19 - Kernel/user-space API switched to a new "sparse marshalling" mechanism, which supports x86 application code on x86-64, and API struct extensions w/o breaking binary compatibility. - Prepared the library for the future non-/proc/pid/perfctr API. - Fixed a bug in the per-process perfctr creation code. The remote-control interface was racy in preemptible kernels. - Fixed a bug in the process exit code for preemptible kernels. - Changes to handle 2.6 kernels with the cpumask_t patch (-mm, -osdl): * Driver converted to use cpumask_t API, with compatibility wrapper for cpumask_t-free kernels. * API change: removed the cpus and cpus_forbidden sets from the perfctr_info struct, added new data type and commands for retrieving these sets. (cpumask_t values cannot be exported as-is since their sizes depend on kernel configuration, and the type definition uses 'long' which breaks 32/64-bit binary compatibility.) * Updated library and example programs for the API change. - Fixed a dependency bug in the library Makefile. - Added support for VIA C3 Antaur/Nehemiah processors. Version 2.6.0-pre3, 2003-08-03 - Replaced 'long' by 'int' in the API structures to eliminate unnecessary ABI incompatibilities between x86 and x86-64. - Simplified global-mode perfctrs API: the write-control and read-state commands now operate on a single CPU instead of on a set of CPUs. Added a new start command to start the counters. - Added thin library wrappers for per-process perfctr kernel calls. Cleaned up examples/perfex and the library itself. - Removed the requirement that CCCR.ACTIVE_THREAD == 3 on P4. - Extended cascading should now work on Pentium 4 Model 2 processors. - Fixed a bug where the perfctr module's refcount could be zero with code still running in the module. This could race with rmmod in preemptive kernels, and in theory also in SMP kernels. Version 2.6.0-pre2, 2003-07-13 - Per-process perfctrs API fixes: control data is retrieved using new READ_CONTROL operation, mmap()ed state no longer exposes the control data, the SAMPLE operation is renamed to READ_SUM and now updates a given user-space buffer, non-write operations are permitted on dead perfctrs. Retrieving control explicitly makes the user-visible mmap()ed state binary compatible between x86 and x86-64. The other changes simplify the user-space library and allow perfex to replace raw mmap() accesses with higher-level operations. - Driver cleanups, including eliminating many #ifdefs and removing some unnecessary P4-specific driver procedures. - Fixes for macro redefinition warnings in the 2.4.22-pre3 kernel. - Perfctr library RPM spec file updates from Bryan O'Sullivan. Version 2.6.0-pre1, 2003-07-02 - Rearranged the data structure holding the counter state to reduce the number of caches lines needed to be touched at key operations. The new representation is also binary compatible between x86 and x86-64, which matters since user-space mmaps() it. - Added RPM spec file for the library. (From Bryan O'Sullivan). - Patch kit updated for kernels 2.4.22-pre2 and 2.5.73. Version 2.5.5, 2003-06-15 - Updates for driver model changes in kernel 2.5.71. - Minor updates to the library's event descriptions for Pentium 4. - Now supports SuSE's 2.4.19.SuSE-206 kernel for SLES 8 users. Autodetection of SuSE kernel versions is not yet implemented: pass "--patch=2.4.19.SuSE-206" to perfctr's update-kernel script to ensure that the correct patch is applied. - Patch kit updates for 2.4.21 final and 2.4.20-18 RH kernels. Version 2.5.4, 2003-06-01 - Corrected the driver's handling of OVF_PMI+FORCE_OVF counters on Pentium 4. This configuration didn't work at all, and lead to various BUG messages from the driver. These restrictions apply to OVF_PMI+FORCE_OVF counters: * The ireset value must be -1. * Once the counter has interrupted once, it will continue to interrupt when the faulting instruction is restarted, causing it to never complete. This problem also occurs for non-FORCE_OVF interrupt-mode counters if the ireset value is of too small magnitude, like -1. This appears to be a P4 hardware quirk. Don't restart FORCE_OVF interrupt-mode counters, and don't use ireset values too small to allow instructions to complete. - Updated library's K8 event descriptions to match current documentation. Corrected several omissions and errors. - Patch kit updated for kernels 2.5.70 and 2.4.21-rc6. Version 2.5.3.1, 2003-05-21 - Patch kit updated for recent RedHat 6.2/7.x/8.0/9 update kernels (2.2.24-{6.2.3,7.0.3} and 2.4.20-13.{7,8,9}). - Fixed a driver compile warning which occurred when the driver is built as a module in 2.4 SMP kernels using module versions. - x86-64 now uses 'long long' for 64-bit sums, like x86. This reduces x86 and x86-64 user-space source code incompatibility. Version 2.5.3, 2003-05-16 - Added support for the Pentium M processor. It is mostly like a Pentium III with some more events, except that six old Pentium III / Pentium Pro events have been redefined. - Added support for K8 in 64-bit mode (the x86_64 kernel arch). Updated driver, user-space library, and example programs. The shared library libperfctr.so is now compiled with -fPIC. - K8 bug fix in examples/signal/signal.c: a missing INT flag caused the driver to reject the control setup. - P4 event descriptions updated from recent documentation changes. Version 2.5.2, 2003-04-13 - Updated power management code for the local APIC and NMI watchdog driver model changes in kernel 2.5.67. - Timer-based sampling of per-process performance counters is now always enabled: previously it was only done on SMP. Needed to avoid counter inaccuracies on high core-clock CPUs. - Fixes to user-space library implementation of remote-control virtual performance counters: open() failed due to a missing return; avoid potential buffer overflow error; fix the "read counters" procedure for the case where the remote process is sampling the time-stamp counter but no performance counters. - Added support for RedHat 9's 2.4.20-8 and 2.4.20-9 kernels. Version 2.5.1, 2003-03-23 - Fixed initialisation on hyper-threading capable P4s in SMP kernels older than 2.4.15 to not signal an error if hyper-threading is disabled: in this case the absence of working set_cpus_allowed() support is not a problem. - Fixed two compilation errors in the set_cpus_allowed() emulation affecting old 2.4 kernels configured for SMP. - INSTALL file updates. Version 2.5.0, 2003-03-10 - Added a simple user-space library API for accessing other processes' virtual performance counters. This uses a new type and a new set of operations since remote access has different requirements than accessing one's own counters. Following Mike Marty's suggestion, I left out the process control calls needed around these operations (ptrace() and wait()), so applications must handle that themselves. - Added 'make install' support for the user-space components. - Driver API cleanups. The 'eventsel_aux[]' array in 'struct perfctr_cpu_control' has been renamed as 'escr[]' and has been moved into the 'p4' sub-structure. (The change highlights the fact that this field was and is P4-only.) The 'version[]' string in 'struct perfctr_info' has been renamed to 'driver_version[]', since perfctr_info now also contains an 'abi_version' field. Some changes in the driver ABI: while not strictly necessary, they clean things up and make room for future changes. The ABI changed anyway from perfctr-2.4, so this shouldn't be a problem. - Added a perfctr_cpu_control_print() procedure to the library, and updated the example programs to use it. - Updated the perfex example program's help text to describe the syntax and meaning of event specifiers. - Patch kit updates for 2.2.24/2.4.18-26(RedHat)/2.5.64 kernels. Version 2.5.0-pre2, 2003-03-03 - Added a way for user-space to query the driver's ABI version, and updated the library to check it. - Fixed to not include when perfctr hasn't been configured. This allows the patched kernel source to compile cleanly also in archs not supported by perfctr. - Major patch kit overhaul. Updated configuration help texts. Removed unnecessary features and patches. Some cleanups. Added aliasing support to the 'update-kernel' script, which allows a patch to serve several kernels (when applicable). - The perfctr configuration option was poorly placed. It is now at the end of the "Processor type and features" menu. - Removed "notsc" kernel option support from the 2.2 kernel patches. To use the driver with an IDT WinChip (Centaur C6/2/3) CPU now requires a newer kernel with native "notsc" support. - Driver fixes for changes in the 2.4.21-pre5 and 2.5.63 kernels. Version 2.5.0-pre1, 2003-02-19 - Fixed the driver's API to support global-mode perfctrs on 2.5 SMP kernels and asymmetric hyper-threaded P4 multiprocessors. Updated examples/global/global.c for the new API. - Minor library cleanups. Updated example programs accordingly. - API cleanup: Removed obsolete STOP command from the driver for virtual perfctrs. The library now uses CONTROL instead. - Proper detection and support for AMD K8 processors. They are similar to the K7s, but the event sets are not identical. - The library's event set descriptions have been redesigned and expanded to include unit mask descriptions and descriptions of Intel P4 and AMD K8 events. The etc/perfctr-events.tab text file has been removed since event_codes.h now is generated from the library's data structures. Version 2.4.5, 2003-02-09 - Corrected the unit mask definition for the K7 SYSTEM_REQUEST_TYPE event in etc/perfctr-events.tab: WC is 0x02 not 0x04. - Fixed two compile warnings which could be triggered in 2.5 kernels. - Patch kit updates for 2.4.21-pre4/2.4.18-24(RedHat)/2.5.59-osdl2 kernels. Version 2.4.4, 2003-01-18 - Fixed a context-switch bug where an interrupt-mode counter could increment unexpectedly, and also miss the overflow interrupt. - Fixed some ugly log messages the new HT P4 support code added in perfctr-2.4.3 could generate at driver initialisation time. - Added preliminary support for AMD K8 processors with the regular 32-bit x86 kernel. The K8 performance counters appear to be identical or very similar to the K7 performance counters. Version 2.4.3, 2002-12-11 - Support for hyper-threaded Pentium 4s added. In a HT P4, the two logical processors share the performance counter state. HT P4s are therefore _asymmetric_ multi-processors, and the driver enforces CPU affinity masks on users of per-process performance counters to avoid resource conflicts. (Users are restricted to logical processor #0 in each physical CPU.) Limitations: * The kernel mechanism for updating a process' CPU affinity mask uses no or very weak locking, which makes certain race conditions possible that can break the driver's CPU affinity mask restrictions. For now, users should NOT use the sched_setaffinity() system call on processes using per-process performance counters. * Global-mode performance counters don't work on HT P4s due to limitations in the API. This will be fixed in perfctr-2.5. * 2.2 kernels don't have CPU affinity masks, and therefore can't support HT P4s. Version 2.4.2, 2002-11-25 - Fixed a driver bug where it could fail to prevent simultaneous use of global-mode and per-process performance counters. - Made the driver safe for preemptible 2.5 kernels. - New patches for RedHat update kernels 2.2.22-6.2.2, 2.2.22-7.0.2, 2.4.18-18.7.x, and 2.4.18-18.8.0. Version 2.4.1, 2002-10-12 - Support RedHat 8.0's 2.4.18-14 kernel. Building perfctr as a module caused a namespace clash in this kernel. The fix required a change to the driver's kernel-resident glue code. Version 2.4.0, 2002-09-26 - Fixed an overly strict access control check which prevented opening another process' /proc//perfctr when the driver was built as a module. - Updates for kernels 2.2.22, 2.4.18-10-redhat, 2.4.20-pre8, 2.5.36. Version 2.4.0-pre2, 2002-08-27 - vperfctr_control() now allows the user to specify that some PMC sums are not to be cleared when updating the control. There is a new bitmap field `preserve' in struct vperfctr_control: if bit i is set then PMC(i)'s sum is not cleared. `preserve' is a simple `unsigned long' for now, since this type fits all currently known CPU types. This change breaks binary compatibility, but user-space code which clears the entire control record before filling in relevant fields will continue to work as before after a recompile. This feature removes a limitation which some people felt was a problem for some usage scenarios. Version 2.4.0-pre1, 2002-08-12 - The kernel driver has an initial implementation of a new remote-control API for virtual per-process perfctrs. A monitor process may access a target process' perfctrs via open(), mmap(), and ioctl() on the target's /proc/pid/perfctr. For open() and ioctl(), the monitor must hold the target under ptrace ATTACH control. The user-space library and examples have not been updated for the new API. Version 2.3.12, 2002-08-12 - Updated patch kit for the 2.4.19 final kernel. - Spelling fix in INSTALL. - Minor driver code size reduction on uniprocessor kernels. Version 2.3.11, 2002-07-21 - Interrupt-mode performance counters now have accumulated sums. The library procedures vperfctr_read_pmc() and vperfctr_read_ctrs() can now retrieve the sums of interrupt-mode counters. - Corrected the name of K7 event 0x42 to DATA_CACHE_REFILLS_FROM_L2. Version 2.3.10, 2002-07-19 - Added a script, `update-kernel', to simplify the process of patching the kernel source code. See INSTALL for details. - The counter and control registers are now cleared when the driver is idle. This should allow the counter hardware to power down when not used, especially on P4. - Some Pentium MMX and Pentium Pro processors have an erratum which causes System Management Mode to shut down if user-space has been granted access to the RDPMC instruction. The driver now avoids granting RDPMC access on the affected processors. The user-space library makes this change transparent. - New CPU type code for Model 2 Pentium 4s, due to a few but significant changes between Model 0 and 1 and Model 2 CPUs. - The driver now supports Replay Tagging on the Pentium 4. The perfex program has been updated to allow users to specify values to store in PEBS_ENABLE and PEBS_MATRIX_VERT. For example, the following command could be use to count the number of L1 cache read misses on a Pentium 4: perfex -e 0x0003B000/0x12000204@0x8000000C --p4pe=0x01000001 --p4pmv=0x1 some_program Explanation: IQ_CCCR0 is bound to CRU_ESCR2, CRU_ESCR2 is set up for replay_event with non-bogus uops and CPL>0, and PEBS_ENABLE and PEBS_MATRIX_VERT are set up for the 1stL_cache_load_miss_retired metric. Note that bit 25 is NOT set in PEBS_ENABLE. Version 2.3.9, 2002-06-27 - Pentium 4 bug fix: An error in older revisions of Intel's IA32 Volume 3 manual caused the driver to program the wrong control register in a few cases, affecting uses of the uop_type event. Revision -007 of Intel document #245472 corrects the error, and the driver has been updated accordingly. Version 2.3.8.1, 2002-06-27 - Regenerated the patch file for RedHat's 2.4.18-5 kernel. The patch file in 2.3.8 only contained an error message from 'diff'. Version 2.3.8, 2002-06-26 - Added counter overflow interrupt support for Intel P4. - New kernel support: standard kernels 2.2.21 and 2.4.19-rc1, and RedHat kernels 2.2.19-7.0.16, 2.4.9-34, and 2.4.18-5. - API changes: Removed unused and obsolete fields from the vperfctr state and control objects. Added fields to perfctr_cpu_control to enable future support for P4 replay tagging events. Incremented the vperfctr mmap() binary layout magic number. - Changed the "make" rule in INSTALL to build "vmlinux" before "modules". This change is needed for RedHat kernels. - Added build of a shared (.so) version of the user-space library. - When changing a process' vperfctr control data, the TSC sum is now preserved if the next control state includes the TSC. It used to be preserved only if both the previous and next states included the TSC. The difference matters when a running TSC is stopped and then restarted by a STOP;CONTROL command sequence. - Driver cleanups. Merged P6 and K7 driver procedures. Version 2.3.7, 2002-04-14 - Added Pentium 4 support to examples/perfex/. The full syntax of an event specifier is now "evntsel/aux@pmc". All three components are 32-bit processor-specific numbers, written in decimal or hexadecimal notation. "evntsel" is the primary processor-specific event selection code to use for this event. This field is mandatory. "/aux" is used when additional event selection data is needed. For the Pentium 4, "evntsel" is put in the counter's CCCR register, and "aux" is put in the associated ESCR register. No other processor currently needs this field. "@pmc" describes which CPU counter number to assign this event to. When omitted, the events are assigned in the order listed, starting from 0. Either all or none of the event specifiers should use the "@pmc" notation. Explicit counter assignment via "@pmc" is required on Pentium 4 and VIA C3 processors. As an example, the following command could be used to count the number of retired instructions on a Pentium 4: perfex -e 0x00039000/0x04000204@0x8000000C some_program Explanation: Program IQ_CCCR0 with required flags, ESCR select 4 (== CRU_ESCR0), and Enable. Program CRU_ESCR0 with event 2 (instr_retired), NBOGUSNTAG, CPL>0. Map this event to IQ_COUNTER0 (0xC) with fast RDPMC enabled. - The driver now permits cascading counters on the Pentium 4. - Preliminary driver infrastructure to support ptrace(ATTACH) for a future remote-control interface to per-process counters. - Driver and patch kit updated for the APIC interrupt entries changes in kernel 2.5.8-pre3. Version 2.3.6, 2002-03-21 - Fixed a problem with caused "BUG! resuming non-suspended perfctr" warnings when running PAPI's test cases with a DEBUG-compiled perfctr driver. There was no actual error, only a mismatch between the debug code and the code for changing event selection data. - Fixed a time-stamp counter accounting error when user-space resumed interrupt-mode perfctrs with the VPERFCTR_IRESUME ioctl. Version 2.3.5, 2002-03-17 - Multiprocessor AMD K7 machines should work now. A bug in current 2.2/2.4/2.5 kernels prevented correct CPU identification on these machines, causing crashes. The driver now works around this bug. - Added support for the VIA C3 Ezra-T processor. - Added some support for interrupt-mode counters to the library. Cleaned up examples/signal/. - Added links in OTHER to John Reiser's tsprof and Troy Baer's lperfex tools. Version 2.3.4, 2002-01-23 - More detailed installation instructions in INSTALL. - Experimental support for at-retirement counting on Pentium 4. Updated examples/global/ to count FLOPS on Pentium 4. - Fixed uses of __FUNCTION__ to comply with changes in GCC 3.0.3. Version 2.3.3, 2001-12-31 - Added support for the 2.4.16 and 2.4.17 kernels. - SMP bug fixed: if a process using interrupt-mode counters migrates from CPU1 to CPU2 and then back to CPU1, then it could incorrectly resume the stale state cached in CPU1. - P6 bug fixed: when a process resumed, it could inadvertently activate a suspended interrupt-mode counter belonging to the previous process using the performance counters. - Pentium 4 bug fixed: could fail to update the control registers on a context switch. - Removed the "pmc_map[] must be the identity function" restriction from P6 and K7. - Updated examples/global/global.c: added Pentium 4 support (preliminary, counting MIPS not FLOPS), corrected VIA C3 handling, and corrected 32-bit integer overflow problems affecting fast CPUs. - Removed perfctr_evntsel_num_insns() from the library: the interface could not support the Pentium 4. examples/self/self.c now does the setup all by itself, with Pentium 4 support. Version 2.3.2, 2001-11-19 - Corrected an error in the driver's mapping from counter number to control registers on the Pentium 4. Counter 17 didn't work, and attempts to use it could have disturbed other counters as well. - Fixed a minor omission in the Pentium 4 initialisation code. Version 2.3.1, 2001-11-06 - New patches for kernels 2.2.20, 2.4.9-13 (RedHat 7.2 update), 2.4.13-ac5, and 2.4.14. Minor cleanup in the P4 driver code. Version 2.3, 2001-10-24 - Added support for multiple interrupt-mode virtual perfctrs with automatic restart. Updated the signal delivery interface to pass a bitmask describing which counters overflowed; the siginfo si_code is now fixed as SI_PMC_OVF (fault-class). - Added EXPORT_NO_SYMBOLS to init.c, for compatibility with announced changes in modutils 2.5. - Patch set updated for recent kernels. Version 2.2, 2001-10-09 - Added preliminary Pentium 4 support to the driver, but only for the simple basic features. The example applications have not been updated, since I don't yet have a Pentium 4 for testing. Version 2.1.4, 2001-09-30 - Added -l/-L (--list/--long-list) options to examples/perfex to have it list the current CPU's available events. - Added 'set of events' descriptors for each supported CPU type to the library, and changed it to be a standard archive file. - Performance counter interrupts now work in standard kernels, starting with kernel 2.4.10. Updated README. Version 2.1.3, 2001-09-13 - Fixed a problem which prevented compiling the driver as a module in kernels older than 2.2.20pre10 if CONFIG_KMOD was disabled. - Cleaned up command-line option processing in perfex. It now uses the GNU getopt library and accepts long option names. - Fixed a typo in perfctr-events.tab (P6's INST_DECODED was misspelled as INST_DECODER), and updated/corrected several unit mask descriptions. - Replaced most occurrences of "VIA Cyrix III / C3" with "VIA C3". Version 2.1.2, 2001-09-05 - Added MODULE_LICENSE() tag, for compatibility with the tainted/ non-tainted kernel stuff being put into 2.4.9-ac and modutils. - The VIA C3 should be supported properly now, thanks to tests run by Dave Jones @ SuSE which clarified some aspects of the C3. - Minor bug fix in the perfctr interrupt assembly code. (Inherited from the 2.4 kernel. Fixed in 2.4.9-ac4.) Version 2.1.1, 2001-08-28 - Fixed a bug in the finalise backpatching code, which could cause a kernel hang in some configurations. - Updated for kernel 2.4.9-ac3, which required changes to avoid conflicts in the %cr4 access methods. - Preliminary code to detect Pentium 4 processors with Performance Monitoring features available. - Minor %cr4-related cleanups. - Minor documentation updates. - Added a link in OTHER to Curtis Janssen's vprof tool. Version 2.1, 2001-08-19 - Fixed a call backpatching bug, caused by an incompatibility between the 2.4 and 2.2 kernels' xchg() macros. - Fixed a bug where an attempt to use /proc//perfctr on an unsupported processor would cause a (well-behaved) kernel oops. - The WinChip configuration option has been removed, and WinChip users should instead pass "notsc" as a boot-time kernel parameter. This permitted a cleanup of the driver and the 2.4 kernel patches, at the expense of more code in the 2.2 kernel patches to implement "notsc" support. Version 2.0.1, 2001-08-14 - The "redirect call" backpatching code in the low-lever driver has been changed again. The change in 2.0-pre6 was insufficient, due to a nasty SMP-related erratum in all Intel P6 processors. - Added support for 2.4.8/2.4.8-ac1 kernels. - Removed an obsolete check from the WinChip support code. Version 2.0, 2001-08-08 - Resurrected partial support for interrupt-mode virtual perfctrs. virtual.c permits a single i-mode perfctr, in addition to TSC and a number of a-mode perfctrs. BUG: The i-mode PMC must be last, which constrains CPUs like the P6 where we currently restrict the pmc_map[] to be the identity mapping. (Not a problem for K7 since it is symmetric, or P4 since it is expected to use a non-identity pmc_map[].) - Bug fix in perfctr_cpu_update_control(): start by clearing cstatus. Prevents a failed attempt to update the control from leaving the object in a state with old cstatus != 0 but new control. Version 2.0-pre7, 2001-08-07 - Updated user-space library: * Coding tweaks to attempt to make gcc (various versions) generate better code. (Not entirely successful. May have to resort to hand-written assembly code.) * New vperfctr_read_ctrs() sampling procedure. * New perfctr_print_info() helper procedure. - Updated example applications: * Use the library's perfctr_print_info() for consistent output. * Counts are now printed in decimal, not hex. * 'perfex' now checks for data layout mismatch when the child process' virtual perfctr is mmap:ed into user space. * 'self' uses the new vperfctr_read_ctrs() sampling procedure. * 'signal' compiles again. - Cleaned up the driver's debugging code. - Internal driver rearrangements. The low-level driver (x86) now handles sampling/suspending/resuming counters. Merged counter state (sums and start values) and CPU control data to a single "CPU state" object. This simplifies the high-level drivers, and permits some optimisations in the low-level driver by avoiding the need to buffer tsc/pmc samples in memory before updating the accumulated sums (not yet implemented). - Removed WinChip "fake TSC" support. The user-space library can now sample with slightly less overhead on sane processors. Version 2.0-pre6, 2001-07-27 - Sampling bug fix for SMP. Normally processes are suspended and resumed many times per second, but on SMP machines it is possible for a process to run for a long time without being suspended. Since sampling is performed at the suspend and resume actions, a performance counter may wrap around more than once between sampling points. When this occurs, the accumulated counts will be highly variable and much lower than expected. A software timer is now used to ensure that sampling deadlines aren't missed on SMP machines. - Bug fix in the x86 "redirect call" backpatching routine. - Bug fix in the internal debugging code (CONFIG_PERFCTR_DEBUG). - Minor performance tweak for the P5/P5MMX read counters procedures. - To avoid undetected data layout mismatches, the user-space library now checks the data layout version field in a virtual perfctr when it is being mmap:ed into the user's address space. Version 2.0-pre5, 2001-06-11 - Structure layout changes to reduce sampling overheads. The ABI changed slightly, but I hope this is the last such change for some time. - Fixed two bugs related to the interaction of interrupt-mode perfctrs and the lazy EVNTSEL MSR update cache in the low-level driver. (Interrupt-mode support is still disabled in the high-level drivers, however.) - Fixed a bug in examples/perfex where it forgot to initialise the pmc_map[] control field. This caused the driver to refuse attempts to use more than one counter. The current fix is for P6/K7 only; a general "fixup" procedure will be added to the user-space library later. - Added a CONFIG_PERFCTR_DEBUG option to enable some internal consistency checking in the driver. This is a temporary measure intended to help debug two open problem reports. Version 2.0-pre4, 2001-04-30 - Some module usage accounting changes which should make automatic module loading and unloading more robust in 2.2 kernels. - Internal cleanups and a few minor bug fixes. - Some API naming changes, and O_CREAT can now be used to control whether opening /proc/self/perfctr should create and attach a vperfctr or not. - The user-space library has been updated for the new API. pmc_map[] is used to map from "virtual counter i" to an actual PMC index to be used by RDPMC -- the VIA Cyrix III / C3 is now able to sample in user-space even though it has no PMC(0). The layout of pmc_map[] is CPU-specific; see x86.c for details. Since TSC sampling is specified explicitly now, perfctr_cpu_nrctrs() has been changed to return the number of performance counters _excluding_ the TSC. - The example programs have been updated for the new API, with the exception of signal.c which is still non-functional. - The perfex.c example works better now that the API has a consistent one-evntsel-per-counter model even for Intel P5-like CPUs. - The global.c example has been fixed to not cause a division by zero on WinChip CPUs lacking a working TSC. Version 2.0-pre3, 2001-04-17 - Preliminary implementation of the new data structures and API is in place. The user-space components have not yet been updated. Interrupt-mode virtual perfctrs have been disabled pending completion of necessary CPU driver support. - Now uses "VIA_C3" as the family name for both the VIA C3 and the slightly older VIA Cyrix III processors. "VIA_CYRIX_III" was just too clumsy and confusing. (It's not a Cyrix at all.) - Fixed etc/perfctr-events.tab to make Cyrix' event codes agree with reality rather than with the Cyrix manuals. The manuals ignore the fact that the 7-bit event codes are stored in two distinct bit fields in the CESR. Version 2.0-pre2, 2001-04-07 - Removed automatic inheritance of per-process virtual perfctrs across fork(). Unless wait4() is modified, it's difficult to communicate the final values back to the parent: the now abandoned code did this in a way which made it impossible to distinguish one child's final counts from another's. Inheritance can be implemented in user-space anyway, so the loss is not great. The interface between the driver and the rest of the kernel is now smaller and simpler than before. - Dropped support for kernels older than 2.2.16. - Preliminary support for the VIA C3 processor. Version 2.0-pre1, 2001-03-25 - First round of API and coding changes/cleanups for version 2.0. The driver version in struct perfctr_info is now a string instead of the previous major/minor/micro version number mess. - Internal cleanups and minor fixes. - Fixed an include file problem which made some C compilers (not gcc) fail when compiling user-space applications using the driver. Version 1.9, 2001-02-13 - Fixed compilation problems for 2.2 and SMP kernels. - Corrected VIA Cyrix III support. The "VIA Cyrix III" product has apparently used two distinct CPUs. Initial CPUs were a Cyrix design (Joshua) while current CPUs apparently are a Centaur design (Samuel). Added support for "Samuel" CPUs. - Two corrections in the K7 perfctr event list. - Small tweaks to vperfctr interrupt handling. - Added preliminary interrupt-mode support for AMD K7. Version 1.8, 2001-01-23 - Added preliminary interrupt-mode support to virtual perfctrs. Currently for P6 only, and the local APIC must have been enabled. Tested on 2.4.0-ac10 with CONFIG_X86_UP_APIC=y. When an i-mode vperfctr interrupts on overflow, the counters are suspended and a user-specified signal is sent to the process. The user's signal handler can read the trap pc from the mmap:ed vperfctr, and should then issue an IRESUME ioctl to restart the counters. Version 1.7, 2001-01-01 - Updated patches for kernels 2.2.18 and 2.4.0-prerelease. - Removed the need to ./configure the library before building it. - /dev/perfctr is now only used for global-mode perfctrs. - Library API changes to reflect new /dev/perfctr semantics. - Backported /proc/self/perfctr to kernels 2.2.13-2.2.17. - /proc/self/perfctr is now mandatory for virtual perfctrs. - Fixed a VIA Cyrix III CPU detection bug. - Fixed a minor problem in the 2.4 patch to drivers/Makefile. - Changed examples/global/global.c to count MFLOP/s instead of branches and branch prediction hits/misses. Version 1.6, 2000-11-21 - Updated for kernels 2.4.0-test11 and 2.2.18pre22. - Preliminary implementation of /proc/self/perfctr as a more direct way of accessing one's virtual perfctrs. (If this works out, the /dev/perfctr interface to vperfctrs will be phased out.) The driver can still be built as an autoloadable module. (For now, only supported in 2.2.18pre22 and 2.4.0-test11.) - Some user-space library API changes to accommodate /proc/self/perfctr. - The per-process virtual TSC is no longer restarted from zero when the perfctrs are reprogrammed, which allows it to be used as a high-res per-process clock (i.e. gethrvtime()). - Rewrote the `command' example application to use perfctr inheritance instead of the recently removed "remote control" facility. - WinChip documentation updates and corrections. Version 1.5, 2000-09-03 - The virtual perfctr "remote control" facility has been removed, resulting in major simplifications in the driver. Since version 1.3 of the driver, the most common application of the remote control facility (to record events from unmodified applications) can be more easily implemented using the perfctr inheritance facility (perfctr control setup is inherited from parent to child processes, and a child's event counts are propagated back to its parent). Removing the remote control facility simplified resource management and eliminated a number of concurrency issues. - Code cleanups. Dropped support for intermediate 2.3 and early 2.4 kernels. The code now supports kernels 2.2.xx and 2.4.0-test7 or later only (via a 2.4-on-2.2 simulation layer). - A number of changes to the user-space library. The API is now thread- safe (the library has no internal state), and the naming scheme has been simplified due to the removal of the remote-control facility. The zero-syscall perfctr sampling code has been rewritten and should be faster and more robust. (It fixed a sampling problem one user had on a 4-way MP box.) Version 1.4, 2000-08-11 - Updates to comply which changes in 2.4.0-test kernels, in particular concerning module owner and use count tracking, and the Virtual File System (VFS) infrastructure. - A bug which prevented reclaiming VFS resources (dentries and inodes) allocated to virtual perfctrs has been fixed. This bug affected both 2.2.x and 2.4.0-test kernels. Version 1.3, 2000-06-29 - Implemented inheritance for per-process virtual perfctrs. This means that a child's performance-monitoring counts are attributed to its parent, similarly to how time is handled. The parent must have active perfctrs before forking off the child, and neither parent nor child must have reprogrammed its perfctrs when the child exits, otherwise no propagation occurs. Threads created implicitly by the kernel via request_module() are protected from perfctr inheritance. - Added an example program to illustrate inheritance. - Fixed two small buglets in the driver. - Preliminary changes to make the user-space library thread-safe. - Updated driver for kernel 2.4.0-test2. - The driver now exports the CPU clock frequency to user-space, to enable mapping of accumulated TSC counts to actual time. - Clarified that this package is licensed under the GNU LGPL. Version 1.2, 2000-05-24 - Added support for kernels 2.2.16pre4 and 2.3.99-pre9-5. - Added support for generic x86 processors with a time-stamp counter but no performance-monitoring counters. By using the driver to virtualise the TSC, accurate cycle-count measurements are now possible on PMC-less processors like the AMD K6. - Fixed a bug in the WinChip driver. - Miscellaneous code cleanups. Version 1.1, 2000-05-13 - Support for Linux kernels 2.2.14, 2.2.15 and 2.3.99-pre8. - Changes to the driver and user-space library to reduce the number of getpid() calls. (Suggested by Ulrich Drepper.) - Added support for the VIA Cyrix III processor. - Performance improvements in the x86 driver interface. - Some code cleanups. Version 1.0, 2000-01-31 - Support for Linux kernels 2.3.41, 2.2.15pre5, and 2.2.14. - Code cleanups in order to handle drivers for non-x86 processors. - Changes to the x86 drivers to reduce cache footprint and sampling overhead. (Sample low 32 bits of counters, but maintain 64-bit sums.) Version 0.11, 2000-01-30 - Support for Linux kernels 2.3.41 and 2.2.14. - Minor code cleanups and fixes. - The CR4.PCE flag is now globally enabled on x86, except for those processors which does not support it. This is done in part to reduce the overhead of virtualising the performance counters, but it is also necessary due to changes in kernel 2.3.40. Version 0.10, 2000-01-23 - Support for Linux kernels 2.3.40 and 2.2.14. - Global-mode performance counters are now implemented. - Added hardware support for the WinChip 3 processor. - More source code reorganisation. Version 0.9, 2000-01-02 - Support for Linux kernels 2.3.35, 2.2.14pre18, and 2.2.13. - The driver can now be built as a module. - The driver now installs itself as the /dev/perfctr device instead of adding a system call. - Significant source code reorganisation. Version 0.8, 1999-11-14 - Support for Linux kernels 2.3.28 and 2.2.13. - Major updates to reduce the overhead of maintaining virtual performance-monitoring counters: - The control registers are cached and updated lazily. - The counter registers are no longer written to. - Unused counters are no longer manipulated at all. (This matters especially for the AMD K7.) - Reduced the process scheduling overhead for processes not using performance-monitoring counters. - Minor code cleanups, bug fixes, and documentation updates. Version 0.7, 1999-10-25 - Support for Linux kernels 2.3.22 and 2.2.13. - Improved performance. (Uses RDPMC instead of RDMSR when possible.) - The AMD K7 Athlon should now work properly. - User-space now uses mmap() to read the kernel's accumulated counter state. - The driver is now invoked via a new sys_perfctr() system call, instead of abusing prctl(). - The kernel patch has been cleaned up. The "#ifdef CONFIG_PERFCTR" mess has been eliminated. Version 0.6, 1999-09-08 - Version 0.6 with support for Linux kernels 2.3.17 and 2.2.12. - Preliminary support for AMD Athlon added. Version 0.5, 1999-08-29 - Support for Linux kernel 2.3.15. - The user-space buffer is updated whenever state.status changes, even when a remote command triggers the change. - Reworked and simplified the high-level code. All accesses now require an attached file in order to implement proper accounting and syncronisation. The only exception is UNLINK: a process may always UNLINK its own PMCs. - Fixed counting bug in sys_perfctr_read(). - Improved support for the Intel Pentium III. - Another WinChip fix: fake TSC update at process resume. - The code should now be safe for 'gcc -fstrict-aliasing'. Version 0.4, 1999-07-31 - Support for Linux kernel 2.3.12. - Implemented PERFCTR_ATTACH and PERFCTR_{READ,CONTROL,STOP,UNLINK} on attached perfctrs. An attached perfctr is represented as a file. - Fixed an error in the WinChip-specific code. - Perfctrs now survive exec(). Version 0.3, 1999-07-22 - Support for Linux kernel 2.3.11. - Interface now via sys_prctl() instead of /dev/perfctr. - Added NYI stubs for accessing other processes' perfctrs. - Moved to dynamic allocation of a task's perfctr state. - Minor code cleanups. Version 0.2, 1999-06-07 - Support for Linux kernel 2.3.5. - Added support for WinChip CPUs. - Restart counters from zero, not their previous values. This corrected a problem for Intel P6 (WRMSR writes 32 bits to a PERFCTR MSR and then sign-extends to 40 bits), and also simplified the code. - Added support for syncing the kernel's counter values to a user- provided buffer each time a process is resumed. This feature, and the fact that the driver enables RDPMC in processes using PMCs, allows user-level computation of a process' accumulated counter values without incurring the overhead of making a system call. Version 0.1, 1999-05-30 - First public release for Linux kernel 2.3.3.