Blob Blame History Raw
$Id: RELEASE-NOTES,v 1.234.2.166 2010/11/07 19:48:14 mikpe Exp $

RELEASE NOTES
=============

Version 2.6.42, 2010-11-07
- x86.c: identify Intel Family 6 Models 37 and 44 as Westmere not Nehalem.
- x86.c: do_init_tests() calls perfctr_x86_init_tests() which is __init,
  therefore mark do_init_tests() also as __init.

Version 2.6.41, 2010-06-08
- x86: Add support for OFFCORE_RSP_{0,1} on Nehalem/Westmere.
- x86: Recognise Intel family 6 models 30 and 37 as Nehalems.
  Update comments mapping product lines to model numbers.
- x86: Rename PERFCTR_X86_INTEL_COREI7 CPU/PMU type constant to
  PERFCTR_X86_INTEL_NHLM. Update driver to print "Nehalem" rather
  than "Core i7" when a CPU of this type is detected.

Version 2.6.40, 2010-01-30
- x86: add comment after #endif terminating big kernel >= 2.6.19 block
- x86: handle cpumask API change in kernel 2.6.32
- x86: recognize Intel Family 6 Model 2Eh processors (Nehalem Xeon 7500).
- x86: recognize Intel Family 6 Model 2Ch processors (i7-980X, Gulftown).
- x86: recognize AMD Family 11h processors, support them as 10h ones.

Version 2.6.39, 2009-06-11
- ppc and arm: updates to match perfctr_cpu_update_control() changes,
  add missing #include <asm/cputype.h> to arm.c
- global.c: coding style fixups
- x86.c: update AMD multicore detection to match the documentation
  and actually work on current processors, set up cpumask of all
  core0 CPUs, detect RevE processors, update p6_like_check_control()
  to allow per-thread sessions to use AMD NB events on post-RevE
  processors but limit them to core0 CPUs
- x86.c: replace is_global parameter to perfctr_cpu_update_control()
  with a cpumask_t pointer, make P4 update this cpumask instead of
  hard-coding the use of perfctr_cpus_forbidden_mask in virtual.c,
  add cpumask to struct vperfctr, update virtual.c to use the cpumask
  from perfctr_cpu_update_control() not perfctr_cpus_forbidden_mask
  to derive the task's new cpumask, update set_cpus_allowed() callback
  to validate new cpumask against the vperfctr's private one, update
  global.c to pass a NULL cpumask_t pointer
- virtual.c: rearrange sys_vperfctr_control() so that set_cpus_allowed()
  comes after perfctr_cpu_update_control(), record updater's ->tgid so
  races with concurrent updaters can be detected and handled
- virtual.c: make vperfctrfs_dentry_operations 'const' in 2.6.30 and
  later kernels.
- x86.c: silence MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL redefinition warning.

Version 2.6.38, 2009-01-23
- Remove 2.4 kernel support from <linux/perfctr.h> (cpumask_t workaround).
- Remove 2.4 kernel support from build system.
- Remove 2.4 kernel support from .c files.
- Remove 2.4 kernel support from .h files.
- Kernel 2.6.29-rc1 changed remap_pfn_range() to WARN_ON when applied
  to plain RAM. Update virtual.c to use vm_insert_page() instead.
- Kernel 2.6.29-rc1 moved a task's fsuid/fsgid field to the ->cred
  struct. Update virtual.c to use current_fsuid() and current_fsgid().
  Update compat.h to supply these macros for older kernels.

Version 2.6.37, 2008-11-30
- x86: Preliminary Intel Core i7 support, limited to handling it
  as a Core2-like processor with four PMCs. The AnyThread evntsel
  flag and off-core/uncore monitoring are not yet supported.
- x86: Recognise Intel Family 6 Model 29 (Xeon 7400) as Core 2.
- x86: Make core2_clear_counters() also clear the FREEZE_PERFMON_ON_PMI
  bit in DEBUGCTLMSR. vtune leaves this bit set, which breaks perfctr.
- x86: Make perfctr_clear_counters() initialise MSR_CORE_PERF_GLOBAL_CTRL
  during initialisation. This fixes compatibility issues with drivers
  that may leave this register cleared == all counters disabled. vtune
  is reported to do this. Add ->clear_counters() op to perfctr_pmu_msrs
  to handle this cleanly. Convert Via C3 to use this mechanism instead
  of being a special case.

Version 2.6.36, 2008-10-19
- x86: Limit the value written to a fixed-function counter's MSR
  to 40 bits. Extraneous high bits cause GP faults on Model 23
  Core2s, while earlier processors would just ignore them.
- Kernel 2.6.27-rc1 dropped the retry parameter to on_each_cpu()
  and smp_call_function(). Adjust accordingly. Add compatibility
  wrappers for older kernels.
- Kernel 2.6.27-rc1 removed find_task_by_pid(). Migrate to new
  find_task_by_vpid(). Add compatibility wrapper for older kernels.
- Starting with kernel 2.6.27-rc1 one should use alloc_intr_gate()
  not set_intr_gate() on x86 when binding a specific vector, as this
  also marks the vector as allocated. Adjust <asm-x86/perfctr.h>.

Version 2.6.35, 2008-06-30
- x86: Preliminary Intel Atom support:
  * add Atom CPU type, it differs from all previous models
  * Atom is poorly documented, so query cpuid leaf 0xA for
    its architectural PMU capabilities; initial Atoms appear
    to have APM V3, 2 40-bit general-purpose counters, 7
    architectural events, and 1 40-bit fixed-function counter
- x86: Replace the p6_is_core2 flag with separate variables
  indicating (a) having per-evntsel enable bits, and (b) the
  number of fixed-function counters available.
- x86: intel_p6_init(): recognise Celeron model 16h and
  treat it as a Core 2.
- x86: Clean up intel_p6_init(): replace complex if conditions
  with explicit switches on x86_model, explicitly enumerate
  accepted model numbers.
- x86: Correct p6_like_check_control() to reject regular
  pmcs >= 2 on Core2 before mapping fixed-function counters
  0x40000000+N to pmcs 2+N for duplicate counter checking.
  The failure to reject those invalid pmcs made it possible
  for users to cause the driver to perform invalid wrmsr and
  rdpmc accesses with kernel hangs as the result.

Version 2.6.35-pre1, 2008-06-23
- Add optional close-on-exec feature for per-process perfctrs:
  * reassign _reserved1 as flags in vperfctr_control
  * add VPERFCTR_CONTROL_CLOEXEC flag
  * add perfctr_flush_thread() hook to exec() path
  * map perfctr_flush_thread() to __vperfctr_flush() via
    inline functions and virtual_stub.c
  * in __virtual_flush(), if CLOEXEC is set then unlink the state
  * bump API version to 5.2

Version 2.6.34, 2008-05-29
- Reorder kernel version and HAVE_EXPORT___put_task_struct
  tests in compat.h to handle the SuSE 2.6.16.42-0.12 kernel
  exporting __put_task_struct_cb().
- Fix warning about DONT_HAVE_i_blksize being undefined in
  the SuSE 2.6.16.42-0.12 kernel.

Version 2.6.33, 2008-05-18
- x86: Intel Family 6 Model 23 support missed that it needs
  to trigger LVTPC reinit. Fix that.

Version 2.6.32, 2008-04-20
- x86: Recognize Intel Family 6 Model 23 as Core2.
- x86: Update perfctr_sysclass definition for kernel 2.6.25.

Version 2.6.31, 2008-01-26
- x86: Correct Barcelona CPU type to read FAM10H not FAM10
  w/o the trailing H. In struct perfctr_cpu_control, place
  p4 struct in a union and alias p4.escr[] with envtsel_high[]:
  this allows passing high 32 evntsel bits for Barcelona.
  Update driver to also manage high 32 evntsel bits on Barcelona,
  on other processors those bits are forced to zero.

Version 2.6.30, 2007-10-28
- x86: Kernel 2.6.24-rc1 changed the calling convention for
  the cpu_data macro. Updated accordingly, and added compat
  code providing the new behaviour in older kernels.
- ppc32: Kernel 2.6.24-rc1 removed the get_property() compat
  macro. Use of_get_property() with kernels >= 2.6.22.
- The workaround for RHEL5 removing ptrace_check_attach()
  only works when perfctr is built as a module. Fix it to
  also work in the non-modular case.

Version 2.6.29, 2007-10-07
- Add new cpu_type for AMD Family 10h, to reduce confusion.
- Very preliminary support for AMD Family 10h processors.
  They will need a new cpu_type and support for 64-bit evntsels.
  For now pretend they are K8C processors.
- Silence compilation warnings on ppc32.
- Intel has finally documented how to read the Core 2's
  fixed-function performance counters in user-space:
  rdpmc 0x4000000N for N=0,1,2. Support them from user-space
  by pretending they have P6-like evntsels, and extract the
  useful controls (CPL+INT) into the fixed-function counters
  control register. Update the P6 driver methods to handle the
  fact that a fixed-function counter has no private evntsel MSR.
  Update x86_tests to measure the cost of reading these counters
  and writing their shared control register.
- More Intel CPU detection cleanup: separate detection needed
  by the driver from that done to supply cpu_type to user-space.
- Clean up Intel CPU detection by moving family 5, 6, and 15
  detection code to separate procedures.
- Use #undef to silence macro redefinition warnings on x86.
- Kernel 2.6.22 removed the rdtsc() macro from i386. Unbreak
  and clean up x86_tests.c by using our own rdtsc_low() macro.

Version 2.6.28, 2007-07-18
- x86: The kernel's perfctr/nmi management system changed again
  in 2.6.22-rc5. Call {disable,enable}_lapic_nmi_watchdog() not
  {stop,setup}_apic_nmi_watchdog() in kernels >= 2.6.22.
- ppc.c: kernel 2.6.22-rc1 removed find_type_devices(),
  use of_find_node_by_type() and of_node_put() instead
- compat.h: fix warnings in CONFIG_UTRACE check

Version 2.6.27, 2007-04-09
- Bumped copyright years on recently updated files.
- RHEL5 2.6.18-8.1.1.el5 added export of __put_task_struct.
  Patched <linux/config.h> to signal this.
- RHEL5 2.6.18-8.1.1.el5 removed i_blksize. Patched
  <linux/config.h> to signal this. Check this in virtual.c.
- RHEL5 2.6.18-8.1.1.el5 replaced ptrace with utrace, breaking
  the remote control API which needs ptrace_check_attach().
  For now, stub ptrace_check_attach() so that things build.
- The {reserve,release}_{perfctr,evntsel}_nmi() API changed
  in kernel 2.6.21-rc6, from being CPU-local to being global.
  Updated x86.c to handle this change.

Version 2.6.26, 2007-02-11
- Updates to show my @it.uu.se email address in some messages,
  as the old @csd.uu.se address now is /dev/null.
- Added driver support for ARM/XScale processors. Overflow
  interrupts are not yet supported, in part due to conflicts
  with Intel's ixp400_eth driver. Plain event counting works.
- Kernel 2.6.20-rc1 moved filp->f_dentry and filp->f_vfsmnt into
  the filp->fpath substructure. Added compat macros to handle this.
- Kernel 2.6.20-rc1 changed how pipefs handles its dentries.
  Adapted those changes to vperfctrfs.

Version 2.6.25, 2006-10-15
- x86.c: Intel Core 2 is substantially different from Intel Core.
  Add new cpu_type for Core 2, map family 6 model 15 to Core 2,
  and require Core 2 to set Enable in all EVNTSELs.
- x86.c: kernel 2.6.19-rc1 removed the {reserve,release}_lapic_nmi()
  API, and added a {reserve,release}_{perfctr,evntsel}_nmi() API.
  Reimplement {reserve,release}_lapic_nmi(). Add data to describe
  the set of perfctr, evntsel, and other MSRs used by a CPU type.
  Add procedures to reserve and release all of the MSRs. Replace
  the CPU-specific clear_counters() procedures with a generic one
  that uses the MSR description data object. Add EXPORT_SYMBOL of
  {setup,stop}_apic_nmi_watchdog() to x86_setup.c for 2.6.19+ kernels.
- virtual.c: kernel 2.6.18 dropped EXPORT_SYMBOL(tasklist_lock),
  so starting with 2.6.18 we must use rcu_read_{lock,unlock}()
  around find_task_by_pid().
- virtual.c: kernel 2.6.19-rc1 dropped the inode->i_blksize field.
- Only #include <linux/config.h> for kernels older than 2.6.19,
  since 2.6.19-rc1 marks it deprecated. The test is ugly: perhaps
  this should be handled in the Makefile instead.

Version 2.6.24, 2006-09-17
- x86_tests: fixed linkage error caused by p6_init_tests()
  not being compiled in 64-bit builds.

Version 2.6.23, 2006-08-20
- Testing done by the PAPI folks indicate that Intel Core 2 has
  a single master Enable bit in EVNTSEL0, just like previous P6s.
  Restore Intel Core to the classic P6 rule: EVNTSEL0 must be
  enabled, EVNTSEL1 must not be enabled.
- Intel Core updates: each EVNTSEL has its own Enable bit like
  AMD and P4, recognise Model 15 (Core2), Core2 is 64-bit so
  make P6 testing code available in both 32- and 64-bit builds.
- ppc32: correct PMC1SEL and PMC4SEL definitions.
- virtual: new vperfctrfs_get_sb() for kernel 2.6.18-rc1 and
  later: ->get_sb() and get_sb_pseudo() changed prototype.
- x86: #include <asm/nmi.h> to get lapic NMI declarations in
  kernel 2.6.18-rc1 and later. Do not do this if we're going
  to stub them because !CONFIG_X86_LOCAL_APIC.

Version 2.6.22, 2006-06-02
- Preliminary support for Intel Core (family 6 model 14) processors.
- x86: The code to extract max_cores_per_package from CPUID(4):EAX
  needs cpuid() to put zero in ecx, but it only does that in fairly
  new 32-bit kernels, not in 64-bit kernels or older 32-bit kernels.
  This badly broke the SMT_ID detection on a dual-processor dual-core
  hyper-threaded 64-bit Xeon machine. Fixed by using cpuid_count()
  instead. Added compatibility definition of it for kernels < 2.6.12.
- Fixed x86_tests.c compilation error in the i386 2.6.16 kernel
  by moving sync_core() definition from x86_tests.c to x86_compat.h
  and only defining it in i386 kernels older than 2.6.16.

Version 2.6.21, 2006-04-03
- Converted mutex-like semaphores to the new mutex type
  introduced in kernel 2.6.16. Added simulation of the
  new API in terms of semaphores to compat.h and compat24.h.
- put_task_struct() uses __put_task_struct() again starting
  with the 2.6.17-rc1 kernel. Updated compat.h, compat24.h,
  and virtual_stub.c accordingly.
- Corrected a botched cleanup of compat24.h in perfctr-2.6.20
  which broke support for RHEL3 2.4.21 kernels.

Version 2.6.20, 2006-03-12
- Starting with 2.6.16-rc1, put_task_struct() uses an RCU callback
  __put_task_struct_cb() instead of the old __put_task_struct().
  2.6.16-rc6 dropped the EXPORT_SYMBOL() of __put_task_struct_cb().
  Updated compat.h, compat24.h, and virtual_stub.c accordingly.

Version 2.6.19, 2006-01-22
- Updated ppc32 driver for kernel 2.6.16-rc1: dynamically
  claim the HW and register our interrupt handler via
  {reserve,release}_pmc_hardware(); simulate these primitives
  in older kernels; fully migrate patch kit from arch/ppc/
  to arch/powerpc/.

Version 2.6.18, 2006-01-03
- 2.6.5-7.201-suse added EXPORT_SYMBOL_GPL(__put_task_struct).
  Added feature #define to the kernel patch, and modified compat.h
  to disable our export of __put_task_struct in this case.
- Merged the structure descriptor declarations in marshal.c
  to avoid duplicating the parts that are identical across
  all supported platforms.

Version 2.6.17, 2005-10-02
- The dual-core P4s changed the layout rules for the initial
  APIC ID, which broke the x86 driver on DC P4s. Updated the
  HT thread ID detection code to match current IA32 SDM Vol3.
- Kernel 2.4.21-37.EL added EXPORT_SYMBOL_GPL(__put_task_struct).
  Added new feature #define to the kernel-specific patch for this
  case. compat24.h now disables our export of __put_task_struct
  when that feature #define is set.
- Kernel 2.6.14-rc1 changed the state parameter to ->suspend()
  methods to be of type 'pm_message_t'. Adjusted x86.c for this,
  to eliminate a compile-time type mismatch warning.

Version 2.6.16, 2005-09-04
- cpu_khz changed type in kernel 2.6.13. Adjusted x86_setup.c
  accordingly, to avoid a compile-time error.
- The ppc32 driver will now compile in kernels that lack Open
  Firmware support, which is needed for some embedded systems.

Version 2.6.15, 2005-05-06
- Added code to detect multicore K8s and prevent threads in the
  thread-centric API from using northbridge events. This avoids
  resource conflicts, and an erratum in Revision E chips.
- #undef MMCR0_PMXE in ppc_compat.h, to avoid macro redefinition
  complaints in 2.6 kernels.

Version 2.6.14, 2005-04-09
- x86: Reverted the workaround in perfctr-2.6.13 for the problem
  that gcc-4.0 snapshots appeared to ignore 'noinline' on static
  functions, as recent gcc-4.0 prereleases seem to work correctly.

Version 2.6.13, 2005-02-13
- global.c: Allow user-space to disable the in-kernel sampling
  timer by setting interval_usec == 0 in the START command.
  In this case sampling is done by the READ command.
- Modified x86 call backpatching code to avoid breaking with
  gcc-4.0 snapshots:
  * gcc-4.0 may clone control flows, resulting in more sites
    with backpatchable calls. finalise_backpatching() now sets
    things up to exercise all affected control flow paths.
  * gcc-4.0 appears to ignore 'noinline' on static functions
    that are only called from one place, at least on x86-64.
    This broke perfctr_cpu_{write_control,isuspend,iresume}().
    Things work again if they are made non-static.
- Only define our own version of get_sb_pseudo() in kernels older
  than 2.6.11, since 2.6.11-rc1 added EXPORT_SYMBOL(get_sb_pseudo).
- In 2.6.11-rc2 and newer kernels, bind ioctls to ->unlocked_ioctl
  and ->compat_ioctl, and don't use register_ioctl32_conversion().
- Remove unused inode parameter to gperfctr_ioctl().
- Define static spinlocks with DEFINE_SPINLOCK(), following new
  coding style in 2.6.11-rc1. Add compat macros for older kernels.

Version 2.6.12, 2004-12-19
- PPC32 driver updated to be more robust in its detection of
  timebase and core clock frequencies. Some information sources
  can give wrong values for those frequencies, so the driver
  now tries other more reliable methods first.

Version 2.6.11, 2004-11-14
- Compat stuff updated for tsk->sighand->siglock,
  recalc_sigpending(), and preempt_enable_no_resched().
- Silence compiler warning from compat.h:remap_pfn_range().
- PPC32 overflow interrupt support backported from perfctr-2.7.
- Backported inheritance handling calls from perfctr-2.7
  to kernel patch kit. They are currently stubs, but can be
  implemented later without having to update the patch kit.
- Overflow interrupts fixes backported from perfctr-2.7.7:
  * x86/x86-64: move perfctr_suspend_thread() call from
    __switch_to() to the start of switch_to()
  * x86/x86-64: mask interrupts at suspend and record if
    any overflows are pending; unmask interrupts at resume
  * virtual: handle pending overflows in resume path
  * ppc32: provide dummy pending overflow checking function

Version 2.6.10.3, 2004-10-24
- virtual.c, linux/perfctr.h: reformatted "if( x )" to "if (x)"
  and similarly for while and switch statements.
- PPC32: Add support for MPC7447A. Add support for MPC7448,
  except for decoding its PLL_CFG.
- Move x86 cpu_type definitions from <linux/perfctr.h>
  to <asm-i386/perfctr.h>.
- Make PERFCTR_INTERRUPT_SUPPORT a Kconfig-derived option.
  Ditto PERFCTR_CPUS_FORBIDDEN_MASK_NEEDED.
  Also implement this in Config.in for 2.4 kernels.
- Kernel 2.6.10-rc1 removed the export of put_filp().
  Reordered the allocations in vperfctr_get_filp() to
  avoid the need to use put_filp().
- remap_page_range() was replaced with remap_pfn_range() in
  kernel 2.6.10-rc1. Updated virtual.c accordingly, and added
  remap_pfn_range() emulations for older kernels.

Version 2.6.10.2, 2004-10-19
- virtual.c: replace nrctrs_lock with a mutex. Avoids illegal
  may-sleep-while-holding-lock, caused by mutex operations in
  perfctr_cpu_{reserve,release}().
  Backport from perfctr-2.7.6.
- PPC32: Correct MMCR0 handling for FCECE/TRIGGER. Read
  MMCR0 at suspend and then freeze the counters. Move
  this code from read_counters() to suspend(). At resume,
  reload MMCR0 to unfreeze the counters. Clean up the
  cstatus checks controlling this behaviour.
  Backport from perfctr-2.7.6.

Version 2.6.10, 2004-09-14
- Fixed p4_clear_counters() to not access IQ_ESCR{0,1}
  on P4 models >= 3.

Version 2.6.10-pre1, 2004-08-03
- Changed x86-64 to use the x86 include file and driver.
  Intel's 64-bit P4 should now work in the x86-64 kernel.
- Replaced PERFCTR_INTERRUPT_SUPPORT and NMI_LOCAL_APIC
  #if:s in x86 code by #ifdef:s on CONFIG_X86_LOCAL_APIC.
- Use macros to clean up x86 per-cpu cache accesses.
- Recognize model 13 Pentium-Ms.
- Changed isuspend_cpu on x86 to be like x86-64's: it
  now stores a CPU number instead of a cache pointer.
- x86: make perfctr_cpu_name more approximate.
- The x86 driver records a simplified CPU type for x86_tests,
  but this only occurs if PERFCTR_INIT_TESTS is configured.
  perfctr_info.cpu_type is now unused.
- Changed P4 driver to set up and check an explicit flag
  for EXTENDED_CASCADE availability. perfctr_info.cpu_type
  is now unused except for perfctr_x86_init_tests().
- x86: Reformatted "if( x )" to "if (x)" and similarly for while
  and switch statements. Deleted #if 0 blocks.

Version 2.6.9, 2004-07-27
- Fix ppc_check_control() to allow 7400/7410 processors to
  specify MMCR2[THRESHMULT].
- PPC32 cleanups: make get_cpu_cache() return pointer not lvalue,
  eliminate duplicated initialisation/cleanup code.
- Makefile: enforce -fno-unit-at-a-time with gcc-3.4 on x86,
  to prevent stack overflow in 2.6 kernels < 2.6.6.
- Do sync_core() before rdtsc() in x86_tests, to avoid bogus
  benchmarking data on K8. Add sync_core() implementation for
  the 32-bit kernel. Add sync_core() benchmark.
- Added __perfctr_mk_cstatus() to allow x86.c:finalise_backpatching()
  to create a cstatus with i-mode counters marked as present, but
  with zero actual counters. This prevents perfctr_cpu_isuspend()
  from clearing the control register for counter #0 at init-time,
  when the hardware doesn't belong to this driver. On AMD and P6
  this would accidentally disable the NMI watchdog.
- x86: Marked initial targets of backpatchable calls
  'noinline' to prevent gcc from inlining them, which
  completely breaks the backpatching mechanism.
- x86_tests: fix CONFIG_X86_LOCAL_APIC=n linkage error.
- 2.6.8-rc1 no longer makes cpu_online_map a #define on UP,
  breaking modules. Reintroduce the macro.
- 2.6.8-rc1 changed cpus_complement() calling convention.
  Replace cpus_complement();cpus_and() with cpus_andnot(),
  and provide cpus_andnot() compat macro.
- PPC32: support generic CPUs using only the TB.
- PPC32: query OF for CPU/TB frequencies, drop /proc/cpuinfo
  parsing code.
- PPC32: avoid CPU re-detection in tests code.
- PPC32: clean up and sync with current perfctr-2.7 code.

Version 2.6.8, 2004-05-29
- Added recognition of PowerPC 750GX.
- Changes for the {reserve,release}_lapic_nmi() API added in
  kernel 2.6.6 backported from perfctr-2.7.1:
  * Starting with kernel 2.6.6 we no longer need access to
    nmi_perfctr_msr, so removed EXPORT_SYMBOL() and <asm/apic.h>
    patches related to this variable (except for older kernels).
  * Updated x86.c to use the new API. Added simulation (without
    the non-conflict guarantees) for older kernels.
  * Moved hardware reservation to x86.c's "reserve" procedure.
    The init code now only does read-only hardware detection.
  * Added a mutex to the reserve/release procedures, eliminating
  * a long-standing race possibility.
  * Changed x86.c to reserve and release the hardware around its
    call to perfctr_x86_init_tests().
  * Similarly updated x86_64.c for the new API.

Version 2.6.7, 2004-05-04
- Replaced x86_64_tests.{c,h} with x86_tests.{c,h}.
- sys_device_{,un}register() was renamed as sysdev_{,un}register()
  in 2.6.4-rc2. Updated x86.c and x86_64.c accordingly, and
  added a compatibility definition in compat.h.
- Removed unnecessary '#include "compat.h"' from x86_tests.c.
- Replaced x86_64_setup.c with x86_setup.c.
- Replaced x86_64_compat.h with x86_compat.h.
- Moved perfctr_interrupt entry point from x86_setup.c to patch kit,
  for kernels older than 2.4.21. Cleanup to facilitate future merge
  of x86_setup.c and x86_64_setup.c.

Version 2.6.6, 2004-02-21
- Fixed a bug in x86-64's perfctr interrupt entry code in 2.4 kernels,
  causing it to pass the wrong value for "struct pt_regs*". This
  was harmless since the retrieved "rip" was unused, but still wrong.
  Renamed do_perfctr_interrupt to smp_perfctr_interrupt to allow
  using the 2.4 kernel's standard BUILD_SMP_INTERRUPT macro.
- Unmask LVTPC after interrupt on Pentium-M. An oprofile user
  reports that P-M auto-masks LVTPC just like P4. Preliminary
  measurements indicate a 40 to 60 cycle cost for the apic write
  on P4s and P6s, so the unmask is not done unconditionally.
- Measure LVTPC write overhead in x86{,_64}_tests.c.
- Add Pentium 4 Model 3 detection.
- The 2.4.21-193 SuSE kernel does EXPORT_SYMBOL(mmu_cr4_features).
  Add compat24.h workaround for this.

Version 2.6.5, 2004-01-26
- Added perfctr_info.cpu_type constants to <asm-ppc/perfctr.h>.
- Init filp->f_mapping in virtual.c for 2.6.2-rc1+ kernels.
- Updated p4_check_control():
  * Allow ESCR.CPL_T1 to be non-zero when using global-mode
    counters on HT processors.
  * Don't require ESCR.CPL_T0 to be non-zero. CPL_T0==0b00
    is safe and potentially useful (global counters on HT).
  * Require CCCR.ACTIVE_THREAD==0b11 on non-HT processors, as
    documented in the IA32 Volume 3 manual. Old non-HT P4s
    seem to work Ok for all four values (see perfctr-2.6.0-pre3
    notes), but this is neither guaranteed nor useful.
- x86.c now detects & records P4 HT-ness also in UP kernels.
- Added 'is_global' parameter to perfctr_cpu_update_control().
  This flag is ignored on everything except P4 (sigh).

Version 2.6.4, 2004-01-12
- Added 'tsc_to_cpu_mult' field to struct perfctr_info, replacing
  '_reserved1'. This is needed on PowerPC to map time-base ticks
  to actual time. On x86/AMD64, tsc_to_cpu_mult == 1.
- Added support for PowerPC 604/7xx/74xx processors. Overflow
  interrupts are currently not allowed due to the PMI/DECR erratum.
- Replaced perfctr_cpus_mask() with cpus_addr(). Updated cpumask.h
  to define cpus_addr() for kernels older than 2.6.1.

Version 2.6.3-pl1, 2004-01-01
- Moved the x86 interrupt handler definition from x86_setup.c to
  the patch kit for 2.4.21 and later 2.4 kernels, like it already
  is done for 2.6 kernels. This change is needed due to extensive
  interrupt handler changes in RedHat's 2.4.21-6.EL kernel.
- Simplified <asm-i386/perfctr.h>: now that early 2.4 kernels no
  longer are supported, LOCAL_PERFCTR_VECTOR is known to be defined,
  so CONFIG_X86_LOCAL_APIC implies PERFCTR_INTERRUPT_SUPPORT.

Version 2.6.3, 2003-12-21
- Removed gperfctr_cpu_state_only_cpu_sdesc's total_sizeof
  optimisation. The ABI change in 2.6.2 broke it, leading to
  the new fields not being cleared and later causing EOVERFLOW.
- The perfctr_ioctl32_handler() workaround is now only applied
  to kernels older than 2.4.23, since 2.4.23 added the "NULL
  handler == sys_ioctl" logic.

Version 2.6.2, 2003-11-23
- Added 16 bytes (four fields) of reserved data to perfctr_info,
  perfctr_cpu_control, vperfctr_control, gperfctr_cpu_control,
  and gperfctr_cpu_state. Renumbered marshalling tags for
  generic structures. Bumped ABI versions.
- Only allow use of IQ_ESCR{0,1} on P4 models <= 2. These ESCRs
  were removed from later models, according to a recent Intel
  documentation update (252046-006).
- Fixes for Fedora Core 1's 2.4.22-1.2115.nptl kernel:
  * Work around their incomplete and broken cpumask_t backport.
  * Avoid name conflict due to their on_each_cpu() backport.
  * Handle their preempt_disable()/enable() macros.
- Added new perfctr_cpu_is_forbidden() macro to fix a
  compilation error affecting AMD64 in SMP 2.6 kernels.
  SMP cpu_isset() requires that mask is an lvalue, but
  for AMD64 the mask is a constant.

Version 2.6.1, 2003-10-05
- Kernel 2.6.0-test6 changed /proc/self and the /proc/<pid>/
  namespace to refer to "processes" (groups of CLONE tasks)
  instead of actual kernel tasks. This forced the planned
  transition of the vperfctr API from /proc/<pid>/perfctr
  to /dev/perfctr to occur immediately. Changes:
  * Moved /dev/perfctr implementation from global.c to init.c.
  * Implemented VPERFCTR_{CREAT,OPEN}, vperfctr_attach(), and
    the vperfctrfs pseudo-fs needed to support the magic files.
    The fs code was ported from perfctr-1.6/3.1, but updated
    for 2.6 and fixed to permit module unloading in 2.4.
  * Fixed VPERFCTR_OPEN to accept tsk->thread.perfctr == NULL.
    (Needed to info querying commands.)
  * Removed /proc/<pid>/perfctr code. Simplified vperfctr_stub code.
  * Updated vperfctr_attach() to mimic the old /proc vperfctr_open().
    This fixes some synchronisation issues.
- Cleanups:
  * Removed #if checks and code for kernels older than 2.4.16.
  * Eliminated compat macros that are identical in 2.6 and 2.4.
  * Moved ptrace_check_attach EXPORT_SYMBOL from x86{,_64}_setup.c
    to virtual_stub.c.
  * get_task_by_proc_pid_inode() is now trivial. Eliminated it.
  * p4_ht_finalise() is now trivial. Eliminated it.
- Added MODULE_ALIAS() declaration, eliminating the need for
  an alias in /etc/modprobe.conf with 2.6 kernels. Added
  MODULE_ALIAS() compatibility #define in compat24.h.
- Added detection of AMD K8 Revision C processors.
- Updated K8C detection for Revision C Athlon64s.

Version 2.6.0, 2003-09-08
- Handle set_cpus_allowed() when PERFCTR_CPUS_FORBIDDEN_MASK_NEEDED:
  * Add bad_cpus_allowed flag to struct vperfctr.
  * Check bad_cpus_allowed in __vperfctr_resume: if resuming
    with PMCs on forbidden CPU, kill counters and SIGILL current.
  * __vperfctr_set_cpus_allowed() callback: set bad_cpus_allowed
    and print warning if mask allows forbidden CPUs.
  * Use task_lock/unlock instead of preempt_disable/enable to
    synchronise task_struct accesses.
  * Ensure sampling_timer and bad_cpus_allowed share cache line.
  * #include <linux/compiler.h> explicitly for 2.4.18 and older
    kernels; newer kernels include it from <linux/kernel.h>.
  * Hook in virtual_stub.c.
  * Hook and cpumask_t typedef in <linux/perfctr.h>.
- Simplify #if test for set_cpus_allowed() emulation code.
  Also don't define it if CONFIG_PERFCTR_VIRTUAL isn't set.
- cpumask.h only typedefs cpumask_t if <linux/perfctr.h> hasn't.
- Don't hide #include <linux/kernel.h> in compat24.h.
- Fixed compat24.h to test for MODULE not CONFIG_MODULES at the
  __module_get/module_put macros.

Version 2.6.0-pre5, 2003-08-31
- printk() is not allowed in switch_to(). Disabled debug code
  which could violate that rule. Changed virtual_stub.c to BUG()
  instead of printk() if the driver is invoked when not loaded.
- Renamed vperfctr_exit2() to vperfctr_unlink() for clarity.
- gcc-3.3.1 issued several "dereferencing type-punned pointer will
  break strict-aliasing rules" warnings for marshal.c. Used explicit
  unions to fix the warnings and clean up the code.
- Removed compat22.h.
- cpumask_t was included in standard 2.6.0-test4; replace #ifndef
  test in cpumask.h with normal kernel version test.
- x86-64 fix: sys_ioctl() isn't exported to modules, so call
  filp->f_op->ioctl() instead in perfctr_ioctl32_handler().
- x86-64 fix: init.c must include <asm/ioctl32.h> not <linux/ioctl32.h>
  for compatibility with 2.4 kernels.

Version 2.6.0-pre4, 2003-08-19
- Fix x86-64 register_ioctl32_conversion() usage for 2.4 kernels:
  * Supply dummy handler since a NULL handler oopses the kernel.
  * Test CONFIG_IA32_EMULATION since CONFIG_COMPAT is post-2.4.
- Fixed and merged the new API struct marshalling code:
  * New files marshal.c and marshal.h contain the marshalling code
    and high-level helper functions (source shared with the library).
  * User-space structs are struct perfctr_struct_buf and accessed using
    perfctr_copy_{from,to}_user() with ptr to appropriate descriptor.
    The cpumask stuff isn't changed.
  * All ioctls registered as trivially 32-bit compatible on x86-64.
  * Changed perfctr_info cpu_type/cpu_features from short to int:
    this avoids the need for UINT16 marshalling support, and cpumask_t
    caused perfctr_info to change binary representation anyway.
- Declared VPERFCTR_{CREAT,OPEN} ioctls, but left them unimplemented.
- Fixed vperfctr_open() preemption bug. The O_CREAT check+install
  code could be preempted, leading to remote-control races.
- Fixed perfctr_exit_thread() preemption bug. It detached the vperfctr
  before calling __vperfctr_exit(). If current was preempted before
  __vperfctr_exit() called vperfctr_suspend(), perfctr_suspend_thread()
  would fail to suspend the counters. The suspend+detach is now done
  atomically within __vperfctr_exit().
- Changes to handle 2.6 kernels with the cpumask_t patch (-mm, -osdl):
  * Convert perfctr_cpus_forbidden_mask accesses to cpumask_t API.
    Based in part on a patch for the -osdl kernel by Stephen Hemminger.
  * Remove cpus and cpus_forbidden from struct perfctr_info,
    since their sizes depend on the kernel configuration.
  * Add struct perfctr_cpu_mask to export cpumask_t objects
    sanely (i.e., using ints not longs) to user-space.
  * Add CPUS and CPUS_FORBIDDEN commands to retrieve these sets.
  * Add cpumask.h to emulate cpumask_t API in cpumask_t-free kernels.
  * Move perfctr_cpus_forbidden_mask declaration/#define from
    <asm/perfctr.h> to cpumask.h -- necessary since <asm/perfctr.h>
    doesn't have access to the driver's compatibility definitions.
- Cleaned up perfctr_cpu_ireload().
- Removed struct field offset check from init.c.
- 2.4.22-rc1 does EXPORT_SYMBOL(mmu_cr4_features). Added
  new compat #define to handle this.
- Rename x86.c's rdmsrl() to rdmsr_low() to work around msr.h
  changes in 2.6.0-test3. Also rename rdpmcl() to rdpmc_low().
- Replaced __attribute__((__aligned__(SMP_CACHE_BYTES))) usage
  with the official ____cacheline_aligned macro.
- Detect cpuid 0x69x VIA C3s (Antaur/Nehemiah).

Version 2.6.0-pre3, 2003-08-03
- Changed perfctr_info.cpus and cpus_forbidden to be int instead of
  long, to make x86-32 and x86-64 compatible. This is a temporary
  solution, as there are patches for >32 CPUs on x86-32. The real
  solution is to make these sets variable-sized, and have user-space
  retrieve them with a new command.
- Simplified GPERFCTR_CONTROL to update a single CPU instead of
  a set of CPUs. Moved cstatus clearing to release_hardware().
- Moved gperfctr start to new GPERFCTR_START command.
- Simplified GPERFCTR_READ to access a single CPU instead of a
  set of CPUs.
- Removed the requirement that CCCR.ACTIVE_THREAD == 3 on P4.
  HT processors define behaviour for all four possible values,
  and non-HT processors behave sanely for all four values.
- Moved struct perfctr_low_ctrs definition from <asm/perfctr.h> to
  the corresponding low-level driver, since it's only used there.
- Changed perfctr_info.cpu_khz and vperfctr_control.preserve to be
  int instead of long. This corrects x86-64 and makes it compatible
  with x86-32.
- Updated x86.c to permit extended cascading on P4M2.
- Fixed a bug where the perfctr module's refcount could be zero with
  code still running in the module (pending returns to exit_thread()).
  This could race with rmmod in preemptive kernels, and in theory
  also in SMP kernels.
  * module owner field added to vperfctr_stub
  * _vperfctr_exit() in the modular case is now a function in
    vperfctr_stub.c, which brackets the vperfctr_stub.exit() call
    with __module_get() and module_put() on vperfctr_stub.owner
  * updated 2.4 and 2.2 compat definitions of __module_get() and
    module_put() to work for modules != THIS_MODULE  
- Replaced uses of (void)try_module_get() with __module_get() as the
  latter is more appropriate for 2.6 kernels. Updated compat stuff.

Version 2.6.0-pre2, 2003-07-13
- vperfctr API fixes:
  * The new VPERFCTR_READ_CONTROL command retrieves a vperfctr's
    control data.
  * Renamed VPERFCTR_SAMPLE to VPERFCTR_READ_SUM, and made it
    write the sums to a perfctr_sum_ctrs user-space buffer.
  * Non-write commands are now always permitted on unlinked perfctrs.
  The first change was needed since the control data no longer is
  accessible via the mmap()ed state. The other changes clean up and
  simplify perfex and the library's slow-path read_ctrs() operation.
- sys_vperfctr_ functions now mark the tsk parameter as "const" if
  they don't need write access to it. Typically they only need to
  compare it with current to detect self-access cases.
- perfctr_cpu_state no longer makes the perfctr_cpu_control part
  accessible to user-space (via mmap() of vperfctrs).
- Simplified {set,is}_isuspend_cpu() in x86_64.c by having callers
  pass the CPU number instead of the cache pointer (which was only
  used to derive the CPU number).
- Eliminated NMI_LOCAL_APIC #ifs from x86-64 code since x86-64
  always defines it.
- x86.c cleanups: the non-PERFCTR_INTERRUPT_SUPPORT case now uses
  dummy stub functions, eliminated six #ifdefs.
- x86_64_setup.c needs <asm/fixmap.h>.
- Protected cpu_has_mmx and cpu_has_ht #defines in x86_compat.h
  with #ifndef since 2.4.22-pre3 added those #defines.
- Eliminated PERFCTR_INTERRUPT_SUPPORT #ifs from x86-64 code
  since x86-64 always defines CONFIG_X86_LOCAL_APIC.
- Removed the P4-specific versions of isuspend() and iresume().
  P4 now uses p6_like_{isuspend,iresume}(), just like P6/K7/K8.
- Long overdue cleanup in x86.c/x86_64.c: renamed per_cpu_cache
  pointer variables from 'cpu' to 'cache'.
- Added inline functions in virtual.c for registering the overflow
  handler and for clearing iresume_cstatus. Cleaned out several
  #if PERFCTR_INTERRUPT_SUPPORT occurrences from the main code.
  (Partial backport from the abandoned perfctr-3.1 branch.)
- Inlined now useless 'struct vperfctr_state' in 'struct vperfctr'.

Version 2.6.0-pre1, 2003-07-02
- Rearranged 'struct perfctr_cpu_state' to reduce the number of
  cache lines needed to be touched by key operations (suspend,
  resume, sample). Switched from struct-of-arrays to array-of-struct
  for perfctr counts, and copied pmc_map into the PMC data array.
  The old representation touched at least 3 cache lines at key
  operations, the new one only needs one cache line in most cases.
  The user-space mmap() view of the new representation is binary
  compatible between x86 and x86-64.
- Changed 'isuspend_cpu' in perfctr_cpu_state on x86-64 to be a
  32-bit CPU number, to maintain binary compatibility with x86.
- Removed the union of p5_cesr and id; use id throughout.
- Removed _filler and si_signo from 'struct vperfctr_state', making
  the user-space view of it identical to 'struct perfctr_cpu_state'.

Version 2.5.5, 2003-06-15
- Updated x86 driver for 2.5.71 local APIC driver model changes.
- Updated x86-64 driver for 2.5.71 NMI watchdog enable/disable API.
- x86-64 is broken in 2.5.71 since x86-64 updated to driver model
  for local APIC and NMI watchdog, at the same time as x86 moved
  to a newer version of the "system device" driver model. Updated
  the x86-64 driver for the new model, which is expected to be in
  x86-64 by 2.5.72 (patch exists for 2.5.71).

Version 2.5.4, 2003-06-01
- The generic-x86-with-TSC driver now uses rdpmc_read_counters
  and p6_write_control instead of its own procedures.
- K8 docs are now available. Updated comment in x86.c accordingly.
- P4 OVF_PMI+FORCE_OVF counters didn't work at all, resulting in
  BUG messages from the driver since identify_overflow failed to
  detect which counters had overflowed, and vperfctr_ihandler
  left the vperfctr in an inconsistent state. This works now.
  However, hardware quirks makes this configuration only useful
  for one-shot counters, since resuming generates a new interrupt
  and the faulting instruction again doesn't complete. The same
  problem can occur with regular OVF_PMI counters if ireset is
  a small-magnitude value, like -5.
  This is a user-space problem; the driver survives.
- On P4, OVF_PMI+FORCE_OVF counters must have an ireset value of -1.
  This allows the regular overflow check to also handle FORCE_OVF
  counters. Not having this restriction would lead to MAJOR
  complications in the driver's "detect overflow counters" code.
  There is no loss of functionality since the ireset value doesn't
  affect the counter's PMI rate for FORCE_OVF counters.
- Moved P4 APIC_LVTPC reinit from p4_isuspend() to identify_overflow().
  Reduces context-switch overheads when i-mode counters are active.
- Corrected vperfctr_suspend()'s precondition.
- Corrected comment in <asm/perfctr.h> to state that ireset[]
  values must be negative rather than non-positive.
- Made 'perfctr_cpu_name' __initdata, like its predecessor.

Version 2.5.3.1, 2003-05-21
- Replaced 'char *perfctr_cpu_name[]' by 'char *perfctr_cpu_name'.
  This is needed for x86-64 and other non-x86 architectures.
- Changed <asm-x86_64/perfctr.h> to use 'long long' for 64-bit sums.
  This doesn't change the ABI, but improves user-space source code
  compatibility with 32-bit x86.
- Removed the !defined(set_cpus_allowed) check added to compat24.h
  in 2.5.3. It's wrong for SMP builds with modules and MODVERSIONS,
  since the set_cpus_allowed() emulation function becomes a #define
  from include/linux/modules/x86_setup.ver. Instead add the already
  used HAVE_SET_CPUS_ALLOWED #define to include/linux/config.h in
  the kernel patch, but make it conditional on CONFIG_X86_64.

Version 2.5.3, 2003-05-16
- Added detection code for Pentium M. MISC_ENABLE_PERF_AVAIL is
  now checked on both P4 and Pentium M.
- Added x86_64 driver code. Both x86_64.c and asm-x86_64/perfctr.h
  are basically simplified versions of corresponding x86 files,
  with P5 and P4 support removed, 2.2 kernel support removed, and
  'long long' for sums replaced by 'long'. The last change is
  painful for user-space and may be reverted.
- compat24.h: don't define set_cpus_allowed() if already #defined,
  workaround for RawHide's 2.4.20-9.2 x86_64 kernel.
- Removed list of supported CPUs from Kconfig. That information
  belongs elsewhere (and it's a pain to maintain for 2.2/2.4).

Version 2.5.2, 2003-04-13
- Minor cleanup: use PROC_I() unconditionally in virtual.c,
  implement trivial compat macro in compat24.h.
- Updated power management code for the local APIC and NMI
  watchdog driver model changes in kernel 2.5.67.
  The suspend/resume procedures are still no-ops, however.
  This revealed a bug in the lapic_nmi_watchdog resume code:
  it resumes the lapic_nmi_watchdog even when it was disabled
  before suspend. Perfctr's 2.5.67 kernel patch includes a fix.
- perfctr_sample_thread() is now used also on UP. Anton Ertl's
  2.26GHz UP P4 managed to execute a process for more than 2^32
  cycles before suspending it, causing TSC inaccuracies.
- RH9's 2.4.20-8 kernel changed cpu_online(), put_task_struct() and
  remap_page_range() to be more like in 2.5 kernels, and moved the
  declaration of ptrace_check_attach() from mm.h to ptrace.h, also
  like in 2.5 kernels, requiring fixes to compat24.h and x86_setup.c.
- Added note in x86.c about the new Pentium M processor.

Version 2.5.1, 2003-03-23
- Fix P4 HT initialisation. I've seen several boot logs from
  people running MP P4 Xeons with HT disabled: this produces
  an ugly "restricting access for CPUs 0x0" message, and would
  cause P4 HT init to unnecessarily return error in older kernels
  lacking set_cpus_allowed(). Now only print the message or
  signal error if non-zero siblings actually are found.
- The set_cpus_allowed() emulation doesn't compile in 2.4
  kernels older than 2.4.15 due to the p->cpus_running field.
  Updated version checks to skip it in 2.4.x when x<15.
- Fix set_cpus_allowed() emulation compile error on BUG_ON()
  in 2.4 kernels older than 2.4.19.
- Added Nehemiah note/reminder in x86.c:centaur_init().

Version 2.5.0, 2003-03-10
- Reverted the 2.5.0-pre2 change that replaced the PERFCTR_INFO
  ioctl by read(): it made the API look too weird.
  Added a PERFCTR_ABI ioctl which only retrieves 'abi_version'.
- Cleaned up struct perfctr_info: renamed abi_magic to abi_version,
  and version to driver_version. Renamed PERFCTR_*_MAGIC too.
- Cleaned up struct perfctr_cpu_control: moved evntsel_aux[]
  into the p4 sub-struct and renamed it as escr[]. Only P4 needs
  it anyway, and the new name clarifies its purpose.
- Renumbered the vperfctr ioctls to the 8-15 range (8-11 are used)
  and reserved 0-7 (0-1 are used) for generic ioctls.
- Added 'use_nmi' field to struct gperfctr_control, reserved for
  future use if/when support for i-mode gperfctrs is implemented.
- Replaced some preempt/smp_call_function combinations with 2.5.64's
  new on_each_cpu() construct. Added compatibility definitions to
  compat24.h and compat22.h.

Version 2.5.0-pre2, 2003-03-03
- Added ABI version to perfctr_info. Replaced PERFCTR_INFO ioctl
  by read() on the fd, since that allows reading the ABI version
  even in the case of a version mismatch. Removed binary layout
  magic number from vperfctr_state. Rearranged perfctr_info to
  make the 'long' fields 8-byte aligned.
- Added #ifdef CONFIG_KPERFCTR to <linux/perfctr.h> to ensure
  that <asm/perfctr.h> isn't included unless CONFIG_KPERFCTR=y.
  This allows the patched kernel source to compile cleanly also
  in archs not yet supported by perfctr.
- Removed PERFCTR_PROC_PID_MODE #define and replaced it with
  /*notype*/S_IRUSR in the patch files.
- Added perfctr_vector_init() to <asm-i386/perfctr.h>. Cleaned
  up arch/i386/kernel/i8259.c patch.
- Removed apic_lvtpc_irqs[] array. Removed irq.c patch.
- Updated CONFIG_PERFCTR_INIT_TESTS help text to match reality.
- Kernel 2.4.21-pre5 added set_cpus_allowed(), which required
  fixing compat24.h and x86_setup.c.
- Fixed init.c for kernel 2.5.63 removing EXPORT_NO_SYMBOLS.
- Cleaned up compat.h by moving 2.2/2.4 stuff to separate files.

Version 2.5.0-pre1, 2003-02-19
- Repair global perfctr API: the target CPUs are now explicit
  in the calls to write control and read state. Global perfctrs
  now work on 2.5 SMP kernels (which no longer have smp_num_cpus
  or cpu_logical_map()), and HT P4s (asymmetric MPs).
- struct perfctr_info has new bitmask fields for the set of CPUs
  (cpu_online_map) and forbidden CPUs; dropped the nrcpus field.
- add cpu_online() compat macro to compat.h
- VPERFCTR_STOP is subsumed by VPERFCTR_CONTROL. Removed it.
- Detect K8 as K8 not K7. They are not identical.
- Makefile cleanup: moved 2.4/2.2 kernel stuff to Makefile24.
- Makefile fix: removed export-objs for 2.5 kernels.
- Kconfig fix: don't mention obsolete .o module suffix.

Version 2.4.5, 2003-02-09
- Fixed two minor compile warnings in x86_tests.c for 2.5 kernels.

Version 2.4.4, 2003-01-18
- Fixed a bug in iresume() where an interrupt-mode counter could
  increment unexpectedly, and also miss the overflow interrupt.
  The following setup would cause the problem:
      P1 has EVNTSELn in non-interrupt mode, counting some high-
  frequency event (e.g. INST_RETIRED) in kernel-mode. P2 has
  EVNTSELn in interrupt-mode, counting some low-frequency event
  (e.g. MMX_ASSIST) in user-mode. P1 suspends. Since EVNTSELn is
  in non-interrupt mode, it is not disabled. P2 resumes. First
  iresume() finds that the CPU cache ID is not P2's, so it reloads
  PERFCTRn with P2's restart value. Then write_control() reloads
  EVNTSELn with P2's EVNTSEL. At this point, P2's PERFCTRn has been
  counting with P1's EVNTSELn since iresume(), so it will no longer
  equal P2's restart value. And if PERFCTRn overflowed, the overflow
  will go undetected since P1's EVNTSELn was in non-interrupt mode.
      To avoid this problem, iresume() now ensures that a counter's
  control register is disabled before reloading the counter.
- Fixed some ugly log messages from the new HT P4 init code:
  * forbidden_mask would be printed as "0X<mask>" (capital X)
  * finalise_backpatching() could trigger a BUG! printk from
    p4_write_control() if the CPU the init code runs on was
    in the forbidden set. At init-time this is not an error.
    Avoided this by temporarily resetting the forbidden_mask.
- Added preliminary support for AMD K8 processors with the
  regular 32-bit x86 kernel. The K8 performance counters appear
  to be identical or very similar to the K7 performance counters.

Version 2.4.3, 2002-12-11
- Added x86.c:perfctr_cpus_forbidden_mask. This bitmask describes
  the set of CPUs that must not access the perfctrs. On HT P4 MPs,
  only logical CPU #0 in each package is allowed access -- this
  avoids the resource conflict that would occur if both logical
  processors were to access the perfctrs. In other cases (UP or
  non-HT-P4 MPs) the mask is zero.
- vperfctr_control() now calls set_cpus_allowed() to ensure that
  the task stays away from CPUs in perfctr_cpus_forbidden_mask.
  This is racy with sys_sched_setaffinity(), and possibly some
  of the kernel's internal set_cpus_allowed() calls, but the race
  is unlikely to occur in current 2.4 kernels.
- Cleaned up the parameter passing protocol between vperfctr_ioctl()
  and the individual vperfctr "system call" procedures.
- Added safety check in global.c to disallow global-mode perfctrs
  on asymmetric MPs until the API has been fixed.
- Added set_cpus_allowed() implementation for 2.4 kernels, except
  those that already have it as indicated by HAVE_SET_CPUS_ALLOWED:
  this symbol is added to <linux/config.h> by the kernel patch.
- 2.2 kernels can't enforce CPU affinity masks, so x86.c warns if
  a HT P4 MP runs a 2.2 kernel, and falls back to generic x86 mode.
  Added dummy set_cpus_allowed() macro for 2.2 kernels.
- x86_compat.h now implements cpuid_ebx() and cpu_has_ht for old kernels.
- Makefile cleanup: Rules.make is obsolete in 2.5.
- Compile fixes in x86.c and virtual_stub.c: <linux/fs.h> needs to
  be included explicitly for the 2.5.50 kernel.

Version 2.4.2, 2002-11-25
- Fixed virtual.c:inc_nrctrs() to handle the -EBUSY case correctly.
  If the HW was busy (e.g. global running), then the first attempt
  to open a vperfctr would fail but further attempts would succeed.
  Updated error propagation to distinguish -EBUSY from -ENOMEM.
- Updated global.c for preempt-safety.
- Made the driver safe for preemptible kernels. This required a lot
  of analysis, but resulted in relatively few actual code changes.
  (Backport from the perfctr-3.1 branch.)
- Ported to 2.5.48: Replaced MOD_INC_USE_COUNT by try_module_get()
  and MOD_DEC_USE_COUNT by module_put(). Updated compat.h.
- Ported to 2.5.45: added Kconfig, removed Config.help.

Version 2.4.1, 2002-10-12
- RedHat 8.0's 2.4.18-14 kernel does EXPORT_SYMBOL(cpu_khz) while
  the vanilla 2.4.18 does not. This clashes with x86_setup.c's
  EXPORT_SYMBOL(cpu_khz). I've found no easy way to distinguish
  between these kernels at C preprocessing time, so I changed
  x86_setup.c to define a trivial perfctr_cpu_khz() function and
  EXPORT_SYMBOL that one instead.

Version 2.4.0, 2002-09-26
- Config.help updated to state that Pentium 4 is supported.
- 2.5.32 moved ptrace_check_attach() declaration to <linux/ptrace.h>.
- Removed redundant /proc/<pid>/perfctr access control check
  from vperfctr_stub_open(). Since 2.4.0-pre1 this check didn't
  match the real one, which prevented remote opens when the
  driver was built as a module.

Version 2.4.0-pre2, 2002-08-27
- vperfctr_control() now allows the user to specify that some PMC
  sums are not to be cleared when updating the control.
  There is a new bitmap field `preserve' in struct vperfctr_control:
  if bit i is set then PMC(i)'s sum is not cleared.
  `preserve' is a simple `unsigned long' for now, since this type
  fits all currently known CPU types.
  This change breaks binary compatibility, but user-space code which
  clears the entire control record before filling in relevant fields
  will continue to work as before after a recompile.
  This feature removes a limitation which some people felt was a
  problem for some usage scenarios.

Version 2.4.0-pre1, 2002-08-12
- Initial implementation of a new remote-control API for virtual
  per-process perfctrs. A monitor process may access a target
  process' perfctrs via /proc/pid/perfctr and operations on that
  file, if the monitor holds the target under ptrace ATTACH control.
  Updated virtual.c to allow remote access.
  Updated x86.c:perfctr_cpu_ireload() to work also in the remote
  control case on SMP machines.

Version 2.3.12, 2002-08-12
- Trivial comment fixes in compat.h and x86_compat.h.
- Removed __vperfctr_sample(), vperfctr_stub.sample, and bug_sample()
  from UP builds, since they are needed only on SMP.

Version 2.3.11, 2002-07-21
- Accumulated sums are now maintained for interrupt-mode perfctrs.
  User-space can use the standard syscall-less algorithm for computing
  these counters' current sums, should that be needed.

Version 2.3.10, 2002-07-19
- Added PERFCTR_X86_INTEL_P4M2 CPU type for Model 2 P4s, since
  they have ESCR Event Mask changes in a few events.
- The driver now supports replay tagging events on P4, using the
  pebs_enable and pebs_matrix_vert control fields added in 2.3.8.
- Some Pentium MMX and Pentium Pro processors have an erratum
  (Pentium erratum #74, Pentium Pro erratum 26) which causes SMM
  to shut down if CR4.PCE is set. intel_init() now clears the
  RDPMC feature on the affected steppings, to avoid the problem.
- perfctr_cpu_release() now clears the hardware registers and
  invalidates the per-cpu cache. This should allow the counter
  hardware to power down when not used, especially on P4.
- Callers of update_control() have no active i-mode counters.
  Documented this as a precondition, and changed update_control()
  to not call isuspend(). update_control() no longer needs hardware
  access, which should ease a port to CONFIG_PREEMPT=y.

Version 2.3.9, 2002-06-27
- Updated p4_escr_addr() in x86.c to match the latest revision of
  Intel's IA32 Volume 3 manual, #245472-007. An error in previous
  revisions of this document caused the driver to program the wrong
  ESCR in some cases. (CCCRs 12/13/16 with ESCR_SELECT(2) were mapped
  to SSU_ESCR0 instead of RAT_ESCR0, affecting the uop_type event.)

Version 2.3.8, 2002-06-26
- Added counter overflow interrupt support for Intel P4.
- 2.5.23 dropped smp_num_cpus and cpu_logical_map(). Added
  temporary workarounds to x86.c and global.c to allow compilation
  and testing under 2.5. May have to change the API (esp. global's)
  to be based on the sparse cpu_online_map instead.
- RedHat's 2.4.9-34 defines cpu_relax(). Updated compat.h.
- Added pebs_enable and pebs_matrix_vert fields (currently unused)
  to perfctr_cpu_control to support replay tagging events on P4.
  Updated the perfctr_cpu_state binary layout magic number.
- Silenced redefinition warnings for MSR_P6_PERFCTR0 and cpu_has_mmx.
- Updated Makefile for the 2.5.19 kernel's Makefile changes.
- Merged the P6 and K7 isuspend/iresume/write_control driver code.
- Added a VC3 specific clear_counters() procedure.
- Removed pointless code from perfctr_cpu_identify_overflow().
- Removed _vperfctr_get/set_thread() wrappers and thread->perfctr
  clobber checks from the DEBUG code. Removed unused "ibuf" and
  obsolete si_code fields from vperfctr state and control objects.
  Updated the vperfctr state magic number.
- Fixed the CONFIG_PREEMPT anti-dependency check in Config.in.
- vperfctr_control() now preserves the TSC sum on STOP;CONTROL
  transitions. The failure to do this caused problems for the
  PAPI P4 support being developed.

Version 2.3.7, 2002-04-14
- Kernel 2.5.8-pre3 changed the way APIC/SMP interrupt entries
  are defined. Defining these with asm() in C is no longer
  practical, so the kernel patch for 2.5.8-pre3 now defines
  the perfctr interrupt entry in arch/i386/kernel/entry.S.
- Permit use of cascading counters on P4: in the slave counter
  one sets the CASCADE flag instead of the ENABLE flag.
- Added P4 hyperthreading bit field definitions.
- Preliminary infrastructure to support a new remote-control
  interface via ptrace(). Updates to compat.h, virtual.c,
  virtual_stub.c, and x86_setup.c. ptrace_check_attach()
  emulation for older kernels is in x86_setup.c since
  virtual_stub.c isn't compiled if the driver isn't a module.

Version 2.3.6, 2002-03-21
- Rewrote sys_vperfctr_control() to do a proper suspend before
  updating the control, and to skip trying to preserve the TSC
  start value around the resume. This cleaned up the code and
  eliminated the bogus "BUG! resuming non-suspended perfctr"
  warnings that control calls to active perfctrs caused.
- Rewrote sys_vperfctr_iresume() to not preserve the TSC start
  value around the resume. Since we had just done a suspend(),
  this would cause double-accounting of the TSC.

Version 2.3.5, 2002-03-17
- Added detection of the VIA C3 Ezra-T processor.
- CPU detection now uses current_cpu_data instead of boot_cpu_data,
  to avoid the boot_cpu_data.x86_vendor bug which is present is
  all current 2.2/2.4/2.5 kernels. The bug caused the x86_vendor
  field to be cleared on SMP machines, which in turn tricked the
  driver to identify MP AMD K7 machines as MP Intel P6, with
  disastrous results when the wrong MSRs were programmed.
- Updated compat.h for /proc/<pid>/ inode change in 2.5.4.
- Added a check to prevent building on preemptible 2.4/2.5 kernels,
  since the driver isn't yet safe for those.
- Put perfctr's configuration help text in Config.help in this
  directory: kernel 2.5.3-pre5 changed from a having a common
  Configure.help file to having local Config.help files.

Version 2.3.4, 2002-01-23
- Updated virtual.c for remap_page_range() change in 2.5.3-pre1.
  Added emulation for older kernels to compat.h.
- Permit use of tagging on P4 for at-retirement counting. This may
  not yet work as expected, since up-stream (tag producing) counters
  aren't disabled at context switches: a process may therefore see
  more tagged uops than expected.
- Fixed uses of __FUNCTION__ to comply with changes in GCC 3.0.3.

Version 2.3.3, 2001-12-31
- Minor x86.c cleanup: reordered function definitions so that
  write_control comes after isuspend/iresume: this makes it easier
  to follow the runtime control flow.
- Fixed isuspend()/iresume()'s broken cache checking protocol. The
  old protocol didn't handle process migration across CPUs in SMP
  machines correctly, as illustrated by the following scenario:
      P1 runs on CPU1 and suspends. P1 and CPU1 now have the same
  cache id (->k1.id). P1 is resumed and suspended on CPU2: the state
  in CPU1 is now stale. Then P1 is resumed on CPU1, and no other
  process has been using CPU1's performance counters since P1's last
  suspend on CPU1. The old protocol would see matching cache ids and
  that P1's i-mode EVNTSELs are stopped, so it would accept the cache
  and resume P1 with CPU1's stale PERFCTRS values.
      In the new protocol isuspend() records the active CPU in the
  state object, and iresume() checks if both the CPU and the control
  id match. The new protocol is also simpler since iresume() no longer
  checks if the i-mode EVNTSELs are cleared or not.
- P6 nasty i-mode to a-mode context switch bug fixed: p6_isuspend()
  used to simply clear EVNTSEL0's Enable flag in order to stop all
  i-mode counters. Unfortunately, that was insufficient as shown by
  the following case (which actually happened).
      P1 has EVNTSEL0 in a-mode and EVNTSEL1 in i-mode. P1 suspends:
  PERFCTR1 is stopped but EVNTSEL1 is still in i-mode. P2 has EVNTSEL0
  in a-mode and no EVNTSEL1. P2 resumes and updates EVNTSEL0. This
  activates not only P2's PERFCTR0 but also the dormant PERFCTR1. If
  PERFCTR1 overflows, then P2 will receive an unexpected interrupt. If
  PERFCTR1 doesn't overflow, but P2 suspends and P1 resumes, then P1
  will find that PERFCTR1 has a larger than expected value.
      p6_isuspend() and p6_iresume() were changed to ignore the global
  Enable flag and to disable/enable each i-mode EVNTSEL individually,
  just like how it's done on the K7.
- x86.c cleanups: P5MMX, MII, C6, VC3, P6, K7, and P4 now all
  use the same rdpmc_read_counters() method. VIA C3 now uses
  p6_write_control() instead of its own method.
- Removed "pmc_map[] must be identity" restriction from P6 and K7.
  The API uses the virtual counter index to distinguish a-mode
  and i-mode counters, but P6 events aren't entirely symmetric:
  this lead to some strange cases with the old pmc_map[] rule.
      P6 and K7 isuspend() now need access to the control, so
  update_control() and its callers had to be changed to allow it
  to isuspend() _before_ the new control is installed.
- P4 write_control fixes: changed the ESCR cache to be indexed by
  MSR offset from 0x3A0, and changed P4 write_control to index the
  CCCR/ESCR cache with physical instead of virtual indices. Added
  call to debug_evntsel_cache(), after updating it for pmc_map[].
- Added P4 and Generic support to x86_tests.c, and some cleanups.

Version 2.3.2, 2001-11-19
- P4 fix: the mapping from CCCR 17 to its associated ESCRs was
  wrong due to an off-by-one error in x86.c:p4_escr_addr().
- P4 fix: also clear the PEBS MSRs when initialising the driver.
- Minor cleanup in x86.c: replaced the "clear MSRs" loops with
  calls to a helper procedure.

Version 2.3.1, 2001-11-06
- Microscopic P4 cleanups. Testing on my new P4 box has confirmed
  that the PMAVAIL flag in MSR_IA32_MISC_ENABLE is read-only.

Version 2.3, 2001-10-24
- Added support for multiple interrupt-mode virtual perfctrs
  with automatic restart. Added an identify_overflow() method
  to x86.c to identify and reset the overflowed counters.
  Added checks to ensure that the user-specified restart values
  for interrupt-mode counters are negative.
  Updated virtual.c's signal delivery interface to pass a
  bitmask describing which counters overflowed; the siginfo
  si_code is now fixed as SI_PMC_OVF (fault-class).
- Fixed some typos in x86.c. Added a note about the C3 Ezra.
- Added EXPORT_NO_SYMBOLS to init.c, for compatibility with
  announced changes in modutils 2.5.

Version 2.2, 2001-10-09
- Added preliminary support for the Pentium 4. Only basic stuff
  for now: no cascading counters, overflow interrupts, tagged
  micro-ops, or use of DS/PEBS. The code compiles but hasn't been
  tested on an actual Pentium 4.

Version 2.1.4, 2001-09-30
- No driver-level changes.

Version 2.1.3, 2001-09-13
- Fixed a compilation problem where virtual_stub couldn't be compiled
  in modular kernels older than 2.2.20pre10 if KMOD was disabled, due
  to an incompatible stub definition of request_module().
- Replaced most occurrences of "VIA Cyrix III / C3" with "VIA C3".

Version 2.1.2, 2001-09-05
- Added MODULE_LICENSE() tag, for compatibility with the tainted/
  non-tainted kernel stuff being put into 2.4.9-ac and modutils.
- VIA C3 support is not "preliminary" any more. Testing has revealed
  that the reserved bits in the C3's EVNTSEL1 have no function and
  need not be preserved. The driver now fills these bits with zeroes.
  (Thanks to Dave Jones @ SuSE for running these tests.)
- Minor bug fix in the perfctr interrupt assembly code.
  (Inherited from the 2.4 kernel. Fixed in 2.4.9-ac4.)

Version 2.1.1, 2001-08-28
- Preliminary recognition of Pentium 4 processors, including
  checking the IA32_MISC_ENABLE MSR.
- Moved %cr4 access functions from <asm-i386/perfctr.h> to
  x86_compat.h, to work around changes in 2.4.9-ac3.
- More %cr4 cleanups possible since the removal of dodgy_tsc()
  in Version 2.1: moved {set,clear}_in_cr4_local() into x86.c,
  and eliminated the set_in_cr4() compat macro.
- Fixed a bug in x86.c:finalise_backpatching(): the fake cstatus
  mustn't include i-mode counters unless we have PCINT support.
  Failure to check this cased fatal init-time oopses in some
  configs (CONFIG_X86_UP_APIC set but no local APIC in the CPU).
- Minor comment updates in x86.c due to AMD #22007 Revision J.
- Removed '%' before 'cr4' in printouts from x86_tests.c, to
  avoid the '%' being mutated by log-reading user-space code.

Version 2.1, 2001-08-19
- Fixed a call backpatching bug, caused by an incompatibility
  between the 2.4 and 2.2 kernels' xchg() macros. The 2.2 version
  lacks a "volatile" causing gcc to remove the entire statement
  if xchg() is used for side-effect only. Reverted to a plain
  assignment, which is safe since the 2.0.1 backpatching changes.
- Fixed a bug where an attempt to use /proc/<pid>/perfctr on an
  unsupported processor would cause a (well-behaved) kernel oops,
  due to calling a NULL function pointer in x86.c, vperfctr_open()
  now returns -ENODEV if virtual.c hasn't been initialised.
- Removed the WinChip configuration option, the dodgy_tsc() callback,
  and the clr_cap_tsc() x86_compat macro. WinChip users should configure
  for generic 586 or less and use the kernel's "notsc" boot parameter.
  This cleans up the driver and the 2.4 kernel patches, at the expense
  of more code in the 2.2 kernel patches to implement "notsc" support.
- Minor cleanup: moved version number definition from init.c to
  a separate file, version.h.

Version 2.0.1, 2001-08-14
- The unsynchronised backpatching in x86.c didn't work on SMP,
  due to Pentium III erratum E49, and similar errata for other
  P6 processors. (The change in 2.0-pre6 was insufficient.)
  x86.c now finalises the backpatching at driver init time,
  by "priming" the relevant code paths. To make this feasible,
  the isuspend() and iresume() methods are now merged into
  the other high-level methods; virtual.c became a bit cleaner.
- Removed obsolete "WinChip pmc_map[] must be identity" check.

Version 2.0, 2001-08-08
- Resurrected partial support for interrupt-mode virtual perfctrs.
  virtual.c permits a single i-mode perfctr, in addition to TSC
  and a number of a-mode perfctrs. BUG: The i-mode PMC must be last,
  which constrains CPUs like the P6 where we currently restrict
  the pmc_map[] to be the identity mapping. (Not a problem for
  K7 since it is symmetric, or P4 since it is expected to use a
  non-identity pmc_map[].)
  New perfctr_cpu_ireload() procedure to force reload of i-mode
  PMCs from their start values before resuming. Currently, this
  just invalidates the CPU cache, which forces the following
  iresume() and resume() to do the right thing.
  perfctr_cpu_update_control() now calls setup_imode_start_values()
  to "prime" i-mode PMCs from the control.ireset[] array.
- Bug fix in perfctr_cpu_update_control(): start by clearing cstatus.
  Prevents a failed attempt to update the control from leaving the
  object in a state with old cstatus != 0 but new control.

Version 2.0-pre7, 2001-08-07
- Cleaned up the driver's debugging code (virtual, x86).
- Internal driver rearrangements. The low-level driver (x86) now handles
  sampling/suspending/resuming counters. Merged counter state (sums and
  start values) and CPU control data to a single "CPU state" object.
  This simplifies the high-level drivers, and permits some optimisations
  in the low-level driver by avoiding the need to buffer tsc/pmc samples
  in memory before updating the accumulated sums (not yet implemented).
- Removed the read_counters, write_control, disable_rdpmc, and enable_rdpmc
  methods from <asm/perfctr.h>, since they have been obsoleted by the
  new suspend/resume/sample methods.
- Rearranged the 'cstatus' encoding slightly by putting 'nractrs' in
  the low 7 bits; this was done because 'nractrs' is retrieved more
  often than 'nrctrs'.
- Removed the obsolete 'status' field from vperfctr_state. Exported
  'cstatus' and its access methods to user-space. (Remove the
  control.tsc_on/nractrs/nrictrs fields entirely?)
- Removed WinChip "fake TSC" support. The user-space library can now
  sample with slightly less overhead on sane processors.
- WinChip and VIA C3 now use p5mmx_read_counters() instead of their
  own versions.

Version 2.0-pre6, 2001-07-27
- New patches for kernels 2.4.6, 2.4.7, and 2.4.7-ac1.
- Sampling bug fix for SMP. Normally processes are suspended and
  resumed many times per second, but on SMP machines it is possible
  for a process to run for a long time without being suspended.
  Since sampling is performed at the suspend and resume actions,
  a performance counter may wrap around more than once between
  sampling points. When this occurs, the accumulated counts will
  be highly variable and much lower than expected.
  A software timer is now used to ensure that sampling deadlines
  aren't missed on SMP machines. (The timer is run by the same code
  which runs the ITIMER_VIRTUAL interval timer.)
- Bug fix in the x86 "redirect call" backpatching routine. To be
  SMP safe, a bus-locked write to the code must be used.
- Bug fix in the internal debugging code (CONFIG_PERFCTR_DEBUG).
  The "shadow" data structure used to detect if a process' perfctr
  pointer has been clobbered could cause lockups with SMP kernels.
  Rewrote the code to be simpler and more robust.
- Minor performance tweak for the P5/P5MMX read counters procedures,
  to work around the P5's cache which doesn't allocate a cache line
  on a write miss.
- To avoid undetected data layout mismatches, the user-space library
  now checks the data layout version field in a virtual perfctr when
  it is being mmap:ed into the user's address space.
- A few minor cleanups.  

Version 2.0-pre5, 2001-06-11
- Internally use a single 'cstatus' field instead of the three
  tsc_on/nractrs/nrictrs fields. Should reduce overhead slightly.
- Reorder the fields in cpu_control so that 'cstatus' and other
  frequently used fields get small offsets -- avoids some disp32
  addressing modes in timing-critical code.
- Fixed a bug in p6_iresume where it forgot to invalidate the
  EVNTSEL cache, causing p6_write_control to fail to reload the
  MSRs. (K7 had a similar bug.) Since i-mode support is disabled
  at the moment, no-one was actually bitten by this.
- Fixed another iresume/write_control cache invalidation bug where a
  switch to an "uninitialised" CPU would fail to initialise the MSRs.
- Added a CONFIG_PERFCTR_DEBUG option to enable internal consistency
  checks. Currently, this checks that a task's vperfctr pointer
  isn't clobbered behind our backs, that resume and suspend for
  a vperfctr are performed on the same CPU, and that the EVNTSEL
  cache is semi-consistent when reloading is optimised away.
  ("semi" because it only checks that the cache agrees with the
  user's control data, and not that the cache agrees with the MSRs.)
- Minor cleanups.

Version 2.0-pre4, 2001-04-30
- Cleanups in x86.c. #defines introduced for magic constants.
  More sharing of procedures between different CPU drivers.
  Fixed a bug where k7_iresume() could cause k7_write_control()
  to fail to reload the correct EVNTSELs.
  The WinChip C6/2/3 driver now "fakes" an incrementing TSC.
- General cleanups: s/__inline__/inline/ following Linux kernel
  coding standards, and renamed the low-level control objects to
  cpu_control to distinguish them from {v,g}perfctr_control objects.
- O_CREAT is now interpreted when /proc/self/perfctr is opened:
  if the vperfctr does not exist, then it is created; if the
  vperfctr does exist, then EEXIST is returned (unfortunately
  O_EXCL doesn't work, since it's intercepted by the VFS layer).
  "perfex -i" uses this to avoid having to create a vperfctr when
  only an INFO command is to be issued.
  libperfctr.c:vperfctr_open() uses this to decide whether to
  UNLINK the newly opened vperfctr in case of errors or not.
- Cleaned up virtual.c's 2.4/2.2 VFS interface code a little,
  and eliminated the OWNER_THIS_MODULE compat macro.
- Added MOD_{INC,DEC}_USE_COUNTs to virtual.c's file_operations
  open and release procedures for 2.2 kernels. This should
  simulate 2.4's fops_get/put at >open() and >release().

Version 2.0-pre3, 2001-04-17
- Interrupt-mode virtual perfctrs are temporarily disabled since
  x86.c doesn't yet detect which PMC overflowed. The old API
  could be made to work, but it was broken anyway.
- Integrated the new P4-ready data structures and APIs.
  The driver compiles but the user-space stuff hasn't been
  updated yet, so there may be some remaining bugs.

  I have not yet committed to all details of this API. Some
  things, like accumulating counters in virtual.c and global.c,
  are uglier now, and going from a single "status == nrctrs"
  field to three separate fields (tsc_on, nrctrs, nrictrs)
  cannot be good for performance.

  In the new API the control information is split in separate
  arrays depending on their use, i.e. a struct-of-arrays layout
  instead of an array-of-struct layout. The advantage of the
  struct-of-arrays layout is that it should cause fewer cache
  lines to be touched at the performance-critical operations.
  The disadvantage is that the layout changes whenever the
  number of array elements has to be increased -- as is the
  case for the future Pentium 4 support (18 counters).

Version 2.0-pre2, 2001-04-07
- Removed automatic inheritance of per-process virtual perfctrs
  across fork(). Unless wait4() is modified, it's difficult to
  communicate the final values back to the parent: the now
  abandoned code did this in a way which made it impossible
  to distinguish one child's final counts from another's.
  Inheritance can be implemented in user-space anyway, so the
  loss is not great. The interface between the driver and the rest
  of the kernel is now smaller and simpler than before.
- Simulating cpu_khz by a macro in very old kernels broke since
  there's also a struct field with that name :-( Instead of
  putting the ugly workaround back in, I decided to drop support
  for kernels older than 2.2.16.
- Preliminary support for the VIA C3 processor -- the C3 is
  apparently a faster version of the VIA Cyrix III.
- Added rdtsc cost deduction to the init tests code, and changed
  it to output per-instruction costs as well.
- More cleanups, making 2.2 compatibility crud less visible.

Version 2.0-pre1, 2001-03-25
- First round of API and coding changes/cleanups for version 2.0:
  made perfctr_info.version a string, moved some perfctr_info inits
  to x86.c and eliminated some redundant variables, removed dead VFS
  code from virtual.c, removed obsolete K7 tests from x86_tests.c,
  removed mmu_cr4_features wrappers from x86_compat.h, minor cleanup
  in virtual_stub.c.
- Fixed an include file problem which made some C compilers (not gcc)
  fail when compiling user-space applications using the driver.
- Added missing EXPORT_SYMBOL declarations needed by the UP-APIC PM
  code when the driver is built as a module.
- Preliminary changes in x86.c to deal with UP-APIC power management
  issues in 2.4-ac kernels. The PM callback is only a stub for now.

Version 1.9, 2001-02-13
- Fixed compilation problems for 2.2 and SMP kernels.
- Found updated documentation on "VIA Cyrix III". Apparently, there
  are two distinct chips: the older Joshua (a Cyrix design) and the
  newer Samuel (a Centaur design). Our current code supported Joshua,
  but mistook Samuel for Joshua. Corrected the identification of Samuel
  and added explicit support for it. Samuel's EVNTSEL1 is not well-
  documented, so there are some new Samuel-specific tests in x86_tests.c.
- Added preliminary interrupt-mode support for AMD K7.
- Small tweaks to virtual.c's interrupt handling.

Version 1.8, 2001-01-23
- Added preliminary interrupt-mode support to virtual perfctrs.
  Currently for P6 only, and the local APIC must have been enabled.
  Tested on 2.4.0-ac10 with CONFIG_X86_UP_APIC=y.
  When an i-mode vperfctr interrupts on overflow, the counters are
  suspended and a user-specified signal is sent to the process. The
  user's signal handler can read the trap pc from the mmap:ed vperfctr,
  and should then issue an IRESUME ioctl to restart the counters.
  The next version will support buffering and automatic restart.
- Some cleanups in the x86.c init and exit code. Removed the implicit
  smp_call_function() calls from x86_compat.h.

Version 1.7, 2001-01-01
- Updated Makefile for 2.4.0-test13-pre3 Rules.make changes.
- Removed PERFCTR_ATTACH ioctl from /dev/perfctr, making the
  vperfctrs only accessible via /proc/self/perfctr. Removed
  the "attach" code from virtual.c, and temporarily commented
  out the "vperfctr fs" code. Moved /dev/perfctr initialisation
  and implementation from init.c to global.c.
- Eliminated CONFIG_VPERFCTR_PROC, making /proc/pid/perfctr
  mandatory if CONFIG_PERFCTR_VIRTUAL is set.
- Some 2.2/2.4 compatibility cleanups.
- VIA Cyrix III detection bug fix. Contrary to VIA's documentation,
  the Cyrix III vendor field is Centaur, not Cyrix.

Version 1.6, 2000-11-21
- Preliminary implementation of /proc/pid/perfctr. Seems to work,
  but virtual.c and virtual_stub.c is again filled with
  #if LINUX_VERSION_CODE crap which will need to be cleaned up.
  The INFO ioctl is now implemented by vperfctrs too, to avoid the
  need for opening /dev/perfctr.
- virtual.c now puts the perfctr pointer in filp->private_data
  instead of inode->u.generic_ip. The main reason for this change
  is that proc-fs places a dentry pointer in inode->u.generic_ip.
- sys_vperfctr_control() no longer resets the virtual TSC
  if it already is active. The virtual TSC therefore runs
  continuously from its first activation until the process
  stops or unlinks its vperfctrs.
- Updates for 2.4.0-test11pre6. Use 2.4-style cpu_has_XXX
  feature testing macros. Updated x86_compat.h to implement
  missing cpu_has_mmx and cpu_has_msr, and compatibility
  macros for 2.2. Changed vperfctr_fs_read_super() to use
  new_inode(sb) instead of get_empty_inode() + some init code.
- Updates for 2.4.0-test9. Fixed x86_compat.h for cpu_khz change.
  Since drivers/Makefile was converted to the new list style,
  it became more difficult to handle CONFIG_PERFCTR=m. Changed
  Config.in to set CONFIG_KPERFCTR=y when CONFIG_PERFCTR != n,
  resulting in a much cleaner kernel patch for 2.4.0-test9.
- Removed d_alloc_root wrapper since 2.2 doesn't need it any more.
- When building for 2.2.18pre, use some of its 2.4 compatibility
  features (module_init, module_exit and DECLARE_MUTEX).
- Updates for 2.4.0-test8: repaired kernel patch for new parameter
  in do_fork, and fixed CLONE_PERFCTR conflict with CLONE_THREAD.

Version 1.5, 2000-09-03
- Dropped support for intermediate 2.3 and early 2.4.0-test kernels.
  The code now supports kernels 2.2.xx and 2.4.0-test7 or later only.
  Cleanups in compat.h and virtual.c.
- Rewrote the Makefile to use object file lists instead of conditionals.
  This gets slightly hairy since kernel extensions are needed even
  when the driver proper is built as a module.
- Removed the definition of CONFIG_PERFCTR_X86 from Config.in.
  Use the 2.4 standard CONFIG_X86 instead. The 2.2.xx kernel
  patches now define CONFIG_X86 in arch/i386/config.in.
- Cleaned up the vperfctr inheritance filter. Instead of setting
  a disable flag (CLONE_KTHREAD) when kernel-internal threads are
  created, I now set CLONE_PERFCTR in sys_fork and sys_vfork.
- /dev/perfctr no longer accepts the SAMPLE and UNLINK ioctls.
  All operations pertaining to a process' virtual perfctrs must
  be applied to the fd returned from the ATTACH ioctl.
- Removed the remote-control features from the virtual perfctrs.
  Significant simplifications in virtual.c. Removed some now
  unused stuff from compat.h and virtual_stub.c.

Version 1.4, 2000-08-11
- Fixed a memory leak bug in virtual.c. An extraneous dget() in
  get_vperfctr_filp() prevented reclaiming the dentry and inode
  allocated for a vperfctr file.
- Major changes to the VFS interface in virtual.c. Starting with
  2.4.0-test6, inode->i_sb == NULL no longer works. Added code to
  register a "vperfctr" fs and define a superblock and a mount point.
  Completely rewrote the dentry init code. Most of the new code is
  adapted from fs/pipe.c, with simplifications and macros to continue
  supporting 2.2.x kernels. `ls -l /proc/*/fd/' now prints recognizable
  names for vperfctr files.
- Cleaned up virtual.c slightly. Removed "#if 1" tests around the
  vperfctr inheritance code. Rewrote vperfctr_alloc and vperfctr_free
  to use the virt_to_page and {Set,Clear}PageReserved macros;
  also updated compat.h to provide these for older kernels.
- Updated for 2.4.0-test3: a dummy `open' file operation is no longer
  required by drivers/char/misc.c.
- Updated for `owner' field in file_operations added in 2.4.0-test2.
  Removed MOD_{INC,DEC}_USE_COUNT from init.c (except when compiling
  for 2.2.x) and virtual.c. Added MOD_{INC,DEC}_USE_COUNT to the
  reserve/release functions in x86.c -- needed because the driver
  may be active even if no open file refers to it. Using can_unload
  in the module struct instead is possible but not as tidy.

Version 1.3, 2000-06-29
- Implemented inheritance for virtual perfctrs: fork() copies the
  evntsel data to the child, exit() stops the child's counters but
  does not detach the vperfctr object, and wait() adds the child's
  counters to the parent's `children' counters.
  Added a CLONE_KTHREAD flag to prevent inheritance to threads
  created implicitly by request_module() and kernel_thread().
- Fixed a half-broken printk() in x86_tests.c.
- Added checks to virtual.c to prevent the remote-control interface
  from trying to activate dead vperfctrs.
- Updated vperfctr_attach() for changes in 2.3.99-pre7 and 2.4.0-test2.
- Fixed a problem introduced in 1.2 which caused linker errors if
  CONFIG_PERFCTR=m and CONFIG_PERFCTR_INIT_TESTS=y.
- Export CPU kHz via a new field in PERFCTR_INFO ioctl, to enable
  user-space to map accumulated TSC counts to actual time.

Version 1.2, 2000-05-24
- Added support for generic x86 processors with a time-stamp counter
  but no performance-monitoring counters. By using the driver to
  virtualise the TSC, accurate cycle-count measurements are now
  possible on PMC-less processors like the AMD K6.
- Removed some of the special-casing of the x86 time-stamp counter.
  It's now "just another counter", except that no evntsel is
  needed to enable it.
- WinChip bug fix: the "fake TSC" code would increment an
  uninitialised counter.
- Reorganised the x86 driver. Moved the optional init-time testing
  code to a separate source file.
- Miscellaneous code cleanups and naming convention changes.

Version 1.1, 2000-05-13
- vperfctr_attach() now accepts pid 0 as an alias for the current
  process. This reduces the number of getpid() calls needed in
  the user-space library. (Suggested by Ulrich Drepper.)
- Added support for the VIA Cyrix III processor.
- Tuned the x86 driver interface. Replaced function pointers
  with stubs which rewrite callers to invoke the correct callees.
- Added ARRAY_SIZE definition to compat.h for 2.2.x builds.
- Updated for 2.3.48 inode changes.
- Moved code closer to 2.3.x coding standards. Removed init_module
  and cleanup_module, added __exit, module_init, and module_exit,
  and extended "compat.h" accordingly. Cleaned up <linux/perfctr.h>
  and <asm-i386/perfctr.h> a little.

Version 1.0, 2000-01-31
- Prepared the driver to cope with non-x86 architectures:
  - Moved generic parts of <asm-i386/perfctr.h> to <linux/perfctr.h>.
  - Merged driver's private "x86.h" into <asm-i386/perfctr.h>.
  - Config.in now defines CONFIG_PERFCTR_${ARCH}, and Makefile uses
    it to select appropriate arch-dependent object files
- The driver now reads the low 32 bits of the counters,
  instead of 40 or 48 bits zero-extended to 64 bits.
  Sums are still 64 bits. This was done to reduce the number
  of cache lines needed for certain data structures, to
  simplify and improve the performance of the sampling
  procedures, and to change 64+(64-64) arithmetic to 64+(32-32)
  for the benefit of gcc on x86. This change doesn't reduce
  precision, as long as no event occurs more than 2^32 times
  between two sampling points.
- PERFCTR_GLOBAL_READ now forces all CPUs to be sampled, if the
  sampling timer isn't running.

Version 0.11, 2000-01-30
- Added a missing EXPORT_SYMBOL which prevented the driver
  from being built as a module in SMP kernels.
- Support for the CPU sampling instructions (i.e. RDPMC and
  RDTSC on x86) is now announced explicitly by PERFCTR_INFO.
- The x86 hardware driver now keeps CR4.PCE globally enabled.
  There are two reasons for this. First, the cost of toggling
  this flag at process suspend/resume is high. Second, changes
  in kernel 2.3.40 imply that any processor's %cr4 may be updated
  asynchronously from the global variable mmu_cr4_features.

Version 0.10, 2000-01-23
- Added support for global-mode perfctrs (global.c).
- There is now a config option controlling whether to
  perform init-time hardware tests or not.
- Added a hardware reserve/release mechanism so that multiple
  high-level services don't simultaneously use the hardware.
- The driver is now officially device <char,major 10,minor 182>.
- Tuned the 64-bit tsc/msr/pmc read operations in x86.c.
- Support for virtual perfctrs can now be enabled or disabled
  via CONFIG_PERFCTR_VIRTUAL.
- Added support for the WinChip 3 processor.
- Split the code into several files: x86.c (x86 drivers),
  virtual.c (virtualised perfctrs), setup.c (boot-time actions),
  init.c (driver top-level and init code).

Version 0.9, 2000-01-02
- The driver can now be built as a module.
- Dropped sys_perfctr() system call and went back to using a
  /dev/perfctr character device. Generic operations are now
  ioctl commands on /dev/perfctr, and control operations on
  virtual perfctrs are ioctl commands on their file descriptors.
  Initially this change was done because new system calls in 2.3.x
  made maintenance and binary compatibility with 2.2.x hard, but
  the new API is actually cleaner than the previous system call.
- Moved this code from arch/i386/kernel/ to drivers/perfctr/.

Version 0.8, 1999-11-14
- Made the process management callback functions inline to
  reduce scheduling overhead for processes not using perfctrs.
- Changed the 'status' field to contain the number of active
  counters. Changed read_counters, write_control, and accumulate
  to use this information to avoid unnecessary work.
- Fixed a bug in k7_check_control() which caused it to
  require all four counters to be enabled.
- Fixed sys_perfctr() to return -ENODEV instead of -ENOSYS
  if the processor doesn't support perfctrs.
- Some code cleanups.
- Evntsel MSRs are updated lazily, and counters are not written to.

  The following table lists the costs (in cycles) of various
  instructions which access the counter or evntsel registers.
  The table was derived from data collected by init-time tests
  run by previous versions of this driver.

  Processor		P5	P5MMX	PII	PIII	K7
  Clock freq. (MHz)	133	233	266	450	500

  RDPMC			n/a	14	31	36	13
  RDMSR (counter)	29	28	81	80	52
  WRMSR (counter)	35	37	97	115	80
  WRMSR (evntsel)	33	37	88	105	232

  Several things are apparent from this table:

  1. It's much cheaper to use RDPMC than RDMSR to read the counters.
  2. It's much more expensive to reset a counter than to read it.
  3. It's expensive to write to an evntsel register.

  As of version 0.8, this driver uses the following strategies:
  * The evntsel registers are updated lazily. A per_cpu_control[]
    array caches the contents of each CPU's evntsel registers,
    and only when a process requires a different setup are the
    evntsel registers written to. In most cases, this eliminates the
    need to reprogram the evntsel registers when switching processes.
    The older drivers would write to the evntsel registers both at
    process suspend and resume.
  * The counter registers are read both at process resume and suspend,
    and the difference is added to the process' accumulated counters.
    The older drivers would reset the counters at resume, read them
    at suspend, and add the values read to the accumulated counters.
  * Only those registers enabled by the user's control information
    are manipulated, instead of blindly manipulating all of them.

Version 0.7 1999-10-25
- The init-time checks in version 0.6 of this driver showed that
  RDMSR is a lot slower than RDPMC for reading the PMCs. The driver
  now uses RDPMC instead of RDMSR whenever possible.
- Added an mmap() operation to perfctr files. This allows any client
  to read the accumulated counter state without making a system call.
  The old "sync to user-provided buffer" method has been removed,
  as it entailed additional copy operations and only worked for the
  "active" process. The PERFCTR_READ operation has been replaced
  by a simpler PERFCTR_SAMPLE operation, for the benefit of pre-MMX
  Intel P5 processors which cannot sample counters in user-mode.
  This rewrite actually simplified the code.
- The AMD K7 should now be supported correctly. The init-time checks
  in version 0.6 of this driver revealed that each K7 counter has
  its own ENable bit. (Thanks to Nathan Slingerland for running the
  test and reporting the results to me.)
- Plugged a potential memory leak in perfctr_attach_task().
- No longer piggyback on prctl(); sys_perfctr() is a real system call.
- Some code cleanups.

Version 0.6 1999-09-08
- Temporarily added some init-time code that checks the
  costs of RDPMC/RDMSR/WRMSR operations applied to perfctr MSRs,
  the semantics of the ENable bit on the Athlon, and gets
  the boot-time value of the WinChip CESR register.
  This code can be turned off by #defining INIT_DEBUG to 0.
- Preliminary support for the AMD K7 Athlon processor.
- The code will now build in both 2.3.x and 2.2.x kernels.

Version 0.5 1999-08-29
- The user-space buffer is updated whenever state.status changes,
  even when a remote command triggers the change.
- Reworked and simplified the high-level code. All accesses
  now require an attached file in order to implement proper
  accounting and syncronisation. The only exception is UNLINK:
  a process may always UNLINK its own PMCs.
- Fixed counting bug in sys_perfctr_read().
- Improved support for the Intel Pentium III.
- Another WinChip fix: fake TSC update at process resume.
- The code should now be safe for 'gcc -fstrict-aliasing'.

Version 0.4 1999-07-31
- Implemented PERFCTR_ATTACH and PERFCTR_{READ,CONTROL,STOP,UNLINK}
  on attached perfctrs. An attached perfctr is represented as a file.
- Fixed an error in the WinChip-specific code.
- Perfctrs now survive exec().

Version 0.3 1999-07-22
- Interface now via sys_prctl() instead of /dev/perfctr.
- Added NYI stubs for accessing other processes' perfctrs.
- Moved to dynamic allocation of a task's perfctr state.
- Minor code cleanups.

Version 0.2 1999-06-07
- Added support for WinChip CPUs.
- Restart counters from zero, not their previous values. This
  corrected a problem for Intel P6 (WRMSR writes 32 bits to a PERFCTR
  MSR and then sign-extends to 40 bits), and also simplified the code.
- Added support for syncing the kernel's counter values to a user-
  provided buffer each time a process is resumed. This feature, and
  the fact that the driver enables RDPMC in processes using PMCs,
  allows user-level computation of a process' accumulated counter
  values without incurring the overhead of making a system call.

Version 0.1 1999-05-30
- First public release.