Blob Blame History Raw

	* src/components/net/linux-net.c: Repairing more coverity warnings.


	* src/windows-common.c: Missed an instance of CPUs yesterday.

	* src/: papi_internal.c, threads.c: This changes fixes two race
	  conditions that are probably the cause of the pthrtough
	  double-free error.

	  When freeing a thread, we remove and free all eventsets belonging
	  to that thread.  This could race with the thread itself removing
	  the evenset, causing some ESI fields to be freed twice.

	  The problem was found by using the Valgrind 3.8 Helgrind tool

	    valgrind --tool=helgrind --free-is-write=yes ctests/pthrtough

	  In order for Helgrind to work, I had to temporarily modify PAPI
	  to use POSIX pthread mutexes for locking.  Is there any reason we
	  don't use these all the time?


	* src/utils/: avail.c, component.c, event_chooser.c,
	  native_avail.c: ix one more case of "CPU's" in the print header

	  Also remove the extraneous	The following correspond to fields
	  in the PAPI_event_info_t structure.  message

	* src/: testlib/papi_test.h, testlib/test_utils.c,
	  ctests/all_native_events.c, ctests/calibrate.c,
	  ctests/code2name.c, ctests/hwinfo.c: Fix one more case of "CPU's"
	  in the print header code.

	  Also remove the extraneous	The following correspond to fields
	  in the PAPI_event_info_t structure.  message

	* src/ take infiniband out of
	  the buildbot test.

	* src/: x86_cache_info.c, components/coretemp/linux-coretemp.c,
	  components/lustre/linux-lustre.c, components/net/linux-net.c,
	  utils/event_chooser.c: Fix coverity errors reported by Will

	* src/: aix.c, any-proc-null.c, linux-common.c, papi.c, papi.h,
	  papivi.h, solaris-niagara2.c, solaris-ultra.c,
	  ctests/clockres_pthreads.c: Address Redhat bug 785975.  The
	  plural of CPU appears to be CPUs

	* src/ Patch to cleanup dependencies, allowing for
	  parallel makes.  Patch due to Will Cohen from redhat


	* src/ Add infiniband and mx
	  component to buildbot component tests.

	* src/components/net/tests/: net_values_by_code.c,
	  net_values_by_name.c: Apply patch suggested by Will Cohen to
	  check for system return values.

	* src/components/lmsensors/linux-lmsensors.h: Added missing string


	* man/... : update man pages one more time for 4.2.1

	* release_procedure.txt: Make sure generated html has papi group


	* src/multiplex.c: Fix the @file matching multiple files warning.

	* src/components/README: Cleanup doxygen errors.

	* doc/Doxyfile-html: Typo introduced by the last commit.

	* doc/Doxyfile-html: Exclude linux-bgp.c from doxygen.

	* doc/Doxyfile-html: Make sure the component README file gets
	  included in doxygen.

	* src/components/coretemp_freebsd/coretemp_freebsd.c: Cleanup
	  doxygen warnings in freebsd coretemp component.

	* src/papi.h: Cleanup some doxygen warnings related to the

	* src/components/example/example.c: fix doxygen warning in the
	  example component

	* doc/Doxyfile-html: Remove some cruft from doxygen config file.

	  This addresses the warning about dot not found at /sw/bin/dot .

	* src/components/: infiniband/linux-infiniband.c,
	  infiniband/linux-infiniband.h, cuda/linux-cuda.c,
	  cuda/linux-cuda.h: Cleaned up some doxygen issues

	* src/components/lmsensors/linux-lmsensors.c: Removed long
	  forgotten debug outputs

	* src/papi_libpfm4_events.c: Fix minor doxygen typos.

	* src/components/vmware/vmware.c: Add params for doxygen

	* man/... : update man pages


	* doc/Doxyfile-man1: Fix a typo in a doxygen config file.


	* release_procedure.txt, doc/Doxyfile, doc/Doxyfile-everything,
	  doc/Doxyfile-html, doc/Doxyfile.utils, doc/Doxyfile-man1,
	  doc/Doxyfile-man3, doc/Makefile, doc/doxygen_procedure.txt:
	  Rework the doxygen configuration files.

	* RELEASENOTES.txt: Update for the impending release.

	* ChangeLogP421.txt, RELEASENOTES.txt: Updates for the impending


	* src/: papi.c, papi.h: Minor tweaks for doxygen errors


	* src/components/lmsensors/: Rules.lmsensors, Fixed
	  configure error message and rules link error for shared object
	  linking. Thanks Will Cohen.

	* src/components/appio/Rules.appio: Correct pathing

	* src/ctests/api.c: One minor tiny fix to check for PAPI_ENOEVNT
	  when testing PAPI_flops. If PAPI_FP_OPS does not exist on the
	  processor (like many of em), then this tests fails.


	* src/ctests/multiattach.c: Increase acceptance criteria for

	* src/, src/configure, src/, src/papi.h,
	  doc/Doxyfile, doc/Doxyfile-everything, doc/Doxyfile.utils,
	  papi.spec: Update version number to 4.2.1 in preparation for

	* src/ctests/prof_utils.c: Correct a warning on 32bit builds about
	  casting caddr_t to (long long)

	  Specifically: prof_utils.c:234: warning: cast from pointer to
	  integer of different size prof_utils.c:248: warning: cast from
	  pointer to integer of different size prof_utils.c:262: warning:
	  cast from pointer to integer of different size

	  We first cast to unsigned long and then on to long long.  ( This
	  maybe overkill, but its for a printf format string )


	* release_procedure.txt: Add the correct path for doxygen on ICL

	* src/papi_events.csv: Modify Intel Sandybridge PAPI_FP_OPS and
	  PAPI_FP_INS events to not count x87 fp instructions.

	  The problem is that the current predefines were made by adding 5
	  events.  With the NMI watchdog stealing an event and/or
	  hyperthreading reducing the numbr of available counters by half,
	  we just couldn't fit.

	  This now raises the potential for people using x87-compiled
	  floating point on Sandybridge and getting 0 FP_OPS.  This is only
	  likely if running a 32-bit kernel and *not* compiling your code
	  with -msse.

	  A long-term solution might be trying to find a better set of FP
	  predefines for sandybridge.

	* src/components/: lustre/linux-lustre.c, mx/linux-mx.c: Some
	  really minor cleanups to the lustre and mx components.


	* src/components/example/: example.c, tests/example_basic.c: Update
	  example component

	  Cleans up code, adds some more documentation, adds counter write


	* src/papi_user_events.c: Minor cleanups for user events.

	* src/libpfm4/: README, include/perfmon/pfmlib.h, lib/Makefile,
	  lib/pfmlib_amd64.c, lib/pfmlib_common.c, lib/pfmlib_priv.h: Fix
	  "conflicts" in git import of libpfm4.

	* src/libpfm4/lib/: pfmlib_amd64_fam11h.c,
	  events/amd64_events_fam11h.h: Initial revision


	* src/papi_fwrappers.c: Escape the include directives in the

	  (Cleans up doxygen )

	* src/components/README: Adding vmware to component README

	* src/components/vmware/:,
	  PAPI-VMwareComponentDocument.pdf, Rules.vmware,
	  VMwareComponentDocument.txt, configure,, vmware.c,
	  vmware.h: merge vmware branch to head

	* src/perf_events.c: Set fast_counter_read back to 0 on x86/x86_64
	  perf_events, as currently rdpmc counter access is not supported.

	  There are patches floating around that enable this (although
	  performance is still a long way from perfctr) but they will  not
	  likely be merged for a while now, and the perf_events substrate
	  will require a lot of extra code to support it once it does make
	  it into a shipping kernel.

	* src/ Remove acpi from the
	  buildbot configure script.


	* src/components/mx/:,, configure,, linux-mx.c, linux-mx.h, tests/Makefile,
	  tests/mx_basic.c, tests/mx_elapsed.c, utils/fake_mx_counters.c,
	  utils/sample_output: Re-write of the MX component

	  + Add tests + Modernize code + Remove the need to run ./configure
	  in the mx directory + Add fake mx_counters program that lets you
	  test component   on machine without myrinet installed

	* src/components/: README, acpi/Rules.acpi,
	  acpi/linux-acpi-memory.c, acpi/linux-acpi.c, acpi/linux-acpi.h:
	  Remove the ACPI component.

	  It was one of the oldest components and needed a lot of cleanup
	  work, and it turns out that the main useful event it provided
	  (temperature) isn't available on modern machines/kernels
	  (coretemp should be used instead).


	* src/perf_events.c: Restored Phil's changes that I inadvertently
	  clobbered with my last commit :(

	* src/perf_events.c: Remove a warning about an uninitialized

	* src/utils/: component.c, event_info.c, native_avail.c: Update the
	  Doxygen comments on these utilities to have the command line
	  options listed in a list like the other utils.

	* src/perf_events.c: More improvements to the read path for
	  multiplexed counters. Now the case for bad kernel behavior is
	  built in, and is not required with a #define.

	  Basically, there are situations when either enabled or running is
	  zero but not both. This could result in a divide by 0 in the
	  worst case, as was observed by Tushar Mohan in papiex. You could
	  trigger it by doing a read immediately after doing a start with
	  perf events and use a FORMAT_SCALE argument.

	  Now the logic goes, assuming mpxing.

	  1) if (running=enabled)	  return raw counter 2) if (running
	  && enabled)	scale counter by ratio 3) else	warn in debug mode
	      return raw counter

	  Apparently we need a test case that does a read immediately after
	  a start.  That's a hole.

	  Tested on brutus, core2 2.6.36

	  Here's the original report.  ------------------- Model string and
	  code	       : Intel(R) Pentium(R) M processor 1600MHz (9) Linux
	  thinkpad 2.6.38-02063808-generic #201106040910 SMP Sat Jun 4
	  10:51:30 UTC 2011 i686 GNU/Linux PAPI Version:

	  I think I ran into a bug similar to what we ran with MIPS.

	  With the latest PAPI (from CVS), on an x86 (32-bit machine), when
	  using papiex with multiplex with anything more than two events, I
	  get a floating point exception in PAPI during the PAPI_read call.
	  On enabling debugging in the substrate, I think the problem is
	  the same (namely a division by zero, because some event had a
	  zero time of running):

	  libpapiex debug: 24625,0x0,papiex_thread_init_routine Starting
	  counters with PAPI_start
	  ioctl(enable): ctx: 0x96a4bc8, fd: 3
	  ioctl(enable): ctx: 0x96a4bc8, fd: 5 libpapiex debug:
	  24625,0x0,papiex_thread_init_routine Calling PAPI_lock before
	  critical section libpapiex debug:
	  24625,0x0,papiex_thread_init_routine Released PAPI lock libpapiex
	  debug: 24625,0x0,papiex_start START POINT 0 LABEL libpapiex
	  debug: 24625,0x0,papiex_start Reading counters (PAPI_read) to get
	  initial counts SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625
	  read: fd:  3, tid: 0, cpu: -1, ret: 56
	  SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 2 1341021
	  1341021 SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
	  (papi_pe_buffer[3] 33405 * tot_time_enabled 1341021) /
	  tot_time_running 1341021
	  (papi_pe_buffer[5] 44552 * tot_time_enabled 1341021) /
	  tot_time_running 1341021
	  SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625 read: fd:  5,
	  tid: 0, cpu: -1, ret: 40
	  SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 1 214777 0
	  (papi_pe_buffer[3] 0 * tot_time_enabled 214777) /
	  tot_time_running 0

	  The above debug log is for three events: PAPI_TOT_CYC,
	  PAPI_TOT_INS and PAPI_L1_DCM. Multiplexing works with two events.
	  Adding the third (any event), gives this error. Basically, the
	  floating point exception kills the program, and PAPI_read never

	  I think I know why papiex always hits this bug: It's because
	  right after starting the counters with PAPI_start, papiex does a
	  PAPI_read to store the initial values of the counters in a tmp
	  variable. These are then subtracted from the final counter
	  values. Should we put a deliberate delay? Of course, the real bug
	  should be fixed in PAPI.  ----

	* src/utils/event_info.c: Major re-write of the papi_xml_event_info
	  program.  + Remove event code numbers, as they are not stable
	  run-to-run + Add some Doxygen comments + Remove some wrong
	  assumptions that could cause potential buffer overflows + Improve
	  usage information


	* src/components/lustre/: Rules.lustre, linux-lustre.c,
	  tests/Makefile, tests/lustre_basic.c: Finish the re-write of the
	  lustre component.

	  It would be nice if someone with access to a machine with a
	  lustre filesystem could test this for us.

	* src/: papi_internal.c, components/lustre/linux-lustre.c: Update
	  the component initialization code so that it can handle a PAPI
	  ERROR return gracefully.  Previously there was no way to indicate
	  initialization failure besides just setting num_native_events to


	* src/components/lustre/: linux-lustre.c, linux-lustre.h: First
	  pass at cleaning up the lustre component.

	  It should now properly report no events when no lustre
	  filesystems are available.


	* src/papi_events.csv: Add AMD fam12h support to the events file.
	  Right now it is just an alias to the similar fam10h event list;
	  this can be split out if necessary once we find a tester with the

	* src/libpfm4/: README, docs/man3/pfm_get_event_next.3,
	  docs/man3/pfm_get_pmu_info.3, include/perfmon/perf_event.h,
	  include/perfmon/pfmlib.h, lib/Makefile, lib/pfmlib_amd64.c,
	  lib/pfmlib_amd64_priv.h, lib/pfmlib_common.c,
	  lib/pfmlib_perf_event.c, lib/pfmlib_priv.h,
	  lib/events/intel_coreduo_events.h, lib/events/perf_events.h,
	  perf_examples/Makefile, perf_examples/perf_util.c,
	  perf_examples/perf_util.h, perf_examples/self.c,
	  perf_examples/task_smpl.c, perf_examples/x86/bts_smpl.c: Fix
	  "merge" conflicts with libpfm4 merge.

	* src/libpfm4/lib/: pfmlib_amd64_fam12h.c,
	  events/amd64_events_fam12h.h: Initial revision

	* src/papi_libpfm4_events.c: Properly use the  pfm_get_event_next()
	  iterator to find next event.

	  Without this, on AMD Fam10h some events are missed.

	  Some events are still missed due to libpfm4 bug, this will be
	  fixed once I update the libpfm4 tree included with PAPI.

	  Note, enumeration fixes like this often break things, so please
	  test if possible.

	* src/papi_events.csv: Update the coreduo (not core2) events.  Most
	  notably the FP events were wrong.

	  This, along with a forthcoming libpfm4 update, make all the
	  CTESTS pass on an old Yonah coreduo laptop I have.


	* src/ctests/api.c: Make the api test actually test PAPI_flops() as
	  it claims to do, rather than PAPI_flips().

	  Patch thanks to: Emilio De Camargo Francesquini

	* src/papi_hl.c: Fix some copy-and-paste documentation remnants in
	  the papi_hl.c file, mostly where it said FLIPS where it meant


	* src/utils/native_avail.c: Update papi_native_avail to *not* print
	  the event codes, as these are not guaranteed to be stable from
	  run to run.

	  Also fix up the formatting and print some component info too.

	  Please try and let me know if you don't like the new output.

	* src/: configure, Respect a FORCED option in


	* src/Rules.pfm4_pe: Remove perfmon.h from MISCHDRS.


	* src/: Rules.perfctr, Rules.perfctr-pfm, Rules.pfm, Rules.pfm4_pe,
	  Rules.pfm_pe, linux-lock.h, mb.h: Merry Christmas ARM users.

	  This patch fixes the SMP ARM issues reported by Harald Servat.
	  Also, adds proper header dependency checking in the Rules files.
	  People, please when you add headers, please add them to the
	  dependency lines so everything gets rebuilt properly.

	  New implementation of SMP locks are very pedantic, that is, they
	  are nost the fastest, but they do use atomics and avoid kernel

	  Passed on our 2 core ARM v7. All pthreads tests now pass, except
	  the ones that also fail in the single processor case usually due
	  to a missing event.


	  mucci@panda:~/papi.head/src$ uname -a Linux panda 3.0.0 #2 SMP
	  Fri Jul 29 16:23:54 EDT 2011 armv7l GNU/Linux

	  mucci@panda:~/papi.head/src$ hostname panda

	  mucci@panda:~/papi.head/src$ cat /proc/cpuinfo Processor: ARMv7
	  Processor rev 2 (v7l) processor: 0 BogoMIPS: 2007.19

	  processor: 1 BogoMIPS: 1965.18

	  Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU
	  implementer: 0x41 CPU architecture: 7 CPU variant: 0x1 CPU part:
	  0xc09 CPU revision: 2

	  Hardware: OMAP4 Panda board Revision: 0020 Serial:

	  mucci@panda:~/papi.head/src$ ./ctests/locks_pthreads Creating 2
	  threads 10000 iterations took 13489 us.  Running 44480 iterations
	  Expected: 88960 Received: 88960 locks_pthreads.c

	  mucci@panda:~/papi.head/src$ ./ctests/pthrtough Creating 2
	  threads for 1000 iterations each of: register create_eventset
	  destroy_eventset unregister pthrtough.c

	  mucci@panda:~/papi.head/src$ ./ctests/pthrtough2 Creating 2000
	  threads for 1 iterations each of: register create_eventset
	  destroy_eventset unregister Failed to create thread: 238
	  Continuing test with 237 threads.  pthrtough2.c

	  mucci@panda:~/papi.head/src$ ./ctests/thrspecific Thread
	  0x40ae1470 started, specific data is at 0xbea9c6d4 Thread
	  0x40021000 started, specific data is at 0xbea9c6c4 Thread
	  0x4244d470 started, specific data is at 0xbea9c6c8 Thread
	  0x4138d470 started, specific data is at 0xbea9c6d0 Thread
	  0x41c4d470 started, specific data is at 0xbea9c6cc Entry 0,
	  Thread 0x41c4d470, Data Pointer 0xbea9c6cc, Value 4000000 Entry
	  1, Thread 0x40021000, Data Pointer 0xbea9c6c4, Value 500000 Entry
	  2, Thread 0x40ae1470, Data Pointer 0xbea9c6d4, Value 1000000
	  Entry 3, Thread 0x4244d470, Data Pointer 0xbea9c6c8, Value
	  8000000 Entry 4, Thread 0x4138d470, Data Pointer 0xbea9c6d0,
	  Value 2000000 thrspecific.c			      PASSED

	  mucci@panda:~/papi.head/src$ ./ctests/krentel_pthreads
	  program_time = 6, threshold = 20000000, num_threads = 3

	  launched timer in thread 0 launched timer in thread 1 launched
	  timer in thread 3 launched timer in thread 2 [1] time = 1, count
	  = 7, iter = 5, rate = 1400.0/Kiter [2] time = 1, count = 7, iter
	  = 5, rate = 1400.0/Kiter [0] time = 1, count = 7, iter = 5, rate
	  = 1400.0/Kiter [3] time = 1, count = 7, iter = 5, rate =
	  1400.0/Kiter [1] time = 2, count = 25, iter = 16, rate =
	  1562.5/Kiter [0] time = 2, count = 25, iter = 16, rate =
	  1562.5/Kiter [3] time = 2, count = 25, iter = 16, rate =
	  1562.5/Kiter [2] time = 2, count = 25, iter = 16, rate =
	  1562.5/Kiter [1] time = 3, count = 25, iter = 16, rate =
	  1562.5/Kiter [2] time = 3, count = 25, iter = 16, rate =
	  1562.5/Kiter [0] time = 3, count = 25, iter = 16, rate =
	  1562.5/Kiter [3] time = 3, count = 25, iter = 16, rate =
	  1562.5/Kiter [1] time = 4, count = 25, iter = 16, rate =
	  1562.5/Kiter [0] time = 4, count = 25, iter = 16, rate =
	  1562.5/Kiter [3] time = 4, count = 25, iter = 16, rate =
	  1562.5/Kiter [2] time = 4, count = 25, iter = 16, rate =
	  1562.5/Kiter [3] time = 5, count = 25, iter = 16, rate =
	  1562.5/Kiter [0] time = 5, count = 25, iter = 16, rate =
	  1562.5/Kiter [2] time = 5, count = 25, iter = 16, rate =
	  1562.5/Kiter [1] time = 5, count = 26, iter = 17, rate =
	  1529.4/Kiter [2] time = 6, count = 25, iter = 16, rate =
	  1562.5/Kiter [0] time = 6, count = 27, iter = 17, rate =
	  1588.2/Kiter done krentel_pthreads.c		     PASSED


	* src/papi_libpfm_presets.c: Change PAPI_PERFMON_EVENT_FILE
	  environment variable name to PAPI_CSV_EVENT_FILE since it's not
	  just for perfmon anymore.

	* src/: configure, Open mouth, insert foot; fix
	  perfctr configure by not testing a library we have not built yet.


	* src/: configure, Missed one more place where we
	  tested perfctr != "no"

	* src/: configure, Fix a typo in the perfctr section;
	  it was causing a machine to default to perfctr when it had no
	  performance interface.  ( a centos vm image with a 2.6.18 kernel

	  Also checks that we actually have perfctr if we specify


	* src/components/cuda/:, Rules.cuda, configure,, linux-cuda.c, linux-cuda.h: Added auto-detection of
	  CUDA version to PAPI CUDA Component. Reason is, the interface has
	  changed between CUDA/CUPTI 4.0 and 4.1. PAPI now supports both
	  CUDA versions without any exposure to the users. Configure step
	  is unchanged and no additional knowledge of which CUDA version is
	  installed is required.


	* src/components/appio/: CHANGES, README, Rules.appio, appio.c,
	  appio.h, tests/Makefile, tests/appio_list_events.c,
	  tests/appio_values_by_code.c, tests/appio_values_by_name.c: [no
	  log message]


	* src/linux-timer.c: Fix compilation warning if you specify

	* src/linux-timer.c: Fix the build on Linux systems using mmtimer

	* src/linux-common.c: Update the linux MHz detection code to use
	  bogoMIPS when there is no MHz field available in /proc/cpuinfo.

	  This gives roughly correct MHz on ARM, and the MIPS workaround
	  should also still work.


	* src/components/net/linux-net.c: Fix compile errors in a debug
	  message.  (pathname didn't exist but we are working on


	* src/components/net/: linux-net.c, tests/net_values_by_code.c,
	  tests/net_values_by_name.c: Change the ping command in the net
	  tests to not use &> to redirect to NULL.

	  This would work on a system with csh, but on systems with a bash
	  shell this runs ping in the background instead, so the test
	  finishes before ping can generate any packets.

	* src/components/net/linux-net.c: Fix slight bug in the net
	  component, where a memset() had the wrong arguments.	This made
	  for weird results in the case where we start/stop quickly enough
	  that we return the initial data.

	* src/components/net/: CHANGES,, README,,
	  configure,, linux-net.c, linux-net.h,
	  tests/Makefile, tests/net_list_events.c,
	  tests/net_values_by_code.c, tests/net_values_by_name.c: Replace
	  net component with updated version written by    Jose Pedro

	   * Dynamically detects the network interfaces
	      (i.e. the ones listed in /proc/net/dev)

	  * No longer needs to fork/exec the external ifconfig command and
		parse its output.  It now reads the Linux kernel network
	  statistics	 directly from /proc/net/dev.

	  * Each network interface now has 16 events instead of 13
	  (all counters in /proc/net/dev).

	  * Adds support for PAPI_event_name_to_code()

	  * Adds a couple of small tests/examples


	* doc/Doxyfile-everything: Fix the exclude libpfm/perfctr config.


	* src/perf_events.c: Only scale when running != enabled.

	  Now verified on ig, brutus and the malta

	* src/perf_events.c: Further tuneups for mpx'ing.

	  Previous commit broke systems with valid return values from
	  perf_events for running & enabled. My attempt at scaling in long
	  long world caused an overflow which led to a negative number when
	  passed up the chain.

	  Also consolidated types... best way to avoid this stuff is to
	  start as the type you are ending as.

	  Now we use some better integer scaling...guaranteed within +-0.5%
	  of the actual scaled value of enabled / running.

	  New results on brutus: multiplex1

	  case1: Does PAPI_multiplex_init() not break regular operation?
	  PAPI_FP_INS case1:	     2739865106    600002876

	  case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS
	  case2: PAPI_TOT_CYC  PAPI_FP_INS case2:	  2739678237

	  case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS
	  case3: PAPI_TOT_CYC  PAPI_FP_INS case3:	  2739847832

	  case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added
	  2737832980	600013404

	  case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
	  Added PAPI_FP_INS read @start counter[0]: 7106 read @stop
	  counter[0]: 2740387017 difference  counter[0]: 2740379911  read
	  @start counter[1]: 0 read @stop  counter[1]: 600017169 difference
	   counter[1]: 600017169  multiplex1.c


	* src/components/cuda/linux-cuda.c: For the CUDA Component,
	  PAPI_read() now accumulates event values. This has to be
	  explicitly done in PAPI because CUPTI automatically resets all
	  counter values to 0 after a read. (PAPI_start()/stop() continues
	  to reset the values to 0)

	* src/perf_events.c: Last of the multiplex fixes to perf events.
	  The root of all evil was this:

	  counts[i] =					      ( uint64_t )
	  ( ( double ) buffer[count_idx] * ( double )

	  buffer[get_total_time_enabled_idx(  )] /
						 ( double )

	  buffer[get_total_time_running_idx(  )] ) ; In addition to
	  improper casting to uints... (papi returns int64s), using
	  floating point arith is a no-no. Plus this resulted in divide by


	  SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd:  3,
	  tid: 0, cpu: -1, buffer[0-2]: 0x6cba, 0x0, 0x0, ret: 24
	  SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd:  4,
	  tid: 0, cpu: -1, buffer[0-2]: 0x23, 0x0, 0x0, ret: 24
	  SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd:  3,
	  tid: 0, cpu: -1, buffer[0-2]: 0x6de72b5d, 0x8ae0fa80, 0x8ae0fa80,
	  ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read:
	  fd:  4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b46b, 0x8ae0fa80,
	  0x8ae0fa80, ret: 24

	  So kernel is good, but errors in multiplexed scaling.

	  case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
	  Added PAPI_FP_INS read @start counter[0]: 9223372034707292159
	  read @stop  counter[0]: 1843791732 difference  counter[0]:
	  -9223372032863500427	multiplex1.c
	  FAILED Line # 389

	  With fix:

	  SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd:  3,
	  tid: 0, cpu: -1, buffer[0-2]: 0x6782, 0x0, 0x0, ret: 24
	  SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd:  4,
	  tid: 0, cpu: -1, buffer[0-2]: 0x0, 0x0, 0x0, ret: 24
	  SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd:  3,
	  tid: 0, cpu: -1, buffer[0-2]: 0x6de725dc, 0x8ae0fa80, 0x8ae0fa80,
	  ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read:
	  fd:  4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b400, 0x8ae0fa80,
	  0x8ae0fa80, ret: 24 read @start counter[0]: 26498 read @stop
	  counter[0]: 1843865052 difference  counter[0]: 1843838554  read
	  @start counter[1]: 0 read @stop  counter[1]: 80000000 difference
	  counter[1]: 80000000
	  Called with count == 0
	  shutdown  multiplex1.c			   PASSED

	  New code is vastly simpler and smaller and checks for bad kernel

		int64_t tot_time_running =
	  papi_pe_buffer[get_total_time_running_idx(  )];
		int64_t tot_time_enabled =
	  papi_pe_buffer[get_total_time_enabled_idx(  )];
	  #ifdef BRAINDEAD_MULTIPLEXING       if (tot_time_enabled == 0)
	       tot_time_enabled = 1;	   if (tot_time_running == 0)
	    tot_time_running = 1; #else       /* If we are convinced this
	  platform's kernel is fully operational, then this stuff will
	  never happen. If it does,	     then BRAINDEAD_MULTIPLEXING
	  needs to be enabled. */	if ((tot_time_running == 0) &&
	  (papi_pe_buffer[count_idx])) {	 PAPIERROR("This platform
	  has a kernel bug in multiplexing, count is %lld (not 0), but time
	  running is 0.\n",papi_pe_buffer[count_idx]);	   return
	  PAPI_EBUG;	   }	   if ((tot_time_enabled == 0) &&
	  (papi_pe_buffer[count_idx])) {	 PAPIERROR("This platform
	  has a kernel bug in multiplexing, count is %lld (not 0), but time
	  enabled is 0.\n",papi_pe_buffer[count_idx]);	 return PAPI_EBUG;
	       } #endif       pe_ctl->counts[i] =
	  (papi_pe_buffer[count_idx] * tot_time_enabled) /

	  Also, renamed all instances of 'buffer' to papi_pe_buffer because
	  buffer is a global variable on MIPS/Linux/libc. Yikes! (gdb)
	  whatis buffer type = struct utmp *

	* src/ctests/multiplex1.c: Made sure that PAPI_TOT_CYC is the first
	  event added to multiplexing event set.

	  This will demonstrate the bug in perf_event multiplexing
	  arithmetic in case5 on MIPS and other perf_event subsystems that
	  likely have some breakage in the kernels handling of
	  multiplexing. The common bug is that the perf_event subsystem
	  does not fill in the second and third elements of the 24 byte
	  read that gets returned from the kernel. These values are
	  time_enabled and time_running. MIPS as of 3.0.3 just fills this
	  in after a HZ tick has happened. Workarounds are pretty simple in
	  the low level layer...

	  A buggy output looks like this (3.0.3 MIPS/Linux Big Endian)

	  -bash-4.1$ ./ctests/multiplex1 case1: Does PAPI_multiplex_init()
	  not break regular operation? Added PAPI_TOT_CYC Added PAPI_FP_INS
	  case1: PAPI_TOT_CYC  PAPI_FP_INS case1:	  1843775252

	  case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS
	  case2: PAPI_TOT_CYC  PAPI_FP_INS case2:	  1843773254

	  case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS
	  case3: PAPI_TOT_CYC  PAPI_FP_INS case3:	  1843772919

	  case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added
	  1843773959	 80000037

	  case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
	  Added PAPI_FP_INS read @start counter[0]: 9223372034707292159
	  read @stop  counter[0]: 1843784577 difference  counter[0]:
	  -9223372032863507582	multiplex1.c
	  FAILED Line # 389 Error: Difference in start and stop resulted in
	  negative value!


	* src/components/cuda/: linux-cuda.c, linux-cuda.h: Updated CUDA
	  component for CUPTI 4.1 (RC1). Note, SetCudaDevice() should now
	  work with the latest CUDA 4.1 version.


	* src/components/coretemp/linux-coretemp.c: Update coretemp to
	  better handle sparse numbering of the inputs.

	* doc/Doxyfile-everything: Exclude the libpfm* and perfctr-*
	  directories from consideration when generating Doxygen docs.

	* src/: papi.h, components/acpi/linux-acpi.h,
	  components/mx/linux-mx.h, components/net/linux-net.h: Place a
	  space in < your name here > to cleanup doxygen warnings.

	* src/perf_events.c: Only perf event systems that have FAST counter
	  reads and FAST hw timer access are x86...

	* src/linux-common.c: MIPS clock and Linux fixup code

	* src/components/example/example.c: A little more documentation on
	  which of the component vector function pointers are relevant.

	* src/papi_vector.c: Tested the dummy get_{real,virt}_{cyc,usec}
	  functions on zeus, they appear to work.

	* src/components/example/tests/example_multiple_components.c:
	  Another fix to properly skip the multiple component case if CPU
	  component not available.

	* src/components/example/tests/example_multiple_components.c: Skip
	  the test if no CPU component enabled, rather than fail.


	* src/components/example/example.c: Free example_native_table with
	  papi_free, glibc didn't like it if we just called free.  (we
	  allocate it with papi_calloc)

	* man/...: Version number bump. (since the pages are
	  quantifiably different from those released in 4.2.0 )

	* doc/: Doxyfile, Doxyfile-everything, Doxyfile.utils: Bump version
	  number in the doxygen config files.

	* src/components/example/example.c:
	  _papi_example_shutdown_substrate does not have any arguments.

	* src/components/net/linux-net.c: Include ctype.h for isspace().

	* release_procedure.txt: release_procedure now reflects the correct
	  version of doxygen to use.

	* src/ Do not always
	  configure with not cpu counters, allow this to be passed in.
	  Allows us to use one script for both types of builds we test.

	  src/ Create a script for
	  buildbot to configure with several components.

	  Buildbot runs all commandline arguments through a sanitization
	  before passing them to sh.  Thus --with-configure="a b c" =>
	  '--with-configure="a b c"' which is bad. has been instructed to remove this file.

	* man/...:  Rebuild the manpages with doxygen 1.7.4 to
	  remove the 's at the end of sentances.

	  The html output looks clean.


	* src/: multiplex.c, papi.c: Fix some gcc-4.6 compile warnings
	  complaining that retval was being set but not used.

	* src/papi.c: Add some extra comments to the PAPI_num_cmp_hwctrs()
	  code that describe its limitations a bit better.


	* src/: ctests/overflow_allcounters.c, testlib/test_utils.c: Add
	  lots of debugging to make results of overflow_allcounters test a
	  bit more clear.

	* src/components/coretemp/tests/coretemp_pretty.c: coretemp_pretty
	  wasn't printing the description for fan inputs.

	  The result on an apple MacBook Pro (running Linux) now looks like

	  Trying all coretemp events Found coretemp component at cid 2
	  hwmon0.temp1_input	     value: 33.50 degrees C, applesmc
	  module, label TB0T hwmon0.temp2_input  value: 33.50 degrees C,
	  applesmc module, label TB1T hwmon0.temp3_input  value: 32.00
	  degrees C, applesmc module, label TB2T hwmon0.temp4_input  value:
	  0.00 degrees C, applesmc module, label TB3T hwmon0.temp5_input
	  value: 62.25 degrees C, applesmc module, label TC0D
	  hwmon0.temp6_input  value: 54.25 degrees C, applesmc module,
	  label TC0F hwmon0.temp7_input  value: 57.25 degrees C, applesmc
	  module, label TC0P hwmon0.temp8_input  value: 69.00 degrees C,
	  applesmc module, label TG0D hwmon0.temp9_input  value: 58.00
	  degrees C, applesmc module, label TG0F hwmon0.temp10_input
	   value: 51.25 degrees C, applesmc module, label TG0H
	  hwmon0.temp11_input	      value: 58.25 degrees C, applesmc
	  module, label TG0P hwmon0.temp12_input	 value: 60.75
	  degrees C, applesmc module, label TG0T hwmon0.temp13_input
	   value: 62.25 degrees C, applesmc module, label TN0D
	  hwmon0.temp14_input	      value: 59.25 degrees C, applesmc
	  module, label TN0P hwmon0.temp15_input	 value: 49.00
	  degrees C, applesmc module, label TTF0 hwmon0.temp16_input
	   value: 54.00 degrees C, applesmc module, label Th2H
	  hwmon0.temp17_input	      value: 58.75 degrees C, applesmc
	  module, label Tm0P hwmon0.temp18_input	 value: 31.50
	  degrees C, applesmc module, label Ts0P hwmon0.temp19_input
	   value: 44.25 degrees C, applesmc module, label Ts0S
	  hwmon0.fan1_input   value: 1999 RPM, applesmc module, label Left
	  side hwmon0.fan2_input     value: 2003 RPM, applesmc module,
	  label Right side coretemp_pretty.c		     PASSED

	* src/components/coretemp/: linux-coretemp.c, linux-coretemp.h,
	  tests/coretemp_pretty.c: Make the coretemp code a bit pickier
	  about which events it supports.  Add descriptions to the events.
	  Also add support for Voltage (in*) events.

	  On an amd14h machine I have access to, coretemp_pretty now

	  Trying all coretemp events Found coretemp component at cid 2
	  hwmon0.in1_input value: 1.31 V, it8721 module, label ?
	  hwmon0.in2_input value: 2.22 V, it8721 module, label ?
	  hwmon0.in3_input value: 3.34 V, it8721 module, label +3.3V
	  hwmon0.in4_input value: 1.02 V, it8721 module, label ?
	  hwmon0.in5_input value: 1.52 V, it8721 module, label ?
	  hwmon0.in6_input value: 1.13 V, it8721 module, label ?
	  hwmon0.in7_input value: 3.26 V, it8721 module, label 3VSB
	  hwmon0.in8_input value: 3.17 V, it8721 module, label Vbat
	  hwmon0.temp1_input value: 28.00 degrees C, it8721 module, label ?
	  hwmon0.temp2_input value: -128.00 degrees C, it8721 module, label
	  ? hwmon0.temp3_input value: -128.00 degrees C, it8721 module,
	  label ? hwmon0.fan1_input value: 0 RPM hwmon0.fan2_input value:
	  1320 RPM hwmon1.temp1_input value: 33.00 degrees C, jc42 module,
	  label ? hwmon2.temp1_input value: 31.75 degrees C, jc42 module,
	  label ? hwmon3.temp1_input value: 53.00 degrees C, radeon module,
	  label ? hwmon4.temp1_input value: 53.12 degrees C, k10temp
	  module, label ? coretemp_pretty.c		    PASSED

	* src/components/coretemp/: linux-coretemp.c,
	  tests/coretemp_pretty.c: Cut and paste error slipped in to that
	  last commit.	Fixes a build issue.

	* src/components/coretemp/: linux-coretemp.c, tests/Makefile,
	  tests/coretemp_pretty.c: Clean up coretemp with same cleanups
	  done in example component.

	  Add a new test, "coretemp_pretty" that prints coretemp results in
	  a more user-friendly way.

	* man/:... Rebuild the man pages with a newer version of
	  doxygen. ( older versions of doxygen had a nasty bug in man
	  output. )

	  Also reworked the utilities documentation to remove pages for the
	  files.  Thanks to Jose Pedre Oliveria for pointing this out.

	* src/components/example/tests/: Makefile,
	  example_multiple_components.c: Add a test that makes sure you can
	  have active EventSets on multiple components at the same time.

	* release_procedure.txt: Change PATH specification to include tcsh
	  syntax; other minor syntax corrections.

	* src/components/example/example.c: More cleanups and documentation
	  for the example component.


	* src/components/example/example.c: Some more major overhaul of the
	  example component.  A lot more documentation, plus make is behave
	  a lot more like a real component would.

	* doc/Doxyfile.utils: Turn off undocumented warnings for the utils.
	  doxygen run.

	* src/utils/: avail.c, command_line.c, cost.c, event_chooser.c,
	  multiplex_cost.c: Add spaces to the comments so doxygen doesn't
	  think <event> is an xml tag.


	* src/utils/: avail.c, clockres.c, command_line.c, component.c,
	  cost.c, decode.c, error_codes.c, event_chooser.c, mem_info.c,
	  multiplex_cost.c, native_avail.c: Remove the @file directive from
	  the doxygen comment blocks for the utilities.  This cleans up the
	  generated man pages. ( we nolonger build *.c.1 )

	* src/components/example/: example.c, tests/example_basic.c:
	  Clarify in the example component that ->reset only gets called if
	  an eventset is currently running.

	  Extend the example_basic test to test PAPI_reset()

	* release_procedure.txt: Fix a maketarget typo.

	* release_procedure.txt: We now have a good version of doxygen
	  installed on most icl run machines.  (
	  /mnt/scratch/sw/doxygen- )

	* doc/doxygen_procedure.txt: [no log message]

	* release_procedure.txt: Update release_procedure to inform how to
	  update the website documentation link.


	* RELEASENOTES.txt: Correct the RELEASENOTES for some things I
	  missed when reviewing it.

	  It's Offcore events that we don't support on

	  Also the power6 libpfm4 bug that was listed as an outstanding bug
	  was fixed a long time ago.

	* src/components/coretemp/linux-coretemp.c: Have coretemp set the
	  num_native_events field.

	* src/components/example/tests/example_basic.c: Update example test
	  to print num_native_events, to help debug issues with other
	  components not updating the value.

	* src/components/coretemp/: linux-coretemp.c, linux-coretemp.h: Fix
	  typo enent -> event Also remove residual LMSENSOR mentions from
	  the coretemp header.

	* src/papi_libpfm4_events.c: Fix two memory leak locations.

	  The attached patch reduces the number of lost memory blocks
	  reported by valgrind from 234 to 39.	It frees the memory
	  allocated by the 4 strdups and the calloc functions in

	  Patch by: José Pedro Oliveira

	* src/components/cuda/tests/Makefile: The change to pass the PAPI
	  CC/CFLAGS to the component tests broke the nvidia test as it
	  wants CC to be nvcc.	So update that Makefile to use nvcc


	* src/components/example/tests/example_basic.c: Improve the
	  example_basic component test to be much more comprehensive.

	* src/components/example/: example.c, tests/HelloWorld.c,
	  tests/Makefile, tests/example_basic.c: Cleanup the example test.
	  Fix various mistakes in the comments as well as add better error

	  Also rename the "HelloWorld" test to "example_basic"

	* src/components/coretemp/tests/Makefile: The coretemp_test target
	  was example_test due to cut-and-paste error.

	  Patch from Jose Pedro Oliveira

	* src/ Add a component_tests dependency so that the
	  component_tests are made during a make -j build

	* src/ Make sure the component test makefiles get
	  passed the CC and CFLAGS definitions.

	* src/components/coretemp/: linux-coretemp.c, tests/Makefile,
	  tests/coretemp_basic.c: Fix up the coretemp component some more.
	  Make sure the enumerate function returns PAPI_ENOEVNT if no
	  events are available.

	  Update the Makefile so it has proper dependencies.

	  Update the test so it prints the first event available.  (The
	  latter based on a patch from Jose Pedro Oliveira)

	* src/: solaris-ultra.c, ctests/all_native_events.c: The
	  solaris-ultra substrate was still broken.  This is because recent
	  changes to component bind time explictly used the ->set_domain()
	  call, and this vector was not set up in solaris_ultra.

	  Also made the all_native_events test report the returned error
	  value to aid in debugging problems like this in the future.