Blob Blame History Raw

	* src/Rules.pfm_pe: The --with-bitmode parameter was not being
	  passed along to libpfm3,  so it was not possible to build
	  perf_event PAPI on non-default bitmodes.  This change passes
	  along the $(BITFLAGS) value to the libpfm3 make invocation.

	* src/: papi_pfm_events.c, papi_pfm_events.h, perf_events.c: The
	  perf_events code was using __u64 instead of uint64_t and this was
	  causing a warning when compiling for 64-bit Power.

	* src/libpfm-3.y/lib/amd64_events_fam15h.h: Added Robert Richter's
	  patch with a few new events for AMD Family 15h.


	* INSTALL.txt: Load the 'gcc' module not 'gnu' module for Cray.

	* INSTALL.txt: Update the install instructions for Cray XT and XE

	* src/ctests/: multiattach.c, multiattach2.c: Make the multiattach
	  and multiattach2 failures into warnings.

	  I have a proposed fix that makes the failures go away, but it has
	  not been tested much and also causes some new fcntl() error
	  messages under perfctr.

	  So temporarily make the tests only warn for the release and I'll
	  work on a proper fix for after.  The behavior in these tests has
	  been broken for a long time so it is not a recent regression.

	* src/papi_memory.c: Band-aid for the leak debugging statement in
	  papi_memory.c on NO_VARARG_MACRO systems.  (aix currently)


	* src/ctests/multiattach.c: Had the division backwards on the

	* src/ctests/multiattach.c: Update the multiattach test to fail if
	  the results aren't in the proper ratio.  This was failing on
	  perf_event kernels but since the results weren't checked   it was
	  never reported as an error.

	* delete before release

	* ChangeLogP413.txt: First cut change log for the 4.1.3 release.
	  Nothing's frozen yet...

	* Perl script to generate change logs. Keeping it with
	  the project makes life easier.

	* INSTALL.txt: Change INSTALL to reflect that we support power7.

	* src/, src/configure, src/, src/papi.h,
	  doc/Doxyfile, doc/Doxyfile-everything, papi.spec: Modfy version
	  number for pending release:


	* src/: papi_internal.c, papi_internal.h, sys_perf_event_open.c,
	  ctests/attach2.c: Cleanup the _papi_hwi_cleanup_eventset()
	  function in papi_internal.c

	  This function was re-using existing functionality to remove one
	  event at a  time before cleaning out the eventset.  This is not
	  strictly necessary  and was breaking on perf_event eventsets that
	  were attached to  finished processes, as a call to
	  update_control_state() would close/reopen  the perf_event fd,
	  failing when the finished process went away after the close.

	  The new code removes all events from the eventset in one go
	  before  calling update_control_state.

	  The change here also updates code comments as necessary, as some
	  of the  code in papi_internal.c can be a bit obscure.

	  It also updates some of the comments in ctests/attach2.c to give
	  better  debugging info.


	* src/threads.c: Uncomment the actual signal passing functionality
	  in _papi_hwi_broadcast_signal

	* src/papi_debug.h: Include files added to papi_debug.h

	* src/components/README: Added detailed instructions on how to
	  build PAPI with the CUDA component


	* src/threads.c: Move an escape test to the outer loop in

	  This cleans up an infinite loop where before we would only break
	  out of the component look, not the thread list walking loop.

	* src/: papi.c, papi_internal.c, papi_internal.h, papi_protos.h:
	  Clean up papi_internal.c so that functions not used outside are
	  marked static.

	* src/: papi_pfm_events.c, papi_preset.c, pmapi-ppc64_events.c:
	  papi: Fix some memory leaks

	  Signed-off-by: Robert Richter <>

	* src/perf_events.c: papi: Make functions and variables static in

	  All this functions and variables are not used outside
	  perf_events.c.  Making them static.

	  Signed-off-by: Robert Richter <>

	* src/papi_pfm_events.c: papi: Fix crash in error handler for

	  Signed-off-by: Robert Richter <>

	* src/utils/native_avail.c: papi: Fix error check in native_avail.c

	  Signed-off-by: Robert Richter <>


	* src/libpfm-3.y/: include/perfmon/pfmlib_amd64.h,
	  lib/pfmlib_amd64.c: AMD architectural PMU could not be detected
	  for family 15h as there was a strict check for AMD family 10h.
	  Enabling it now for all families from 10h.

	  Signed-off-by: Robert Richter

	* src/libpfm-3.y/lib/amd64_events_fam15h.h: There is no kernel
	  support for AMD family 15h northbridge events, disabling them in
	  libpfm3 to not report them as available native events.

	  Patch from Robert Richter

	* src/: configure,, linux-common.c: Add some extra
	  debug messages for better tracking of the   --with-assumed-kernel
	  configure option.


	* src/: configure,, linux-common.c: Add a new
	  configure option:   --with-assumed-kernel=<ver> This allows you
	  to specify a kernel revision to (instead of being   autodetected
	  with uname) for perf_event workaround purposes.  With   this you
	  can force PAPI to not use workarounds on kernels with
	  backported versions of perf_event features.


	* src/:, configure,, papi_debug.h,
	  papi_internal.h, sys_perf_event_open.c: Add debugging to
	  sys_perf_event_open.c to show exactly what   values are being
	  passed to the perf_event_open syscall.


	* src/:, ctests/attach2.c, ctests/attach3.c: Fix for
	  finding attach_target with execlp to search the path.


	* src/: Rules.pfm, configure,, linux-ia64-pfm.h,
	  linux-ia64.c, linux-ia64.h, perfmon-ia64-pfm.h, perfmon-ia64.c,
	  perfmon-ia64.h, perfmon.h: Rename the linux-ia64-* files to be
	  called perfmon-ia64-*

	  This is a more descriptive name, and makes it more obvious what
	  the files are for.

	* src/libpfm-3.y/: include/perfmon/pfmlib_amd64.h,
	  lib/pfmlib_amd64.c, lib/pfmlib_amd64_priv.h: Patch to have
	  libpfm3 use 6 counters on Interlagos.

	  Patch provided by Robert Richter

	* src/linux-memory.c: Fix the POWER cache detection routines to
	  work properly on POWER7.

	  Patch provided by Corey Ashford

	* src/: configure, Have configure check for ifort if
	  gfortran, etc, not found.

	  Patch by Gary Mohr

	* src/ctests/johnmay2.c: Update the validation message on the
	  ctests/johnmay2.c test to be	 less confusing.  Also add some
	  comments to the source code.

	  Problem reported by Steve Kaufmann.


	* src/ctests/: multiattach2, multiattach2.c: Remove the
	  accidentally added ctests/multiattach2 and add instead the proper

	* src/ components_config.h is cleaned out with make
	  clobber, not make clean this should fix the build bot issues.

	* src/ctests/: Makefile, attach3.c, multiattach.c, multiattach2,
	  zero_attach.c: Minor typos in comments. Discovered another bug in
	  attach code demonstrated by multiattach2. You cannot have an
	  eventset running that is self counting as well as one that is
	  attached. PAPI thinks that both are running and throws an error.

	* src/perf_events.c: We must update the control state after
	  attaching for perf_events, zero_attach now passes

	* src/ctests/: Makefile, attach2.c, attach3.c, do_loops.c: This
	  commit adds testing of attaching to fork/exec'd executables.
	  zero_attach and multiattach just test forks. This also modifies
	  do_loops.c to be able to generate a test driver when
	  -DDUMMY_DRIVER is defined so we can use it to generate flops as a
	  sub process.

	  Attach2 and attach3 have one important difference.

	  Attach3 does a 'assign component' before attaching and then
	  adding events.  Attach2 does not assign a component and thus
	  should inherit the default component.

	  The current bug in PAPI is that: * The default component is not
	  assigned until you add an event.  * However, attaching an
	  eventset without events is perfectly valid, but we get an error.

	  Possible solution is that the default component should be
	  assigned at create time.


	* src/ctests/multiattach.c: Make sure the two processes compute
	  different numbers of flops to test attach


	* src/power7_events.h: Turns out Maynard Johnson answered my
	  questions about the native_name enum back in December.  ( this is
	  a correct version of the events file )

	  As I found out, the AIX substrates do not use the native_name
	  enum.  But a hypothetical perfctr build would.


	* src/ Clear out the components_config.h file on make

	* src/: aix.c, power7_events.h: Initial support for power7 aix, the
	  events file is a copy of power6_events.h with the number of
	  groups changed. The native_name enum is unchanged, but unused?


	* src/ Commited wrong

	* src/: configure, Clean up setting bitmode flags for
	  non-gcc (xlc in this case) compilers.

	* src/papi_events.csv: Change the Nehalem PAPI_FP_OPS event from

	  The new event gives the same results as the previous one, with
	  the added benefit of also counting 32-bit compiled x87 fp ops

	  More detailed analysis can be found here:


	* src/utils/multiplex_cost.c: Turns out that getopt_long isn't as
	  standard as I had hoped.

	  Convert multiplex_cost to use only getopt.  -s disables software
	  multiplexing -k disables kernel multiplexing


	* src/: configure, utils/Makefile, utils/multiplex_cost.c, Multiplex_cost utility.

	* src/utils/: Makefile, cost.c, cost_utils.c, cost_utils.h: Split
	  off the statistics functions from cost.


	* src/: run_tests_exclude_cuda.txt, Exclude some
	  fork/thread tests from fulltest that won't run with CUDA (reason:
	  cannot invoke same GPU from different threads)


	* src/utils/cost.c: Add a test for DERIVED_[ADD | SUB ] events to


	* src/components/cuda/linux-cuda.c: all_native_events ctest failed
	  when CUDA Component is used. Reason: removing cuda events from
	  the eventset is currently not supported. According to the NVIDIA
	  folks this is a bug in cuda 4.0rc and will be fixed in rc2. Note
	  also, several fork and thread tests fail since it's illegal to
	  invoke the same GPU device from different processes / threads. We
	  need a mechanism that allows us to run tests for the CPU
	  component only.


	* src/utils/cost.c: Add a test case to cost util, look for a
	  derived-postfix event and if found, give timing information for
	  read calls to it.

	  This is just a first run at the test, Core2 and AMD have
	  candidate events and the test runs, but that is the extent of my
	  testing so far.


	* src/components/: README, cuda/, cuda/Rules.cuda,
	  cuda/configure, cuda/, cuda/linux-cuda.c,
	  cuda/linux-cuda.h: Added CUDA component, a hardware performance
	  counter measurement technology for the NVIDIA CUDA platform which
	  provides access to the hardware counters inside the GPU. PAPI
	  CUDA is based on CUPTI support - shipped with CUDA 4.0rc - in the
	  NVIDIA driver library. In any environment where the CUPTI-enabled
	  driver is installed, the PAPI CUDA component can provide detailed
	  performance counter information regarding the execution of GPU

	* src/components/: coretemp/linux-coretemp.c,
	  lustre/linux-lustre.c: Add some missing includes to components.

	  Thanks to Will Cohen for reminding us warnings matter. :)

	* src/: configure,, perf_events.c: The SYNC_READ
	  workaround in perf_events.c was being handled at compile time,
	  rather than at run-time like all of our other workarounds.

	  Change it to be like our other kernel-version related


	* src/ctests/multiplex1_pthreads.c: Between 4.0.0 and 4.1.0 a
	  pthread_exit() call was added to ctest/multiplex1_pthreads.c that
	  caused the test to exit partway through the test and without
	  doing a proper PASS/FAIL result.

	  This changeset backs out that change, though the original change
	  was marked as a memory leak fix so a different fix may be needed.

	  Reported by Steve Kaufmann

	* src/linux-timer.c: Add missing header needed by
	  --with-virtualtimer=times build.

	  Reported by Steve Kaufmann


	* src/: papi_pfm_events.c, perf_events.c: Fix broken Linux/PPC
	  build caused by my pfm_events code movement changes.


	* src/: papi_pfm_events.c, papi_pfm_events.h, perfctr-x86.h: My
	  changes yesterday broke the perfctr build.  This should fix it.


	* src/ctests/inherit.c: Make the inherit test respect TESTS_QUIET
	  so that it does not print extra output during a run

	* src/ctests/overflow.c: Fix missing newline in the overflow

	  Reported by Gary Mohr

	* src/: papi_pfm_events.c, papi_pfm_events.h, perf_events.c: Move
	  the libpfm3 specific functions from perf_events.c into

	* src/perf_events.c: Separate the libpfm3-specific code from
	  _papi_pe_init_substrate() and _papi_pe_update_control_state()
	  into their own functions.  This will allow eventual code sharing
	  and also make the libpfm4 merge easier.

	* src/perf_events.c: Some minor cleanups I found after reviewing
	  the inherit merge.	+ Add missing "static inline" to the new
	  kernel-version codes	  + Remove duplicated test for Pentium 4
	  + Fix a warning only seen if --with-debug is enabled

	* src/: papi.c, papi.h, papi_internal.h, perf_events.c,
	  perf_events.h, ctests/Makefile, ctests/inherit.c,
	  ctests/test_utils.c: Merging Gary Mohr's re-implementation of
	  inherit into the code base.  Thanks, Gary!


	* src/: any-null.h, freebsd.h, linux-bgp.h, linux-common.c,
	  linux-common.h, linux-context.h, linux-ia64.c, linux-ia64.h,
	  linux-lock.h, linux-memory.c, linux-ppc64.h, linux-timer.c,
	  papi_internal.h, papi_pfm_events.c, perf_events.c, perf_events.h,
	  perfctr-x86.h, perfctr.c, perfmon.h, solaris-niagara2.h,
	  solaris-ultra.h, solaris.h, x86_cache_info.c: Move some more
	  duplicated OS common code (in this case the locking code and the
	  context accessing code) out of the various substrate include
	  files and into a common location.


	* src/perf_events.c: Separate out the kernel-version dependent
	  checks and group them together near the beginning of the code.
	  This not only allows us to easily see which routines are
	  kernel-version dependent, but it makes it easier to disable the
	  checks one-by-one when debugging kernel-version related issues
	  like those found with the inherit patches.


	* src/papi_internal.c: Extend _papi_hwi_cleanup_eventset to free
	  memory and better cleanup after us.


	* src/papi.c: PAPI_assign_eventset_component changed; refuses to
	  reassign components.


	* src/: papi_events.csv, libpfm-3.y/include/perfmon/pfmlib.h,
	  libpfm-3.y/lib/pfmlib_amd64_priv.h: Add support for AMD Family
	  15h processors.  Also adds suport for Family 10h RevE

	  Patches provided by Robert Richter

	* src/utils/native_avail.c: Modify papi_native_avail to properly
	  handle event names with libpfm4-style  "::" separators in them.


	* src/ make install-doxyman will build/install the
	  doxygen version of the manpages.

	  Note that these pages are very rough right now, much work is
	  needed to get them to be a drop in replacement for the current
	  man pages. (mostly formatting related/use related issues, eg man
	  PAPI_start will not work yet; the content is there.)

	* doc/Makefile: Add install target for doxygen generated man pages.


	* src/: perfctr-x86.c, perfctr.c: perfctr-2.6.42 introduced
	  PERFCTR_X86_INTEL_WSTMR PAPI added support for
	  PERFCTR_X86_INTEL_WSMR notice the missing T

	  Fix PAPI to use the proper define.  This should fix Westmere
	  support on perfctr kernels.


	* src/: papi_protos.h, papi_vector.c, papi_vector.h,
	  papi_vector_redefine.h: Added function pointer destroy_eventset
	  to the PAPI vector table. Needed for the CUDA Component to
	  disable CUDA eventGroups, to destroy floating CUDA context, and
	  to free perfmon hardware on the GPU. (Note: the CUDA Component
	  cannot be released yet since we are still under NDA with NVIDIA.
	  Stay tuned.)


	* src/x86_cache_info.c: The cpuid leaf2 code was printing a message
	  to stderr if leaf4 was needed (only happens on Westmere
	  currently).  Change this to be a MEMDBG() debug message instead.


	* src/: papi_events.csv, perfctr-x86.c: perfctr-x86 was reporting
	  "Core i7" instead of "Nehalem".  i7 can mean Westmere or Sandy
	  Bridge too, so change the code to properly report Nehalem.


	* src/ctests/all_native_events.c:
	  Fix this ctest. It failed when the package was built with several
	  components because the eventset was reused and failed to add
	  events that were not from the first component.

	  In order to fix it, I recreate & destroy the eventset when the
	  current event does not belong to the previous component.


	* src/: configure,, linux-timer.c, perfmon.c: Fix Cray
	  CLE build.

	* src/: configure, Putting -Wall in cflags now
	  requires CC = gcc

	* src/: aix.c, freebsd.c, linux-bgp.c, linux-common.c,
	  linux-memory.c, linux-memory.h, papi.c, papi_protos.h,
	  papi_vector.c, papi_vector.h, solaris-niagara2.c,
	  solaris-ultra.c, windows-common.c, windows-memory.c: Change the
	  paramaters passed to update_shlib_info() to match better with
	  those passed to get_system_info().  This only affects the
	  substrates, outside users of PAPI will not notice this change.


	* src/: configure, Make sure that aix gets -g.

	* src/: configure, Give everyone else -g when
	  configuring with debug.

	  To wit, we pass gcc -g3 but neglected platforms where CC!=gcc.

	* src/aix.c: First run at supporting power7.  NOTE: this code is
	  only good for getting event listings eg papi_native_avail,

	  passing PM_GET_GROUPS causes our code to segfault later on, a
	  buffer overflow I'm still tracking down.

	* src/perfctr-x86.c: Accidentally converted a function to _perfctr_
	  that should have stayed _linux_.

	* src/: perfctr-x86.c, perfctr.c: Rename the various perfctr
	  functions to be _perfctr_ rather than _linux_.  This way _linux_
	  is reserved for the common functions used by all.

	* src/: linux-common.c, linux-memory.c, linux-timer.c,
	  perf_events.c, windows-common.c, windows-memory.c,
	  windows-timer.c: Split the WIN32 specific code out from the new
	  linux common code.

	  In most cases very little code was shared (it tended to be a big
	  #ifdef block) and it is confusing to have windows-specific code
	  in files named linux-*


	* src/linux-timer.c: Fix a compile error that only shows up on PPC.

	* src/linux-timer.c: Fix compile warning if mmtimer is enabled.

	* src/perfctr-x86.c: Missing comma in the perfctr code.

	* src/:, aix.c, configure,,
	  hwinfo_linux.c, linux-bgp.c, linux-common.c, linux-common.h,
	  linux-ia64.c, linux-timer.c, linux-timer.h, papi_vector.h,
	  perf_events.c, perfctr-x86.c, perfctr.c, perfmon.c,
	  solaris-niagara2.c, solaris-ultra.c: One last batch of
	  consolidation changes.

	  This one moves get_system_info and get_cpu_info into
	  linux-common.c, plus moves some other routines from perf_events.c
	  there that are shared by the future libpfm4 version.

	  Some non-linux substrates are touched here; these are just short
	  fixes to   make sure the get_system_info() function pointed to by
	  the papi_vector   has the same format on all substrates.

	* src/:, configure,, linux-memory.c,
	  linux-memory.h, perf_events.c, perfctr-x86.c, perfctr.c,
	  perfmon.c: Move the various Linux update_shlib_info() functions
	  into a common place.

	* src/:, linux-timer.c, linux-timer.h, perf_events.c,
	  perfctr-x86.c, perfctr.c, perfmon.c: Move the various
	  timer-related functions to linux-timer.c This gets rid of the
	  duplicated code spread throughout the substrates.


	*, release_procedure.txt: Updated the
	  release docs with what I learned when making the release.

	* src/: configure,, freebsd-memory.c,
	  linux-ia64-memory.c, linux-memory.c, linux-memory.h,
	  linux-mx-memory.c, linux-ppc64-memory.c, perf_events.c,
	  perfctr-x86.c, perfmon-memory.c, perfmon.c: Currently there are
	  at least 3 identical copies of the linux memory detection code
	  spread throughout the PAPI source code.

	  This change puts them all in linux-memory.c, and then has all the
	  individual substrates use the common code.