2012-02-13
* src/components/net/linux-net.c: Repairing more coverity warnings.
2012-02-11
* src/windows-common.c: Missed an instance of CPUs yesterday.
* src/: papi_internal.c, threads.c: This changes fixes two race
conditions that are probably the cause of the pthrtough
double-free error.
When freeing a thread, we remove and free all eventsets belonging
to that thread. This could race with the thread itself removing
the evenset, causing some ESI fields to be freed twice.
The problem was found by using the Valgrind 3.8 Helgrind tool
valgrind --tool=helgrind --free-is-write=yes ctests/pthrtough
In order for Helgrind to work, I had to temporarily modify PAPI
to use POSIX pthread mutexes for locking. Is there any reason we
don't use these all the time?
2012-02-10
* src/utils/: avail.c, component.c, event_chooser.c,
native_avail.c: ix one more case of "CPU's" in the print header
code.
Also remove the extraneous The following correspond to fields
in the PAPI_event_info_t structure. message
* src/: testlib/papi_test.h, testlib/test_utils.c,
ctests/all_native_events.c, ctests/calibrate.c,
ctests/code2name.c, ctests/hwinfo.c: Fix one more case of "CPU's"
in the print header code.
Also remove the extraneous The following correspond to fields
in the PAPI_event_info_t structure. message
* src/buildbot_configure_with_components.sh: take infiniband out of
the buildbot test.
* src/: x86_cache_info.c, components/coretemp/linux-coretemp.c,
components/lmsensors/linux-lmsensors.c,
components/lustre/linux-lustre.c, components/net/linux-net.c,
utils/event_chooser.c: Fix coverity errors reported by Will
Cohen.
* src/: aix.c, any-proc-null.c, linux-common.c, papi.c, papi.h,
papivi.h, solaris-niagara2.c, solaris-ultra.c,
ctests/clockres_pthreads.c: Address Redhat bug 785975. The
plural of CPU appears to be CPUs
* src/Makefile.inc: Patch to cleanup dependencies, allowing for
parallel makes. Patch due to Will Cohen from redhat
2012-02-09
* src/buildbot_configure_with_components.sh: Add infiniband and mx
component to buildbot component tests.
* src/components/net/tests/: net_values_by_code.c,
net_values_by_name.c: Apply patch suggested by Will Cohen to
check for system return values.
* src/components/lmsensors/linux-lmsensors.h: Added missing string
header
2012-02-08
* man/... : update man pages one more time for 4.2.1
release
* release_procedure.txt: Make sure generated html has papi group
id.
2012-02-07
* src/multiplex.c: Fix the @file matching multiple files warning.
* src/components/README: Cleanup doxygen errors.
* doc/Doxyfile-html: Typo introduced by the last commit.
* doc/Doxyfile-html: Exclude linux-bgp.c from doxygen.
* doc/Doxyfile-html: Make sure the component README file gets
included in doxygen.
* src/components/coretemp_freebsd/coretemp_freebsd.c: Cleanup
doxygen warnings in freebsd coretemp component.
* src/papi.h: Cleanup some doxygen warnings related to the
groupings.
* src/components/example/example.c: fix doxygen warning in the
example component
* doc/Doxyfile-html: Remove some cruft from doxygen config file.
This addresses the warning about dot not found at /sw/bin/dot .
* src/components/: infiniband/linux-infiniband.c,
infiniband/linux-infiniband.h, cuda/linux-cuda.c,
cuda/linux-cuda.h: Cleaned up some doxygen issues
* src/components/lmsensors/linux-lmsensors.c: Removed long
forgotten debug outputs
* src/papi_libpfm4_events.c: Fix minor doxygen typos.
* src/components/vmware/vmware.c: Add params for doxygen
* man/... : update man pages
2012-02-06
* doc/Doxyfile-man1: Fix a typo in a doxygen config file.
2012-02-03
* release_procedure.txt, doc/Doxyfile, doc/Doxyfile-everything,
doc/Doxyfile-html, doc/Doxyfile.utils, doc/Doxyfile-man1,
doc/Doxyfile-man3, doc/Makefile, doc/doxygen_procedure.txt:
Rework the doxygen configuration files.
* RELEASENOTES.txt: Update for the impending release.
* ChangeLogP421.txt, RELEASENOTES.txt: Updates for the impending
release.
2012-02-02
* src/: papi.c, papi.h: Minor tweaks for doxygen errors
2012-02-01
* src/components/lmsensors/: Rules.lmsensors, configure.in: Fixed
configure error message and rules link error for shared object
linking. Thanks Will Cohen.
* src/components/appio/Rules.appio: Correct pathing
* src/ctests/api.c: One minor tiny fix to check for PAPI_ENOEVNT
when testing PAPI_flops. If PAPI_FP_OPS does not exist on the
processor (like many of em), then this tests fails.
2012-01-31
* src/ctests/multiattach.c: Increase acceptance criteria for
cycles.
* src/Makefile.in, src/configure, src/configure.in, src/papi.h,
doc/Doxyfile, doc/Doxyfile-everything, doc/Doxyfile.utils,
papi.spec: Update version number to 4.2.1 in preparation for
release.
* src/ctests/prof_utils.c: Correct a warning on 32bit builds about
casting caddr_t to (long long)
Specifically: prof_utils.c:234: warning: cast from pointer to
integer of different size prof_utils.c:248: warning: cast from
pointer to integer of different size prof_utils.c:262: warning:
cast from pointer to integer of different size
We first cast to unsigned long and then on to long long. ( This
maybe overkill, but its for a printf format string )
2012-01-30
* release_procedure.txt: Add the correct path for doxygen on ICL
machines.
* src/papi_events.csv: Modify Intel Sandybridge PAPI_FP_OPS and
PAPI_FP_INS events to not count x87 fp instructions.
The problem is that the current predefines were made by adding 5
events. With the NMI watchdog stealing an event and/or
hyperthreading reducing the numbr of available counters by half,
we just couldn't fit.
This now raises the potential for people using x87-compiled
floating point on Sandybridge and getting 0 FP_OPS. This is only
likely if running a 32-bit kernel and *not* compiling your code
with -msse.
A long-term solution might be trying to find a better set of FP
predefines for sandybridge.
* src/components/: lustre/linux-lustre.c, mx/linux-mx.c: Some
really minor cleanups to the lustre and mx components.
2012-01-28
* src/components/example/: example.c, tests/example_basic.c: Update
example component
Cleans up code, adds some more documentation, adds counter write
support.
2012-01-27
* src/papi_user_events.c: Minor cleanups for user events.
* src/libpfm4/: README, include/perfmon/pfmlib.h, lib/Makefile,
lib/pfmlib_amd64.c, lib/pfmlib_common.c, lib/pfmlib_priv.h: Fix
"conflicts" in git import of libpfm4.
* src/libpfm4/lib/: pfmlib_amd64_fam11h.c,
events/amd64_events_fam11h.h: Initial revision
2012-01-26
* src/papi_fwrappers.c: Escape the include directives in the
documentation.
(Cleans up doxygen )
* src/components/README: Adding vmware to component README
* src/components/vmware/: Makefile.vmware.in,
PAPI-VMwareComponentDocument.pdf, Rules.vmware,
VMwareComponentDocument.txt, configure, configure.in, vmware.c,
vmware.h: merge vmware branch to head
* src/perf_events.c: Set fast_counter_read back to 0 on x86/x86_64
perf_events, as currently rdpmc counter access is not supported.
There are patches floating around that enable this (although
performance is still a long way from perfctr) but they will not
likely be merged for a while now, and the perf_events substrate
will require a lot of extra code to support it once it does make
it into a shipping kernel.
* src/buildbot_configure_with_components.sh: Remove acpi from the
buildbot configure script.
2012-01-25
* src/components/mx/: Makefile.mx.in, Rules.mx, configure,
configure.in, linux-mx.c, linux-mx.h, tests/Makefile,
tests/mx_basic.c, tests/mx_elapsed.c, utils/fake_mx_counters.c,
utils/sample_output: Re-write of the MX component
+ Add tests + Modernize code + Remove the need to run ./configure
in the mx directory + Add fake mx_counters program that lets you
test component on machine without myrinet installed
* src/components/: README, acpi/Rules.acpi,
acpi/linux-acpi-memory.c, acpi/linux-acpi.c, acpi/linux-acpi.h:
Remove the ACPI component.
It was one of the oldest components and needed a lot of cleanup
work, and it turns out that the main useful event it provided
(temperature) isn't available on modern machines/kernels
(coretemp should be used instead).
2012-01-23
* src/perf_events.c: Restored Phil's changes that I inadvertently
clobbered with my last commit :(
* src/perf_events.c: Remove a warning about an uninitialized
variable.
* src/utils/: component.c, event_info.c, native_avail.c: Update the
Doxygen comments on these utilities to have the command line
options listed in a list like the other utils.
* src/perf_events.c: More improvements to the read path for
multiplexed counters. Now the case for bad kernel behavior is
built in, and is not required with a #define.
Basically, there are situations when either enabled or running is
zero but not both. This could result in a divide by 0 in the
worst case, as was observed by Tushar Mohan in papiex. You could
trigger it by doing a read immediately after doing a start with
perf events and use a FORMAT_SCALE argument.
Now the logic goes, assuming mpxing.
1) if (running=enabled) return raw counter 2) if (running
&& enabled) scale counter by ratio 3) else warn in debug mode
return raw counter
Apparently we need a test case that does a read immediately after
a start. That's a hole.
Tested on brutus, core2 2.6.36
Here's the original report. ------------------- Model string and
code : Intel(R) Pentium(R) M processor 1600MHz (9) Linux
thinkpad 2.6.38-02063808-generic #201106040910 SMP Sat Jun 4
10:51:30 UTC 2011 i686 GNU/Linux PAPI Version: 4.2.0.0
I think I ran into a bug similar to what we ran with MIPS.
With the latest PAPI (from CVS), on an x86 (32-bit machine), when
using papiex with multiplex with anything more than two events, I
get a floating point exception in PAPI during the PAPI_read call.
On enabling debugging in the substrate, I think the problem is
the same (namely a division by zero, because some event had a
zero time of running):
libpapiex debug: 24625,0x0,papiex_thread_init_routine Starting
counters with PAPI_start
SUBSTRATE:perf_events.c:pe_enable_counters:953:24625
ioctl(enable): ctx: 0x96a4bc8, fd: 3
SUBSTRATE:perf_events.c:pe_enable_counters:953:24625
ioctl(enable): ctx: 0x96a4bc8, fd: 5 libpapiex debug:
24625,0x0,papiex_thread_init_routine Calling PAPI_lock before
critical section libpapiex debug:
24625,0x0,papiex_thread_init_routine Released PAPI lock libpapiex
debug: 24625,0x0,papiex_start START POINT 0 LABEL libpapiex
debug: 24625,0x0,papiex_start Reading counters (PAPI_read) to get
initial counts SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625
read: fd: 3, tid: 0, cpu: -1, ret: 56
SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 2 1341021
1341021 SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
(papi_pe_buffer[3] 33405 * tot_time_enabled 1341021) /
tot_time_running 1341021
SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
(papi_pe_buffer[5] 44552 * tot_time_enabled 1341021) /
tot_time_running 1341021
SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625 read: fd: 5,
tid: 0, cpu: -1, ret: 40
SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 1 214777 0
SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
(papi_pe_buffer[3] 0 * tot_time_enabled 214777) /
tot_time_running 0
The above debug log is for three events: PAPI_TOT_CYC,
PAPI_TOT_INS and PAPI_L1_DCM. Multiplexing works with two events.
Adding the third (any event), gives this error. Basically, the
floating point exception kills the program, and PAPI_read never
returns.
I think I know why papiex always hits this bug: It's because
right after starting the counters with PAPI_start, papiex does a
PAPI_read to store the initial values of the counters in a tmp
variable. These are then subtracted from the final counter
values. Should we put a deliberate delay? Of course, the real bug
should be fixed in PAPI. ----
* src/utils/event_info.c: Major re-write of the papi_xml_event_info
program. + Remove event code numbers, as they are not stable
run-to-run + Add some Doxygen comments + Remove some wrong
assumptions that could cause potential buffer overflows + Improve
usage information
2012-01-20
* src/components/lustre/: Rules.lustre, linux-lustre.c,
linux-lustre.h,
fake_proc/fs/lustre/llite/hpcdata-ffff81022a732800/read_ahead_stats,
fake_proc/fs/lustre/llite/hpcdata-ffff81022a732800/stats,
tests/Makefile, tests/lustre_basic.c: Finish the re-write of the
lustre component.
It would be nice if someone with access to a machine with a
lustre filesystem could test this for us.
* src/: papi_internal.c, components/lustre/linux-lustre.c: Update
the component initialization code so that it can handle a PAPI
ERROR return gracefully. Previously there was no way to indicate
initialization failure besides just setting num_native_events to
0.
2012-01-19
* src/components/lustre/: linux-lustre.c, linux-lustre.h: First
pass at cleaning up the lustre component.
It should now properly report no events when no lustre
filesystems are available.
2012-01-11
* src/papi_events.csv: Add AMD fam12h support to the events file.
Right now it is just an alias to the similar fam10h event list;
this can be split out if necessary once we find a tester with the
hardware.
* src/libpfm4/: README, docs/man3/pfm_get_event_next.3,
docs/man3/pfm_get_pmu_info.3, include/perfmon/perf_event.h,
include/perfmon/pfmlib.h, lib/Makefile, lib/pfmlib_amd64.c,
lib/pfmlib_amd64_priv.h, lib/pfmlib_common.c,
lib/pfmlib_perf_event.c, lib/pfmlib_priv.h,
lib/events/intel_coreduo_events.h, lib/events/perf_events.h,
perf_examples/Makefile, perf_examples/perf_util.c,
perf_examples/perf_util.h, perf_examples/self.c,
perf_examples/task_smpl.c, perf_examples/x86/bts_smpl.c: Fix
"merge" conflicts with libpfm4 merge.
* src/libpfm4/lib/: pfmlib_amd64_fam12h.c,
events/amd64_events_fam12h.h: Initial revision
* src/papi_libpfm4_events.c: Properly use the pfm_get_event_next()
iterator to find next event.
Without this, on AMD Fam10h some events are missed.
Some events are still missed due to libpfm4 bug, this will be
fixed once I update the libpfm4 tree included with PAPI.
Note, enumeration fixes like this often break things, so please
test if possible.
* src/papi_events.csv: Update the coreduo (not core2) events. Most
notably the FP events were wrong.
This, along with a forthcoming libpfm4 update, make all the
CTESTS pass on an old Yonah coreduo laptop I have.
2012-01-05
* src/ctests/api.c: Make the api test actually test PAPI_flops() as
it claims to do, rather than PAPI_flips().
Patch thanks to: Emilio De Camargo Francesquini
* src/papi_hl.c: Fix some copy-and-paste documentation remnants in
the papi_hl.c file, mostly where it said FLIPS where it meant
FLOPS.
2012-01-04
* src/utils/native_avail.c: Update papi_native_avail to *not* print
the event codes, as these are not guaranteed to be stable from
run to run.
Also fix up the formatting and print some component info too.
Please try and let me know if you don't like the new output.
* src/: configure, configure.in: Respect a FORCED option in
configure.
2011-12-22
* src/Rules.pfm4_pe: Remove perfmon.h from MISCHDRS.
2011-12-20
* src/: Rules.perfctr, Rules.perfctr-pfm, Rules.pfm, Rules.pfm4_pe,
Rules.pfm_pe, linux-lock.h, mb.h: Merry Christmas ARM users.
This patch fixes the SMP ARM issues reported by Harald Servat.
Also, adds proper header dependency checking in the Rules files.
People, please when you add headers, please add them to the
dependency lines so everything gets rebuilt properly.
New implementation of SMP locks are very pedantic, that is, they
are nost the fastest, but they do use atomics and avoid kernel
intervention.
Passed on our 2 core ARM v7. All pthreads tests now pass, except
the ones that also fail in the single processor case usually due
to a missing event.
Samples:
mucci@panda:~/papi.head/src$ uname -a Linux panda 3.0.0 #2 SMP
Fri Jul 29 16:23:54 EDT 2011 armv7l GNU/Linux
mucci@panda:~/papi.head/src$ hostname panda
mucci@panda:~/papi.head/src$ cat /proc/cpuinfo Processor: ARMv7
Processor rev 2 (v7l) processor: 0 BogoMIPS: 2007.19
processor: 1 BogoMIPS: 1965.18
Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU
implementer: 0x41 CPU architecture: 7 CPU variant: 0x1 CPU part:
0xc09 CPU revision: 2
Hardware: OMAP4 Panda board Revision: 0020 Serial:
0000000000000000
mucci@panda:~/papi.head/src$ ./ctests/locks_pthreads Creating 2
threads 10000 iterations took 13489 us. Running 44480 iterations
Expected: 88960 Received: 88960 locks_pthreads.c
PASSED
mucci@panda:~/papi.head/src$ ./ctests/pthrtough Creating 2
threads for 1000 iterations each of: register create_eventset
destroy_eventset unregister pthrtough.c
PASSED
mucci@panda:~/papi.head/src$ ./ctests/pthrtough2 Creating 2000
threads for 1 iterations each of: register create_eventset
destroy_eventset unregister Failed to create thread: 238
Continuing test with 237 threads. pthrtough2.c
PASSED
mucci@panda:~/papi.head/src$ ./ctests/thrspecific Thread
0x40ae1470 started, specific data is at 0xbea9c6d4 Thread
0x40021000 started, specific data is at 0xbea9c6c4 Thread
0x4244d470 started, specific data is at 0xbea9c6c8 Thread
0x4138d470 started, specific data is at 0xbea9c6d0 Thread
0x41c4d470 started, specific data is at 0xbea9c6cc Entry 0,
Thread 0x41c4d470, Data Pointer 0xbea9c6cc, Value 4000000 Entry
1, Thread 0x40021000, Data Pointer 0xbea9c6c4, Value 500000 Entry
2, Thread 0x40ae1470, Data Pointer 0xbea9c6d4, Value 1000000
Entry 3, Thread 0x4244d470, Data Pointer 0xbea9c6c8, Value
8000000 Entry 4, Thread 0x4138d470, Data Pointer 0xbea9c6d0,
Value 2000000 thrspecific.c PASSED
mucci@panda:~/papi.head/src$ ./ctests/krentel_pthreads
program_time = 6, threshold = 20000000, num_threads = 3
launched timer in thread 0 launched timer in thread 1 launched
timer in thread 3 launched timer in thread 2 [1] time = 1, count
= 7, iter = 5, rate = 1400.0/Kiter [2] time = 1, count = 7, iter
= 5, rate = 1400.0/Kiter [0] time = 1, count = 7, iter = 5, rate
= 1400.0/Kiter [3] time = 1, count = 7, iter = 5, rate =
1400.0/Kiter [1] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [1] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [1] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 5, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 5, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 5, count = 25, iter = 16, rate =
1562.5/Kiter [1] time = 5, count = 26, iter = 17, rate =
1529.4/Kiter [2] time = 6, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 6, count = 27, iter = 17, rate =
1588.2/Kiter done krentel_pthreads.c PASSED
2011-12-15
* src/papi_libpfm_presets.c: Change PAPI_PERFMON_EVENT_FILE
environment variable name to PAPI_CSV_EVENT_FILE since it's not
just for perfmon anymore.
* src/: configure, configure.in: Open mouth, insert foot; fix
perfctr configure by not testing a library we have not built yet.
2011-12-14
* src/: configure, configure.in: Missed one more place where we
tested perfctr != "no"
* src/: configure, configure.in: Fix a typo in the perfctr section;
it was causing a machine to default to perfctr when it had no
performance interface. ( a centos vm image with a 2.6.18 kernel
)
Also checks that we actually have perfctr if we specify
--with-perfctr.
2011-12-08
* src/components/cuda/: Makefile.cuda.in, Rules.cuda, configure,
configure.in, linux-cuda.c, linux-cuda.h: Added auto-detection of
CUDA version to PAPI CUDA Component. Reason is, the interface has
changed between CUDA/CUPTI 4.0 and 4.1. PAPI now supports both
CUDA versions without any exposure to the users. Configure step
is unchanged and no additional knowledge of which CUDA version is
installed is required.
2011-12-03
* src/components/appio/: CHANGES, README, Rules.appio, appio.c,
appio.h, tests/Makefile, tests/appio_list_events.c,
tests/appio_values_by_code.c, tests/appio_values_by_name.c: [no
log message]
2011-11-25
* src/linux-timer.c: Fix compilation warning if you specify
--with-walltime=gettimeofday
* src/linux-timer.c: Fix the build on Linux systems using mmtimer
* src/linux-common.c: Update the linux MHz detection code to use
bogoMIPS when there is no MHz field available in /proc/cpuinfo.
This gives roughly correct MHz on ARM, and the MIPS workaround
should also still work.
2011-11-23
* src/components/net/linux-net.c: Fix compile errors in a debug
message. (pathname didn't exist but we are working on
NET_PROC_FILE)
2011-11-22
* src/components/net/: linux-net.c, tests/net_values_by_code.c,
tests/net_values_by_name.c: Change the ping command in the net
tests to not use &> to redirect to NULL.
This would work on a system with csh, but on systems with a bash
shell this runs ping in the background instead, so the test
finishes before ping can generate any packets.
* src/components/net/linux-net.c: Fix slight bug in the net
component, where a memset() had the wrong arguments. This made
for weird results in the case where we start/stop quickly enough
that we return the initial data.
* src/components/net/: CHANGES, Makefile.net.in, README, Rules.net,
configure, configure.in, linux-net.c, linux-net.h,
tests/Makefile, tests/net_list_events.c,
tests/net_values_by_code.c, tests/net_values_by_name.c: Replace
net component with updated version written by Jose Pedro
Oliveira
* Dynamically detects the network interfaces
(i.e. the ones listed in /proc/net/dev)
* No longer needs to fork/exec the external ifconfig command and
parse its output. It now reads the Linux kernel network
statistics directly from /proc/net/dev.
* Each network interface now has 16 events instead of 13
(all counters in /proc/net/dev).
* Adds support for PAPI_event_name_to_code()
* Adds a couple of small tests/examples
2011-11-16
* doc/Doxyfile-everything: Fix the exclude libpfm/perfctr config.
2011-11-10
* src/perf_events.c: Only scale when running != enabled.
Now verified on ig, brutus and the malta
* src/perf_events.c: Further tuneups for mpx'ing.
Previous commit broke systems with valid return values from
perf_events for running & enabled. My attempt at scaling in long
long world caused an overflow which led to a negative number when
passed up the chain.
Also consolidated types... best way to avoid this stuff is to
start as the type you are ending as.
Now we use some better integer scaling...guaranteed within +-0.5%
of the actual scaled value of enabled / running.
New results on brutus: multiplex1
case1: Does PAPI_multiplex_init() not break regular operation?
Added PAPI_TOT_CYC Added PAPI_FP_INS case1: PAPI_TOT_CYC
PAPI_FP_INS case1: 2739865106 600002876
case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case2: PAPI_TOT_CYC PAPI_FP_INS case2: 2739678237
600002258
case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case3: PAPI_TOT_CYC PAPI_FP_INS case3: 2739847832
600002298
case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added
PAPI_FP_INS case4: PAPI_TOT_CYC PAPI_FP_INS case4:
2737832980 600013404
case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
Added PAPI_FP_INS read @start counter[0]: 7106 read @stop
counter[0]: 2740387017 difference counter[0]: 2740379911 read
@start counter[1]: 0 read @stop counter[1]: 600017169 difference
counter[1]: 600017169 multiplex1.c
PASSED
2011-11-09
* src/components/cuda/linux-cuda.c: For the CUDA Component,
PAPI_read() now accumulates event values. This has to be
explicitly done in PAPI because CUPTI automatically resets all
counter values to 0 after a read. (PAPI_start()/stop() continues
to reset the values to 0)
* src/perf_events.c: Last of the multiplex fixes to perf events.
The root of all evil was this:
counts[i] = ( uint64_t )
( ( double ) buffer[count_idx] * ( double )
buffer[get_total_time_enabled_idx( )] /
( double )
buffer[get_total_time_running_idx( )] ) ; In addition to
improper casting to uints... (papi returns int64s), using
floating point arith is a no-no. Plus this resulted in divide by
zeros...
Before:
SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6cba, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 4,
tid: 0, cpu: -1, buffer[0-2]: 0x23, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6de72b5d, 0x8ae0fa80, 0x8ae0fa80,
ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read:
fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b46b, 0x8ae0fa80,
0x8ae0fa80, ret: 24
So kernel is good, but errors in multiplexed scaling.
case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
Added PAPI_FP_INS read @start counter[0]: 9223372034707292159
read @stop counter[0]: 1843791732 difference counter[0]:
-9223372032863500427 multiplex1.c
FAILED Line # 389
With fix:
SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6782, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 4,
tid: 0, cpu: -1, buffer[0-2]: 0x0, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6de725dc, 0x8ae0fa80, 0x8ae0fa80,
ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read:
fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b400, 0x8ae0fa80,
0x8ae0fa80, ret: 24 read @start counter[0]: 26498 read @stop
counter[0]: 1843865052 difference counter[0]: 1843838554 read
@start counter[1]: 0 read @stop counter[1]: 80000000 difference
counter[1]: 80000000
SUBSTRATE:perf_events.c:_papi_pe_update_control_state:1288:12821
Called with count == 0
SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_shutdown:1178:12821
shutdown multiplex1.c PASSED
New code is vastly simpler and smaller and checks for bad kernel
behavior:
int64_t tot_time_running =
papi_pe_buffer[get_total_time_running_idx( )];
int64_t tot_time_enabled =
papi_pe_buffer[get_total_time_enabled_idx( )];
#ifdef BRAINDEAD_MULTIPLEXING if (tot_time_enabled == 0)
tot_time_enabled = 1; if (tot_time_running == 0)
tot_time_running = 1; #else /* If we are convinced this
platform's kernel is fully operational, then this stuff will
never happen. If it does, then BRAINDEAD_MULTIPLEXING
needs to be enabled. */ if ((tot_time_running == 0) &&
(papi_pe_buffer[count_idx])) { PAPIERROR("This platform
has a kernel bug in multiplexing, count is %lld (not 0), but time
running is 0.\n",papi_pe_buffer[count_idx]); return
PAPI_EBUG; } if ((tot_time_enabled == 0) &&
(papi_pe_buffer[count_idx])) { PAPIERROR("This platform
has a kernel bug in multiplexing, count is %lld (not 0), but time
enabled is 0.\n",papi_pe_buffer[count_idx]); return PAPI_EBUG;
} #endif pe_ctl->counts[i] =
(papi_pe_buffer[count_idx] * tot_time_enabled) /
tot_time_running;
Also, renamed all instances of 'buffer' to papi_pe_buffer because
buffer is a global variable on MIPS/Linux/libc. Yikes! (gdb)
whatis buffer type = struct utmp *
* src/ctests/multiplex1.c: Made sure that PAPI_TOT_CYC is the first
event added to multiplexing event set.
This will demonstrate the bug in perf_event multiplexing
arithmetic in case5 on MIPS and other perf_event subsystems that
likely have some breakage in the kernels handling of
multiplexing. The common bug is that the perf_event subsystem
does not fill in the second and third elements of the 24 byte
read that gets returned from the kernel. These values are
time_enabled and time_running. MIPS as of 3.0.3 just fills this
in after a HZ tick has happened. Workarounds are pretty simple in
the low level layer...
A buggy output looks like this (3.0.3 MIPS/Linux Big Endian)
-bash-4.1$ ./ctests/multiplex1 case1: Does PAPI_multiplex_init()
not break regular operation? Added PAPI_TOT_CYC Added PAPI_FP_INS
case1: PAPI_TOT_CYC PAPI_FP_INS case1: 1843775252
80000000
case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case2: PAPI_TOT_CYC PAPI_FP_INS case2: 1843773254
80000037
case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case3: PAPI_TOT_CYC PAPI_FP_INS case3: 1843772919
80000037
case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added
PAPI_FP_INS case4: PAPI_TOT_CYC PAPI_FP_INS case4:
1843773959 80000037
case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
Added PAPI_FP_INS read @start counter[0]: 9223372034707292159
read @stop counter[0]: 1843784577 difference counter[0]:
-9223372032863507582 multiplex1.c
FAILED Line # 389 Error: Difference in start and stop resulted in
negative value!
2011-11-08
* src/components/cuda/: linux-cuda.c, linux-cuda.h: Updated CUDA
component for CUPTI 4.1 (RC1). Note, SetCudaDevice() should now
work with the latest CUDA 4.1 version.
2011-11-07
* src/components/coretemp/linux-coretemp.c: Update coretemp to
better handle sparse numbering of the inputs.
* doc/Doxyfile-everything: Exclude the libpfm* and perfctr-*
directories from consideration when generating Doxygen docs.
* src/: papi.h, components/acpi/linux-acpi.h,
components/coretemp_freebsd/coretemp_freebsd.c,
components/cuda/linux-cuda.h,
components/infiniband/linux-infiniband.h,
components/mx/linux-mx.h, components/net/linux-net.h: Place a
space in < your name here > to cleanup doxygen warnings.
* src/perf_events.c: Only perf event systems that have FAST counter
reads and FAST hw timer access are x86...
* src/linux-common.c: MIPS clock and Linux fixup code
* src/components/example/example.c: A little more documentation on
which of the component vector function pointers are relevant.
* src/papi_vector.c: Tested the dummy get_{real,virt}_{cyc,usec}
functions on zeus, they appear to work.
* src/components/example/tests/example_multiple_components.c:
Another fix to properly skip the multiple component case if CPU
component not available.
* src/components/example/tests/example_multiple_components.c: Skip
the test if no CPU component enabled, rather than fail.
2011-11-04
* src/components/example/example.c: Free example_native_table with
papi_free, glibc didn't like it if we just called free. (we
allocate it with papi_calloc)
* man/...: Version number bump. (since the pages are
quantifiably different from those released in 4.2.0 )
* doc/: Doxyfile, Doxyfile-everything, Doxyfile.utils: Bump version
number in the doxygen config files.
* src/components/example/example.c:
_papi_example_shutdown_substrate does not have any arguments.
* src/components/net/linux-net.c: Include ctype.h for isspace().
* release_procedure.txt: release_procedure now reflects the correct
version of doxygen to use.
* src/buildbot_configure_with_components.sh: Do not always
configure with not cpu counters, allow this to be passed in.
Allows us to use one script for both types of builds we test.
* delete_before_release.sh,
src/buildbot_configure_with_components.sh: Create a script for
buildbot to configure with several components.
Buildbot runs all commandline arguments through a sanitization
before passing them to sh. Thus --with-configure="a b c" =>
'--with-configure="a b c"' which is bad.
delete_before_release.sh has been instructed to remove this file.
* man/...: Rebuild the manpages with doxygen 1.7.4 to
remove the 's at the end of sentances.
The html output looks clean.
2011-11-03
* src/: multiplex.c, papi.c: Fix some gcc-4.6 compile warnings
complaining that retval was being set but not used.
* src/papi.c: Add some extra comments to the PAPI_num_cmp_hwctrs()
code that describe its limitations a bit better.
2011-11-02
* src/: ctests/overflow_allcounters.c, testlib/test_utils.c: Add
lots of debugging to make results of overflow_allcounters test a
bit more clear.
* src/components/coretemp/tests/coretemp_pretty.c: coretemp_pretty
wasn't printing the description for fan inputs.
The result on an apple MacBook Pro (running Linux) now looks like
this:
Trying all coretemp events Found coretemp component at cid 2
hwmon0.temp1_input value: 33.50 degrees C, applesmc
module, label TB0T hwmon0.temp2_input value: 33.50 degrees C,
applesmc module, label TB1T hwmon0.temp3_input value: 32.00
degrees C, applesmc module, label TB2T hwmon0.temp4_input value:
0.00 degrees C, applesmc module, label TB3T hwmon0.temp5_input
value: 62.25 degrees C, applesmc module, label TC0D
hwmon0.temp6_input value: 54.25 degrees C, applesmc module,
label TC0F hwmon0.temp7_input value: 57.25 degrees C, applesmc
module, label TC0P hwmon0.temp8_input value: 69.00 degrees C,
applesmc module, label TG0D hwmon0.temp9_input value: 58.00
degrees C, applesmc module, label TG0F hwmon0.temp10_input
value: 51.25 degrees C, applesmc module, label TG0H
hwmon0.temp11_input value: 58.25 degrees C, applesmc
module, label TG0P hwmon0.temp12_input value: 60.75
degrees C, applesmc module, label TG0T hwmon0.temp13_input
value: 62.25 degrees C, applesmc module, label TN0D
hwmon0.temp14_input value: 59.25 degrees C, applesmc
module, label TN0P hwmon0.temp15_input value: 49.00
degrees C, applesmc module, label TTF0 hwmon0.temp16_input
value: 54.00 degrees C, applesmc module, label Th2H
hwmon0.temp17_input value: 58.75 degrees C, applesmc
module, label Tm0P hwmon0.temp18_input value: 31.50
degrees C, applesmc module, label Ts0P hwmon0.temp19_input
value: 44.25 degrees C, applesmc module, label Ts0S
hwmon0.fan1_input value: 1999 RPM, applesmc module, label Left
side hwmon0.fan2_input value: 2003 RPM, applesmc module,
label Right side coretemp_pretty.c PASSED
* src/components/coretemp/: linux-coretemp.c, linux-coretemp.h,
tests/coretemp_pretty.c: Make the coretemp code a bit pickier
about which events it supports. Add descriptions to the events.
Also add support for Voltage (in*) events.
On an amd14h machine I have access to, coretemp_pretty now
prints:
Trying all coretemp events Found coretemp component at cid 2
hwmon0.in1_input value: 1.31 V, it8721 module, label ?
hwmon0.in2_input value: 2.22 V, it8721 module, label ?
hwmon0.in3_input value: 3.34 V, it8721 module, label +3.3V
hwmon0.in4_input value: 1.02 V, it8721 module, label ?
hwmon0.in5_input value: 1.52 V, it8721 module, label ?
hwmon0.in6_input value: 1.13 V, it8721 module, label ?
hwmon0.in7_input value: 3.26 V, it8721 module, label 3VSB
hwmon0.in8_input value: 3.17 V, it8721 module, label Vbat
hwmon0.temp1_input value: 28.00 degrees C, it8721 module, label ?
hwmon0.temp2_input value: -128.00 degrees C, it8721 module, label
? hwmon0.temp3_input value: -128.00 degrees C, it8721 module,
label ? hwmon0.fan1_input value: 0 RPM hwmon0.fan2_input value:
1320 RPM hwmon1.temp1_input value: 33.00 degrees C, jc42 module,
label ? hwmon2.temp1_input value: 31.75 degrees C, jc42 module,
label ? hwmon3.temp1_input value: 53.00 degrees C, radeon module,
label ? hwmon4.temp1_input value: 53.12 degrees C, k10temp
module, label ? coretemp_pretty.c PASSED
* src/components/coretemp/: linux-coretemp.c,
tests/coretemp_pretty.c: Cut and paste error slipped in to that
last commit. Fixes a build issue.
* src/components/coretemp/: linux-coretemp.c, tests/Makefile,
tests/coretemp_pretty.c: Clean up coretemp with same cleanups
done in example component.
Add a new test, "coretemp_pretty" that prints coretemp results in
a more user-friendly way.
* man/:... Rebuild the man pages with a newer version of
doxygen. ( older versions of doxygen had a nasty bug in man
output. )
Also reworked the utilities documentation to remove pages for the
files. Thanks to Jose Pedre Oliveria for pointing this out.
* src/components/example/tests/: Makefile,
example_multiple_components.c: Add a test that makes sure you can
have active EventSets on multiple components at the same time.
* release_procedure.txt: Change PATH specification to include tcsh
syntax; other minor syntax corrections.
* src/components/example/example.c: More cleanups and documentation
for the example component.
2011-11-01
* src/components/example/example.c: Some more major overhaul of the
example component. A lot more documentation, plus make is behave
a lot more like a real component would.
* doc/Doxyfile.utils: Turn off undocumented warnings for the utils.
doxygen run.
* src/utils/: avail.c, command_line.c, cost.c, event_chooser.c,
multiplex_cost.c: Add spaces to the comments so doxygen doesn't
think <event> is an xml tag.
2011-10-31
* src/utils/: avail.c, clockres.c, command_line.c, component.c,
cost.c, decode.c, error_codes.c, event_chooser.c, mem_info.c,
multiplex_cost.c, native_avail.c: Remove the @file directive from
the doxygen comment blocks for the utilities. This cleans up the
generated man pages. ( we nolonger build *.c.1 )
* src/components/example/: example.c, tests/example_basic.c:
Clarify in the example component that ->reset only gets called if
an eventset is currently running.
Extend the example_basic test to test PAPI_reset()
* release_procedure.txt: Fix a maketarget typo.
* release_procedure.txt: We now have a good version of doxygen
installed on most icl run machines. (
/mnt/scratch/sw/doxygen-1.7.5.1 )
* doc/doxygen_procedure.txt: [no log message]
* release_procedure.txt: Update release_procedure to inform how to
update the website documentation link.
2011-10-28
* RELEASENOTES.txt: Correct the RELEASENOTES for some things I
missed when reviewing it.
It's Offcore events that we don't support on
Nehalem/Westmere/Sandybridge.
Also the power6 libpfm4 bug that was listed as an outstanding bug
was fixed a long time ago.
* src/components/coretemp/linux-coretemp.c: Have coretemp set the
num_native_events field.
* src/components/example/tests/example_basic.c: Update example test
to print num_native_events, to help debug issues with other
components not updating the value.
* src/components/coretemp/: linux-coretemp.c, linux-coretemp.h: Fix
typo enent -> event Also remove residual LMSENSOR mentions from
the coretemp header.
* src/papi_libpfm4_events.c: Fix two memory leak locations.
The attached patch reduces the number of lost memory blocks
reported by valgrind from 234 to 39. It frees the memory
allocated by the 4 strdups and the calloc functions in
papi_libpfm4_events.c:allocate_native_event().
Patch by: José Pedro Oliveira
* src/components/cuda/tests/Makefile: The change to pass the PAPI
CC/CFLAGS to the component tests broke the nvidia test as it
wants CC to be nvcc. So update that Makefile to use nvcc
instead.
2011-10-27
* src/components/example/tests/example_basic.c: Improve the
example_basic component test to be much more comprehensive.
* src/components/example/: example.c, tests/HelloWorld.c,
tests/Makefile, tests/example_basic.c: Cleanup the example test.
Fix various mistakes in the comments as well as add better error
checking.
Also rename the "HelloWorld" test to "example_basic"
* src/components/coretemp/tests/Makefile: The coretemp_test target
was example_test due to cut-and-paste error.
Patch from Jose Pedro Oliveira
* src/Makefile.inc: Add a component_tests dependency so that the
component_tests are made during a make -j build
* src/Makefile.inc: Make sure the component test makefiles get
passed the CC and CFLAGS definitions.
* src/components/coretemp/: linux-coretemp.c, tests/Makefile,
tests/coretemp_basic.c: Fix up the coretemp component some more.
Make sure the enumerate function returns PAPI_ENOEVNT if no
events are available.
Update the Makefile so it has proper dependencies.
Update the test so it prints the first event available. (The
latter based on a patch from Jose Pedro Oliveira)
* src/: solaris-ultra.c, ctests/all_native_events.c: The
solaris-ultra substrate was still broken. This is because recent
changes to component bind time explictly used the ->set_domain()
call, and this vector was not set up in solaris_ultra.
Also made the all_native_events test report the returned error
value to aid in debugging problems like this in the future.