|
Packit |
b1f7ae |
Capturing Intel(R) Processor Trace (Intel PT) {#capture}
|
|
Packit |
b1f7ae |
=============================================
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
! Copyright (c) 2015-2017, Intel Corporation
|
|
Packit |
b1f7ae |
!
|
|
Packit |
b1f7ae |
! Redistribution and use in source and binary forms, with or without
|
|
Packit |
b1f7ae |
! modification, are permitted provided that the following conditions are met:
|
|
Packit |
b1f7ae |
!
|
|
Packit |
b1f7ae |
! * Redistributions of source code must retain the above copyright notice,
|
|
Packit |
b1f7ae |
! this list of conditions and the following disclaimer.
|
|
Packit |
b1f7ae |
! * Redistributions in binary form must reproduce the above copyright notice,
|
|
Packit |
b1f7ae |
! this list of conditions and the following disclaimer in the documentation
|
|
Packit |
b1f7ae |
! and/or other materials provided with the distribution.
|
|
Packit |
b1f7ae |
! * Neither the name of Intel Corporation nor the names of its contributors
|
|
Packit |
b1f7ae |
! may be used to endorse or promote products derived from this software
|
|
Packit |
b1f7ae |
! without specific prior written permission.
|
|
Packit |
b1f7ae |
!
|
|
Packit |
b1f7ae |
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
Packit |
b1f7ae |
! AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
Packit |
b1f7ae |
! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
Packit |
b1f7ae |
! ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
Packit |
b1f7ae |
! LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
Packit |
b1f7ae |
! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
Packit |
b1f7ae |
! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
Packit |
b1f7ae |
! INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
Packit |
b1f7ae |
! CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
Packit |
b1f7ae |
! ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
Packit |
b1f7ae |
! POSSIBILITY OF SUCH DAMAGE.
|
|
Packit |
b1f7ae |
!-->
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
This chapter describes how to capture Intel PT for processing with libipt. For
|
|
Packit |
b1f7ae |
illustration, we use the sample tools ptdump and ptxed.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## Capturing Intel PT on Linux
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Starting with version 4.1, the Linux kernel supports Intel PT via the perf_event
|
|
Packit |
b1f7ae |
kernel interface. Starting with version 4.3, the perf user-space tool will
|
|
Packit |
b1f7ae |
support Intel PT as well.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Capturing Intel PT via Linux perf_event
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
We start with setting up a perf_event_attr object for capturing Intel PT. The
|
|
Packit |
b1f7ae |
structure is declared in `/usr/include/linux/perf_event.h`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The Intel PT PMU type is dynamic. Its value can be read from
|
|
Packit |
b1f7ae |
`/sys/bus/event_source/devices/intel_pt/type`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct perf_event_attr attr;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&attr, 0, sizeof(attr));
|
|
Packit |
b1f7ae |
attr.size = sizeof(attr);
|
|
Packit |
b1f7ae |
attr.type = <read type>();
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
attr.exclude_kernel = 1;
|
|
Packit |
b1f7ae |
...
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Once all desired fields have been set, we can open a perf_event counter for
|
|
Packit |
b1f7ae |
Intel PT. See `man 2 perf_event_open` for details. In our example, we
|
|
Packit |
b1f7ae |
configure it for tracing a single thread.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The system call returns a file descriptor on success, `-1` otherwise.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
int fd;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
fd = syscall(SYS_perf_event_open, &attr, <pid>, -1, -1, 0);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The Intel PT trace is captured in the AUX area, which has been introduced with
|
|
Packit |
b1f7ae |
kernel 4.1. The DATA area contains sideband information such as image changes
|
|
Packit |
b1f7ae |
that are necessary for decoding the trace.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In theory, both areas can be configured as circular buffers or as linear buffers
|
|
Packit |
b1f7ae |
by mapping them read-only or read-write, respectively. When configured as
|
|
Packit |
b1f7ae |
circular buffer, new data will overwrite older data. When configured as linear
|
|
Packit |
b1f7ae |
buffer, the user is expected to continuously read out the data and update the
|
|
Packit |
b1f7ae |
buffer's tail pointer. New data that do not fit into the buffer will be
|
|
Packit |
b1f7ae |
dropped.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
When using the AUX area, its size and offset have to be filled into the
|
|
Packit |
b1f7ae |
`perf_event_mmap_page`, which is mapped together with the DATA area. This
|
|
Packit |
b1f7ae |
requires the DATA area to be mapped read-write and hence configured as linear
|
|
Packit |
b1f7ae |
buffer. In our example, we configure the AUX area as circular buffer.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Note that the size of both the AUX and the DATA area has to be a power of two
|
|
Packit |
b1f7ae |
pages. The DATA area needs one additional page to contain the
|
|
Packit |
b1f7ae |
`perf_event_mmap_page`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct perf_event_mmap_page *header;
|
|
Packit |
b1f7ae |
void *base, *data, *aux;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
base = mmap(NULL, (1+2**n) * PAGE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
|
|
Packit |
b1f7ae |
if (base == MAP_FAILED)
|
|
Packit |
b1f7ae |
return <handle data mmap error>();
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
header = base;
|
|
Packit |
b1f7ae |
data = base + header->data_offset;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
header->aux_offset = header->data_offset + header->data_size;
|
|
Packit |
b1f7ae |
header->aux_size = (2**m) * PAGE_SIZE;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
aux = mmap(NULL, header->aux_size, PROT_READ, MAP_SHARED, fd,
|
|
Packit |
b1f7ae |
header->aux_offset);
|
|
Packit |
b1f7ae |
if (aux == MAP_FAILED)
|
|
Packit |
b1f7ae |
return <handle aux mmap error>();
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Capturing Intel PT via the perf user-space tool
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Starting with kernel 4.3, the perf user-space tool can be used to capture Intel
|
|
Packit |
b1f7ae |
PT with the `intel_pt` event. See tools/perf/Documentation in the Linux kernel
|
|
Packit |
b1f7ae |
tree for further information. In this text, we describe how to use the captured
|
|
Packit |
b1f7ae |
trace with the ptdump and ptxed sample tools.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
We start with capturing some Intel PT trace using the intel_pt event.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.sh}
|
|
Packit |
b1f7ae |
$ perf record -e intel_pt//u --per-thread -- grep -r foo /usr/include
|
|
Packit |
b1f7ae |
[ perf record: Woken up 26 times to write data ]
|
|
Packit |
b1f7ae |
[ perf record: Captured and wrote 51.969 MB perf.data ]
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
This generates a `perf.data` file that contains the Intel PT trace, the sideband
|
|
Packit |
b1f7ae |
information, and some metadata. To process the trace with libipt, we need to
|
|
Packit |
b1f7ae |
extract the Intel PT trace into one file per thread or cpu.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Looking at the raw trace dump of `perf script -D`, we notice
|
|
Packit |
b1f7ae |
`PERF_RECORD_AUXTRACE` records. The raw Intel PT trace is contained directly
|
|
Packit |
b1f7ae |
after such records. We can extract it with the `dd` command. The arguments to
|
|
Packit |
b1f7ae |
`dd` can be computed from the record's fields. This can be done automatically,
|
|
Packit |
b1f7ae |
for example with an AWK script.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.awk}
|
|
Packit |
b1f7ae |
/PERF_RECORD_AUXTRACE / {
|
|
Packit |
b1f7ae |
offset = strtonum($1)
|
|
Packit |
b1f7ae |
hsize = strtonum(substr($2, 2))
|
|
Packit |
b1f7ae |
size = strtonum($5)
|
|
Packit |
b1f7ae |
idx = strtonum($11)
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
ofile = sprintf("perf.data-aux-idx%d.bin", idx)
|
|
Packit |
b1f7ae |
begin = offset + hsize
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
cmd = sprintf("dd if=perf.data of=%s conv=notrunc oflag=append ibs=1 \
|
|
Packit |
b1f7ae |
skip=%d count=%d status=none", ofile, begin, size)
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
system(cmd)
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The libipt tree contains such a script in `script/perf-read-aux.bash`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In addition to the Intel PT trace, we need the traced memory image. When
|
|
Packit |
b1f7ae |
tracing a single process where the memory image does not change during tracing,
|
|
Packit |
b1f7ae |
we can construct the memory image by examining `PERF_RECORD_MMAP` and
|
|
Packit |
b1f7ae |
`PERF_RECORD_MMAP2` records. This can again be done automatically, for example
|
|
Packit |
b1f7ae |
with an AWK script.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.awk}
|
|
Packit |
b1f7ae |
function handle_mmap(file, vaddr) {
|
|
Packit |
b1f7ae |
if (match(file, /\[.*\]/) != 0) {
|
|
Packit |
b1f7ae |
# ignore 'virtual' file names like [kallsyms]
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
else if (match(file, /\.ko$/) != 0) {
|
|
Packit |
b1f7ae |
# ignore kernel objects
|
|
Packit |
b1f7ae |
#
|
|
Packit |
b1f7ae |
# use /proc/kcore
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
else {
|
|
Packit |
b1f7ae |
printf(" --elf %s:0x%x", file, vaddr)
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
/PERF_RECORD_MMAP / {
|
|
Packit |
b1f7ae |
vaddr = strtonum(substr($5, 2))
|
|
Packit |
b1f7ae |
file = $9
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
handle_mmap(file, vaddr)
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
/PERF_RECORD_MMAP2 / {
|
|
Packit |
b1f7ae |
vaddr = strtonum(substr($5, 2))
|
|
Packit |
b1f7ae |
file = $12
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
handle_mmap(file, vaddr)
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The above script generates options for the `ptxed` sample tool. The libipt tree
|
|
Packit |
b1f7ae |
contains such a script in `script/perf-read-image.bash`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Let's put it all together.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.sh}
|
|
Packit |
b1f7ae |
$ perf record -e intel_pt//u --per-thread -- grep -r foo /usr/include
|
|
Packit |
b1f7ae |
[ perf record: Woken up 26 times to write data ]
|
|
Packit |
b1f7ae |
[ perf record: Captured and wrote 51.969 MB perf.data ]
|
|
Packit |
b1f7ae |
$ script/perf-read-aux.bash
|
|
Packit |
b1f7ae |
$ script/perf-read-image.bash | xargs ptxed --cpu 6/61 --pt perf.data-aux-idx0.bin
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Sideband support
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The above example does not consider sideband information. It therefore only
|
|
Packit |
b1f7ae |
works for not-too-complicated single-threaded applications. For tracing
|
|
Packit |
b1f7ae |
multi-threaded applications or for system-wide tracing (including ring-3),
|
|
Packit |
b1f7ae |
sideband information is required for decoding the trace.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Sideband information can be defined as any information necessary for decoding
|
|
Packit |
b1f7ae |
Intel PT that is not contained in the trace stream itself. We already supply:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* the binary files whose execution was traced and the virtual address at which
|
|
Packit |
b1f7ae |
each file was loaded
|
|
Packit |
b1f7ae |
* the family/model/stepping of the processor on which the trace was recorded
|
|
Packit |
b1f7ae |
* some information regarding timing
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
What's missing is information about changes to the traced memory image while the
|
|
Packit |
b1f7ae |
trace is being recorded:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* memory map/unamp information
|
|
Packit |
b1f7ae |
* context switch information
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
On Linux, this information can be found in the form of PERF_EVENT records in the
|
|
Packit |
b1f7ae |
DATA buffer or in the perf.data file respectively.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Collection and interpretation of this information is currently left completely
|
|
Packit |
b1f7ae |
to the user.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Capturing Intel PT via Simple-PT
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The Simple-PT project on github supports capturing Intel PT on Linux with an
|
|
Packit |
b1f7ae |
alternative kernel driver. The spt decoder supports sideband information.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
See the project's page at https://github.com/andikleen/simple-pt for more
|
|
Packit |
b1f7ae |
information including examples.
|