Blame doc/howto_capture.md

Packit b1f7ae
Capturing Intel(R) Processor Trace (Intel PT) {#capture}
Packit b1f7ae
=============================================
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
 ! Copyright (c) 2015-2017, Intel Corporation
Packit b1f7ae
 !
Packit b1f7ae
 ! Redistribution and use in source and binary forms, with or without
Packit b1f7ae
 ! modification, are permitted provided that the following conditions are met:
Packit b1f7ae
 !
Packit b1f7ae
 !  * Redistributions of source code must retain the above copyright notice,
Packit b1f7ae
 !    this list of conditions and the following disclaimer.
Packit b1f7ae
 !  * Redistributions in binary form must reproduce the above copyright notice,
Packit b1f7ae
 !    this list of conditions and the following disclaimer in the documentation
Packit b1f7ae
 !    and/or other materials provided with the distribution.
Packit b1f7ae
 !  * Neither the name of Intel Corporation nor the names of its contributors
Packit b1f7ae
 !    may be used to endorse or promote products derived from this software
Packit b1f7ae
 !    without specific prior written permission.
Packit b1f7ae
 !
Packit b1f7ae
 ! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
Packit b1f7ae
 ! AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
Packit b1f7ae
 ! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
Packit b1f7ae
 ! ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
Packit b1f7ae
 ! LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
Packit b1f7ae
 ! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
Packit b1f7ae
 ! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
Packit b1f7ae
 ! INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
Packit b1f7ae
 ! CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
Packit b1f7ae
 ! ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
Packit b1f7ae
 ! POSSIBILITY OF SUCH DAMAGE.
Packit b1f7ae
 !-->
Packit b1f7ae
Packit b1f7ae
This chapter describes how to capture Intel PT for processing with libipt.  For
Packit b1f7ae
illustration, we use the sample tools ptdump and ptxed.
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
## Capturing Intel PT on Linux
Packit b1f7ae
Packit b1f7ae
Starting with version 4.1, the Linux kernel supports Intel PT via the perf_event
Packit b1f7ae
kernel interface.  Starting with version 4.3, the perf user-space tool will
Packit b1f7ae
support Intel PT as well.
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
### Capturing Intel PT via Linux perf_event
Packit b1f7ae
Packit b1f7ae
We start with setting up a perf_event_attr object for capturing Intel PT.  The
Packit b1f7ae
structure is declared in `/usr/include/linux/perf_event.h`.
Packit b1f7ae
Packit b1f7ae
The Intel PT PMU type is dynamic.  Its value can be read from
Packit b1f7ae
`/sys/bus/event_source/devices/intel_pt/type`.
Packit b1f7ae
Packit b1f7ae
~~~{.c}
Packit b1f7ae
    struct perf_event_attr attr;
Packit b1f7ae
Packit b1f7ae
    memset(&attr, 0, sizeof(attr));
Packit b1f7ae
    attr.size = sizeof(attr);
Packit b1f7ae
    attr.type = <read type>();
Packit b1f7ae
Packit b1f7ae
    attr.exclude_kernel = 1;
Packit b1f7ae
    ...
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
Once all desired fields have been set, we can open a perf_event counter for
Packit b1f7ae
Intel PT.  See `man 2 perf_event_open` for details.  In our example, we
Packit b1f7ae
configure it for tracing a single thread.
Packit b1f7ae
Packit b1f7ae
The system call returns a file descriptor on success, `-1` otherwise.
Packit b1f7ae
Packit b1f7ae
~~~{.c}
Packit b1f7ae
    int fd;
Packit b1f7ae
Packit b1f7ae
    fd = syscall(SYS_perf_event_open, &attr, <pid>, -1, -1, 0);
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
The Intel PT trace is captured in the AUX area, which has been introduced with
Packit b1f7ae
kernel 4.1.  The DATA area contains sideband information such as image changes
Packit b1f7ae
that are necessary for decoding the trace.
Packit b1f7ae
Packit b1f7ae
In theory, both areas can be configured as circular buffers or as linear buffers
Packit b1f7ae
by mapping them read-only or read-write, respectively.  When configured as
Packit b1f7ae
circular buffer, new data will overwrite older data.  When configured as linear
Packit b1f7ae
buffer, the user is expected to continuously read out the data and update the
Packit b1f7ae
buffer's tail pointer.  New data that do not fit into the buffer will be
Packit b1f7ae
dropped.
Packit b1f7ae
Packit b1f7ae
When using the AUX area, its size and offset have to be filled into the
Packit b1f7ae
`perf_event_mmap_page`, which is mapped together with the DATA area.  This
Packit b1f7ae
requires the DATA area to be mapped read-write and hence configured as linear
Packit b1f7ae
buffer.  In our example, we configure the AUX area as circular buffer.
Packit b1f7ae
Packit b1f7ae
Note that the size of both the AUX and the DATA area has to be a power of two
Packit b1f7ae
pages.  The DATA area needs one additional page to contain the
Packit b1f7ae
`perf_event_mmap_page`.
Packit b1f7ae
Packit b1f7ae
~~~{.c}
Packit b1f7ae
    struct perf_event_mmap_page *header;
Packit b1f7ae
    void *base, *data, *aux;
Packit b1f7ae
Packit b1f7ae
    base = mmap(NULL, (1+2**n) * PAGE_SIZE, PROT_WRITE, MAP_SHARED, fd, 0);
Packit b1f7ae
    if (base == MAP_FAILED)
Packit b1f7ae
        return <handle data mmap error>();
Packit b1f7ae
Packit b1f7ae
    header = base;
Packit b1f7ae
    data = base + header->data_offset;
Packit b1f7ae
Packit b1f7ae
    header->aux_offset = header->data_offset + header->data_size;
Packit b1f7ae
    header->aux_size   = (2**m) * PAGE_SIZE;
Packit b1f7ae
Packit b1f7ae
    aux = mmap(NULL, header->aux_size, PROT_READ, MAP_SHARED, fd,
Packit b1f7ae
               header->aux_offset);
Packit b1f7ae
    if (aux == MAP_FAILED)
Packit b1f7ae
        return <handle aux mmap error>();
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
### Capturing Intel PT via the perf user-space tool
Packit b1f7ae
Packit b1f7ae
Starting with kernel 4.3, the perf user-space tool can be used to capture Intel
Packit b1f7ae
PT with the `intel_pt` event.  See tools/perf/Documentation in the Linux kernel
Packit b1f7ae
tree for further information.  In this text, we describe how to use the captured
Packit b1f7ae
trace with the ptdump and ptxed sample tools.
Packit b1f7ae
Packit b1f7ae
We start with capturing some Intel PT trace using the intel_pt event.
Packit b1f7ae
Packit b1f7ae
~~~{.sh}
Packit b1f7ae
    $ perf record -e intel_pt//u --per-thread -- grep -r foo /usr/include
Packit b1f7ae
    [ perf record: Woken up 26 times to write data ]
Packit b1f7ae
    [ perf record: Captured and wrote 51.969 MB perf.data ]
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
This generates a `perf.data` file that contains the Intel PT trace, the sideband
Packit b1f7ae
information, and some metadata.  To process the trace with libipt, we need to
Packit b1f7ae
extract the Intel PT trace into one file per thread or cpu.
Packit b1f7ae
Packit b1f7ae
Looking at the raw trace dump of `perf script -D`, we notice
Packit b1f7ae
`PERF_RECORD_AUXTRACE` records.  The raw Intel PT trace is contained directly
Packit b1f7ae
after such records.  We can extract it with the `dd` command.  The arguments to
Packit b1f7ae
`dd` can be computed from the record's fields.  This can be done automatically,
Packit b1f7ae
for example with an AWK script.
Packit b1f7ae
Packit b1f7ae
~~~{.awk}
Packit b1f7ae
  /PERF_RECORD_AUXTRACE / {
Packit b1f7ae
    offset = strtonum($1)
Packit b1f7ae
    hsize  = strtonum(substr($2, 2))
Packit b1f7ae
    size   = strtonum($5)
Packit b1f7ae
    idx    = strtonum($11)
Packit b1f7ae
Packit b1f7ae
    ofile = sprintf("perf.data-aux-idx%d.bin", idx)
Packit b1f7ae
    begin = offset + hsize
Packit b1f7ae
Packit b1f7ae
    cmd = sprintf("dd if=perf.data of=%s conv=notrunc oflag=append ibs=1 \
Packit b1f7ae
                  skip=%d count=%d status=none", ofile, begin, size)
Packit b1f7ae
Packit b1f7ae
    system(cmd)
Packit b1f7ae
  }
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
The libipt tree contains such a script in `script/perf-read-aux.bash`.
Packit b1f7ae
Packit b1f7ae
In addition to the Intel PT trace, we need the traced memory image.  When
Packit b1f7ae
tracing a single process where the memory image does not change during tracing,
Packit b1f7ae
we can construct the memory image by examining `PERF_RECORD_MMAP` and
Packit b1f7ae
`PERF_RECORD_MMAP2` records.  This can again be done automatically, for example
Packit b1f7ae
with an AWK script.
Packit b1f7ae
Packit b1f7ae
~~~{.awk}
Packit b1f7ae
  function handle_mmap(file, vaddr) {
Packit b1f7ae
    if (match(file, /\[.*\]/) != 0) {
Packit b1f7ae
      # ignore 'virtual' file names like [kallsyms]
Packit b1f7ae
    }
Packit b1f7ae
    else if (match(file, /\.ko$/) != 0) {
Packit b1f7ae
      # ignore kernel objects
Packit b1f7ae
      #
Packit b1f7ae
      # use /proc/kcore
Packit b1f7ae
    }
Packit b1f7ae
    else {
Packit b1f7ae
      printf(" --elf %s:0x%x", file, vaddr)
Packit b1f7ae
    }
Packit b1f7ae
  }
Packit b1f7ae
Packit b1f7ae
  /PERF_RECORD_MMAP / {
Packit b1f7ae
    vaddr = strtonum(substr($5, 2))
Packit b1f7ae
    file = $9
Packit b1f7ae
Packit b1f7ae
    handle_mmap(file, vaddr)
Packit b1f7ae
  }
Packit b1f7ae
Packit b1f7ae
  /PERF_RECORD_MMAP2 / {
Packit b1f7ae
    vaddr = strtonum(substr($5, 2))
Packit b1f7ae
    file = $12
Packit b1f7ae
Packit b1f7ae
    handle_mmap(file, vaddr)
Packit b1f7ae
  }
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
The above script generates options for the `ptxed` sample tool.  The libipt tree
Packit b1f7ae
contains such a script in `script/perf-read-image.bash`.
Packit b1f7ae
Packit b1f7ae
Let's put it all together.
Packit b1f7ae
Packit b1f7ae
~~~{.sh}
Packit b1f7ae
    $ perf record -e intel_pt//u --per-thread -- grep -r foo /usr/include
Packit b1f7ae
    [ perf record: Woken up 26 times to write data ]
Packit b1f7ae
    [ perf record: Captured and wrote 51.969 MB perf.data ]
Packit b1f7ae
    $ script/perf-read-aux.bash
Packit b1f7ae
    $ script/perf-read-image.bash | xargs ptxed --cpu 6/61 --pt perf.data-aux-idx0.bin
Packit b1f7ae
~~~
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
### Sideband support
Packit b1f7ae
Packit b1f7ae
The above example does not consider sideband information.  It therefore only
Packit b1f7ae
works for not-too-complicated single-threaded applications.  For tracing
Packit b1f7ae
multi-threaded applications or for system-wide tracing (including ring-3),
Packit b1f7ae
sideband information is required for decoding the trace.
Packit b1f7ae
Packit b1f7ae
Sideband information can be defined as any information necessary for decoding
Packit b1f7ae
Intel PT that is not contained in the trace stream itself.  We already supply:
Packit b1f7ae
Packit b1f7ae
  * the binary files whose execution was traced and the virtual address at which
Packit b1f7ae
    each file was loaded
Packit b1f7ae
  * the family/model/stepping of the processor on which the trace was recorded
Packit b1f7ae
  * some information regarding timing
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
What's missing is information about changes to the traced memory image while the
Packit b1f7ae
trace is being recorded:
Packit b1f7ae
Packit b1f7ae
  * memory map/unamp information
Packit b1f7ae
  * context switch information
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
On Linux, this information can be found in the form of PERF_EVENT records in the
Packit b1f7ae
DATA buffer or in the perf.data file respectively.
Packit b1f7ae
Packit b1f7ae
Collection and interpretation of this information is currently left completely
Packit b1f7ae
to the user.
Packit b1f7ae
Packit b1f7ae
Packit b1f7ae
### Capturing Intel PT via Simple-PT
Packit b1f7ae
Packit b1f7ae
The Simple-PT project on github supports capturing Intel PT on Linux with an
Packit b1f7ae
alternative kernel driver.  The spt decoder supports sideband information.
Packit b1f7ae
Packit b1f7ae
See the project's page at https://github.com/andikleen/simple-pt for more
Packit b1f7ae
information including examples.