|
Packit |
b1f7ae |
Decoding Intel(R) Processor Trace Using libipt {#libipt}
|
|
Packit |
b1f7ae |
========================================================
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
! Copyright (c) 2013-2017, Intel Corporation
|
|
Packit |
b1f7ae |
!
|
|
Packit |
b1f7ae |
! Redistribution and use in source and binary forms, with or without
|
|
Packit |
b1f7ae |
! modification, are permitted provided that the following conditions are met:
|
|
Packit |
b1f7ae |
!
|
|
Packit |
b1f7ae |
! * Redistributions of source code must retain the above copyright notice,
|
|
Packit |
b1f7ae |
! this list of conditions and the following disclaimer.
|
|
Packit |
b1f7ae |
! * Redistributions in binary form must reproduce the above copyright notice,
|
|
Packit |
b1f7ae |
! this list of conditions and the following disclaimer in the documentation
|
|
Packit |
b1f7ae |
! and/or other materials provided with the distribution.
|
|
Packit |
b1f7ae |
! * Neither the name of Intel Corporation nor the names of its contributors
|
|
Packit |
b1f7ae |
! may be used to endorse or promote products derived from this software
|
|
Packit |
b1f7ae |
! without specific prior written permission.
|
|
Packit |
b1f7ae |
!
|
|
Packit |
b1f7ae |
! THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
Packit |
b1f7ae |
! AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
Packit |
b1f7ae |
! IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
Packit |
b1f7ae |
! ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
Packit |
b1f7ae |
! LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
Packit |
b1f7ae |
! CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
Packit |
b1f7ae |
! SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
Packit |
b1f7ae |
! INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
Packit |
b1f7ae |
! CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
Packit |
b1f7ae |
! ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
Packit |
b1f7ae |
! POSSIBILITY OF SUCH DAMAGE.
|
|
Packit |
b1f7ae |
!-->
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
This chapter describes how to use libipt for various tasks around Intel
|
|
Packit |
b1f7ae |
Processor Trace (Intel PT). For code examples, refer to the sample tools that
|
|
Packit |
b1f7ae |
are contained in the source tree:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* *ptdump* A packet dumper example.
|
|
Packit |
b1f7ae |
* *ptxed* A control-flow reconstruction example.
|
|
Packit |
b1f7ae |
* *pttc* A packet encoder example.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
For detailed information about Intel PT, please refer to chapter 36 of the Intel
|
|
Packit |
b1f7ae |
Software Developer's Manual at http://www.intel.com/sdm.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## Introduction
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The libipt decoder library provides multiple layers of abstraction ranging from
|
|
Packit |
b1f7ae |
packet encoding and decoding to full execution flow reconstruction. The layers
|
|
Packit |
b1f7ae |
are organized as follows:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* *packets* This layer deals with raw Intel PT packets.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* *events* This layer deals with packet combinations that
|
|
Packit |
b1f7ae |
encode higher-level events.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* *instruction flow* This layer deals with the execution flow on the
|
|
Packit |
b1f7ae |
instruction level.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* *block* This layer deals with the execution flow on the
|
|
Packit |
b1f7ae |
instruction level.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
It is faster than the instruction flow decoder but
|
|
Packit |
b1f7ae |
requires a small amount of post-processing.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Each layer provides its own encoder or decoder struct plus a set of functions
|
|
Packit |
b1f7ae |
for allocating and freeing encoder or decoder objects and for synchronizing
|
|
Packit |
b1f7ae |
decoders onto the Intel PT packet stream. Function names are prefixed with
|
|
Packit |
b1f7ae |
`pt_<lyr>_` where `<lyr>` is an abbreviation of the layer name. The following
|
|
Packit |
b1f7ae |
abbreviations are used:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* *enc* Packet encoding (packet layer).
|
|
Packit |
b1f7ae |
* *pkt* Packet decoding (packet layer).
|
|
Packit |
b1f7ae |
* *qry* Event (or query) layer.
|
|
Packit |
b1f7ae |
* *insn* Instruction flow layer.
|
|
Packit |
b1f7ae |
* *blk* Block layer.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Here is some generic example code for working with decoders:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_<layer>_decoder *decoder;
|
|
Packit |
b1f7ae |
struct pt_config config;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&config, 0, sizeof(config));
|
|
Packit |
b1f7ae |
config.size = sizeof(config);
|
|
Packit |
b1f7ae |
config.begin = <pt buffer begin>;
|
|
Packit |
b1f7ae |
config.end = <pt buffer end>;
|
|
Packit |
b1f7ae |
config.cpu = <cpu identifier>;
|
|
Packit |
b1f7ae |
config...
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
decoder = pt_<lyr>_alloc_decoder(&config);
|
|
Packit |
b1f7ae |
if (!decoder)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_<lyr>_sync_<where>(decoder);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
<use decoder>(decoder);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_<lyr>_free_decoder(decoder);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
First, configure the decoder. As a minimum, the size of the config struct and
|
|
Packit |
b1f7ae |
the `begin` and `end` of the buffer containing the Intel PT data need to be set.
|
|
Packit |
b1f7ae |
Configuration options details will be discussed later in this chapter. In the
|
|
Packit |
b1f7ae |
case of packet encoding, this is the begin and end address of the pre-allocated
|
|
Packit |
b1f7ae |
buffer, into which Intel PT packets shall be written.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Next, allocate a decoder object for the layer you are interested in. A return
|
|
Packit |
b1f7ae |
value of NULL indicates an error. There is no further information available on
|
|
Packit |
b1f7ae |
the exact error condition. Most of the time, however, the error is the result
|
|
Packit |
b1f7ae |
of an incomplete or inconsistent configuration.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Before the decoder can be used, it needs to be synchronized onto the Intel PT
|
|
Packit |
b1f7ae |
packet stream specified in the configuration. The only exception to this is the
|
|
Packit |
b1f7ae |
packet encoder, which is implicitly synchronized onto the beginning of the Intel
|
|
Packit |
b1f7ae |
PT buffer.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Depending on the type of decoder, one or more synchronization options are
|
|
Packit |
b1f7ae |
available.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_<lyr>_sync_forward()` Synchronize onto the next PSB in forward
|
|
Packit |
b1f7ae |
direction (or the first PSB if not yet
|
|
Packit |
b1f7ae |
synchronized).
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_<lyr>_sync_backward()` Synchronize onto the next PSB in backward
|
|
Packit |
b1f7ae |
direction (or the last PSB if not yet
|
|
Packit |
b1f7ae |
synchronized).
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_<lyr>_sync_set()` Set the synchronization position to a
|
|
Packit |
b1f7ae |
user-defined location in the Intel PT packet
|
|
Packit |
b1f7ae |
stream.
|
|
Packit |
b1f7ae |
There is no check whether the specified
|
|
Packit |
b1f7ae |
location makes sense or is valid.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
After synchronizing, the decoder can be used. While decoding, the decoder
|
|
Packit |
b1f7ae |
stores the location of the last PSB it encountered during normal decode.
|
|
Packit |
b1f7ae |
Subsequent calls to pt_<lyr>_sync_forward() will start searching from that
|
|
Packit |
b1f7ae |
location. This is useful for re-synchronizing onto the Intel PT packet stream
|
|
Packit |
b1f7ae |
in case of errors. An example of a typical decode loop is given below:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = <use decoder>(decoder);
|
|
Packit |
b1f7ae |
if (errcode >= 0)
|
|
Packit |
b1f7ae |
continue;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (errcode == -pte_eos)
|
|
Packit |
b1f7ae |
return;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
<report error>(errcode);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
do {
|
|
Packit |
b1f7ae |
errcode = pt_<lyr>_sync_forward(decoder);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (errcode == -pte_eos)
|
|
Packit |
b1f7ae |
return;
|
|
Packit |
b1f7ae |
} while (errcode < 0);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the current decoder position as offset into the Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_<lyr>_get_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the position of the last synchronization point as offset into the
|
|
Packit |
b1f7ae |
Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_<lyr>_get_sync_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Each layer will be discussed in detail below. In the remainder of this section,
|
|
Packit |
b1f7ae |
general functionality will be considered.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Version
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can query the library version using:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_library_version()`
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
This function returns a version structure that can be used for compatibility
|
|
Packit |
b1f7ae |
checks or simply for reporting the version of the decoder library.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Errors
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The library uses a single error enum for all layers.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `enum pt_error_code` An enumeration of encode and decode errors.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Errors are typically represented as negative pt_error_code enumeration constants
|
|
Packit |
b1f7ae |
and returned as an int. The library provides two functions for dealing with
|
|
Packit |
b1f7ae |
errors:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_errcode()` Translate an int return value into a pt_error_code
|
|
Packit |
b1f7ae |
enumeration constant.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_errstr()` Returns a human-readable error string.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Not all errors may occur on every layer. Every API function specifies the
|
|
Packit |
b1f7ae |
errors it may return.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Configuration
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Every encoder or decoder allocation function requires a configuration argument.
|
|
Packit |
b1f7ae |
Some of its fields have already been discussed in the example above. Refer to
|
|
Packit |
b1f7ae |
the `intel-pt.h` header for detailed and up-to-date documentation of each field.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
As a minimum, the `size` field needs to be set to `sizeof(struct pt_config)` and
|
|
Packit |
b1f7ae |
`begin` and `end` need to be set to the Intel PT buffer to use.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The size is used for detecting library version mismatches and to provide
|
|
Packit |
b1f7ae |
backwards compatibility. Without the proper `size`, decoder allocation will
|
|
Packit |
b1f7ae |
fail.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Although not strictly required, it is recommended to also set the `cpu` field to
|
|
Packit |
b1f7ae |
the processor, on which Intel PT has been collected (for decoders), or for which
|
|
Packit |
b1f7ae |
Intel PT shall be generated (for encoders). This allows implementing
|
|
Packit |
b1f7ae |
processor-specific behavior such as erratum workarounds.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## The Packet Layer
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
This layer deals with Intel PT packet encoding and decoding. It can further be
|
|
Packit |
b1f7ae |
split into three sub-layers: opcodes, encoding, and decoding.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Opcodes
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The opcodes layer provides enumerations for all the bits necessary for Intel PT
|
|
Packit |
b1f7ae |
encoding and decoding. The enumeration constants can be used without linking to
|
|
Packit |
b1f7ae |
the decoder library. There is no encoder or decoder struct associated with this
|
|
Packit |
b1f7ae |
layer. See the intel-pt.h header file for details.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Packet Encoding
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The packet encoding layer provides support for encoding Intel PT
|
|
Packit |
b1f7ae |
packet-by-packet. Start by configuring and allocating a `pt_packet_encoder` as
|
|
Packit |
b1f7ae |
shown below:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_encoder *encoder;
|
|
Packit |
b1f7ae |
struct pt_config config;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&config, 0, sizeof(config));
|
|
Packit |
b1f7ae |
config.size = sizeof(config);
|
|
Packit |
b1f7ae |
config.begin = <pt buffer begin>;
|
|
Packit |
b1f7ae |
config.end = <pt buffer end>;
|
|
Packit |
b1f7ae |
config.cpu = <cpu identifier>;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
encoder = pt_alloc_encoder(&config);
|
|
Packit |
b1f7ae |
if (!encoder)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
For packet encoding, only the mandatory config fields need to be filled in.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The allocated encoder object will be implicitly synchronized onto the beginning
|
|
Packit |
b1f7ae |
of the Intel PT buffer. You may change the encoder's position at any time by
|
|
Packit |
b1f7ae |
calling `pt_enc_sync_set()` with the desired buffer offset.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Next, fill in a `pt_packet` object with details about the packet to be encoded.
|
|
Packit |
b1f7ae |
You do not need to fill in the `size` field. The needed size is computed by the
|
|
Packit |
b1f7ae |
encoder. There is no consistency check with the size specified in the packet
|
|
Packit |
b1f7ae |
object. The following example encodes a TIP packet:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_packet_encoder *encoder = ...;
|
|
Packit |
b1f7ae |
struct pt_packet packet;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
packet.type = ppt_tip;
|
|
Packit |
b1f7ae |
packet.payload.ip.ipc = pt_ipc_update_16;
|
|
Packit |
b1f7ae |
packet.payload.ip.ip = <ip>;
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
For IP packets, for example FUP or TIP.PGE, there is no need to mask out bits in
|
|
Packit |
b1f7ae |
the `ip` field that will not be encoded in the packet due to the specified IP
|
|
Packit |
b1f7ae |
compression in the `ipc` field. The encoder will ignore them.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
There are no consistency checks whether the specified IP compression in the
|
|
Packit |
b1f7ae |
`ipc` field is allowed in the current context or whether decode will result in
|
|
Packit |
b1f7ae |
the full IP specified in the `ip` field.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Once the packet object has been filled, it can be handed over to the encoder as
|
|
Packit |
b1f7ae |
shown here:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
errcode = pt_enc_next(encoder, &packet);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The encoder will encode the packet, write it into the Intel PT buffer, and
|
|
Packit |
b1f7ae |
advance its position to the next byte after the packet. On a successful encode,
|
|
Packit |
b1f7ae |
it will return the number of bytes that have been written. In case of errors,
|
|
Packit |
b1f7ae |
nothing will be written and the encoder returns a negative error code.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
### Packet Decoding
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The packet decoding layer provides support for decoding Intel PT
|
|
Packit |
b1f7ae |
packet-by-packet. Start by configuring and allocating a `pt_packet_decoder` as
|
|
Packit |
b1f7ae |
shown here:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_packet_decoder *decoder;
|
|
Packit |
b1f7ae |
struct pt_config config;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&config, 0, sizeof(config));
|
|
Packit |
b1f7ae |
config.size = sizeof(config);
|
|
Packit |
b1f7ae |
config.begin = <pt buffer begin>;
|
|
Packit |
b1f7ae |
config.end = <pt buffer end>;
|
|
Packit |
b1f7ae |
config.cpu = <cpu identifier>;
|
|
Packit |
b1f7ae |
config.decode.callback = <decode function>;
|
|
Packit |
b1f7ae |
config.decode.context = <decode context>;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
decoder = pt_pkt_alloc_decoder(&config);
|
|
Packit |
b1f7ae |
if (!decoder)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
For packet decoding, an optional decode callback function may be specified in
|
|
Packit |
b1f7ae |
addition to the mandatory config fields. If specified, the callback function
|
|
Packit |
b1f7ae |
will be called for packets the decoder does not know about. If there is no
|
|
Packit |
b1f7ae |
decode callback specified, the decoder will return `-pte_bad_opc`. In addition
|
|
Packit |
b1f7ae |
to the callback function pointer, an optional pointer to user-defined context
|
|
Packit |
b1f7ae |
information can be specified. This context will be passed to the decode
|
|
Packit |
b1f7ae |
callback function.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Before the decoder can be used, it needs to be synchronized onto the Intel PT
|
|
Packit |
b1f7ae |
packet stream. Packet decoders offer three synchronization functions. To
|
|
Packit |
b1f7ae |
iterate over synchronization points in the Intel PT packet stream in forward or
|
|
Packit |
b1f7ae |
backward direction, use one of the following two functions respectively:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_pkt_sync_forward()
|
|
Packit |
b1f7ae |
pt_pkt_sync_backward()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
To manually synchronize the decoder at a particular offset into the Intel PT
|
|
Packit |
b1f7ae |
packet stream, use the following function:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_pkt_sync_set()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
There are no checks to ensure that the specified offset is at the beginning of a
|
|
Packit |
b1f7ae |
packet. The example below shows synchronization to the first synchronization
|
|
Packit |
b1f7ae |
point:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_packet_decoder *decoder;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_pkt_sync_forward(decoder);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The decoder will remember the last synchronization packet it decoded.
|
|
Packit |
b1f7ae |
Subsequent calls to `pt_pkt_sync_forward` and `pt_pkt_sync_backward` will use
|
|
Packit |
b1f7ae |
this as their starting point.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the current decoder position as offset into the Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_pkt_get_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the position of the last synchronization point as offset into the
|
|
Packit |
b1f7ae |
Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_pkt_get_sync_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Once the decoder is synchronized, you can iterate over packets by repeated calls
|
|
Packit |
b1f7ae |
to `pt_pkt_next()` as shown in the following example:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_packet_decoder *decoder;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
struct pt_packet packet;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_pkt_next(decoder, &packet, sizeof(packet));
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
<process packet>(&packet);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## The Event Layer
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The event layer deals with packet combinations that encode higher-level events.
|
|
Packit |
b1f7ae |
It is used for reconstructing execution flow for users who need finer-grain
|
|
Packit |
b1f7ae |
control not available via the instruction flow layer or for users who want to
|
|
Packit |
b1f7ae |
integrate execution flow reconstruction with other functionality more tightly
|
|
Packit |
b1f7ae |
than it would be possible otherwise.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
This section describes how to use the query decoder for reconstructing execution
|
|
Packit |
b1f7ae |
flow. See the instruction flow decoder as an example. Start by configuring and
|
|
Packit |
b1f7ae |
allocating a `pt_query_decoder` as shown below:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_query_decoder *decoder;
|
|
Packit |
b1f7ae |
struct pt_config config;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&config, 0, sizeof(config));
|
|
Packit |
b1f7ae |
config.size = sizeof(config);
|
|
Packit |
b1f7ae |
config.begin = <pt buffer begin>;
|
|
Packit |
b1f7ae |
config.end = <pt buffer end>;
|
|
Packit |
b1f7ae |
config.cpu = <cpu identifier>;
|
|
Packit |
b1f7ae |
config.decode.callback = <decode function>;
|
|
Packit |
b1f7ae |
config.decode.context = <decode context>;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
decoder = pt_qry_alloc_decoder(&config);
|
|
Packit |
b1f7ae |
if (!decoder)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
An optional packet decode callback function may be specified in addition to the
|
|
Packit |
b1f7ae |
mandatory config fields. If specified, the callback function will be called for
|
|
Packit |
b1f7ae |
packets the decoder does not know about. The query decoder will ignore the
|
|
Packit |
b1f7ae |
unknown packet except for its size in order to skip it. If there is no decode
|
|
Packit |
b1f7ae |
callback specified, the decoder will abort with `-pte_bad_opc`. In addition to
|
|
Packit |
b1f7ae |
the callback function pointer, an optional pointer to user-defined context
|
|
Packit |
b1f7ae |
information can be specified. This context will be passed to the decode
|
|
Packit |
b1f7ae |
callback function.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Before the decoder can be used, it needs to be synchronized onto the Intel PT
|
|
Packit |
b1f7ae |
packet stream. To iterate over synchronization points in the Intel PT packet
|
|
Packit |
b1f7ae |
stream in forward or backward direction, the query decoders offer the following
|
|
Packit |
b1f7ae |
two synchronization functions respectively:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_qry_sync_forward()
|
|
Packit |
b1f7ae |
pt_qry_sync_backward()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
To manually synchronize the decoder at a synchronization point (i.e. PSB packet)
|
|
Packit |
b1f7ae |
in the Intel PT packet stream, use the following function:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_qry_sync_set()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
After successfully synchronizing, the query decoder will start reading the PSB+
|
|
Packit |
b1f7ae |
header to initialize its internal state. If tracing is enabled at this
|
|
Packit |
b1f7ae |
synchronization point, the IP of the instruction, at which decoding should be
|
|
Packit |
b1f7ae |
started, is returned. If tracing is disabled at this synchronization point, it
|
|
Packit |
b1f7ae |
will be indicated in the returned status bits (see below). In this example,
|
|
Packit |
b1f7ae |
synchronization to the first synchronization point is shown:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_query_decoder *decoder;
|
|
Packit |
b1f7ae |
uint64_t ip;
|
|
Packit |
b1f7ae |
int status;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
status = pt_qry_sync_forward(decoder, &ip);
|
|
Packit |
b1f7ae |
if (status < 0)
|
|
Packit |
b1f7ae |
<handle error>(status);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In addition to a query decoder, you will need an instruction decoder for
|
|
Packit |
b1f7ae |
decoding and classifying instructions.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### In A Nutshell
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
After synchronizing, you begin decoding instructions starting at the returned
|
|
Packit |
b1f7ae |
IP. As long as you can determine the next instruction in execution order, you
|
|
Packit |
b1f7ae |
continue on your own. Only when the next instruction cannot be determined by
|
|
Packit |
b1f7ae |
examining the current instruction, you would ask the query decoder for guidance:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* If the current instruction is a conditional branch, the
|
|
Packit |
b1f7ae |
`pt_qry_cond_branch()` function will tell whether it was taken.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* If the current instruction is an indirect branch, the
|
|
Packit |
b1f7ae |
`pt_qry_indirect_branch()` function will provide the IP of its destination.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_query_decoder *decoder;
|
|
Packit |
b1f7ae |
uint64_t ip;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
struct <instruction> insn;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
insn = <decode instruction>(ip);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
ip += <instruction size>(insn);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (<is cond branch>(insn)) {
|
|
Packit |
b1f7ae |
int status, taken;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
status = pt_qry_cond_branch(decoder, &taken);
|
|
Packit |
b1f7ae |
if (status < 0)
|
|
Packit |
b1f7ae |
<handle error>(status);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (taken)
|
|
Packit |
b1f7ae |
ip += <branch displacement>(insn);
|
|
Packit |
b1f7ae |
} else if (<is indirect branch>(insn)) {
|
|
Packit |
b1f7ae |
int status;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
status = pt_qry_indirect_branch(decoder, &ip);
|
|
Packit |
b1f7ae |
if (status < 0)
|
|
Packit |
b1f7ae |
<handle error>(status);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Certain aspects such as, for example, asynchronous events or synchronizing at a
|
|
Packit |
b1f7ae |
location where tracing is disabled, have been ignored so far. Let us consider
|
|
Packit |
b1f7ae |
them now.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Queries
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The query decoder provides four query functions:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_qry_cond_branch()` Query whether the next conditional branch was
|
|
Packit |
b1f7ae |
taken.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_qry_indirect_branch()` Query for the destination IP of the next
|
|
Packit |
b1f7ae |
indirect branch.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_qry_event()` Query for the next event.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
* `pt_qry_time()` Query for the current time.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Each function returns either a positive vector of status bits or a negative
|
|
Packit |
b1f7ae |
error code. For details on status bits and error conditions, please refer to
|
|
Packit |
b1f7ae |
the `pt_status_flag` and `pt_error_code` enumerations in the intel-pt.h header.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The `pts_ip_suppressed` status bit is used to indicate that no IP is available
|
|
Packit |
b1f7ae |
at functions that are supposed to return an IP. Examples are the indirect
|
|
Packit |
b1f7ae |
branch query function and both synchronization functions.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The `pts_event_pending` status bit is used to indicate that there is an event
|
|
Packit |
b1f7ae |
pending. You should query for this event before continuing execution flow
|
|
Packit |
b1f7ae |
reconstruction.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The `pts_eos` status bit is used to indicate the end of the trace. Any
|
|
Packit |
b1f7ae |
subsequent query will return -pte_eos.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Events
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Events are signaled ahead of time. When you query for pending events as soon as
|
|
Packit |
b1f7ae |
they are indicated, you will be aware of asynchronous events before you reach
|
|
Packit |
b1f7ae |
the instruction associated with the event.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
For example, if tracing is disabled at the synchronization point, the IP will be
|
|
Packit |
b1f7ae |
suppressed. In this case, it is very likely that a tracing enabled event is
|
|
Packit |
b1f7ae |
signaled. You will also get events for initializing the decoder state after
|
|
Packit |
b1f7ae |
synchronizing onto the Intel PT packet stream. For example, paging or execution
|
|
Packit |
b1f7ae |
mode events.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
See the `enum pt_event_type` and `struct pt_event` in the intel-pt.h header for
|
|
Packit |
b1f7ae |
details on possible events. This document does not give an example of event
|
|
Packit |
b1f7ae |
processing. Refer to the implementation of the instruction flow decoder in
|
|
Packit |
b1f7ae |
pt_insn.c for details.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Timing
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
To be able to signal events, the decoder reads ahead until it arrives at a query
|
|
Packit |
b1f7ae |
relevant packet. Errors encountered during that time will be postponed until
|
|
Packit |
b1f7ae |
the respective query call. This reading ahead affects timing. The decoder will
|
|
Packit |
b1f7ae |
always be a few packets ahead. When querying for the current time, the query
|
|
Packit |
b1f7ae |
will return the time at the decoder's current packet. This corresponds to the
|
|
Packit |
b1f7ae |
time at our next query.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Return Compression
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
If Intel PT has been configured to compress returns, a successfully compressed
|
|
Packit |
b1f7ae |
return is represented as a conditional branch instead of an indirect branch.
|
|
Packit |
b1f7ae |
For a RET instruction, you first query for a conditional branch. If the query
|
|
Packit |
b1f7ae |
succeeds, it should indicate that the branch was taken. In that case, the
|
|
Packit |
b1f7ae |
return has been compressed. A not taken branch indicates an error. If the
|
|
Packit |
b1f7ae |
query fails, the return has not been compressed and you query for an indirect
|
|
Packit |
b1f7ae |
branch.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
There is no guarantee that returns will be compressed. Even though return
|
|
Packit |
b1f7ae |
compression has been enabled, returns may still be represented as indirect
|
|
Packit |
b1f7ae |
branches.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
To reconstruct the execution flow for compressed returns, you would maintain a
|
|
Packit |
b1f7ae |
stack of return addresses. For each call instruction, push the IP of the
|
|
Packit |
b1f7ae |
instruction following the call onto the stack. For compressed returns, pop the
|
|
Packit |
b1f7ae |
topmost IP from the stack. See pt_retstack.h and pt_retstack.c for a sample
|
|
Packit |
b1f7ae |
implementation.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## The Instruction Flow Layer
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The instruction flow layer provides a simple API for iterating over instructions
|
|
Packit |
b1f7ae |
in execution order. Start by configuring and allocating a `pt_insn_decoder` as
|
|
Packit |
b1f7ae |
shown below:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_insn_decoder *decoder;
|
|
Packit |
b1f7ae |
struct pt_config config;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&config, 0, sizeof(config));
|
|
Packit |
b1f7ae |
config.size = sizeof(config);
|
|
Packit |
b1f7ae |
config.begin = <pt buffer begin>;
|
|
Packit |
b1f7ae |
config.end = <pt buffer end>;
|
|
Packit |
b1f7ae |
config.cpu = <cpu identifier>;
|
|
Packit |
b1f7ae |
config.decode.callback = <decode function>;
|
|
Packit |
b1f7ae |
config.decode.context = <decode context>;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
decoder = pt_insn_alloc_decoder(&config);
|
|
Packit |
b1f7ae |
if (!decoder)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
An optional packet decode callback function may be specified in addition to the
|
|
Packit |
b1f7ae |
mandatory config fields. If specified, the callback function will be called for
|
|
Packit |
b1f7ae |
packets the decoder does not know about. The decoder will ignore the unknown
|
|
Packit |
b1f7ae |
packet except for its size in order to skip it. If there is no decode callback
|
|
Packit |
b1f7ae |
specified, the decoder will abort with `-pte_bad_opc`. In addition to the
|
|
Packit |
b1f7ae |
callback function pointer, an optional pointer to user-defined context
|
|
Packit |
b1f7ae |
information can be specified. This context will be passed to the decode
|
|
Packit |
b1f7ae |
callback function.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The image argument is optional. If no image is given, the decoder will use an
|
|
Packit |
b1f7ae |
empty default image that can be populated later on and that is implicitly
|
|
Packit |
b1f7ae |
destroyed when the decoder is freed. See below for more information on this.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### The Traced Image
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In addition to the Intel PT configuration, the instruction flow decoder needs to
|
|
Packit |
b1f7ae |
know the memory image for which Intel PT has been recorded. This memory image
|
|
Packit |
b1f7ae |
is represented by a `pt_image` object. If decoding failed due to an IP lying
|
|
Packit |
b1f7ae |
outside of the traced memory image, `pt_insn_next()` will return `-pte_nomap`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Use `pt_image_alloc()` to allocate and `pt_image_free()` to free an image.
|
|
Packit |
b1f7ae |
Images may not be shared. Every decoder must use a different image. Use this
|
|
Packit |
b1f7ae |
to prepare the image in advance or if you want to switch between images.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Every decoder provides an empty default image that is used if no image is
|
|
Packit |
b1f7ae |
specified during allocation. The default image is implicitly destroyed when the
|
|
Packit |
b1f7ae |
decoder is freed. It can be obtained by calling `pt_insn_get_image()`. Use
|
|
Packit |
b1f7ae |
this if you only use one decoder and one image.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
An image is a collection of contiguous, non-overlapping memory regions called
|
|
Packit |
b1f7ae |
`sections`. Starting with an empty image, it may be populated with repeated
|
|
Packit |
b1f7ae |
calls to `pt_image_add_file()` or `pt_image_add_cached()`, one for each section,
|
|
Packit |
b1f7ae |
or with a call to `pt_image_copy()` to add all sections from another image. If
|
|
Packit |
b1f7ae |
a newly added section overlaps with an existing section, the existing section
|
|
Packit |
b1f7ae |
will be truncated or split to make room for the new section.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In some cases, the memory image may change during the execution. You can use
|
|
Packit |
b1f7ae |
the `pt_image_remove_by_filename()` function to remove previously added sections
|
|
Packit |
b1f7ae |
by their file name and `pt_image_remove_by_asid()` to remove all sections for an
|
|
Packit |
b1f7ae |
address-space.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In addition to adding sections, you can register a callback function for reading
|
|
Packit |
b1f7ae |
memory using `pt_image_set_callback()`. The `context` parameter you pass
|
|
Packit |
b1f7ae |
together with the callback function pointer will be passed to your callback
|
|
Packit |
b1f7ae |
function every time it is called. There can only be one callback at any time.
|
|
Packit |
b1f7ae |
Adding a new callback will remove any previously added callback. To remove the
|
|
Packit |
b1f7ae |
callback function, pass `NULL` to `pt_image_set_callback()`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Callback and files may be combined. The callback function is used whenever
|
|
Packit |
b1f7ae |
the memory cannot be found in any of the image's sections.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
If more than one process is traced, the memory image may change when the process
|
|
Packit |
b1f7ae |
context is switched. To simplify handling this case, an address-space
|
|
Packit |
b1f7ae |
identifier may be passed to each of the above functions to define separate
|
|
Packit |
b1f7ae |
images for different processes at the same time. The decoder will select the
|
|
Packit |
b1f7ae |
correct image based on context switch information in the Intel PT trace. If
|
|
Packit |
b1f7ae |
you want to manage this on your own, you can use `pt_insn_set_image()` to
|
|
Packit |
b1f7ae |
replace the image a decoder uses.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### The Traced Image Section Cache
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
When using multiple decoders that work on related memory images it is desirable
|
|
Packit |
b1f7ae |
to share image sections between decoders. The underlying file sections will be
|
|
Packit |
b1f7ae |
mapped only once per image section cache.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Use `pt_iscache_alloc()` to allocate and `pt_iscache_free()` to free an image
|
|
Packit |
b1f7ae |
section cache. Freeing the cache does not destroy sections added to the cache.
|
|
Packit |
b1f7ae |
They remain valid until they are no longer used.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Use `pt_iscache_add_file()` to add a file section to an image section cache.
|
|
Packit |
b1f7ae |
The function returns an image section identifier (ISID) that uniquely identifies
|
|
Packit |
b1f7ae |
the section in this cache. Use `pt_image_add_cached()` to add a file section
|
|
Packit |
b1f7ae |
from an image section cache to an image.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Multiple image section caches may be used at the same time but it is recommended
|
|
Packit |
b1f7ae |
not to mix sections from different image section caches in one image.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
A traced image section cache can also be used for reading an instruction's
|
|
Packit |
b1f7ae |
memory via its IP and ISID as provided in `struct pt_insn`.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Synchronizing
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Before the decoder can be used, it needs to be synchronized onto the Intel PT
|
|
Packit |
b1f7ae |
packet stream. To iterate over synchronization points in the Intel PT packet
|
|
Packit |
b1f7ae |
stream in forward or backward directions, the instruction flow decoders offer
|
|
Packit |
b1f7ae |
the following two synchronization functions respectively:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_insn_sync_forward()
|
|
Packit |
b1f7ae |
pt_insn_sync_backward()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
To manually synchronize the decoder at a synchronization point (i.e. PSB packet)
|
|
Packit |
b1f7ae |
in the Intel PT packet stream, use the following function:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_insn_sync_set()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The example below shows synchronization to the first synchronization point:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_insn_decoder *decoder;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_insn_sync_forward(decoder);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The decoder will remember the last synchronization packet it decoded.
|
|
Packit |
b1f7ae |
Subsequent calls to `pt_insn_sync_forward` and `pt_insn_sync_backward` will use
|
|
Packit |
b1f7ae |
this as their starting point.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the current decoder position as offset into the Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_insn_get_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the position of the last synchronization point as offset into the
|
|
Packit |
b1f7ae |
Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_insn_get_sync_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Iterating
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Once the decoder is synchronized, you can iterate over instructions in execution
|
|
Packit |
b1f7ae |
flow order by repeated calls to `pt_insn_next()` as shown in the following
|
|
Packit |
b1f7ae |
example:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_insn_decoder *decoder;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
struct pt_insn insn;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_insn_next(decoder, &insn, sizeof(insn));
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (insn.iclass != ptic_error)
|
|
Packit |
b1f7ae |
<process instruction>(&insn);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
For each instruction, you get its IP, its size in bytes, the raw memory, an
|
|
Packit |
b1f7ae |
identifier for the image section that contained it, the current execution mode,
|
|
Packit |
b1f7ae |
and the speculation state, that is whether the instruction has been executed
|
|
Packit |
b1f7ae |
speculatively. In addition, you get a coarse classification that can be used
|
|
Packit |
b1f7ae |
for further processing without the need for a full instruction decode.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
If a traced image section cache is used the image section identifier can be used
|
|
Packit |
b1f7ae |
to trace an instruction back to the binary file that contained it. This allows
|
|
Packit |
b1f7ae |
mapping the instruction back to source code using the debug information
|
|
Packit |
b1f7ae |
contained in or reachable via the binary file.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You also get some information about events that occured either before or after
|
|
Packit |
b1f7ae |
executing the instruction like enable or disable tracing. For detailed
|
|
Packit |
b1f7ae |
information about instructions, see `enum pt_insn_class` and `struct pt_insn` in
|
|
Packit |
b1f7ae |
the intel-pt.h header file.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Beware that `pt_insn_next()` may indicate errors that occur after the returned
|
|
Packit |
b1f7ae |
instruction. The returned instruction is valid if its `iclass` field is set.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## The Block Layer
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The block layer provides a simple API for iterating over blocks of sequential
|
|
Packit |
b1f7ae |
instructions in execution order. The instructions in a block are sequential in
|
|
Packit |
b1f7ae |
the sense that no trace is required for reconstructing the instructions. The IP
|
|
Packit |
b1f7ae |
of the first instruction is given in `struct pt_block` and the IP of other
|
|
Packit |
b1f7ae |
instructions in the block can be determined by decoding and examining the
|
|
Packit |
b1f7ae |
previous instruction.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Start by configuring and allocating a `pt_block_decoder` as shown below:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_block_decoder *decoder;
|
|
Packit |
b1f7ae |
struct pt_config config;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
memset(&config, 0, sizeof(config));
|
|
Packit |
b1f7ae |
config.size = sizeof(config);
|
|
Packit |
b1f7ae |
config.begin = <pt buffer begin>;
|
|
Packit |
b1f7ae |
config.end = <pt buffer end>;
|
|
Packit |
b1f7ae |
config.cpu = <cpu identifier>;
|
|
Packit |
b1f7ae |
config.decode.callback = <decode function>;
|
|
Packit |
b1f7ae |
config.decode.context = <decode context>;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
decoder = pt_blk_alloc_decoder(&config);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
An optional packet decode callback function may be specified in addition to the
|
|
Packit |
b1f7ae |
mandatory config fields. If specified, the callback function will be called for
|
|
Packit |
b1f7ae |
packets the decoder does not know about. The decoder will ignore the unknown
|
|
Packit |
b1f7ae |
packet except for its size in order to skip it. If there is no decode callback
|
|
Packit |
b1f7ae |
specified, the decoder will abort with `-pte_bad_opc`. In addition to the
|
|
Packit |
b1f7ae |
callback function pointer, an optional pointer to user-defined context
|
|
Packit |
b1f7ae |
information can be specified. This context will be passed to the decode
|
|
Packit |
b1f7ae |
callback function.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Synchronizing
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Before the decoder can be used, it needs to be synchronized onto the Intel PT
|
|
Packit |
b1f7ae |
packet stream. To iterate over synchronization points in the Intel PT packet
|
|
Packit |
b1f7ae |
stream in forward or backward directions, the block decoder offers the following
|
|
Packit |
b1f7ae |
two synchronization functions respectively:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_blk_sync_forward()
|
|
Packit |
b1f7ae |
pt_blk_sync_backward()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
To manually synchronize the decoder at a synchronization point (i.e. PSB packet)
|
|
Packit |
b1f7ae |
in the Intel PT packet stream, use the following function:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_blk_sync_set()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The example below shows synchronization to the first synchronization point:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_block_decoder *decoder;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_blk_sync_forward(decoder);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The decoder will remember the last synchronization packet it decoded.
|
|
Packit |
b1f7ae |
Subsequent calls to `pt_blk_sync_forward` and `pt_blk_sync_backward` will use
|
|
Packit |
b1f7ae |
this as their starting point.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the current decoder position as offset into the Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_blk_get_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
You can get the position of the last synchronization point as offset into the
|
|
Packit |
b1f7ae |
Intel PT buffer via:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
pt_blk_get_sync_offset()
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
#### Iterating
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Once the decoder is synchronized, it can be used to iterate over blocks of
|
|
Packit |
b1f7ae |
instructions in execution flow order by repeated calls to `pt_blk_next()` as
|
|
Packit |
b1f7ae |
shown in the following example:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_block_decoder *decoder;
|
|
Packit |
b1f7ae |
int errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
struct pt_block block;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_blk_next(decoder, &block, sizeof(block));
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (block.ninsn > 0)
|
|
Packit |
b1f7ae |
<process block>(&block);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
A block contains enough information to reconstruct the instructions. See
|
|
Packit |
b1f7ae |
`struct pt_block` in `intel-pt.h` for details. Note that errors returned by
|
|
Packit |
b1f7ae |
`pt_blk_next()` apply after the last instruction in the provided block.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
It is recommended to use a traced image section cache so the image section
|
|
Packit |
b1f7ae |
identifier contained in a block can be used for reading the memory containing
|
|
Packit |
b1f7ae |
the instructions in the block. This also allows mapping the instructions back
|
|
Packit |
b1f7ae |
to source code using the debug information contained in or reachable via the
|
|
Packit |
b1f7ae |
binary file.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
In some cases, the last instruction in a block may cross image section
|
|
Packit |
b1f7ae |
boundaries. This can happen when a code segment is split into more than one
|
|
Packit |
b1f7ae |
image section. The block is marked truncated in this case and provides the raw
|
|
Packit |
b1f7ae |
bytes of the last instruction.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The following example shows how instructions can be reconstructed from a block:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_image_section_cache *iscache;
|
|
Packit |
b1f7ae |
struct pt_block *block;
|
|
Packit |
b1f7ae |
uint16_t ninsn;
|
|
Packit |
b1f7ae |
uint64_t ip;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
ip = block->ip;
|
|
Packit |
b1f7ae |
for (ninsn = 0; ninsn < block->ninsn; ++ninsn) {
|
|
Packit |
b1f7ae |
uint8_t raw[pt_max_insn_size];
|
|
Packit |
b1f7ae |
<struct insn> insn;
|
|
Packit |
b1f7ae |
int size;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (block->truncated && ((ninsn +1) == block->ninsn)) {
|
|
Packit |
b1f7ae |
memcpy(raw, block->raw, block->size);
|
|
Packit |
b1f7ae |
size = block->size;
|
|
Packit |
b1f7ae |
} else {
|
|
Packit |
b1f7ae |
size = pt_iscache_read(iscache, raw, sizeof(raw), block->isid, ip);
|
|
Packit |
b1f7ae |
if (size < 0)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = <decode instruction>(&insn, raw, size, block->mode);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
<process instruction>(&insn);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
ip = <determine next ip>(&insn);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## Parallel Decode
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
Intel PT splits naturally into self-contained PSB segments that can be decoded
|
|
Packit |
b1f7ae |
independently. Use the packet or query decoder to search for PSB's using
|
|
Packit |
b1f7ae |
repeated calls to `pt_pkt_sync_forward()` and `pt_pkt_get_sync_offset()` (or
|
|
Packit |
b1f7ae |
`pt_qry_sync_forward()` and `pt_qry_get_sync_offset()`). The following example
|
|
Packit |
b1f7ae |
shows this using the query decoder, which will already give the IP needed in
|
|
Packit |
b1f7ae |
the next step.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_query_decoder *decoder;
|
|
Packit |
b1f7ae |
uint64_t offset, ip;
|
|
Packit |
b1f7ae |
int status, errcode;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
status = pt_qry_sync_forward(decoder, &ip);
|
|
Packit |
b1f7ae |
if (status < 0)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
errcode = pt_qry_get_sync_offset(decoder, &offset);
|
|
Packit |
b1f7ae |
if (errcode < 0)
|
|
Packit |
b1f7ae |
<handle error>(errcode);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
<split trace>(offset, ip, status);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The individual trace segments can then be decoded using the query, instruction
|
|
Packit |
b1f7ae |
flow, or block decoder as shown above in the previous examples.
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
When stitching decoded trace segments together, a sequence of linear (in the
|
|
Packit |
b1f7ae |
sense that it can be decoded without Intel PT) code has to be filled in. Use
|
|
Packit |
b1f7ae |
the `pts_eos` status indication to stop decoding early enough. Then proceed
|
|
Packit |
b1f7ae |
until the IP at the start of the succeeding trace segment is reached. When
|
|
Packit |
b1f7ae |
using the instruction flow decoder, `pt_insn_next()` may be used for that as
|
|
Packit |
b1f7ae |
shown in the following example:
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
~~~{.c}
|
|
Packit |
b1f7ae |
struct pt_insn_decoder *decoder;
|
|
Packit |
b1f7ae |
struct pt_insn insn;
|
|
Packit |
b1f7ae |
int status;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
for (;;) {
|
|
Packit |
b1f7ae |
status = pt_insn_next(decoder, &insn, sizeof(insn));
|
|
Packit |
b1f7ae |
if (status < 0)
|
|
Packit |
b1f7ae |
<handle error>(status);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
if (status & pts_eos)
|
|
Packit |
b1f7ae |
break;
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
<process instruction>(&insn);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
while (insn.ip != <next segment's start IP>) {
|
|
Packit |
b1f7ae |
<process instruction>(&insn);
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
status = pt_insn_next(decoder, &insn, sizeof(insn));
|
|
Packit |
b1f7ae |
if (status < 0)
|
|
Packit |
b1f7ae |
<handle error>(status);
|
|
Packit |
b1f7ae |
}
|
|
Packit |
b1f7ae |
~~~
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
## Threading
|
|
Packit |
b1f7ae |
|
|
Packit |
b1f7ae |
The decoder library API is not thread-safe. Different threads may allocate and
|
|
Packit |
b1f7ae |
use different decoder objects at the same time. Different decoders must not use
|
|
Packit |
b1f7ae |
the same image object. Use `pt_image_copy()` to give each decoder its own copy
|
|
Packit |
b1f7ae |
of a shared master image.
|