|
Packit |
d0b620 |
# Actions and States
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Parsing, i.e. matching an input with a grammar rule, by itself only indicates whether (a portion of) the input is valid according to the grammar.
|
|
Packit |
d0b620 |
In order to do something useful with the input, it is usually necessary to attach user-defined *actions* to one or more rules.
|
|
Packit |
d0b620 |
An action is *applied* whenever the rule to which it is attached succeeds.
|
|
Packit |
d0b620 |
Applying an action means that its static `apply()` or `apply0()`-method is called.
|
|
Packit |
d0b620 |
The first argument to an `apply()` method is always an object that represents the portion of the input consumed by the successful match of the rule.
|
|
Packit |
d0b620 |
An action's `apply()` or `apply0()`-method can either return `void`, or a `bool`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## Contents
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
* [Actions](#actions)
|
|
Packit |
d0b620 |
* [Apply0](#apply0)
|
|
Packit |
d0b620 |
* [Apply](#apply)
|
|
Packit |
d0b620 |
* [States](#states)
|
|
Packit |
d0b620 |
* [Action Specialisation](#action-specialisation)
|
|
Packit |
d0b620 |
* [Changing Actions](#changing-actions)
|
|
Packit |
d0b620 |
* [Changing States](#changing-states)
|
|
Packit |
d0b620 |
* [No Switching](#no-switching)
|
|
Packit |
d0b620 |
* [Intrusive Switching](#intrusive-switching)
|
|
Packit |
d0b620 |
* [External Switching](#external-switching)
|
|
Packit |
d0b620 |
* [Legacy Actions](#legacy-actions)
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## Actions
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Actions are implemented as static `apply()` or `apply0()`-method of specialisations of custom class templates (which is not quite as difficult as it sounds).
|
|
Packit |
d0b620 |
First the default- or base-case of the action class template has to be defined:
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
template< typename Rule >
|
|
Packit |
d0b620 |
struct my_actions
|
|
Packit |
d0b620 |
: tao::pegtl::nothing< Rule > {};
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Inheriting from `tao::pegtl::nothing< Rule >` indicates to the PEGTL that no action is attached to `Rule`, i.e. that no `apply()` or `apply0()`-method should be called for successful matches of `Rule`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
To attach an action to `Rule`, this class template has to be specialised for `Rule` with two important properties.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
1. The specialisation *must not* inherit from `tao::pegtl::nothing< Rule >`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
2. An *appropriate* static `apply()` or `apply0()`-method has to be implemented.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The PEGTL will auto-detect whether an action, i.e. a specialisation of an action class template, contains an appropriate `apply()` or `apply0()` function, and whether it returns `void` or `bool`.
|
|
Packit |
d0b620 |
It will fail to compile when both `apply()` and `apply0()` are found.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
### Apply0
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
In cases where the matched part of the input is not required, an action method named `apply0()` is implemented.
|
|
Packit |
d0b620 |
This allows for some optimisations compared to the `apply()` method which receives the matched input as first argument.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
template<>
|
|
Packit |
d0b620 |
struct my_actions< tao::pegtl::plus< tao::pegtl::alpha > >
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
static void apply0( /* all the states */ )
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
// Called whenever a call to tao::pegtl::plus< tao::pegtl::alpha >
|
|
Packit |
d0b620 |
// in the grammar succeeds.
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
// OR ALTERNATIVELY
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
static bool apply0( /* all the states */ )
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
// Called whenever a call to tao::pegtl::plus< tao::pegtl::alpha >
|
|
Packit |
d0b620 |
// in the grammar succeeds.
|
|
Packit |
d0b620 |
return // see below
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
When the return type is `bool`, the action can determine whether matching the rule to which it was attached, and which already returned with success, should be retro-actively considered a (local) failure.
|
|
Packit |
d0b620 |
For the overall parsing run, there is no difference between a rule or an attached action returning `false` (but of course the action is not called when the rule already returned `false`).
|
|
Packit |
d0b620 |
When an action returns `false`, the PEGTL takes care of rewinding the input to where it was when the rule to which the action was attached started its (successful) match (which is unlike rules' `match()` methods that have to take care of rewinding themselves).
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Note that actions returning `bool` are an advanced use case that should be used with caution.
|
|
Packit |
d0b620 |
They prevent some internal optimisations, in particular when used with `apply0()`.
|
|
Packit |
d0b620 |
They can also have weird effects on the semantics of a parsing run, for example `at< rule >` can succeed for the same input for which `rule` fails when there is a `bool`-action attached to `rule` that returns `false` (remembering that actions are disabled within an `at<>` combinator).
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
### Apply
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
When the action method is called `apply()`, it receives a const-reference to an instance of an input class as first argument.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
template<>
|
|
Packit |
d0b620 |
struct my_actions< tao::pegtl::plus< tao::pegtl::digit > >
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
template< typename Input >
|
|
Packit |
d0b620 |
static void apply( const Input& in, /* all the states */ )
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
// Called whenever a call to tao::pegtl::plus< tao::pegtl::digit >
|
|
Packit |
d0b620 |
// in the grammar succeeds. The argument named 'in' represents the
|
|
Packit |
d0b620 |
// matched part of the input.
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
// OR ALTERNATIVELY
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
template< typename Input >
|
|
Packit |
d0b620 |
static bool apply( const Input& in, /* all the states */ )
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
// Called whenever a call to tao::pegtl::plus< tao::pegtl::digit >
|
|
Packit |
d0b620 |
// in the grammar succeeds. The argument named 'in' represents the
|
|
Packit |
d0b620 |
// matched part of the input.
|
|
Packit |
d0b620 |
return // see description for apply0() above
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The exact type of the input class passed to an action's `apply()`-method is not specified.
|
|
Packit |
d0b620 |
It is currently best practice to "template over" the type of the input as shown above.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Actions can then assume that the input provides (at least) the following members.
|
|
Packit |
d0b620 |
The `Input` template parameter is set to the class of the input used at the point in the parsing run where the action is applied.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
For illustrative purposes, we will assume that the input passed to `apply()` is of type `action_input`.
|
|
Packit |
d0b620 |
Any resemblance to real classes is not a coincidence.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
template< typename Input >
|
|
Packit |
d0b620 |
class action_input
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
public:
|
|
Packit |
d0b620 |
using input_t = Input;
|
|
Packit |
d0b620 |
using iterator_t = typename Input::iterator_t;
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
bool empty() const noexcept;
|
|
Packit |
d0b620 |
std::size_t size() const noexcept;
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
const char* begin() const noexcept; // Non-owning pointer!
|
|
Packit |
d0b620 |
const char* end() const noexcept; // Non-owning pointer!
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
std::string string() const; // { return std::string( begin(), end() ); }
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
char peek_char( const std::size_t offset = 0 ) const noexcept; // { return begin()[ offset ]; }
|
|
Packit |
d0b620 |
unsigned char peek_byte( const std::size_t offset = 0 ) const noexcept; // As above with cast.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
pegtl::position position() const noexcept; // Not efficient with LAZY inputs.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
const Input& input() const noexcept; // The input from the parsing run.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
const iterator_t& iterator() const noexcept;
|
|
Packit |
d0b620 |
};
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Note that the `action_input` does **not** own the data it points to, it belongs to the original input used in the parsing run. Therefore **the validity of the pointed-to data might not extend (much) beyond the call to the `apply()`-method**!
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
When the original input has tracking mode `IMMEDIATE`, the `iterator_t` returned by `action_input::iterator()` will contain the `byte`, `line` and `byte_in_line` counters corresponding to the beginning of the matched input represented by the `action_input`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
When the original input has tracking mode `LAZY`, then `action_input::position()` is not efficient because it calculates the line number etc. by scanning the complete original input from the beginning.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Actions often need to store and/or reference portions of the input for after the parsing run, for example when an abstract syntax tree is generated.
|
|
Packit |
d0b620 |
Some of the syntax tree nodes will contain portions of the input, for example for a variable name in a script language that needs to be stored in the syntax tree just as it occurs in the input data.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The **default safe choice** is to copy the matched portions of the input data that are passed to an action by storing a deep copy of the data as `std::string`, as obtained by the input class' `string()` method, in the data structures built while parsing.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## States
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
In most applications, the actions also need some kind of data or user-defined (parser/action) *state* to operate on.
|
|
Packit |
d0b620 |
Since the `apply()` and `apply0()`-methods are `static`, they do not have an instance of the class of which they are a member function available for this purpose.
|
|
Packit |
d0b620 |
Therefore the *state(s)* are an arbitrary collection of objects that are
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
* passed by the user as additional arguments to the [`parse()`-function](Inputs-and-Parsing.md#parse-function) that starts a parsing run, and then
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
* passed by the PEGTL as additional arguments to all actions' `apply()` or `apply0()`-method.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
In other words, the additional arguments to the `apply()` and `apply0()`-method can be chosen freely, however **all** actions **must** accept the same argument list since they are **all** called with the same arguments.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
For example, in a practical grammar the example from above might use a second argument to store the parsed sequence of digits somewhere.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
template<> struct my_actions< tao::pegtl::plus< tao::pegtl::digit > >
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
template< typename Input >
|
|
Packit |
d0b620 |
static void apply( const Input& in,
|
|
Packit |
d0b620 |
std::vector< std::string >& digit_strings )
|
|
Packit |
d0b620 |
{
|
|
Packit |
d0b620 |
digit_strings.push_back( in.string() );
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
}
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
If we then assume that our grammar `my_grammar` contains the rule `tao::pegtl::plus< tao::pegtl::digit >` somewhere, we can use
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
const std::string parsed_data = ...;
|
|
Packit |
d0b620 |
std::vector< std::string > digit_strings;
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
tao::pegtl::memory_input<> in( parsed_data, "data-source-name" );
|
|
Packit |
d0b620 |
tao::pegtl::parse< my_grammar, my_actions >( in, digit_strings );
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
to collect all `digit_strings` that were detected by the grammar, i.e. the vector will contain one string for every time that the `tao::pegtl::plus< tao::pegtl::digit >` rule was matched against the input.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Since the `parse()`-functions are variadic function templates, an arbitrary sequence of state arguments can be used.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## Action Specialisation
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The rule class for which the action class template is specialised *must* exactly match how the rule is defined and referenced in the grammar.
|
|
Packit |
d0b620 |
For example given the rule
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
struct foo : tao::pegtl::plus< tao::pegtl::one< '*' > > {};
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
an action class template can be specialised for `foo` or for `tao::pegtl::one< '*' >`, but *not* for `tao::pegtl::plus< tao::pegtl::one< '*' > >` because that is not the rule class name whose `match()`-method is called.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
(The method is called on class `foo`, which happens to inherit `match()` from `tao::pegtl::plus< tao::pegtl::one< '*' > >`, however base classes are not taken into consideration by the C++ language when choosing a specialisation.)
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
While it is possible to specialize for `tao::pegtl::one< '*' >` in the above rule, any such specialization would also match any other occurrence in the grammar. It is therefore best practice to *always* specialize for explicitly named top-level rules.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
To then use these actions in a parsing run, simply pass them as additional template parameter to one of the parser functions defined in `<tao/pegtl/parse.hpp>`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
tao::pegtl::parse< my_grammar, my_actions >( ... );
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## Changing Actions
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Within a grammar, the action class template can be changed, enabled or disabled using the `action<>`, `enable<>` and `disable<>` rules.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The following two lines effectively do the same thing, namely parse with `my_grammar` as top-level parsing rule without invoking actions (unless actions are enabled again somewhere within the grammar).
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
tao::pegtl::parse< my_grammar >( ... );
|
|
Packit |
d0b620 |
tao::pegtl::parse< tao::pegtl::disable< my_grammar >, my_actions >( ... );
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Similarly the following two lines both start parsing `my_grammar` with `my_actions` (again with the caveat that something might change somewhere in the grammar).
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
tao::pegtl::parse< my_grammar, my_actions >( ... );
|
|
Packit |
d0b620 |
tao::pegtl::parse< tao::pegtl::action< my_actions, my_grammar > >( ... );
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
In other words, `enable<>` and `disable<>` behave just like `seq<>` but enable or disable the calling of actions. `action<>` changes the active action class template, which must be supplied as first template parameter to `action<>`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Note that `action<>` does *not* implicitly enable actions when they were previously explicitly disabled.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
User-defined parsing rules can use `action<>`, `enable<>` and `disable<>` just like any other combinator rules, for example to disable actions in LISP-style comments:
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
```c++
|
|
Packit |
d0b620 |
struct comment
|
|
Packit |
d0b620 |
: tao::pegtl::seq< tao::pegtl::one< '#' >, tao::pegtl::disable< cons_list > > {};
|
|
Packit |
d0b620 |
```
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
This also allows using the same rules multiple times with different actions within the grammar.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## Changing States
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Implementing a parser with the PEGTL consists of two main parts.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
1. The actual grammar that drives the parser.
|
|
Packit |
d0b620 |
2. The states and actions that "do something".
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
For the second part, there are three distinct styles of how to manage the states and actions in non-trivial parsers.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The **main issue** addressed by the switching styles is the **growing complexity** encountered when a single state argument to a parsing run must perform multiple different tasks, including the management of nested data structures.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The way that this issue is addressed is by providing another tool for performing divide-and-conquer: A large state class with multiple tasks can be divided into
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
- multiple smaller state classes that each take care of a single issue,
|
|
Packit |
d0b620 |
- one or more [control classes](Control-and-Debug.md) that switch between the states,
|
|
Packit |
d0b620 |
- using the C++ stack for nested structures (rather than manually managing a stack).
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The different styles can also be freely mixed within the same parser.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
### No Switching
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The "no switching style" consists of having one (or more) state-arguments that are passed to a parsing run and that are the arguments to all action's `apply0()`- and `apply()`-methods.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
For an example of how to build a generic JSON data structure with the "no switching style" see `src/example/pegtl/json_build_two.cpp`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
### Intrusive Switching
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The `state<>` and `action<>` [meta combinators](Rule-Reference.md#meta-rules) can be used to hard-code state and actions switches in the grammar.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
In some cases a state object is required for the grammar itself, and in these cases embedding the state-switch into the grammar is recommended.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
### External Switching
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
"External switching" is when the states and/or actions are switched from outside of the grammar by providing a specialised control class.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
For an example of how to build a generic JSON data structure with the "external switching style" see `src/example/pegtl/json_build_one.cpp`.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
The actual switching control classes are defined in `<tao/pegtl/contrib/changes.hpp>` and can be used as template for custom switching.
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
## Legacy Actions
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
See the [section on legacy-style action rules](Rule-Reference.md#action-rules).
|
|
Packit |
d0b620 |
|
|
Packit |
d0b620 |
Copyright (c) 2014-2018 Dr. Colin Hirsch and Daniel Frey
|