Blame doc/Actions-and-States.md

Packit d0b620
# Actions and States
Packit d0b620
Packit d0b620
Parsing, i.e. matching an input with a grammar rule, by itself only indicates whether (a portion of) the input is valid according to the grammar.
Packit d0b620
In order to do something useful with the input, it is usually necessary to attach user-defined *actions* to one or more rules.
Packit d0b620
An action is *applied* whenever the rule to which it is attached succeeds.
Packit d0b620
Applying an action means that its static `apply()` or `apply0()`-method is called.
Packit d0b620
The first argument to an `apply()` method is always an object that represents the portion of the input consumed by the successful match of the rule.
Packit d0b620
An action's `apply()` or `apply0()`-method can either return `void`, or a `bool`.
Packit d0b620
Packit d0b620
## Contents
Packit d0b620
Packit d0b620
* [Actions](#actions)
Packit d0b620
  * [Apply0](#apply0)
Packit d0b620
  * [Apply](#apply)
Packit d0b620
* [States](#states)
Packit d0b620
* [Action Specialisation](#action-specialisation)
Packit d0b620
* [Changing Actions](#changing-actions)
Packit d0b620
* [Changing States](#changing-states)
Packit d0b620
  * [No Switching](#no-switching)
Packit d0b620
  * [Intrusive Switching](#intrusive-switching)
Packit d0b620
  * [External Switching](#external-switching)
Packit d0b620
* [Legacy Actions](#legacy-actions)
Packit d0b620
Packit d0b620
## Actions
Packit d0b620
Packit d0b620
Actions are implemented as static `apply()` or `apply0()`-method of specialisations of custom class templates (which is not quite as difficult as it sounds).
Packit d0b620
First the default- or base-case of the action class template has to be defined:
Packit d0b620
Packit d0b620
```c++
Packit d0b620
template< typename Rule >
Packit d0b620
struct my_actions
Packit d0b620
   : tao::pegtl::nothing< Rule > {};
Packit d0b620
```
Packit d0b620
Packit d0b620
Inheriting from `tao::pegtl::nothing< Rule >` indicates to the PEGTL that no action is attached to `Rule`, i.e. that no `apply()` or `apply0()`-method should be called for successful matches of `Rule`.
Packit d0b620
Packit d0b620
To attach an action to `Rule`, this class template has to be specialised for `Rule` with two important properties.
Packit d0b620
Packit d0b620
1. The specialisation *must not* inherit from `tao::pegtl::nothing< Rule >`.
Packit d0b620
Packit d0b620
2. An *appropriate* static `apply()` or `apply0()`-method has to be implemented.
Packit d0b620
Packit d0b620
The PEGTL will auto-detect whether an action, i.e. a specialisation of an action class template, contains an appropriate `apply()` or `apply0()` function, and whether it returns `void` or `bool`.
Packit d0b620
It will fail to compile when both `apply()` and `apply0()` are found.
Packit d0b620
Packit d0b620
### Apply0
Packit d0b620
Packit d0b620
In cases where the matched part of the input is not required, an action method named `apply0()` is implemented.
Packit d0b620
This allows for some optimisations compared to the `apply()` method which receives the matched input as first argument.
Packit d0b620
Packit d0b620
```c++
Packit d0b620
template<>
Packit d0b620
struct my_actions< tao::pegtl::plus< tao::pegtl::alpha > >
Packit d0b620
{
Packit d0b620
   static void apply0( /* all the states */ )
Packit d0b620
   {
Packit d0b620
      // Called whenever a call to tao::pegtl::plus< tao::pegtl::alpha >
Packit d0b620
      // in the grammar succeeds.
Packit d0b620
   }
Packit d0b620
Packit d0b620
   // OR ALTERNATIVELY
Packit d0b620
Packit d0b620
   static bool apply0( /* all the states */ )
Packit d0b620
   {
Packit d0b620
      // Called whenever a call to tao::pegtl::plus< tao::pegtl::alpha >
Packit d0b620
      // in the grammar succeeds.
Packit d0b620
      return // see below
Packit d0b620
   }
Packit d0b620
}
Packit d0b620
```
Packit d0b620
Packit d0b620
When the return type is `bool`, the action can determine whether matching the rule to which it was attached, and which already returned with success, should be retro-actively considered a (local) failure.
Packit d0b620
For the overall parsing run, there is no difference between a rule or an attached action returning `false` (but of course the action is not called when the rule already returned `false`).
Packit d0b620
When an action returns `false`, the PEGTL takes care of rewinding the input to where it was when the rule to which the action was attached started its (successful) match (which is unlike rules' `match()` methods that have to take care of rewinding themselves).
Packit d0b620
Packit d0b620
Note that actions returning `bool` are an advanced use case that should be used with caution.
Packit d0b620
They prevent some internal optimisations, in particular when used with `apply0()`.
Packit d0b620
They can also have weird effects on the semantics of a parsing run, for example `at< rule >` can succeed for the same input for which `rule` fails when there is a `bool`-action attached to `rule` that returns `false` (remembering that actions are disabled within an `at<>` combinator).
Packit d0b620
Packit d0b620
### Apply
Packit d0b620
Packit d0b620
When the action method is called `apply()`, it receives a const-reference to an instance of an input class as first argument.
Packit d0b620
Packit d0b620
```c++
Packit d0b620
template<>
Packit d0b620
struct my_actions< tao::pegtl::plus< tao::pegtl::digit > >
Packit d0b620
{
Packit d0b620
   template< typename Input >
Packit d0b620
   static void apply( const Input& in, /* all the states */ )
Packit d0b620
   {
Packit d0b620
      // Called whenever a call to tao::pegtl::plus< tao::pegtl::digit >
Packit d0b620
      // in the grammar succeeds. The argument named 'in' represents the
Packit d0b620
      // matched part of the input.
Packit d0b620
   }
Packit d0b620
Packit d0b620
   // OR ALTERNATIVELY
Packit d0b620
Packit d0b620
   template< typename Input >
Packit d0b620
   static bool apply( const Input& in, /* all the states */ )
Packit d0b620
   {
Packit d0b620
      // Called whenever a call to tao::pegtl::plus< tao::pegtl::digit >
Packit d0b620
      // in the grammar succeeds. The argument named 'in' represents the
Packit d0b620
      // matched part of the input.
Packit d0b620
      return // see description for apply0() above
Packit d0b620
   }
Packit d0b620
}
Packit d0b620
```
Packit d0b620
Packit d0b620
The exact type of the input class passed to an action's `apply()`-method is not specified.
Packit d0b620
It is currently best practice to "template over" the type of the input as shown above.
Packit d0b620
Packit d0b620
Actions can then assume that the input provides (at least) the following members.
Packit d0b620
The `Input` template parameter is set to the class of the input used at the point in the parsing run where the action is applied.
Packit d0b620
Packit d0b620
For illustrative purposes, we will assume that the input passed to `apply()` is of type `action_input`.
Packit d0b620
Any resemblance to real classes is not a coincidence.
Packit d0b620
Packit d0b620
```c++
Packit d0b620
template< typename Input >
Packit d0b620
class action_input
Packit d0b620
{
Packit d0b620
public:
Packit d0b620
   using input_t = Input;
Packit d0b620
   using iterator_t = typename Input::iterator_t;
Packit d0b620
Packit d0b620
   bool empty() const noexcept;
Packit d0b620
   std::size_t size() const noexcept;
Packit d0b620
Packit d0b620
   const char* begin() const noexcept;  // Non-owning pointer!
Packit d0b620
   const char* end() const noexcept;  // Non-owning pointer!
Packit d0b620
Packit d0b620
   std::string string() const;  // { return std::string( begin(), end() ); }
Packit d0b620
Packit d0b620
   char peek_char( const std::size_t offset = 0 ) const noexcept;   // { return begin()[ offset ]; }
Packit d0b620
   unsigned char peek_byte( const std::size_t offset = 0 ) const noexcept;  // As above with cast.
Packit d0b620
Packit d0b620
   pegtl::position position() const noexcept;  // Not efficient with LAZY inputs.
Packit d0b620
Packit d0b620
   const Input& input() const noexcept;  // The input from the parsing run.
Packit d0b620
Packit d0b620
   const iterator_t& iterator() const noexcept;
Packit d0b620
};
Packit d0b620
```
Packit d0b620
Packit d0b620
Note that the `action_input` does **not** own the data it points to, it belongs to the original input used in the parsing run. Therefore **the validity of the pointed-to data might not extend (much) beyond the call to the `apply()`-method**!
Packit d0b620
Packit d0b620
When the original input has tracking mode `IMMEDIATE`, the `iterator_t` returned by `action_input::iterator()` will contain the `byte`, `line` and `byte_in_line` counters corresponding to the beginning of the matched input represented by the `action_input`.
Packit d0b620
Packit d0b620
When the original input has tracking mode `LAZY`, then `action_input::position()` is not efficient because it calculates the line number etc. by scanning the complete original input from the beginning.
Packit d0b620
Packit d0b620
Actions often need to store and/or reference portions of the input for after the parsing run, for example when an abstract syntax tree is generated.
Packit d0b620
Some of the syntax tree nodes will contain portions of the input, for example for a variable name in a script language that needs to be stored in the syntax tree just as it occurs in the input data.
Packit d0b620
Packit d0b620
The **default safe choice** is to copy the matched portions of the input data that are passed to an action by storing a deep copy of the data as `std::string`, as obtained by the input class' `string()` method, in the data structures built while parsing.
Packit d0b620
Packit d0b620
## States
Packit d0b620
Packit d0b620
In most applications, the actions also need some kind of data or user-defined (parser/action) *state* to operate on.
Packit d0b620
Since the `apply()` and `apply0()`-methods are `static`, they do not have an instance of the class of which they are a member function available for this purpose.
Packit d0b620
Therefore the *state(s)* are an arbitrary collection of objects that are
Packit d0b620
Packit d0b620
* passed by the user as additional arguments to the [`parse()`-function](Inputs-and-Parsing.md#parse-function) that starts a parsing run, and then
Packit d0b620
Packit d0b620
* passed by the PEGTL as additional arguments to all actions' `apply()` or `apply0()`-method.
Packit d0b620
Packit d0b620
In other words, the additional arguments to the `apply()` and `apply0()`-method can be chosen freely, however **all** actions **must** accept the same argument list since they are **all** called with the same arguments.
Packit d0b620
Packit d0b620
For example, in a practical grammar the example from above might use a second argument to store the parsed sequence of digits somewhere.
Packit d0b620
Packit d0b620
```c++
Packit d0b620
template<> struct my_actions< tao::pegtl::plus< tao::pegtl::digit > >
Packit d0b620
{
Packit d0b620
   template< typename Input >
Packit d0b620
   static void apply( const Input& in,
Packit d0b620
                      std::vector< std::string >& digit_strings )
Packit d0b620
   {
Packit d0b620
      digit_strings.push_back( in.string() );
Packit d0b620
   }
Packit d0b620
}
Packit d0b620
```
Packit d0b620
Packit d0b620
If we then assume that our grammar `my_grammar` contains the rule `tao::pegtl::plus< tao::pegtl::digit >` somewhere, we can use
Packit d0b620
Packit d0b620
```c++
Packit d0b620
const std::string parsed_data = ...;
Packit d0b620
std::vector< std::string > digit_strings;
Packit d0b620
Packit d0b620
tao::pegtl::memory_input<> in( parsed_data, "data-source-name" );
Packit d0b620
tao::pegtl::parse< my_grammar, my_actions >( in, digit_strings );
Packit d0b620
```
Packit d0b620
Packit d0b620
to collect all `digit_strings` that were detected by the grammar, i.e. the vector will contain one string for every time that the `tao::pegtl::plus< tao::pegtl::digit >` rule was matched against the input.
Packit d0b620
Packit d0b620
Since the `parse()`-functions are variadic function templates, an arbitrary sequence of state arguments can be used.
Packit d0b620
Packit d0b620
## Action Specialisation
Packit d0b620
Packit d0b620
The rule class for which the action class template is specialised *must* exactly match how the rule is defined and referenced in the grammar.
Packit d0b620
For example given the rule
Packit d0b620
Packit d0b620
```c++
Packit d0b620
struct foo : tao::pegtl::plus< tao::pegtl::one< '*' > > {};
Packit d0b620
```
Packit d0b620
Packit d0b620
an action class template can be specialised for `foo` or for `tao::pegtl::one< '*' >`, but *not* for `tao::pegtl::plus< tao::pegtl::one< '*' > >` because that is not the rule class name whose `match()`-method is called.
Packit d0b620
Packit d0b620
(The method is called on class `foo`, which happens to inherit `match()` from `tao::pegtl::plus< tao::pegtl::one< '*' > >`, however base classes are not taken into consideration by the C++ language when choosing a specialisation.)
Packit d0b620
Packit d0b620
While it is possible to specialize for `tao::pegtl::one< '*' >` in the above rule, any such specialization would also match any other occurrence in the grammar. It is therefore best practice to *always* specialize for explicitly named top-level rules.
Packit d0b620
Packit d0b620
To then use these actions in a parsing run, simply pass them as additional template parameter to one of the parser functions defined in `<tao/pegtl/parse.hpp>`.
Packit d0b620
Packit d0b620
```c++
Packit d0b620
tao::pegtl::parse< my_grammar, my_actions >( ... );
Packit d0b620
```
Packit d0b620
Packit d0b620
## Changing Actions
Packit d0b620
Packit d0b620
Within a grammar, the action class template can be changed, enabled or disabled using the `action<>`, `enable<>` and `disable<>` rules.
Packit d0b620
Packit d0b620
The following two lines effectively do the same thing, namely parse with `my_grammar` as top-level parsing rule without invoking actions (unless actions are enabled again somewhere within the grammar).
Packit d0b620
Packit d0b620
```c++
Packit d0b620
tao::pegtl::parse< my_grammar >( ... );
Packit d0b620
tao::pegtl::parse< tao::pegtl::disable< my_grammar >, my_actions >( ... );
Packit d0b620
```
Packit d0b620
Packit d0b620
Similarly the following two lines both start parsing `my_grammar` with `my_actions` (again with the caveat that something might change somewhere in the grammar).
Packit d0b620
Packit d0b620
```c++
Packit d0b620
tao::pegtl::parse< my_grammar, my_actions >( ... );
Packit d0b620
tao::pegtl::parse< tao::pegtl::action< my_actions, my_grammar > >( ... );
Packit d0b620
```
Packit d0b620
Packit d0b620
In other words, `enable<>` and `disable<>` behave just like `seq<>` but enable or disable the calling of actions. `action<>` changes the active action class template, which must be supplied as first template parameter to `action<>`.
Packit d0b620
Packit d0b620
Note that `action<>` does *not* implicitly enable actions when they were previously explicitly disabled.
Packit d0b620
Packit d0b620
User-defined parsing rules can use `action<>`, `enable<>` and `disable<>` just like any other combinator rules, for example to disable actions in LISP-style comments:
Packit d0b620
Packit d0b620
```c++
Packit d0b620
struct comment
Packit d0b620
   : tao::pegtl::seq< tao::pegtl::one< '#' >, tao::pegtl::disable< cons_list > > {};
Packit d0b620
```
Packit d0b620
Packit d0b620
This also allows using the same rules multiple times with different actions within the grammar.
Packit d0b620
Packit d0b620
## Changing States
Packit d0b620
Packit d0b620
Implementing a parser with the PEGTL consists of two main parts.
Packit d0b620
Packit d0b620
1. The actual grammar that drives the parser.
Packit d0b620
2. The states and actions that "do something".
Packit d0b620
Packit d0b620
For the second part, there are three distinct styles of how to manage the states and actions in non-trivial parsers.
Packit d0b620
Packit d0b620
The **main issue** addressed by the switching styles is the **growing complexity** encountered when a single state argument to a parsing run must perform multiple different tasks, including the management of nested data structures.
Packit d0b620
Packit d0b620
The way that this issue is addressed is by providing another tool for performing divide-and-conquer: A large state class with multiple tasks can be divided into
Packit d0b620
Packit d0b620
- multiple smaller state classes that each take care of a single issue,
Packit d0b620
- one or more [control classes](Control-and-Debug.md) that switch between the states,
Packit d0b620
- using the C++ stack for nested structures (rather than manually managing a stack).
Packit d0b620
Packit d0b620
The different styles can also be freely mixed within the same parser.
Packit d0b620
Packit d0b620
### No Switching
Packit d0b620
Packit d0b620
The "no switching style" consists of having one (or more) state-arguments that are passed to a parsing run and that are the arguments to all action's `apply0()`- and `apply()`-methods.
Packit d0b620
Packit d0b620
For an example of how to build a generic JSON data structure with the "no switching style" see `src/example/pegtl/json_build_two.cpp`.
Packit d0b620
Packit d0b620
### Intrusive Switching
Packit d0b620
Packit d0b620
The `state<>` and `action<>` [meta combinators](Rule-Reference.md#meta-rules) can be used to hard-code state and actions switches in the grammar.
Packit d0b620
Packit d0b620
In some cases a state object is required for the grammar itself, and in these cases embedding the state-switch into the grammar is recommended.
Packit d0b620
Packit d0b620
### External Switching
Packit d0b620
Packit d0b620
"External switching" is when the states and/or actions are switched from outside of the grammar by providing a specialised control class.
Packit d0b620
Packit d0b620
For an example of how to build a generic JSON data structure with the "external switching style" see `src/example/pegtl/json_build_one.cpp`.
Packit d0b620
Packit d0b620
The actual switching control classes are defined in `<tao/pegtl/contrib/changes.hpp>` and can be used as template for custom switching.
Packit d0b620
Packit d0b620
## Legacy Actions
Packit d0b620
Packit d0b620
See the [section on legacy-style action rules](Rule-Reference.md#action-rules).
Packit d0b620
Packit d0b620
Copyright (c) 2014-2018 Dr. Colin Hirsch and Daniel Frey