Tree - source-git/PEGTL - CentOS Git server

source-git / PEGTL

Blame doc/Actions-and-States.md

Blob History Raw

Packit	d0b620	`# Actions and States`
Packit	d0b620
Packit	d0b620	`Parsing, i.e. matching an input with a grammar rule, by itself only indicates whether (a portion of) the input is valid according to the grammar.`
Packit	d0b620	`In order to do something useful with the input, it is usually necessary to attach user-defined actions to one or more rules.`
Packit	d0b620	`An action is applied whenever the rule to which it is attached succeeds.`
Packit	d0b620	Applying an action means that its static `apply()` or `apply0()`-method is called.
Packit	d0b620	The first argument to an `apply()` method is always an object that represents the portion of the input consumed by the successful match of the rule.
Packit	d0b620	An action's `apply()` or `apply0()`-method can either return `void`, or a `bool`.
Packit	d0b620
Packit	d0b620	`## Contents`
Packit	d0b620
Packit	d0b620	`* [Actions](#actions)`
Packit	d0b620	`* [Apply0](#apply0)`
Packit	d0b620	`* [Apply](#apply)`
Packit	d0b620	`* [States](#states)`
Packit	d0b620	`* [Action Specialisation](#action-specialisation)`
Packit	d0b620	`* [Changing Actions](#changing-actions)`
Packit	d0b620	`* [Changing States](#changing-states)`
Packit	d0b620	`* [No Switching](#no-switching)`
Packit	d0b620	`* [Intrusive Switching](#intrusive-switching)`
Packit	d0b620	`* [External Switching](#external-switching)`
Packit	d0b620	`* [Legacy Actions](#legacy-actions)`
Packit	d0b620
Packit	d0b620	`## Actions`
Packit	d0b620
Packit	d0b620	Actions are implemented as static `apply()` or `apply0()`-method of specialisations of custom class templates (which is not quite as difficult as it sounds).
Packit	d0b620	`First the default- or base-case of the action class template has to be defined:`
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`template< typename Rule >`
Packit	d0b620	`struct my_actions`
Packit	d0b620	`: tao::pegtl::nothing< Rule > {};`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	Inheriting from `tao::pegtl::nothing< Rule >` indicates to the PEGTL that no action is attached to `Rule`, i.e. that no `apply()` or `apply0()`-method should be called for successful matches of `Rule`.
Packit	d0b620
Packit	d0b620	To attach an action to `Rule`, this class template has to be specialised for `Rule` with two important properties.
Packit	d0b620
Packit	d0b620	1. The specialisation must not inherit from `tao::pegtl::nothing< Rule >`.
Packit	d0b620
Packit	d0b620	2. An appropriate static `apply()` or `apply0()`-method has to be implemented.
Packit	d0b620
Packit	d0b620	The PEGTL will auto-detect whether an action, i.e. a specialisation of an action class template, contains an appropriate `apply()` or `apply0()` function, and whether it returns `void` or `bool`.
Packit	d0b620	It will fail to compile when both `apply()` and `apply0()` are found.
Packit	d0b620
Packit	d0b620	`### Apply0`
Packit	d0b620
Packit	d0b620	In cases where the matched part of the input is not required, an action method named `apply0()` is implemented.
Packit	d0b620	This allows for some optimisations compared to the `apply()` method which receives the matched input as first argument.
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`template<>`
Packit	d0b620	`struct my_actions< tao::pegtl::plus< tao::pegtl::alpha > >`
Packit	d0b620	`{`
Packit	d0b620	`static void apply0( /* all the states */ )`
Packit	d0b620	`{`
Packit	d0b620	`// Called whenever a call to tao::pegtl::plus< tao::pegtl::alpha >`
Packit	d0b620	`// in the grammar succeeds.`
Packit	d0b620	`}`
Packit	d0b620
Packit	d0b620	`// OR ALTERNATIVELY`
Packit	d0b620
Packit	d0b620	`static bool apply0( /* all the states */ )`
Packit	d0b620	`{`
Packit	d0b620	`// Called whenever a call to tao::pegtl::plus< tao::pegtl::alpha >`
Packit	d0b620	`// in the grammar succeeds.`
Packit	d0b620	`return // see below`
Packit	d0b620	`}`
Packit	d0b620	`}`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	When the return type is `bool`, the action can determine whether matching the rule to which it was attached, and which already returned with success, should be retro-actively considered a (local) failure.
Packit	d0b620	For the overall parsing run, there is no difference between a rule or an attached action returning `false` (but of course the action is not called when the rule already returned `false`).
Packit	d0b620	When an action returns `false`, the PEGTL takes care of rewinding the input to where it was when the rule to which the action was attached started its (successful) match (which is unlike rules' `match()` methods that have to take care of rewinding themselves).
Packit	d0b620
Packit	d0b620	Note that actions returning `bool` are an advanced use case that should be used with caution.
Packit	d0b620	They prevent some internal optimisations, in particular when used with `apply0()`.
Packit	d0b620	They can also have weird effects on the semantics of a parsing run, for example `at< rule >` can succeed for the same input for which `rule` fails when there is a `bool`-action attached to `rule` that returns `false` (remembering that actions are disabled within an `at<>` combinator).
Packit	d0b620
Packit	d0b620	`### Apply`
Packit	d0b620
Packit	d0b620	When the action method is called `apply()`, it receives a const-reference to an instance of an input class as first argument.
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`template<>`
Packit	d0b620	`struct my_actions< tao::pegtl::plus< tao::pegtl::digit > >`
Packit	d0b620	`{`
Packit	d0b620	`template< typename Input >`
Packit	d0b620	`static void apply( const Input& in, /* all the states */ )`
Packit	d0b620	`{`
Packit	d0b620	`// Called whenever a call to tao::pegtl::plus< tao::pegtl::digit >`
Packit	d0b620	`// in the grammar succeeds. The argument named 'in' represents the`
Packit	d0b620	`// matched part of the input.`
Packit	d0b620	`}`
Packit	d0b620
Packit	d0b620	`// OR ALTERNATIVELY`
Packit	d0b620
Packit	d0b620	`template< typename Input >`
Packit	d0b620	`static bool apply( const Input& in, /* all the states */ )`
Packit	d0b620	`{`
Packit	d0b620	`// Called whenever a call to tao::pegtl::plus< tao::pegtl::digit >`
Packit	d0b620	`// in the grammar succeeds. The argument named 'in' represents the`
Packit	d0b620	`// matched part of the input.`
Packit	d0b620	`return // see description for apply0() above`
Packit	d0b620	`}`
Packit	d0b620	`}`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	The exact type of the input class passed to an action's `apply()`-method is not specified.
Packit	d0b620	`It is currently best practice to "template over" the type of the input as shown above.`
Packit	d0b620
Packit	d0b620	`Actions can then assume that the input provides (at least) the following members.`
Packit	d0b620	The `Input` template parameter is set to the class of the input used at the point in the parsing run where the action is applied.
Packit	d0b620
Packit	d0b620	For illustrative purposes, we will assume that the input passed to `apply()` is of type `action_input`.
Packit	d0b620	`Any resemblance to real classes is not a coincidence.`
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`template< typename Input >`
Packit	d0b620	`class action_input`
Packit	d0b620	`{`
Packit	d0b620	`public:`
Packit	d0b620	`using input_t = Input;`
Packit	d0b620	`using iterator_t = typename Input::iterator_t;`
Packit	d0b620
Packit	d0b620	`bool empty() const noexcept;`
Packit	d0b620	`std::size_t size() const noexcept;`
Packit	d0b620
Packit	d0b620	`const char* begin() const noexcept; // Non-owning pointer!`
Packit	d0b620	`const char* end() const noexcept; // Non-owning pointer!`
Packit	d0b620
Packit	d0b620	`std::string string() const; // { return std::string( begin(), end() ); }`
Packit	d0b620
Packit	d0b620	`char peek_char( const std::size_t offset = 0 ) const noexcept; // { return begin()[ offset ]; }`
Packit	d0b620	`unsigned char peek_byte( const std::size_t offset = 0 ) const noexcept; // As above with cast.`
Packit	d0b620
Packit	d0b620	`pegtl::position position() const noexcept; // Not efficient with LAZY inputs.`
Packit	d0b620
Packit	d0b620	`const Input& input() const noexcept; // The input from the parsing run.`
Packit	d0b620
Packit	d0b620	`const iterator_t& iterator() const noexcept;`
Packit	d0b620	`};`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	Note that the `action_input` does not own the data it points to, it belongs to the original input used in the parsing run. Therefore the validity of the pointed-to data might not extend (much) beyond the call to the `apply()`-method!
Packit	d0b620
Packit	d0b620	When the original input has tracking mode `IMMEDIATE`, the `iterator_t` returned by `action_input::iterator()` will contain the `byte`, `line` and `byte_in_line` counters corresponding to the beginning of the matched input represented by the `action_input`.
Packit	d0b620
Packit	d0b620	When the original input has tracking mode `LAZY`, then `action_input::position()` is not efficient because it calculates the line number etc. by scanning the complete original input from the beginning.
Packit	d0b620
Packit	d0b620	`Actions often need to store and/or reference portions of the input for after the parsing run, for example when an abstract syntax tree is generated.`
Packit	d0b620	`Some of the syntax tree nodes will contain portions of the input, for example for a variable name in a script language that needs to be stored in the syntax tree just as it occurs in the input data.`
Packit	d0b620
Packit	d0b620	The default safe choice is to copy the matched portions of the input data that are passed to an action by storing a deep copy of the data as `std::string`, as obtained by the input class' `string()` method, in the data structures built while parsing.
Packit	d0b620
Packit	d0b620	`## States`
Packit	d0b620
Packit	d0b620	`In most applications, the actions also need some kind of data or user-defined (parser/action) state to operate on.`
Packit	d0b620	Since the `apply()` and `apply0()`-methods are `static`, they do not have an instance of the class of which they are a member function available for this purpose.
Packit	d0b620	`Therefore the state(s) are an arbitrary collection of objects that are`
Packit	d0b620
Packit	d0b620	* passed by the user as additional arguments to the [`parse()`-function](Inputs-and-Parsing.md#parse-function) that starts a parsing run, and then
Packit	d0b620
Packit	d0b620	* passed by the PEGTL as additional arguments to all actions' `apply()` or `apply0()`-method.
Packit	d0b620
Packit	d0b620	In other words, the additional arguments to the `apply()` and `apply0()`-method can be chosen freely, however all actions must accept the same argument list since they are all called with the same arguments.
Packit	d0b620
Packit	d0b620	`For example, in a practical grammar the example from above might use a second argument to store the parsed sequence of digits somewhere.`
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`template<> struct my_actions< tao::pegtl::plus< tao::pegtl::digit > >`
Packit	d0b620	`{`
Packit	d0b620	`template< typename Input >`
Packit	d0b620	`static void apply( const Input& in,`
Packit	d0b620	`std::vector< std::string >& digit_strings )`
Packit	d0b620	`{`
Packit	d0b620	`digit_strings.push_back( in.string() );`
Packit	d0b620	`}`
Packit	d0b620	`}`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	If we then assume that our grammar `my_grammar` contains the rule `tao::pegtl::plus< tao::pegtl::digit >` somewhere, we can use
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`const std::string parsed_data = ...;`
Packit	d0b620	`std::vector< std::string > digit_strings;`
Packit	d0b620
Packit	d0b620	`tao::pegtl::memory_input<> in( parsed_data, "data-source-name" );`
Packit	d0b620	`tao::pegtl::parse< my_grammar, my_actions >( in, digit_strings );`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	to collect all `digit_strings` that were detected by the grammar, i.e. the vector will contain one string for every time that the `tao::pegtl::plus< tao::pegtl::digit >` rule was matched against the input.
Packit	d0b620
Packit	d0b620	Since the `parse()`-functions are variadic function templates, an arbitrary sequence of state arguments can be used.
Packit	d0b620
Packit	d0b620	`## Action Specialisation`
Packit	d0b620
Packit	d0b620	`The rule class for which the action class template is specialised must exactly match how the rule is defined and referenced in the grammar.`
Packit	d0b620	`For example given the rule`
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`struct foo : tao::pegtl::plus< tao::pegtl::one< '*' > > {};`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	an action class template can be specialised for `foo` or for `tao::pegtl::one< '' >`, but not* for `tao::pegtl::plus< tao::pegtl::one< '*' > >` because that is not the rule class name whose `match()`-method is called.
Packit	d0b620
Packit	d0b620	(The method is called on class `foo`, which happens to inherit `match()` from `tao::pegtl::plus< tao::pegtl::one< '*' > >`, however base classes are not taken into consideration by the C++ language when choosing a specialisation.)
Packit	d0b620
Packit	d0b620	While it is possible to specialize for `tao::pegtl::one< '' >` in the above rule, any such specialization would also match any other occurrence in the grammar. It is therefore best practice to always* specialize for explicitly named top-level rules.
Packit	d0b620
Packit	d0b620	To then use these actions in a parsing run, simply pass them as additional template parameter to one of the parser functions defined in `<tao/pegtl/parse.hpp>`.
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`tao::pegtl::parse< my_grammar, my_actions >( ... );`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	`## Changing Actions`
Packit	d0b620
Packit	d0b620	Within a grammar, the action class template can be changed, enabled or disabled using the `action<>`, `enable<>` and `disable<>` rules.
Packit	d0b620
Packit	d0b620	The following two lines effectively do the same thing, namely parse with `my_grammar` as top-level parsing rule without invoking actions (unless actions are enabled again somewhere within the grammar).
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`tao::pegtl::parse< my_grammar >( ... );`
Packit	d0b620	`tao::pegtl::parse< tao::pegtl::disable< my_grammar >, my_actions >( ... );`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	Similarly the following two lines both start parsing `my_grammar` with `my_actions` (again with the caveat that something might change somewhere in the grammar).
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`tao::pegtl::parse< my_grammar, my_actions >( ... );`
Packit	d0b620	`tao::pegtl::parse< tao::pegtl::action< my_actions, my_grammar > >( ... );`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	In other words, `enable<>` and `disable<>` behave just like `seq<>` but enable or disable the calling of actions. `action<>` changes the active action class template, which must be supplied as first template parameter to `action<>`.
Packit	d0b620
Packit	d0b620	Note that `action<>` does not implicitly enable actions when they were previously explicitly disabled.
Packit	d0b620
Packit	d0b620	User-defined parsing rules can use `action<>`, `enable<>` and `disable<>` just like any other combinator rules, for example to disable actions in LISP-style comments:
Packit	d0b620
Packit	d0b620	```c++
Packit	d0b620	`struct comment`
Packit	d0b620	`: tao::pegtl::seq< tao::pegtl::one< '#' >, tao::pegtl::disable< cons_list > > {};`
Packit	d0b620	```
Packit	d0b620
Packit	d0b620	`This also allows using the same rules multiple times with different actions within the grammar.`
Packit	d0b620
Packit	d0b620	`## Changing States`
Packit	d0b620
Packit	d0b620	`Implementing a parser with the PEGTL consists of two main parts.`
Packit	d0b620
Packit	d0b620	`1. The actual grammar that drives the parser.`
Packit	d0b620	`2. The states and actions that "do something".`
Packit	d0b620
Packit	d0b620	`For the second part, there are three distinct styles of how to manage the states and actions in non-trivial parsers.`
Packit	d0b620
Packit	d0b620	`The main issue addressed by the switching styles is the growing complexity encountered when a single state argument to a parsing run must perform multiple different tasks, including the management of nested data structures.`
Packit	d0b620
Packit	d0b620	`The way that this issue is addressed is by providing another tool for performing divide-and-conquer: A large state class with multiple tasks can be divided into`
Packit	d0b620
Packit	d0b620	`- multiple smaller state classes that each take care of a single issue,`
Packit	d0b620	`- one or more [control classes](Control-and-Debug.md) that switch between the states,`
Packit	d0b620	`- using the C++ stack for nested structures (rather than manually managing a stack).`
Packit	d0b620
Packit	d0b620	`The different styles can also be freely mixed within the same parser.`
Packit	d0b620
Packit	d0b620	`### No Switching`
Packit	d0b620
Packit	d0b620	The "no switching style" consists of having one (or more) state-arguments that are passed to a parsing run and that are the arguments to all action's `apply0()`- and `apply()`-methods.
Packit	d0b620
Packit	d0b620	For an example of how to build a generic JSON data structure with the "no switching style" see `src/example/pegtl/json_build_two.cpp`.
Packit	d0b620
Packit	d0b620	`### Intrusive Switching`
Packit	d0b620
Packit	d0b620	The `state<>` and `action<>` [meta combinators](Rule-Reference.md#meta-rules) can be used to hard-code state and actions switches in the grammar.
Packit	d0b620
Packit	d0b620	`In some cases a state object is required for the grammar itself, and in these cases embedding the state-switch into the grammar is recommended.`
Packit	d0b620
Packit	d0b620	`### External Switching`
Packit	d0b620
Packit	d0b620	`"External switching" is when the states and/or actions are switched from outside of the grammar by providing a specialised control class.`
Packit	d0b620
Packit	d0b620	For an example of how to build a generic JSON data structure with the "external switching style" see `src/example/pegtl/json_build_one.cpp`.
Packit	d0b620
Packit	d0b620	The actual switching control classes are defined in `<tao/pegtl/contrib/changes.hpp>` and can be used as template for custom switching.
Packit	d0b620
Packit	d0b620	`## Legacy Actions`
Packit	d0b620
Packit	d0b620	`See the [section on legacy-style action rules](Rule-Reference.md#action-rules).`
Packit	d0b620
Packit	d0b620	`Copyright (c) 2014-2018 Dr. Colin Hirsch and Daniel Frey`

source-git / PEGTL

Source Code

Blame doc/Actions-and-States.md