Blame doc/pcre2callout.3

Packit Service 02e2fd
.TH PCRE2CALLOUT 3 "26 April 2018" "PCRE2 10.32"
Packit Service 02e2fd
.SH NAME
Packit Service 02e2fd
PCRE2 - Perl-compatible regular expressions (revised API)
Packit Service 02e2fd
.SH SYNOPSIS
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
.B #include <pcre2.h>
Packit Service 02e2fd
.PP
Packit Service 02e2fd
.SM
Packit Service 02e2fd
.nf
Packit Service 02e2fd
.B int (*pcre2_callout)(pcre2_callout_block *, void *);
Packit Service 02e2fd
.sp
Packit Service 02e2fd
.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP,
Packit Service 02e2fd
.B "  int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *),"
Packit Service 02e2fd
.B "  void *\fIuser_data\fP);"
Packit Service 02e2fd
.fi
Packit Service 02e2fd
.
Packit Service 02e2fd
.SH DESCRIPTION
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
PCRE2 provides a feature called "callout", which is a means of temporarily
Packit Service 02e2fd
passing control to the caller of PCRE2 in the middle of pattern matching. The
Packit Service 02e2fd
caller of PCRE2 provides an external function by putting its entry point in
Packit Service 02e2fd
a match context (see \fBpcre2_set_callout()\fP in the
Packit Service 02e2fd
.\" HREF
Packit Service 02e2fd
\fBpcre2api\fP
Packit Service 02e2fd
.\"
Packit Service 02e2fd
documentation).
Packit Service 02e2fd
.P
Packit Service 02e2fd
Within a regular expression, (?C<arg>) indicates a point at which the external
Packit Service 02e2fd
function is to be called. Different callout points can be identified by putting
Packit Service 02e2fd
a number less than 256 after the letter C. The default value is zero.
Packit Service 02e2fd
Alternatively, the argument may be a delimited string. The starting delimiter
Packit Service 02e2fd
must be one of ` ' " ^ % # $ { and the ending delimiter is the same as the
Packit Service 02e2fd
start, except for {, where the ending delimiter is }. If the ending delimiter
Packit Service 02e2fd
is needed within the string, it must be doubled. For example, this pattern has
Packit Service 02e2fd
two callout points:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  (?C1)abc(?C"some ""arbitrary"" text")def
Packit Service 02e2fd
.sp
Packit Service 02e2fd
If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
Packit Service 02e2fd
automatically inserts callouts, all with number 255, before each item in the
Packit Service 02e2fd
pattern except for immediately before or after an explicit callout. For
Packit Service 02e2fd
example, if PCRE2_AUTO_CALLOUT is used with the pattern
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  A(?C3)B
Packit Service 02e2fd
.sp
Packit Service 02e2fd
it is processed as if it were
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  (?C255)A(?C3)B(?C255)
Packit Service 02e2fd
.sp
Packit Service 02e2fd
Here is a more complicated example:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  A(\ed{2}|--)
Packit Service 02e2fd
.sp
Packit Service 02e2fd
With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  (?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
Packit Service 02e2fd
.sp
Packit Service 02e2fd
Notice that there is a callout before and after each parenthesis and
Packit Service 02e2fd
alternation bar. If the pattern contains a conditional group whose condition is
Packit Service 02e2fd
an assertion, an automatic callout is inserted immediately before the
Packit Service 02e2fd
condition. Such a callout may also be inserted explicitly, for example:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  (?(?C9)(?=a)ab|de)  (?(?C%text%)(?!=d)ab|de)
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This applies only to assertion conditions (because they are themselves
Packit Service 02e2fd
independent groups).
Packit Service 02e2fd
.P
Packit Service 02e2fd
Callouts can be useful for tracking the progress of pattern matching. The
Packit Service 02e2fd
.\" HREF
Packit Service 02e2fd
\fBpcre2test\fP
Packit Service 02e2fd
.\"
Packit Service 02e2fd
program has a pattern qualifier (/auto_callout) that sets automatic callouts.
Packit Service 02e2fd
When any callouts are present, the output from \fBpcre2test\fP indicates how
Packit Service 02e2fd
the pattern is being matched. This is useful information when you are trying to
Packit Service 02e2fd
optimize the performance of a particular pattern.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SH "MISSING CALLOUTS"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
You should be aware that, because of optimizations in the way PCRE2 compiles
Packit Service 02e2fd
and matches patterns, callouts sometimes do not happen exactly as you might
Packit Service 02e2fd
expect.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SS "Auto-possessification"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
At compile time, PCRE2 "auto-possessifies" repeated items when it knows that
Packit Service 02e2fd
what follows cannot be part of the repeat. For example, a+[bc] is compiled as
Packit Service 02e2fd
if it were a++[bc]. The \fBpcre2test\fP output when this pattern is compiled
Packit Service 02e2fd
with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string
Packit Service 02e2fd
"aaaa" is:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  --->aaaa
Packit Service 02e2fd
   +0 ^        a+
Packit Service 02e2fd
   +2 ^   ^    [bc]
Packit Service 02e2fd
  No match
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This indicates that when matching [bc] fails, there is no backtracking into a+
Packit Service 02e2fd
(because it is being treated as a++) and therefore the callouts that would be
Packit Service 02e2fd
taken for the backtracks do not occur. You can disable the auto-possessify
Packit Service 02e2fd
feature by passing PCRE2_NO_AUTO_POSSESS to \fBpcre2_compile()\fP, or starting
Packit Service 02e2fd
the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  --->aaaa
Packit Service 02e2fd
   +0 ^        a+
Packit Service 02e2fd
   +2 ^   ^    [bc]
Packit Service 02e2fd
   +2 ^  ^     [bc]
Packit Service 02e2fd
   +2 ^ ^      [bc]
Packit Service 02e2fd
   +2 ^^       [bc]
Packit Service 02e2fd
  No match
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This time, when matching [bc] fails, the matcher backtracks into a+ and tries
Packit Service 02e2fd
again, repeatedly, until a+ itself fails.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SS "Automatic .* anchoring"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
By default, an optimization is applied when .* is the first significant item in
Packit Service 02e2fd
a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the
Packit Service 02e2fd
pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can
Packit Service 02e2fd
start only after an internal newline or at the beginning of the subject, and
Packit Service 02e2fd
\fBpcre2_compile()\fP remembers this. If a pattern has more than one top-level
Packit Service 02e2fd
branch, automatic anchoring occurs if all branches are anchorable.
Packit Service 02e2fd
.P
Packit Service 02e2fd
This optimization is disabled, however, if .* is in an atomic group or if there
Packit Service 02e2fd
is a backreference to the capturing group in which it appears. It is also
Packit Service 02e2fd
disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of
Packit Service 02e2fd
callouts does not affect it.
Packit Service 02e2fd
.P
Packit Service 02e2fd
For example, if the pattern .*\ed is compiled with PCRE2_AUTO_CALLOUT and
Packit Service 02e2fd
applied to the string "aa", the \fBpcre2test\fP output is:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  --->aa
Packit Service 02e2fd
   +0 ^      .*
Packit Service 02e2fd
   +2 ^ ^    \ed
Packit Service 02e2fd
   +2 ^^     \ed
Packit Service 02e2fd
   +2 ^      \ed
Packit Service 02e2fd
  No match
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This shows that all match attempts start at the beginning of the subject. In
Packit Service 02e2fd
other words, the pattern is anchored. You can disable this optimization by
Packit Service 02e2fd
passing PCRE2_NO_DOTSTAR_ANCHOR to \fBpcre2_compile()\fP, or starting the
Packit Service 02e2fd
pattern with (*NO_DOTSTAR_ANCHOR). In this case, the output changes to:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  --->aa
Packit Service 02e2fd
   +0 ^      .*
Packit Service 02e2fd
   +2 ^ ^    \ed
Packit Service 02e2fd
   +2 ^^     \ed
Packit Service 02e2fd
   +2 ^      \ed
Packit Service 02e2fd
   +0  ^     .*
Packit Service 02e2fd
   +2  ^^    \ed
Packit Service 02e2fd
   +2  ^     \ed
Packit Service 02e2fd
  No match
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This shows more match attempts, starting at the second subject character.
Packit Service 02e2fd
Another optimization, described in the next section, means that there is no
Packit Service 02e2fd
subsequent attempt to match with an empty subject.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SS "Other optimizations"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
Other optimizations that provide fast "no match" results also affect callouts.
Packit Service 02e2fd
For example, if the pattern is
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  ab(?C4)cd
Packit Service 02e2fd
.sp
Packit Service 02e2fd
PCRE2 knows that any matching string must contain the letter "d". If the
Packit Service 02e2fd
subject string is "abyz", the lack of "d" means that matching doesn't ever
Packit Service 02e2fd
start, and the callout is never reached. However, with "abyd", though the
Packit Service 02e2fd
result is still no match, the callout is obeyed.
Packit Service 02e2fd
.P
Packit Service 02e2fd
For most patterns PCRE2 also knows the minimum length of a matching string, and
Packit Service 02e2fd
will immediately give a "no match" return without actually running a match if
Packit Service 02e2fd
the subject is not long enough, or, for unanchored patterns, if it has been
Packit Service 02e2fd
scanned far enough.
Packit Service 02e2fd
.P
Packit Service 02e2fd
You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE
Packit Service 02e2fd
option to \fBpcre2_compile()\fP, or by starting the pattern with
Packit Service 02e2fd
(*NO_START_OPT). This slows down the matching process, but does ensure that
Packit Service 02e2fd
callouts such as the example above are obeyed.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.\" HTML 
Packit Service 02e2fd
.SH "THE CALLOUT INTERFACE"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
During matching, when PCRE2 reaches a callout point, if an external function is
Packit Service 02e2fd
provided in the match context, it is called. This applies to both normal,
Packit Service 02e2fd
DFA, and JIT matching. The first argument to the callout function is a pointer
Packit Service 02e2fd
to a \fBpcre2_callout\fP block. The second argument is the void * callout data
Packit Service 02e2fd
that was supplied when the callout was set up by calling
Packit Service 02e2fd
\fBpcre2_set_callout()\fP (see the
Packit Service 02e2fd
.\" HREF
Packit Service 02e2fd
\fBpcre2api\fP
Packit Service 02e2fd
.\"
Packit Service 02e2fd
documentation). The callout block structure contains the following fields, not
Packit Service 02e2fd
necessarily in this order:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  uint32_t      \fIversion\fP;
Packit Service 02e2fd
  uint32_t      \fIcallout_number\fP;
Packit Service 02e2fd
  uint32_t      \fIcapture_top\fP;
Packit Service 02e2fd
  uint32_t      \fIcapture_last\fP;
Packit Service 02e2fd
  uint32_t      \fIcallout_flags\fP;
Packit Service 02e2fd
  PCRE2_SIZE   *\fIoffset_vector\fP;
Packit Service 02e2fd
  PCRE2_SPTR    \fImark\fP;
Packit Service 02e2fd
  PCRE2_SPTR    \fIsubject\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fIsubject_length\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fIstart_match\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fIcurrent_position\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fIpattern_position\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fInext_item_length\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fIcallout_string_offset\fP;
Packit Service 02e2fd
  PCRE2_SIZE    \fIcallout_string_length\fP;
Packit Service 02e2fd
  PCRE2_SPTR    \fIcallout_string\fP;
Packit Service 02e2fd
.sp
Packit Service 02e2fd
The \fIversion\fP field contains the version number of the block format. The
Packit Service 02e2fd
current version is 2; the three callout string fields were added for version 1,
Packit Service 02e2fd
and the \fIcallout_flags\fP field for version 2. If you are writing an
Packit Service 02e2fd
application that might use an earlier release of PCRE2, you should check the
Packit Service 02e2fd
version number before accessing any of these fields. The version number will
Packit Service 02e2fd
increase in future if more fields are added, but the intention is never to
Packit Service 02e2fd
remove any of the existing fields.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SS "Fields for numerical callouts"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
Packit Service 02e2fd
contains the number of the callout, in the range 0-255. This is the number
Packit Service 02e2fd
that follows (?C for callouts that part of the pattern; it is 255 for
Packit Service 02e2fd
automatically generated callouts.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SS "Fields for string callouts"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
For callouts with string arguments, \fIcallout_number\fP is always zero, and
Packit Service 02e2fd
\fIcallout_string\fP points to the string that is contained within the compiled
Packit Service 02e2fd
pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending
Packit Service 02e2fd
delimiters that were present in the original pattern string have been turned
Packit Service 02e2fd
into single characters, but there is no other processing of the callout string
Packit Service 02e2fd
argument. An additional code unit containing binary zero is present after the
Packit Service 02e2fd
string, but is not included in the length. The delimiter that was used to start
Packit Service 02e2fd
the string is also stored within the pattern, immediately before the string
Packit Service 02e2fd
itself. You can access this delimiter as \fIcallout_string\fP[-1] if you need
Packit Service 02e2fd
it.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIcallout_string_offset\fP field is the code unit offset to the start of
Packit Service 02e2fd
the callout argument string within the original pattern string. This is
Packit Service 02e2fd
provided for the benefit of applications such as script languages that might
Packit Service 02e2fd
need to report errors in the callout string within the pattern.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SS "Fields for all callouts"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
The remaining fields in the callout block are the same for both kinds of
Packit Service 02e2fd
callout.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIoffset_vector\fP field is a pointer to a vector of capturing offsets
Packit Service 02e2fd
(the "ovector"). You may read the elements in this vector, but you must not
Packit Service 02e2fd
change any of them.
Packit Service 02e2fd
.P
Packit Service 02e2fd
For calls to \fBpcre2_match()\fP, the \fIoffset_vector\fP field is not (since
Packit Service 02e2fd
release 10.30) a pointer to the actual ovector that was passed to the matching
Packit Service 02e2fd
function in the match data block. Instead it points to an internal ovector of a
Packit Service 02e2fd
size large enough to hold all possible captured substrings in the pattern. Note
Packit Service 02e2fd
that whenever a recursion or subroutine call within a pattern completes, the
Packit Service 02e2fd
capturing state is reset to what it was before.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIcapture_last\fP field contains the number of the most recently captured
Packit Service 02e2fd
substring, and the \fIcapture_top\fP field contains one more than the number of
Packit Service 02e2fd
the highest numbered captured substring so far. If no substrings have yet been
Packit Service 02e2fd
captured, the value of \fIcapture_last\fP is 0 and the value of
Packit Service 02e2fd
\fIcapture_top\fP is 1. The values of these fields do not always differ by one;
Packit Service 02e2fd
for example, when the callout in the pattern ((a)(b))(?C2) is taken,
Packit Service 02e2fd
\fIcapture_last\fP is 1 but \fIcapture_top\fP is 4.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The contents of ovector[2] to ovector[<capture_top>*2-1] can be inspected in
Packit Service 02e2fd
order to extract substrings that have been matched so far, in the same way as
Packit Service 02e2fd
extracting substrings after a match has completed. The values in ovector[0] and
Packit Service 02e2fd
ovector[1] are always PCRE2_UNSET because the match is by definition not
Packit Service 02e2fd
complete. Substrings that have not been captured but whose numbers are less
Packit Service 02e2fd
than \fIcapture_top\fP also have both of their ovector slots set to
Packit Service 02e2fd
PCRE2_UNSET.
Packit Service 02e2fd
.P
Packit Service 02e2fd
For DFA matching, the \fIoffset_vector\fP field points to the ovector that was
Packit Service 02e2fd
passed to the matching function in the match data block for callouts at the top
Packit Service 02e2fd
level, but to an internal ovector during the processing of pattern recursions,
Packit Service 02e2fd
lookarounds, and atomic groups. However, these ovectors hold no useful
Packit Service 02e2fd
information because \fBpcre2_dfa_match()\fP does not support substring
Packit Service 02e2fd
capturing. The value of \fIcapture_top\fP is always 1 and the value of
Packit Service 02e2fd
\fIcapture_last\fP is always 0 for DFA matching.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
Packit Service 02e2fd
that were passed to the matching function.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIstart_match\fP field normally contains the offset within the subject at
Packit Service 02e2fd
which the current match attempt started. However, if the escape sequence \eK
Packit Service 02e2fd
has been encountered, this value is changed to reflect the modified starting
Packit Service 02e2fd
point. If the pattern is not anchored, the callout function may be called
Packit Service 02e2fd
several times from the same point in the pattern for different starting points
Packit Service 02e2fd
in the subject.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIcurrent_position\fP field contains the offset within the subject of the
Packit Service 02e2fd
current match pointer.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIpattern_position\fP field contains the offset in the pattern string to
Packit Service 02e2fd
the next item to be matched.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fInext_item_length\fP field contains the length of the next item to be
Packit Service 02e2fd
processed in the pattern string. When the callout is at the end of the pattern,
Packit Service 02e2fd
the length is zero. When the callout precedes an opening parenthesis, the
Packit Service 02e2fd
length includes meta characters that follow the parenthesis. For example, in a
Packit Service 02e2fd
callout before an assertion such as (?=ab) the length is 3. For an an
Packit Service 02e2fd
alternation bar or a closing parenthesis, the length is one, unless a closing
Packit Service 02e2fd
parenthesis is followed by a quantifier, in which case its length is included.
Packit Service 02e2fd
(This changed in release 10.23. In earlier releases, before an opening
Packit Service 02e2fd
parenthesis the length was that of the entire subpattern, and before an
Packit Service 02e2fd
alternation bar or a closing parenthesis the length was zero.)
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
Packit Service 02e2fd
help in distinguishing between different automatic callouts, which all have the
Packit Service 02e2fd
same callout number. However, they are set for all callouts, and are used by
Packit Service 02e2fd
\fBpcre2test\fP to show the next item to be matched when displaying callout
Packit Service 02e2fd
information.
Packit Service 02e2fd
.P
Packit Service 02e2fd
In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to
Packit Service 02e2fd
the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or
Packit Service 02e2fd
(*THEN) item in the match, or NULL if no such items have been passed. Instances
Packit Service 02e2fd
of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In
Packit Service 02e2fd
callouts from the DFA matching function this field always contains NULL.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The \fIcallout_flags\fP field is always zero in callouts from
Packit Service 02e2fd
\fBpcre2_dfa_match()\fP or when JIT is being used. When \fBpcre2_match()\fP
Packit Service 02e2fd
without JIT is used, the following bits may be set:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  PCRE2_CALLOUT_STARTMATCH
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This is set for the first callout after the start of matching for each new
Packit Service 02e2fd
starting position in the subject.
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  PCRE2_CALLOUT_BACKTRACK
Packit Service 02e2fd
.sp
Packit Service 02e2fd
This is set if there has been a matching backtrack since the previous callout,
Packit Service 02e2fd
or since the start of matching if this is the first callout from a
Packit Service 02e2fd
\fBpcre2_match()\fP run.
Packit Service 02e2fd
.P
Packit Service 02e2fd
Both bits are set when a backtrack has caused a "bumpalong" to a new starting
Packit Service 02e2fd
position in the subject. Output from \fBpcre2test\fP does not indicate the
Packit Service 02e2fd
presence of these bits unless the \fBcallout_extra\fP modifier is set.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The information in the \fBcallout_flags\fP field is provided so that
Packit Service 02e2fd
applications can track and tell their users how matching with backtracking is
Packit Service 02e2fd
done. This can be useful when trying to optimize patterns, or just to
Packit Service 02e2fd
understand how PCRE2 works. There is no support in \fBpcre2_dfa_match()\fP
Packit Service 02e2fd
because there is no backtracking in DFA matching, and there is no support in
Packit Service 02e2fd
JIT because JIT is all about maximimizing matching performance. In both these
Packit Service 02e2fd
cases the \fBcallout_flags\fP field is always zero.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SH "RETURN VALUES FROM CALLOUTS"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
The external callout function returns an integer to PCRE2. If the value is
Packit Service 02e2fd
zero, matching proceeds as normal. If the value is greater than zero, matching
Packit Service 02e2fd
fails at the current point, but the testing of other matching possibilities
Packit Service 02e2fd
goes ahead, just as if a lookahead assertion had failed. If the value is less
Packit Service 02e2fd
than zero, the match is abandoned, and the matching function returns the
Packit Service 02e2fd
negative value.
Packit Service 02e2fd
.P
Packit Service 02e2fd
Negative values should normally be chosen from the set of PCRE2_ERROR_xxx
Packit Service 02e2fd
values. In particular, PCRE2_ERROR_NOMATCH forces a standard "no match"
Packit Service 02e2fd
failure. The error number PCRE2_ERROR_CALLOUT is reserved for use by callout
Packit Service 02e2fd
functions; it will never be used by PCRE2 itself.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SH "CALLOUT ENUMERATION"
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
.nf
Packit Service 02e2fd
.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP,
Packit Service 02e2fd
.B "  int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *),"
Packit Service 02e2fd
.B "  void *\fIuser_data\fP);"
Packit Service 02e2fd
.fi
Packit Service 02e2fd
.sp
Packit Service 02e2fd
A script language that supports the use of string arguments in callouts might
Packit Service 02e2fd
like to scan all the callouts in a pattern before running the match. This can
Packit Service 02e2fd
be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a
Packit Service 02e2fd
pointer to a compiled pattern, the second points to a callback function, and
Packit Service 02e2fd
the third is arbitrary user data. The callback function is called for every
Packit Service 02e2fd
callout in the pattern in the order in which they appear. Its first argument is
Packit Service 02e2fd
a pointer to a callout enumeration block, and its second argument is the
Packit Service 02e2fd
\fIuser_data\fP value that was passed to \fBpcre2_callout_enumerate()\fP. The
Packit Service 02e2fd
data block contains the following fields:
Packit Service 02e2fd
.sp
Packit Service 02e2fd
  \fIversion\fP                Block version number
Packit Service 02e2fd
  \fIpattern_position\fP       Offset to next item in pattern
Packit Service 02e2fd
  \fInext_item_length\fP       Length of next item in pattern
Packit Service 02e2fd
  \fIcallout_number\fP         Number for numbered callouts
Packit Service 02e2fd
  \fIcallout_string_offset\fP  Offset to string within pattern
Packit Service 02e2fd
  \fIcallout_string_length\fP  Length of callout string
Packit Service 02e2fd
  \fIcallout_string\fP         Points to callout string or is NULL
Packit Service 02e2fd
.sp
Packit Service 02e2fd
The version number is currently 0. It will increase if new fields are ever
Packit Service 02e2fd
added to the block. The remaining fields are the same as their namesakes in the
Packit Service 02e2fd
\fBpcre2_callout\fP block that is used for callouts during matching, as
Packit Service 02e2fd
described
Packit Service 02e2fd
.\" HTML 
Packit Service 02e2fd
.\" 
Packit Service 02e2fd
above.
Packit Service 02e2fd
.\"
Packit Service 02e2fd
.P
Packit Service 02e2fd
Note that the value of \fIpattern_position\fP is unique for each callout.
Packit Service 02e2fd
However, if a callout occurs inside a group that is quantified with a non-zero
Packit Service 02e2fd
minimum or a fixed maximum, the group is replicated inside the compiled
Packit Service 02e2fd
pattern. For example, a pattern such as /(a){2}/ is compiled as if it were
Packit Service 02e2fd
/(a)(a)/. This means that the callout will be enumerated more than once, but
Packit Service 02e2fd
with the same value for \fIpattern_position\fP in each case.
Packit Service 02e2fd
.P
Packit Service 02e2fd
The callback function should normally return zero. If it returns a non-zero
Packit Service 02e2fd
value, scanning the pattern stops, and that value is returned from
Packit Service 02e2fd
\fBpcre2_callout_enumerate()\fP.
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SH AUTHOR
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
.nf
Packit Service 02e2fd
Philip Hazel
Packit Service 02e2fd
University Computing Service
Packit Service 02e2fd
Cambridge, England.
Packit Service 02e2fd
.fi
Packit Service 02e2fd
.
Packit Service 02e2fd
.
Packit Service 02e2fd
.SH REVISION
Packit Service 02e2fd
.rs
Packit Service 02e2fd
.sp
Packit Service 02e2fd
.nf
Packit Service 02e2fd
Last updated: 26 April 2018
Packit Service 02e2fd
Copyright (c) 1997-2018 University of Cambridge.
Packit Service 02e2fd
.fi