Blob Blame History Raw
Liblognorm internals
====================

Parse-tree
----------

A parse-tree is generated each time when normalization process is set up.

You could also call it a optimized rulebase. Each message runs through 
this tree consisting of parsers and fields and will be compared to it. The 
message can either fit into a branch or not. If it fits, it can be 
normalized. If it does not fit any branch in the tree, then a fitting 
sample has to be created for this message.
 
The tree is built from branches. These branches consist of 3 things: 
nodes, paths and parser.

A node is typically a literal part from a message where either a parser 
follows or there are several subsequent literals which are different, so 
one of the paths must be selected. After a parser, a node will always 
follow. Parsers are like variables and thus the core structure of a 
sample. With these a property field can be filled, which in the end is 
needed to normalize the message. 

A few notes on optimization of a parse-tree.

A parse-tree is always optimized, whether or not the samples of a similar 
kind are next to each other or not. Even if you make the order totally 
random, it should always result in the same parse-tree. Therefore, no 
optimization efforts have to be made to the tree itself. It reuses 
equivalent prefixes of messages which are already in the tree. Only if a 
difference occurs, then a new node must follow. 

One case where rule order can be significant is when a message can match
two or more different rules. This can occur when the rules differ in
parsers. If in doubt, use :doc:`lognormalizer <lognormalizer>` tool to 
debug.