Blame doc/lognormalizer.rst

Packit 1422b7
Lognormalizer
Packit 1422b7
=============
Packit 1422b7
Packit 1422b7
Lognormalizer is a sample tool which is often used to test and debug 
Packit 1422b7
rulebases before real use. Nevertheless, it can be used in production as 
Packit 1422b7
a simple command line interface to liblognorm.
Packit 1422b7
Packit 1422b7
This tool reads log lines from its standard input and prints results 
Packit 1422b7
to standard output. You need to use redirections if you want to read 
Packit 1422b7
or write files.
Packit 1422b7
Packit 1422b7
An example of the command::
Packit 1422b7
Packit 1422b7
    $ lognormalizer -r messages.sampdb -e json 
Packit 1422b7
Packit 1422b7
Command line options
Packit 1422b7
--------------------
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -V
Packit 1422b7
Packit 1422b7
Output version information, including information about the installed
Packit 1422b7
version of liblognorm and its optional features. So this may also be
Packit 1422b7
used to check the currently installed library version.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -r <FILENAME>
Packit 1422b7
Packit 1422b7
Specifies name of the file containing the rulebase.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -v
Packit 1422b7
    
Packit 1422b7
Increase verbosity level. Can be used several times. If used three
Packit 1422b7
times, internal data structures are dumped (make sense to developers,
Packit 1422b7
only).
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -p
Packit 1422b7
Packit 1422b7
Print only successfully parsed messages.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -P
Packit 1422b7
Packit 1422b7
Print only messages **not** successfully parsed.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -L
Packit 1422b7
Packit 1422b7
Add line number information to events not successfully parsed. This
Packit 1422b7
is meant as a troubleshooting aid when working with unparsable events,
Packit 1422b7
as the information can be used to directly go to the line in question
Packit 1422b7
in the source data file. The line number is contained in a field
Packit 1422b7
named ``lognormalizer.line_nbr``.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -t <TAG>
Packit 1422b7
    
Packit 1422b7
Print only those messages which have this tag.
Packit 1422b7
    
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -T
Packit 1422b7
Packit 1422b7
Include 'event.tags' attribute when output is in JSON format. This attribute contains list of tags of the matched 
Packit 1422b7
rule.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -E <DATA>
Packit 1422b7
Packit 1422b7
Encoder-specific data. For CSV, it is the list of fields to be output, 
Packit 1422b7
separated by comma or space. It is currently unused for other formats.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -d <FILENAME>
Packit 1422b7
Packit 1422b7
Generate DOT file describing parse tree. It is used to plot parse graph 
Packit 1422b7
with GraphViz.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -H
Packit 1422b7
Packit 1422b7
At end of run, print a summary line with number of messages processed,
Packit 1422b7
parsed and unparsed to stdout.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -U
Packit 1422b7
Packit 1422b7
At end of run, print a summary line with number of messages unparsed to
Packit 1422b7
stdout. Note that this message is only printed if there was at least one
Packit 1422b7
unparsable message.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -o
Packit 1422b7
Packit 1422b7
Special options. The following ones can be set:
Packit 1422b7
Packit 1422b7
   * **allowRegex** Permits to use regular expressions inse the v1 engine
Packit 1422b7
     This is deprecated and should not be used for new deployments.
Packit 1422b7
Packit 1422b7
   * **addExecPath** Includes metadata into the event on how it was
Packit 1422b7
     (tried) to be parsed. Can be useful in troubleshooting normalization
Packit 1422b7
     problems.
Packit 1422b7
Packit 1422b7
   * **addOriginalMsg** Always add the "original-msg" data item. By
Packit 1422b7
     default, this is only done when a message could not be parsed.
Packit 1422b7
Packit 1422b7
   * **addRule** Add a mockup of the rule that was processed. Note that
Packit 1422b7
     it is *not* an exact copy of the rule, but a rule that correctly
Packit 1422b7
     describes the parsed message. Most importantly, prefixes are 
Packit 1422b7
     appended and custom data types are expanded (and no longer visiable
Packit 1422b7
     as such). This option is primarily meant for postprocessing, e.g.
Packit 1422b7
     as input to an anonymizer.
Packit 1422b7
Packit 1422b7
   * **addRuleRulcation** For rules that successfully parsed, add the
Packit 1422b7
     location of the rule inside the rulebase. But the file name as
Packit 1422b7
     well as the line number are given. If two rules evaluate to the same
Packit 1422b7
     end node, only a single rule location is given. However, in
Packit 1422b7
     practice this is extremely unlikely and as such for practical
Packit 1422b7
     reasons the information can be considered reliable.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -s <FILENAME>
Packit 1422b7
Packit 1422b7
At end of run, print internal parse DAG statistics and exit. This
Packit 1422b7
option is meant for developers and researches which want to get insight
Packit 1422b7
into the quality of the algorithm and/or how efficient the rulebase could
Packit 1422b7
be processed. **NOT** intended for end users. This option is performance
Packit 1422b7
intense.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -S <FILENAME>
Packit 1422b7
Packit 1422b7
Even stronger statistics than -s. Requires that the version is compiled
Packit 1422b7
with --enable-advanced-statistics, which causes a considerable
Packit 1422b7
performance loss.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
   -x <FILENAME>
Packit 1422b7
Packit 1422b7
Print statistics as a DOT file. In order to keep the graph readable,
Packit 1422b7
information is only emitted for called nodes.
Packit 1422b7
Packit 1422b7
::
Packit 1422b7
Packit 1422b7
    -e <json|xml|csv|raw|cee-syslog>
Packit 1422b7
Packit 1422b7
Output format. By default, output is in JSON format. With this option,
Packit 1422b7
you can change it to a different one.
Packit 1422b7
Packit 1422b7
Supported Output Formats
Packit 1422b7
........................
Packit 1422b7
The JSON, XML, and CSV formats should be self-explanatory.
Packit 1422b7
Packit 1422b7
The cee-syslog format emits messages according to the Mitre CEE spec.
Packit 1422b7
Note that the cee-syslog format is primarily supported for
Packit 1422b7
backward-compatibility. It does **not** support nested data items
Packit 1422b7
and as such cannot be used when the rulebase makes use of this
Packit 1422b7
feature (we assume this most often happens nowadays). We strongly
Packit 1422b7
recommend not use it for new deployments. Support may be removed
Packit 1422b7
in later releases.
Packit 1422b7
Packit 1422b7
The raw format outputs an exact copy of the input message, without
Packit 1422b7
any normalization visible. The prime use case of "raw" is to extract
Packit 1422b7
either all messages that could or could not be normalized. To do so
Packit 1422b7
specify the -p or -P option. Also, it works in combination with the
Packit 1422b7
-t option to extract a subset based on tagging. In any case, the core
Packit 1422b7
use is to prepare a subset of the original file for further processing.
Packit 1422b7
Packit 1422b7
Examples
Packit 1422b7
--------
Packit 1422b7
Packit 1422b7
These examples were created using sample rulebase from source package.
Packit 1422b7
Packit 1422b7
Default (CEE) output::
Packit 1422b7
Packit 1422b7
	$ lognormalizer -r rulebases/sample.rulebase
Packit 1422b7
	Weight: 42kg
Packit 1422b7
	[cee@115 event.tags="tag2" unit="kg" N="42" fat="free"]
Packit 1422b7
	Snow White and the Seven Dwarfs
Packit 1422b7
	[cee@115 event.tags="tale" company="the Seven Dwarfs"]
Packit 1422b7
	2012-10-11 src=127.0.0.1 dst=88.111.222.19
Packit 1422b7
	[cee@115 dst="88.111.222.19" src="127.0.0.1" date="2012-10-11"]
Packit 1422b7
Packit 1422b7
JSON output, flat tags enabled::
Packit 1422b7
Packit 1422b7
	$ lognormalizer -r rulebases/sample.rulebase -e json -T
Packit 1422b7
	%%
Packit 1422b7
	{ "event.tags": [ "tag3", "percent" ], "percent": "100", "part": "wha", "whole": "whale" }
Packit 1422b7
	Weight: 42kg
Packit 1422b7
	{ "unit": "kg", "N": "42", "event.tags": [ "tag2" ], "fat": "free" }
Packit 1422b7
Packit 1422b7
CSV output with fixed field list::
Packit 1422b7
Packit 1422b7
	$ lognormalizer -r rulebases/sample.rulebase -e csv -E'N unit'
Packit 1422b7
	Weight: 42kg
Packit 1422b7
	"42","kg"
Packit 1422b7
	Weight: 115lbs
Packit 1422b7
	"115","lbs"
Packit 1422b7
	Anything not matching the rule
Packit 1422b7
	,
Packit 1422b7
Packit 1422b7
Creating a graph of the rulebase
Packit 1422b7
--------------------------------
Packit 1422b7
Packit 1422b7
To get a better overview of a rulebase you can create a graph that shows you 
Packit 1422b7
the chain of normalization (parse-tree).
Packit 1422b7
Packit 1422b7
At first you have to install an additional package called graphviz. Graphviz 
Packit 1422b7
is a tool that creates such a graph with the help of a control file (created 
Packit 1422b7
with the rulebase). `Here <http://www.graphviz.org/>`_ you will find more 
Packit 1422b7
information about graphviz.
Packit 1422b7
Packit 1422b7
To install it you can use the package manager. For example, on RedHat 
Packit 1422b7
systems it is yum command::
Packit 1422b7
Packit 1422b7
    $ sudo yum install graphviz
Packit 1422b7
Packit 1422b7
The next step would be creating the control file for graphviz. Therefore we 
Packit 1422b7
use the normalizer command with the options -d "prefered filename for the 
Packit 1422b7
control file" and -r "rulebase"::
Packit 1422b7
Packit 1422b7
    $ lognormalize -d control.dot -r messages.rb
Packit 1422b7
Packit 1422b7
Please note that there is no need for an input or output file.
Packit 1422b7
If you have a look at the control file now you will see that the content is 
Packit 1422b7
a little bit confusing, but it includes all information, like the nodes, 
Packit 1422b7
fields and parser, that graphviz needs to create the graph. Of course you 
Packit 1422b7
can edit that file, but please note that it is a lot of work.
Packit 1422b7
Packit 1422b7
Now we can create the graph by typing::
Packit 1422b7
Packit 1422b7
    $ dot control.dot -Tpng >graph.png
Packit 1422b7
Packit 1422b7
dot + name of control file + option -T -> file format + output file
Packit 1422b7
Packit 1422b7
That is just one example for using graphviz, of course you can do many 
Packit 1422b7
other great things with it. But I think this "simple" graph could be very 
Packit 1422b7
helpful for the normalizer.
Packit 1422b7
Packit 1422b7
Below you see sample for such a graph, but please note that this is 
Packit 1422b7
not such a pretty one. Such a graph can grow very fast by editing your 
Packit 1422b7
rulebase.
Packit 1422b7
Packit 1422b7
.. figure:: graph.png
Packit 1422b7
   :width: 90 %
Packit 1422b7
   :alt: graph sample
Packit 1422b7