Tree - source-git/bogofilter - CentOS Git server

source-git / bogofilter

Files

Blob Blame History Raw
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
                   "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<refentry id='bogofilter.1'>
<refmeta>
<refentrytitle>bogofilter</refentrytitle>
<manvolnum>1</manvolnum>
<refmiscinfo class="manual">Bogofilter Reference Manual</refmiscinfo>
<refmiscinfo class="source">Bogofilter</refmiscinfo>
</refmeta>
<refnamediv id='name'>
<refname>bogofilter</refname>
<refpurpose>fast Bayesian spam filter</refpurpose>
</refnamediv>
<refsynopsisdiv id='synopsis'>

<cmdsynopsis>
  <command>bogofilter</command>
  <group>
    <arg choice='plain'>help options</arg>
    <arg choice='plain'>classification options</arg>
    <arg choice='plain'>registration options</arg>
    <arg choice='plain'>parameter options</arg>
    <arg choice='plain'>info options</arg>
  </group>
  <arg choice='opt'>general options</arg>
  <arg choice='opt'>config file options</arg>
</cmdsynopsis>

  <para>where</para>
  <para><option>help options</option> are:</para>
<cmdsynopsis>
    <arg choice='opt'>-h</arg>
    <arg choice='opt'>--help</arg>
    <arg choice='opt'>-V</arg>
    <arg choice='opt'>-Q</arg>
</cmdsynopsis>

  <para><option>classification options</option> are:</para>

<cmdsynopsis>
    <arg choice='opt'>-p</arg>
    <arg choice='opt'>-e</arg>
    <arg choice='opt'>-t</arg>
    <arg choice='opt'>-T</arg>
    <arg choice='opt'>-u</arg>
    <arg choice='opt'>-H</arg>
    <arg choice='opt'>-M</arg>
    <arg choice='opt'>-b</arg>
    <arg choice='opt'>-B <replaceable>object ...</replaceable></arg>
    <arg choice='opt'>-R</arg>
    <arg choice='opt'>general options</arg>
    <arg choice='opt'>parameter options</arg>
    <arg choice='opt'>config file options</arg>
    </cmdsynopsis>

  <para><option>registration options</option> are:</para>
    <cmdsynopsis>
    <group>
      <arg choice='plain'>-s</arg>
      <arg choice='plain'>-n</arg>
    </group>
    <group>
      <arg choice='plain'>-S</arg>
      <arg choice='plain'>-N</arg>
    </group>
    <arg choice='opt'>general options</arg>
    </cmdsynopsis>

  <para><option>general options</option> are:</para>
    <cmdsynopsis>
    <arg choice='opt'>-c <replaceable>filename</replaceable></arg>
    <arg choice='opt'>-C</arg>
    <arg choice='opt'>-d <replaceable>dir</replaceable></arg>
    <arg choice='opt'>-k <replaceable>cachesize</replaceable></arg>
    <arg choice='opt'>-l</arg>
    <arg choice='opt'>-L <replaceable>tag</replaceable></arg>
    <arg choice='opt'>-I <replaceable>filename</replaceable></arg>
    <arg choice='opt'>-O <replaceable>filename</replaceable></arg>
    </cmdsynopsis>

  <para><option>parameter options</option> are:</para>
    <cmdsynopsis>
    <arg choice='opt'>-E <replaceable>value<optional>,value</optional></replaceable></arg>
    <arg choice='opt'>-m <replaceable>value<optional>,value</optional><optional>,value</optional></replaceable></arg>
    <arg choice='opt'>-o <replaceable>value<optional>,value</optional></replaceable></arg>
    </cmdsynopsis>

  <para><option>info options</option> are:</para>
    <cmdsynopsis>
    <arg choice='opt'>-v</arg>
    <arg choice='opt'>-y <replaceable>date</replaceable></arg>
    <arg choice='opt'>-D</arg>
    <arg choice='opt'>-x <replaceable>flags</replaceable></arg>
    </cmdsynopsis>

  <para><option>config file options</option> are:</para>
    <cmdsynopsis>
    <arg choice='opt'>--<replaceable>option=value</replaceable></arg>
    </cmdsynopsis>
    <para>Note:  Use <command>bogofilter --help</command> to display
    the complete list of options.</para>

</refsynopsisdiv>

<refsect1 id='description'><title>DESCRIPTION</title>
<para><application>Bogofilter</application> is a Bayesian spam filter.
In its normal mode of operation, it takes an email message or other
text on standard input, does a statistical check against lists of
"good" and "bad" words, and returns a status code indicating whether
or not the message is spam.  <application>Bogofilter</application> is
designed with a fast algorithm, uses the Berkeley DB for fast startup
and lookups, coded directly in C, and tuned for speed, so it can be
used for production by sites that process a lot of mail.</para>
</refsect1>

<refsect1 id='theory'><title>THEORY OF OPERATION</title>

<para><application>Bogofilter</application> treats its input as a bag
of tokens.  Each token is checked against a wordlist, which maintains
counts of the numbers of times it has occurred in non-spam and spam
mails.  These numbers are used to compute an estimate of the
probability that a message in which the token occurs is spam.  Those are
combined to indicate whether the message is spam or ham.</para>

<para>While this method sounds crude compared to the more usual
pattern-matching approach, it turns out to be extremely effective.
Paul Graham's paper <ulink url="http://www.paulgraham.com/spam.html">
A Plan For Spam</ulink> is recommended reading.</para>

<para>This program substantially improves on Paul's proposal by doing
smarter lexical analysis.  <application>Bogofilter</application> does
proper MIME decoding and a reasonable HTML parsing.  Special kinds of
tokens like hostnames and IP addresses are retained as recognition
features rather than broken up.  Various kinds of MTA cruft such as
dates and message-IDs are ignored so as not to bloat the wordlist.
Tokens found in various header fields are marked appropriately.</para>

<para>Another improvement is that this program offers Gary Robinson's
suggested modifications to the calculations (see the parameters robx
and robs below).  These modifications are described in Robinson's
paper <ulink
url="http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html">Spam
Detection</ulink>.</para>

<para>Since then, Robinson (see his Linux Journal article <ulink
url="http://www.linuxjournal.com/article/6467">A Statistical
Approach to the Spam Problem</ulink>) and others have realized that
the calculation can be further optimized using Fisher's method.
<ulink url="http://www.garyrobinson.net/2004/04/improved%5fchi.html">Another
improvement</ulink> compensates for token redundancy by applying separate
effective size factors (ESF) to spam and nonspam probability calculations.
</para>

<para>In short, this is how it works: The estimates for the spam
probabilities of the individual tokens are combined using the "inverse
chi-square function".  Its value indicates how badly the null
hypothesis that the message is just a random collection of independent
words with probabilities given by our previous estimates fails.  This
function is very sensitive to small probabilities (hammish words), but
not to high probabilities (spammish words); so the value only
indicates strong hammish signs in a message. Now using inverse
probabilities for the tokens, the same computation is done again,
giving an indicator that a message looks strongly spammish.  Finally,
those two indicators are subtracted (and scaled into a 0-1-interval).
This combined indicator (bogosity) is close to 0 if the signs for a
hammish message are stronger than for a spammish message and close to
1 if the situation is the other way round.  If signs for both are
equally strong, the value will be near 0.5.  Since those message don't
give a clear indication there is a tristate mode in
<application>bogofilter</application> to mark those messages as
unsure, while the clear messages are marked as spam or ham,
respectively.  In two-state mode, every message is marked as either
spam or ham.</para>

<para>Various parameters influence these calculations, the most
important are:</para>

<para>robx: the score given to a token which has not seen before.
robx is the probability that the token is spammish.</para>

<para>robs: a weight on robx which moves the probability of a little seen
token towards robx.</para>

<para>min-dev: a minimum distance from .5 for tokens to use in the
calculation.  Only tokens farther away from 0.5 than this value are
used.</para>

<para>spam-cutoff: messages with scores greater than or equal to will
be marked as spam.</para>

<para>ham-cutoff: If zero or spam-cutoff, all messages with values
strictly below spam-cutoff are marked as ham, all others as spam
(two-state).  Else values less than or equal to ham-cutoff are marked
as ham, messages with values strictly between ham-cutoff and
spam-cutoff are marked as unsure; the rest as spam (tristate)</para>

<para>sp-esf: the effective size factor (ESF) for spam.</para>

<para>ns-esf: the ESF for nonspam.  These ESF values default to 1.0,
which is the same as not using ESF in the calculation.  Values suitable
to a user's email population can be determined with the aid of the
<application>bogotune</application> program.</para>
</refsect1>

<refsect1 id='options'><title>OPTIONS</title>

<para>HELP OPTIONS</para>

<para>The <option>-h</option> option prints the help message and exits.</para>

<para>The <option>-V</option> option prints the version number and
exits.</para>

<para>The <option>-Q</option> (query) option prints
<application>bogofilter</application>'s configuration, i.e. registration
parameters, parsing options, <application>bogofilter</application>
directory, etc.</para>

<para>CLASSIFICATION OPTIONS</para>

<para>The <option>-p</option> (passthrough) option outputs the message
with an X-Bogosity line at the end of the message header.  This
requires keeping the entire message in memory when it's read from
stdin (or from a pipe or socket).  If the message is read from a file
that can be rewound, <application>bogofilter</application> will read it
a second time.</para>

<para>The <option>-e</option> (embed) option tells
<application>bogofilter</application> to exit with code 0 if the
message can be classified, i.e. if there is not an error.  Normally
<application>bogofilter</application> uses different codes for spam, ham,
and unsure classifications, but this simplifies using
<application>bogofilter</application> with
<application>procmail</application> or
<application>maildrop</application>.</para>

<para>The <option>-t</option> (terse) option tells
<application>bogofilter</application> to print an abbreviated
spamicity message containing 1 letter and the score.  Spam is
indicated with "Y", ham by "N", and unsure by "U".  Note: the
formatting can be customized using the config file.</para>

<para>The <option>-T</option> provides an invariant terse mode for
scripts to use.  <application>bogofilter</application> will print an
abbreviated spamicity message containing 1 letter and the score.  Spam
is indicated with "S", ham by "H", and unsure by "U".</para>

<para>The <option>-TT</option> provides an invariant terse mode for
scripts to use.  <application>Bogofilter</application> prints only the
score and displays it to 16 significant digits.</para>

<para>The <option>-u</option> option tells
<application>bogofilter</application> to register the message's text
after classifying it as spam or non-spam.  A spam message will be
registered on the spamlist and a non-spam message on the goodlist.  If
the classification is "unsure", the message will not be registered.
Effectively this option runs <application>bogofilter</application>
with the <option>-s</option> or <option>-n</option> flag, as
appropriate.  Caution is urged in the use of this capability, as any
classification errors <application>bogofilter</application> may make
will be preserved and will accumulate until manually corrected with
the <option>-Sn</option> and <option>-Ns</option> option
combinations.  Note this option causes the database to be opened for
write access, which can entail massive slowdowns through
lock contention and synchronous I/O operations.</para>

<para>The <option>-H</option> option tells
<application>bogofilter</application> to not tag tokens from the
header. This option is for testing, you should not use it in normal
operation.</para>

<para>The <option>-M</option> option tells
<application>bogofilter</application> to process its input as a mbox
formatted file.  If the <option>-v</option> or <option>-t</option>
option is also given, a spamicity line will be printed for each
message.</para>

<para>The <option>-b</option> (streaming bulk mode) option tells
<application>bogofilter</application> to classify multiple objects
whose names are read from stdin.  If the <option>-v</option> or
<option>-t</option> option is also given,
<application>bogofilter</application> will print a line giving file
name and classification information for each file.  This is an alternative to 
<option>-B</option> which lists objects on the command line.</para>

<para>An object in this context shall be a maildir (autodetected), or
if it's not a maildir, a single mail unless <option>-M</option> is
given - in that case it's processed as mbox.  (The Content-Length:
header is not taken into account currently.)</para>

<para>When reading mbox format, <application>bogofilter</application>
relies on the empty line after a mail.  If needed,
<command>formail -es</command> will ensure this is the case.</para>

<para>The <option>-B <replaceable>object ...</replaceable></option>
(bulk mode) option tells <application>bogofilter</application> to
classify multiple objects named on the command line.  The objects may
be filenames (for single messages), mailboxes (files with multiple
messages), or directories (of maildir and MH format).  If the
<option>-v</option> or <option>-t</option> option is also given,
<application>bogofilter</application> will print a line giving file
name and classification information for each file.  This is an alternative to 
<option>-b</option> which lists objects on stdin.</para>

<para>The <option>-R</option> option tells
<application>bogofilter</application> to output an R data frame in
text form on the standard output.  See the section on integration with
R, below, for further detail.</para>

<para>REGISTRATION OPTIONS</para>

<para>The <option>-s</option> option tells
<application>bogofilter</application> to register the text presented
as spam.  The database is created if absent.</para>

<para>The <option>-n</option> option tells
<application>bogofilter</application> to register the text presented
as non-spam.</para>

<para><application>Bogofilter</application> doesn't detect if a message
registered twice.  If you do this by accident, the token counts will off by 1
from what you really want and the corresponding spam scores will be slightly
off.  Given a large number of tokens and messages in the wordlist, this
doesn't matter.  The problem <emphasis>can</emphasis> be corrected by using
the <option>-S</option> option or the <option>-N</option> option.</para>

<para>The <option>-S</option> option tells <application>bogofilter</application>
to undo a prior registration of the same message as spam.  If a message was
incorrectly entered as spam by <option>-s</option> or <option>-u</option>
and you want to remove it and enter it as non-spam, use <option>-Sn</option>.
If <option>-S</option> is used for a message that wasn't registered as spam,
the counts will still be decremented.</para>

<para>The <option>-N</option> option tells <application>bogofilter</application>
to undo a prior registration of the same message as non-spam.  If a message was
incorrectly entered as non-spam by <option>-n</option> or <option>-u</option>
and you want to remove it and enter it as spam, then use <option>-Ns</option>.
If <option>-N</option> is used for a message that wasn't registered as non-spam,
the counts will still be decremented.</para>

<para>GENERAL OPTIONS</para>

<para>The <option>-c <replaceable>filename</replaceable></option>
option tells <application>bogofilter</application> to read the config
file named.</para>

<para>The <option>-C</option> option prevents
<application>bogofilter</application> from reading configuration
files.</para>

<para>The <option>-d <replaceable>dir</replaceable></option> option
allows you to set the directory for the database.  See the ENVIRONMENT
section for other directory setting options.</para>

<para>The <option>-k <replaceable>cachesize</replaceable></option> option
sets the cache size for the BerkeleyDB subsystem, in units of 1 MiB (1,048,576
bytes).  Properly sizing the cache improves
<application>bogofilter</application>'s performance.  The recommended
size is one third of the size of the database file.  You can run the
<application>bogotune</application> script (in the tuning directory) to
determine the recommended size.</para>

<para>The <option>-l</option> option writes an informational line to
the system log each time <application>bogofilter</application> is run.
The information logged depends on how
<application>bogofilter</application> is run.</para>

<para>The <option>-L <replaceable>tag</replaceable></option> option
configures a tag which can be included in the information being logged
by the <option>-l</option> option, but it requires a custom format
that includes the %l string for now.  This option implies
<option>-l</option>.</para>

<para>The <option>-I <replaceable>filename</replaceable></option>
option tells <application>bogofilter</application> to read its input
from the specified file, rather than from
<option>stdin</option>.</para>

<para>The <option>-O <replaceable>filename</replaceable></option> option
tells <application>bogofilter</application> where to write its output in
passthrough mode.  Note that this only works when -p is explicitly given.</para>

<para>PARAMETER OPTIONS</para>

<para>The <option>-E
      <replaceable>value<optional>,value</optional></replaceable></option>
      option allows setting the sp-esf value and the ns-esf value.
      With two values, both sp-esf and ns-esf are set.  If only one
      value is given, parameters are set as described in the note
      below.</para>

<para>The <option>-m
      <replaceable>value<optional>,value</optional><optional>,value</optional></replaceable></option>
      option allows setting the min-dev value and, optionally, the
      robs and robx values.  With three values, min-dev, robs, and
      robx are all set.  If fewer values are given, parameters are set
      as described in the note below.</para>

<para>The <option>-o
      <replaceable>value<optional>,value</optional></replaceable></option>
      option allows setting the spam-cutoff ham-cutoff values.  With
      two values, both spam-cutoff and ham-cutoff are set.  If only
      one value is given, parameters are set as described in the note
      below.</para>

<para>Note: All of these options allow fewer values to be provided.
     Values can be skipped by using just the comma delimiter, in which
     case the corresponding parameter(s) won't be changed.  If only
     the first value is provided, then only the first parameter is
     set.  Trailing values can be skipped, in which case the
     corresponding parameters won't be changed.  Within the parameter
     list, spaces are not allowed after commas.</para>


<para>INFO OPTIONS</para>

<para>The <option>-v</option> option produces a report to standard
output on <application>bogofilter</application>'s analysis of the
input.  Each additional <option>v</option> will increase the verbosity of
the output, up to a maximum of 4.  With <option>-vv</option>, the report
lists the tokens with highest deviation from a mean of 0.5 association
with spam.</para>

<para>Option <option>-y date</option> can be used to override the
current date when timestamping tokens.  A value of zero (0) turns off
timestamping.</para>

<para>The <option>-D</option> option redirects debug output to
stdout.</para>

<para>The <option>-x <replaceable>flags</replaceable></option> option
allows setting of debug flags for printing debug information.  See
header file debug.h for the list of usable flags.</para>

<para>CONFIG FILE OPTIONS</para>

<para>Using GNU longopt <option>--</option> syntax, a config file's
<option><replaceable>name=value</replaceable></option> statement
becomes a command line's
<option>--<replaceable>option=value</replaceable></option>.  Use
command <command>bogofilter --help</command> for a list of options and
see <filename>bogofilter.cf.example</filename> for more info on
them. For example to change the X-Bogosity header to "X-Spam-Header", use:</para>

<para>
<option><replaceable>--spam-header-name=X-Spam-Header</replaceable></option>
</para>  

</refsect1>

<refsect1 id='environment'><title>ENVIRONMENT</title>
<para><application>Bogofilter</application> uses a database directory,
which can be set in the config file.  If not set there,
<application>bogofilter</application> will use the value of
<envar>BOGOFILTER_DIR</envar>.  Both can be overridden by the
<option>-d <replaceable>dir</replaceable></option> option.  If none of
that is available, <application>bogofilter</application> will use directory
<filename>$HOME/.bogofilter</filename>.</para>
</refsect1>

<refsect1 id='configuration'><title>CONFIGURATION</title>
<para>The <application>bogofilter</application> command line allows
setting of many options that determine how
<application>bogofilter</application> operates.  File
<filename>@sysconfdir@/bogofilter.cf</filename> can be used to set
additional parameters that affect its operation.  File
<filename>@sysconfdir@/bogofilter.cf.example</filename> has samples of
all of the parameters.  Status and logging messages can be customized
for each site.</para>
</refsect1>

<refsect1 id='returns'><title>RETURN VALUES</title>
<para>0 for spam; 1 for non-spam; 2 for unsure ; 3 for I/O or other errors.</para>
<para>If both <option>-p</option> and <option>-e</option> are used, the
    return values are: 0 for spam or non-spam; 3 for I/O or other
    errors.</para>

<para>Error 3 usually means that the wordlist file
<application>bogofilter</application> wants to read at startup
is missing or the hard disk has filled up in <option>-p</option> mode.</para>
</refsect1>

<refsect1 id='integration'><title>INTEGRATION WITH OTHER TOOLS</title>

<para>Use with procmail</para>
<para>The following recipe (a) spam-bins anything that
<application>bogofilter</application> rates as spam, (b) registers the
words in messages rated as spam as such, and (c) registers the
words in messages rated as non-spam as such.  With
this in place, it will normally only be necessary for the user to
intervene (with <option>-Ns</option> or <option>-Sn</option>) when
<application>bogofilter</application> miscategorizes something.</para>
<programlisting>
<lineannotation>
# filter mail through bogofilter, tagging it as Ham, Spam, or Unsure,
# and updating the wordlist
</lineannotation>
:0fw
| bogofilter -u -e -p

<lineannotation>
# if bogofilter failed, return the mail to the queue;
# the MTA will retry to deliver it later
# 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h
</lineannotation>
:0e
{ EXITCODE=75 HOST }

<lineannotation>
# file the mail to <filename>spam-bogofilter</filename> if it's spam.
</lineannotation>
:0:
* ^X-Bogosity: Spam, tests=bogofilter
spam-bogofilter
<lineannotation>
# file the mail to <filename>unsure-bogofilter</filename> 
# if it's neither ham nor spam.
</lineannotation>
:0:
* ^X-Bogosity: Unsure, tests=bogofilter
unsure-bogofilter

# With this recipe, you can train bogofilter starting with an empty
# wordlist.  Be sure to check your unsure-folder regularly, take the
# messages out of it, classify them as ham (or spam), and use them to
# train bogofilter.

</programlisting>

<para>The following procmail rule will take mail on stdin and save
it to file <filename>spam</filename> if
<application>bogofilter</application> thinks it's spam:
<programlisting>:0HB:
* ? bogofilter
spam
</programlisting>

and this similar rule will also register the tokens in the mail
according to the <application>bogofilter</application> classification:

<programlisting>:0HB:
* ? bogofilter -u
spam
</programlisting>
</para>
<para>If <application>bogofilter</application> fails (returning 3) the
message will be treated as non-spam.</para>

<para>This one is for <application>maildrop</application>, it
    automatically defers the mail and retries later when the
    <application>xfilter</application> command fails, use this in your
    <filename>~/.mailfilter</filename>:</para>
<programlisting>xfilter "bogofilter -u -e -p"
if (/^X-Bogosity: Spam, tests=bogofilter/)
{
  to "spam-bogofilter"
}
</programlisting>

<para>The following <filename>.muttrc</filename> lines will create
mutt macros for dispatching mail to
<application>bogofilter</application>.</para>
<programlisting>macro index d "&lt;enter-command&gt;unset wait_key\n\
&lt;pipe-entry&gt;bogofilter -n\n\
&lt;enter-command&gt;set wait_key\n\
&lt;delete-message&gt;" "delete message as non-spam"
macro index \ed "&lt;enter-command&gt;unset wait_key\n\
&lt;pipe-entry&gt;bogofilter -s\n\
&lt;enter-command&gt;set wait_key\n\
&lt;delete-message&gt;" "delete message as spam"
</programlisting>

<para>Integration with Mail Transport Agent (MTA)</para>
<procedure>
<step>
 <para><application>bogofilter</application> can also be integrated into an MTA
  to filter all incoming mail.  While the specific implementation is MTA dependent, the
  general steps are as follows:</para>
</step>

<step>
 <para>Install <application>bogofilter</application> on the mail server</para>
</step>
<step>
 <para>Prime the <application>bogofilter</application> databases with a spam
and non-spam corpus.  Since <application>bogofilter</application> will be
serving a larger community, it is important to prime it with a representative
set of messages.</para>
</step>
<step>
 <para>Set up the MTA to invoke <application>bogofilter</application> on each
   message.  While this is an MTA specific step, you'll probably need to use the
   <option>-p</option>, <option>-u</option>, and <option>-e</option> options.</para>
</step>
<step>
 <para>Set up a mechanism for users to register spam/non-spam messages,
 as well as to correct mis-classifications.
 The most generic solution is to set up alias email addresses to which
 users bounce messages.</para>
</step>
<step>
 <para>See the <filename>doc</filename> and <filename>contrib</filename> directories
   for more information.</para>
</step>
</procedure>

<para>Use of R to verify <application>bogofilter</application>'s
calculations</para>

<para>The -R option tells <application>bogofilter</application> to
generate an R data frame.  The data frame contains one row per token
analyzed.  Each such row contains the token, the sum of its database
"good" and "spam" counts, the "good" count divided by the number of
non-spam messages used to create the training database, the "spam"
count divided by the spam message count, Robinson's f(w) for the token,
the natural logs of (1 - f(w)) and f(w), and an indicator character (+
if the token's f(w) value exceeded the minimum deviation from 0.5, - if
it didn't).  There is one additional row at the end of the table that
contains a label in the token field, followed by the number of words
actually used (the ones with + indicators), Robinson's P, Q, S, s and x
values and the minimum deviation.</para>

<para>The R data frame can be saved to a file and later read
into an R session (see <ulink url="http://cran.r-project.org/">the R
project website</ulink> for information about the mathematics package
R).  Provided with the <application>bogofilter</application>
distribution is a simple R script (file bogo.R) that can be used to
verify <application>bogofilter</application>'s calculations.
Instructions for its use are included in the script in the form of
comments.</para>
</refsect1>

<refsect1 id='logmessages'><title>Log messages</title>

<para><application>Bogofilter</application> writes messages to the system log
    when the <option>-l</option> option is used.  What is written depends on
    which other flags are used.</para>

<para>A classification run will generate (we are not showing the date
    and host part here):
    <screen>
bogofilter[1412]: X-Bogosity: Ham, spamicity=0.000227
bogofilter[1415]: X-Bogosity: Spam, spamicity=0.998918
</screen></para>
<para>Using <option>-u</option> to classify a message and update a wordlist
    will produce (one a single line):
    <screen>
bogofilter[1426]: X-Bogosity: Spam, spamicity=0.998918,
  register -s, 329 words, 1 messages
    </screen></para>
    <para>Registering words (<option>-l</option> and <option>-s</option>,
    <option>-n</option>, <option>-S</option>, or <option>-N</option>) will produce:
    <screen>
bogofilter[1440]: register-n, 255 words, 1 messages
</screen></para>

<para>A registration run (using <option>-s</option>, <option>-n</option>,
    <option>-N</option>, or <option>-S</option>) will generate messages like:
<screen>
bogofilter[17330]: register-n, 574 words, 3 messages
bogofilter[6244]: register-s, 1273 words, 4 messages
</screen></para>
</refsect1>

<refsect1 id='files'><title>FILES</title>
<variablelist>
<varlistentry>
<term><filename>@sysconfdir@/bogofilter.cf</filename></term>
<listitem><para>System configuration file.</para></listitem>
</varlistentry>
<varlistentry>
<term><filename>~/.bogofilter.cf</filename></term>
<listitem><para>User configuration file.</para></listitem>
</varlistentry>
<varlistentry>
<term><filename>~/.bogofilter/wordlist.db</filename></term>
<listitem><para>Combined list of good and spam tokens.</para></listitem>
</varlistentry>
</variablelist>
</refsect1>

<refsect1 id='author'><title>AUTHOR</title>
    <literallayout>
Eric S. Raymond <email>esr@thyrsus.com</email>.
David Relson <email>relson@osagesoftware.com</email>.
Matthias Andree <email>matthias.andree@gmx.de</email>.
Greg Louis <email>glouis@dynamicro.on.ca</email>.
</literallayout>

    <para>For updates, see the <ulink url="http://bogofilter.sourceforge.net/">
    bogofilter project page</ulink>.</para>
</refsect1>

<refsect1 id="also">
    <title>SEE ALSO </title>
    <para>bogolexer(1), bogotune(1), bogoupgrade(1), bogoutil(1)</para>
</refsect1>
</refentry>
source-git / bogofilter

Source Code

Files