|
Packit Service |
8f0814 |
'\" t
|
|
Packit Service |
8f0814 |
.\" Title: bogofilter
|
|
Packit Service |
8f0814 |
.\" Author: [see the "AUTHOR" section]
|
|
Packit Service |
8f0814 |
.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
|
|
Packit Service |
8f0814 |
.\" Date: 05/19/2019
|
|
Packit Service |
8f0814 |
.\" Manual: Bogofilter Reference Manual
|
|
Packit Service |
8f0814 |
.\" Source: Bogofilter
|
|
Packit Service |
8f0814 |
.\" Language: English
|
|
Packit Service |
8f0814 |
.\"
|
|
Packit Service |
8f0814 |
.TH "BOGOFILTER" "1" "05/19/2019" "Bogofilter" "Bogofilter Reference Manual"
|
|
Packit Service |
8f0814 |
.\" -----------------------------------------------------------------
|
|
Packit Service |
8f0814 |
.\" * Define some portability stuff
|
|
Packit Service |
8f0814 |
.\" -----------------------------------------------------------------
|
|
Packit Service |
8f0814 |
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
Packit Service |
8f0814 |
.\" http://bugs.debian.org/507673
|
|
Packit Service |
8f0814 |
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
|
|
Packit Service |
8f0814 |
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
Packit Service |
8f0814 |
.ie \n(.g .ds Aq \(aq
|
|
Packit Service |
8f0814 |
.el .ds Aq '
|
|
Packit Service |
8f0814 |
.\" -----------------------------------------------------------------
|
|
Packit Service |
8f0814 |
.\" * set default formatting
|
|
Packit Service |
8f0814 |
.\" -----------------------------------------------------------------
|
|
Packit Service |
8f0814 |
.\" disable hyphenation
|
|
Packit Service |
8f0814 |
.nh
|
|
Packit Service |
8f0814 |
.\" disable justification (adjust text to left margin only)
|
|
Packit Service |
8f0814 |
.ad l
|
|
Packit Service |
8f0814 |
.\" -----------------------------------------------------------------
|
|
Packit Service |
8f0814 |
.\" * MAIN CONTENT STARTS HERE *
|
|
Packit Service |
8f0814 |
.\" -----------------------------------------------------------------
|
|
Packit Service |
8f0814 |
.SH "NAME"
|
|
Packit Service |
8f0814 |
bogofilter \- fast Bayesian spam filter
|
|
Packit Service |
8f0814 |
.SH "SYNOPSIS"
|
|
Packit Service |
8f0814 |
.HP \w'\fBbogofilter\fR\ 'u
|
|
Packit Service |
8f0814 |
\fBbogofilter\fR [help\ options | classification\ options | registration\ options | parameter\ options | info\ options] [general\ options] [config\ file\ options]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
where
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBhelp options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-h] [\-\-help] [\-V] [\-Q]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBclassification options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-p] [\-e] [\-t] [\-T] [\-u] [\-H] [\-M] [\-b] [\-B\ \fIobject\ \&.\&.\&.\fR] [\-R] [general\ options] [parameter\ options] [config\ file\ options]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBregistration options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-s | \-n] [\-S | \-N] [general\ options]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBgeneral options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-c\ \fIfilename\fR] [\-C] [\-d\ \fIdir\fR] [\-k\ \fIcachesize\fR] [\-l] [\-L\ \fItag\fR] [\-I\ \fIfilename\fR] [\-O\ \fIfilename\fR]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBparameter options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-E\ \fIvalue\fR\fI[,value]\fR] [\-m\ \fIvalue\fR\fI[,value]\fR\fI[,value]\fR] [\-o\ \fIvalue\fR\fI[,value]\fR]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBinfo options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-v] [\-y\ \fIdate\fR] [\-D] [\-x\ \fIflags\fR]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fBconfig file options\fR
|
|
Packit Service |
8f0814 |
are:
|
|
Packit Service |
8f0814 |
.HP \w'\ 'u
|
|
Packit Service |
8f0814 |
[\-\-\fIoption=value\fR]
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Note: Use
|
|
Packit Service |
8f0814 |
\fBbogofilter \-\-help\fR
|
|
Packit Service |
8f0814 |
to display the complete list of options\&.
|
|
Packit Service |
8f0814 |
.SH "DESCRIPTION"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
is a Bayesian spam filter\&. In its normal mode of operation, it takes an email message or other text on standard input, does a statistical check against lists of "good" and "bad" words, and returns a status code indicating whether or not the message is spam\&.
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
is designed with a fast algorithm, uses the Berkeley DB for fast startup and lookups, coded directly in C, and tuned for speed, so it can be used for production by sites that process a lot of mail\&.
|
|
Packit Service |
8f0814 |
.SH "THEORY OF OPERATION"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
treats its input as a bag of tokens\&. Each token is checked against a wordlist, which maintains counts of the numbers of times it has occurred in non\-spam and spam mails\&. These numbers are used to compute an estimate of the probability that a message in which the token occurs is spam\&. Those are combined to indicate whether the message is spam or ham\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
While this method sounds crude compared to the more usual pattern\-matching approach, it turns out to be extremely effective\&. Paul Graham\*(Aqs paper
|
|
Packit Service |
8f0814 |
\m[blue]\fBA Plan For Spam\fR\m[]\&\s-2\u[1]\d\s+2
|
|
Packit Service |
8f0814 |
is recommended reading\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
This program substantially improves on Paul\*(Aqs proposal by doing smarter lexical analysis\&.
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
does proper MIME decoding and a reasonable HTML parsing\&. Special kinds of tokens like hostnames and IP addresses are retained as recognition features rather than broken up\&. Various kinds of MTA cruft such as dates and message\-IDs are ignored so as not to bloat the wordlist\&. Tokens found in various header fields are marked appropriately\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Another improvement is that this program offers Gary Robinson\*(Aqs suggested modifications to the calculations (see the parameters robx and robs below)\&. These modifications are described in Robinson\*(Aqs paper
|
|
Packit Service |
8f0814 |
\m[blue]\fBSpam Detection\fR\m[]\&\s-2\u[2]\d\s+2\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Since then, Robinson (see his Linux Journal article
|
|
Packit Service |
8f0814 |
\m[blue]\fBA Statistical Approach to the Spam Problem\fR\m[]\&\s-2\u[3]\d\s+2) and others have realized that the calculation can be further optimized using Fisher\*(Aqs method\&.
|
|
Packit Service |
8f0814 |
\m[blue]\fBAnother improvement\fR\m[]\&\s-2\u[4]\d\s+2
|
|
Packit Service |
8f0814 |
compensates for token redundancy by applying separate effective size factors (ESF) to spam and nonspam probability calculations\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
In short, this is how it works: The estimates for the spam probabilities of the individual tokens are combined using the "inverse chi\-square function"\&. Its value indicates how badly the null hypothesis that the message is just a random collection of independent words with probabilities given by our previous estimates fails\&. This function is very sensitive to small probabilities (hammish words), but not to high probabilities (spammish words); so the value only indicates strong hammish signs in a message\&. Now using inverse probabilities for the tokens, the same computation is done again, giving an indicator that a message looks strongly spammish\&. Finally, those two indicators are subtracted (and scaled into a 0\-1\-interval)\&. This combined indicator (bogosity) is close to 0 if the signs for a hammish message are stronger than for a spammish message and close to 1 if the situation is the other way round\&. If signs for both are equally strong, the value will be near 0\&.5\&. Since those message don\*(Aqt give a clear indication there is a tristate mode in
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to mark those messages as unsure, while the clear messages are marked as spam or ham, respectively\&. In two\-state mode, every message is marked as either spam or ham\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Various parameters influence these calculations, the most important are:
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
robx: the score given to a token which has not seen before\&. robx is the probability that the token is spammish\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
robs: a weight on robx which moves the probability of a little seen token towards robx\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
min\-dev: a minimum distance from \&.5 for tokens to use in the calculation\&. Only tokens farther away from 0\&.5 than this value are used\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
spam\-cutoff: messages with scores greater than or equal to will be marked as spam\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
ham\-cutoff: If zero or spam\-cutoff, all messages with values strictly below spam\-cutoff are marked as ham, all others as spam (two\-state)\&. Else values less than or equal to ham\-cutoff are marked as ham, messages with values strictly between ham\-cutoff and spam\-cutoff are marked as unsure; the rest as spam (tristate)
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
sp\-esf: the effective size factor (ESF) for spam\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
ns\-esf: the ESF for nonspam\&. These ESF values default to 1\&.0, which is the same as not using ESF in the calculation\&. Values suitable to a user\*(Aqs email population can be determined with the aid of the
|
|
Packit Service |
8f0814 |
bogotune
|
|
Packit Service |
8f0814 |
program\&.
|
|
Packit Service |
8f0814 |
.SH "OPTIONS"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
HELP OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-h\fR
|
|
Packit Service |
8f0814 |
option prints the help message and exits\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-V\fR
|
|
Packit Service |
8f0814 |
option prints the version number and exits\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-Q\fR
|
|
Packit Service |
8f0814 |
(query) option prints
|
|
Packit Service |
8f0814 |
bogofilter\*(Aqs configuration, i\&.e\&. registration parameters, parsing options,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
directory, etc\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
CLASSIFICATION OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-p\fR
|
|
Packit Service |
8f0814 |
(passthrough) option outputs the message with an X\-Bogosity line at the end of the message header\&. This requires keeping the entire message in memory when it\*(Aqs read from stdin (or from a pipe or socket)\&. If the message is read from a file that can be rewound,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will read it a second time\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-e\fR
|
|
Packit Service |
8f0814 |
(embed) option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to exit with code 0 if the message can be classified, i\&.e\&. if there is not an error\&. Normally
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
uses different codes for spam, ham, and unsure classifications, but this simplifies using
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
with
|
|
Packit Service |
8f0814 |
procmail
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
maildrop\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-t\fR
|
|
Packit Service |
8f0814 |
(terse) option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to print an abbreviated spamicity message containing 1 letter and the score\&. Spam is indicated with "Y", ham by "N", and unsure by "U"\&. Note: the formatting can be customized using the config file\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-T\fR
|
|
Packit Service |
8f0814 |
provides an invariant terse mode for scripts to use\&.
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will print an abbreviated spamicity message containing 1 letter and the score\&. Spam is indicated with "S", ham by "H", and unsure by "U"\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-TT\fR
|
|
Packit Service |
8f0814 |
provides an invariant terse mode for scripts to use\&.
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
prints only the score and displays it to 16 significant digits\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-u\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to register the message\*(Aqs text after classifying it as spam or non\-spam\&. A spam message will be registered on the spamlist and a non\-spam message on the goodlist\&. If the classification is "unsure", the message will not be registered\&. Effectively this option runs
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
with the
|
|
Packit Service |
8f0814 |
\fB\-s\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-n\fR
|
|
Packit Service |
8f0814 |
flag, as appropriate\&. Caution is urged in the use of this capability, as any classification errors
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
may make will be preserved and will accumulate until manually corrected with the
|
|
Packit Service |
8f0814 |
\fB\-Sn\fR
|
|
Packit Service |
8f0814 |
and
|
|
Packit Service |
8f0814 |
\fB\-Ns\fR
|
|
Packit Service |
8f0814 |
option combinations\&. Note this option causes the database to be opened for write access, which can entail massive slowdowns through lock contention and synchronous I/O operations\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-H\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to not tag tokens from the header\&. This option is for testing, you should not use it in normal operation\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-M\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to process its input as a mbox formatted file\&. If the
|
|
Packit Service |
8f0814 |
\fB\-v\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-t\fR
|
|
Packit Service |
8f0814 |
option is also given, a spamicity line will be printed for each message\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-b\fR
|
|
Packit Service |
8f0814 |
(streaming bulk mode) option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to classify multiple objects whose names are read from stdin\&. If the
|
|
Packit Service |
8f0814 |
\fB\-v\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-t\fR
|
|
Packit Service |
8f0814 |
option is also given,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will print a line giving file name and classification information for each file\&. This is an alternative to
|
|
Packit Service |
8f0814 |
\fB\-B\fR
|
|
Packit Service |
8f0814 |
which lists objects on the command line\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
An object in this context shall be a maildir (autodetected), or if it\*(Aqs not a maildir, a single mail unless
|
|
Packit Service |
8f0814 |
\fB\-M\fR
|
|
Packit Service |
8f0814 |
is given \- in that case it\*(Aqs processed as mbox\&. (The Content\-Length: header is not taken into account currently\&.)
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
When reading mbox format,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
relies on the empty line after a mail\&. If needed,
|
|
Packit Service |
8f0814 |
\fBformail \-es\fR
|
|
Packit Service |
8f0814 |
will ensure this is the case\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-B \fR\fB\fIobject \&.\&.\&.\fR\fR
|
|
Packit Service |
8f0814 |
(bulk mode) option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to classify multiple objects named on the command line\&. The objects may be filenames (for single messages), mailboxes (files with multiple messages), or directories (of maildir and MH format)\&. If the
|
|
Packit Service |
8f0814 |
\fB\-v\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-t\fR
|
|
Packit Service |
8f0814 |
option is also given,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will print a line giving file name and classification information for each file\&. This is an alternative to
|
|
Packit Service |
8f0814 |
\fB\-b\fR
|
|
Packit Service |
8f0814 |
which lists objects on stdin\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-R\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to output an R data frame in text form on the standard output\&. See the section on integration with R, below, for further detail\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
REGISTRATION OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-s\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to register the text presented as spam\&. The database is created if absent\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-n\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to register the text presented as non\-spam\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
doesn\*(Aqt detect if a message registered twice\&. If you do this by accident, the token counts will off by 1 from what you really want and the corresponding spam scores will be slightly off\&. Given a large number of tokens and messages in the wordlist, this doesn\*(Aqt matter\&. The problem
|
|
Packit Service |
8f0814 |
\fIcan\fR
|
|
Packit Service |
8f0814 |
be corrected by using the
|
|
Packit Service |
8f0814 |
\fB\-S\fR
|
|
Packit Service |
8f0814 |
option or the
|
|
Packit Service |
8f0814 |
\fB\-N\fR
|
|
Packit Service |
8f0814 |
option\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-S\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to undo a prior registration of the same message as spam\&. If a message was incorrectly entered as spam by
|
|
Packit Service |
8f0814 |
\fB\-s\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-u\fR
|
|
Packit Service |
8f0814 |
and you want to remove it and enter it as non\-spam, use
|
|
Packit Service |
8f0814 |
\fB\-Sn\fR\&. If
|
|
Packit Service |
8f0814 |
\fB\-S\fR
|
|
Packit Service |
8f0814 |
is used for a message that wasn\*(Aqt registered as spam, the counts will still be decremented\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-N\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to undo a prior registration of the same message as non\-spam\&. If a message was incorrectly entered as non\-spam by
|
|
Packit Service |
8f0814 |
\fB\-n\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-u\fR
|
|
Packit Service |
8f0814 |
and you want to remove it and enter it as spam, then use
|
|
Packit Service |
8f0814 |
\fB\-Ns\fR\&. If
|
|
Packit Service |
8f0814 |
\fB\-N\fR
|
|
Packit Service |
8f0814 |
is used for a message that wasn\*(Aqt registered as non\-spam, the counts will still be decremented\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
GENERAL OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-c \fR\fB\fIfilename\fR\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to read the config file named\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-C\fR
|
|
Packit Service |
8f0814 |
option prevents
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
from reading configuration files\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-d \fR\fB\fIdir\fR\fR
|
|
Packit Service |
8f0814 |
option allows you to set the directory for the database\&. See the ENVIRONMENT section for other directory setting options\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-k \fR\fB\fIcachesize\fR\fR
|
|
Packit Service |
8f0814 |
option sets the cache size for the BerkeleyDB subsystem, in units of 1 MiB (1,048,576 bytes)\&. Properly sizing the cache improves
|
|
Packit Service |
8f0814 |
bogofilter\*(Aqs performance\&. The recommended size is one third of the size of the database file\&. You can run the
|
|
Packit Service |
8f0814 |
bogotune
|
|
Packit Service |
8f0814 |
script (in the tuning directory) to determine the recommended size\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-l\fR
|
|
Packit Service |
8f0814 |
option writes an informational line to the system log each time
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
is run\&. The information logged depends on how
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
is run\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-L \fR\fB\fItag\fR\fR
|
|
Packit Service |
8f0814 |
option configures a tag which can be included in the information being logged by the
|
|
Packit Service |
8f0814 |
\fB\-l\fR
|
|
Packit Service |
8f0814 |
option, but it requires a custom format that includes the %l string for now\&. This option implies
|
|
Packit Service |
8f0814 |
\fB\-l\fR\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-I \fR\fB\fIfilename\fR\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to read its input from the specified file, rather than from
|
|
Packit Service |
8f0814 |
\fBstdin\fR\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-O \fR\fB\fIfilename\fR\fR
|
|
Packit Service |
8f0814 |
option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
where to write its output in passthrough mode\&. Note that this only works when \-p is explicitly given\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
PARAMETER OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-E \fR\fB\fIvalue\fR\fI[,value]\fR\fR
|
|
Packit Service |
8f0814 |
option allows setting the sp\-esf value and the ns\-esf value\&. With two values, both sp\-esf and ns\-esf are set\&. If only one value is given, parameters are set as described in the note below\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-m \fR\fB\fIvalue\fR\fI[,value]\fR\fI[,value]\fR\fR
|
|
Packit Service |
8f0814 |
option allows setting the min\-dev value and, optionally, the robs and robx values\&. With three values, min\-dev, robs, and robx are all set\&. If fewer values are given, parameters are set as described in the note below\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-o \fR\fB\fIvalue\fR\fI[,value]\fR\fR
|
|
Packit Service |
8f0814 |
option allows setting the spam\-cutoff ham\-cutoff values\&. With two values, both spam\-cutoff and ham\-cutoff are set\&. If only one value is given, parameters are set as described in the note below\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Note: All of these options allow fewer values to be provided\&. Values can be skipped by using just the comma delimiter, in which case the corresponding parameter(s) won\*(Aqt be changed\&. If only the first value is provided, then only the first parameter is set\&. Trailing values can be skipped, in which case the corresponding parameters won\*(Aqt be changed\&. Within the parameter list, spaces are not allowed after commas\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
INFO OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-v\fR
|
|
Packit Service |
8f0814 |
option produces a report to standard output on
|
|
Packit Service |
8f0814 |
bogofilter\*(Aqs analysis of the input\&. Each additional
|
|
Packit Service |
8f0814 |
\fBv\fR
|
|
Packit Service |
8f0814 |
will increase the verbosity of the output, up to a maximum of 4\&. With
|
|
Packit Service |
8f0814 |
\fB\-vv\fR, the report lists the tokens with highest deviation from a mean of 0\&.5 association with spam\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Option
|
|
Packit Service |
8f0814 |
\fB\-y date\fR
|
|
Packit Service |
8f0814 |
can be used to override the current date when timestamping tokens\&. A value of zero (0) turns off timestamping\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-D\fR
|
|
Packit Service |
8f0814 |
option redirects debug output to stdout\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
\fB\-x \fR\fB\fIflags\fR\fR
|
|
Packit Service |
8f0814 |
option allows setting of debug flags for printing debug information\&. See header file debug\&.h for the list of usable flags\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
CONFIG FILE OPTIONS
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Using GNU longopt
|
|
Packit Service |
8f0814 |
\fB\-\-\fR
|
|
Packit Service |
8f0814 |
syntax, a config file\*(Aqs
|
|
Packit Service |
8f0814 |
\fB\fIname=value\fR\fR
|
|
Packit Service |
8f0814 |
statement becomes a command line\*(Aqs
|
|
Packit Service |
8f0814 |
\fB\-\-\fR\fB\fIoption=value\fR\fR\&. Use command
|
|
Packit Service |
8f0814 |
\fBbogofilter \-\-help\fR
|
|
Packit Service |
8f0814 |
for a list of options and see
|
|
Packit Service |
8f0814 |
bogofilter\&.cf\&.example
|
|
Packit Service |
8f0814 |
for more info on them\&. For example to change the X\-Bogosity header to "X\-Spam\-Header", use:
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
\fB\fI\-\-spam\-header\-name=X\-Spam\-Header\fR\fR
|
|
Packit Service |
8f0814 |
.SH "ENVIRONMENT"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
uses a database directory, which can be set in the config file\&. If not set there,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will use the value of
|
|
Packit Service |
8f0814 |
\fBBOGOFILTER_DIR\fR\&. Both can be overridden by the
|
|
Packit Service |
8f0814 |
\fB\-d \fR\fB\fIdir\fR\fR
|
|
Packit Service |
8f0814 |
option\&. If none of that is available,
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will use directory
|
|
Packit Service |
8f0814 |
$HOME/\&.bogofilter\&.
|
|
Packit Service |
8f0814 |
.SH "CONFIGURATION"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
command line allows setting of many options that determine how
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
operates\&. File
|
|
Packit Service |
8f0814 |
@sysconfdir@/bogofilter\&.cf
|
|
Packit Service |
8f0814 |
can be used to set additional parameters that affect its operation\&. File
|
|
Packit Service |
8f0814 |
@sysconfdir@/bogofilter\&.cf\&.example
|
|
Packit Service |
8f0814 |
has samples of all of the parameters\&. Status and logging messages can be customized for each site\&.
|
|
Packit Service |
8f0814 |
.SH "RETURN VALUES"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
0 for spam; 1 for non\-spam; 2 for unsure ; 3 for I/O or other errors\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
If both
|
|
Packit Service |
8f0814 |
\fB\-p\fR
|
|
Packit Service |
8f0814 |
and
|
|
Packit Service |
8f0814 |
\fB\-e\fR
|
|
Packit Service |
8f0814 |
are used, the return values are: 0 for spam or non\-spam; 3 for I/O or other errors\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Error 3 usually means that the wordlist file
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
wants to read at startup is missing or the hard disk has filled up in
|
|
Packit Service |
8f0814 |
\fB\-p\fR
|
|
Packit Service |
8f0814 |
mode\&.
|
|
Packit Service |
8f0814 |
.SH "INTEGRATION WITH OTHER TOOLS"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Use with procmail
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The following recipe (a) spam\-bins anything that
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
rates as spam, (b) registers the words in messages rated as spam as such, and (c) registers the words in messages rated as non\-spam as such\&. With this in place, it will normally only be necessary for the user to intervene (with
|
|
Packit Service |
8f0814 |
\fB\-Ns\fR
|
|
Packit Service |
8f0814 |
or
|
|
Packit Service |
8f0814 |
\fB\-Sn\fR) when
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
miscategorizes something\&.
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
# filter mail through bogofilter, tagging it as Ham, Spam, or Unsure,
|
|
Packit Service |
8f0814 |
# and updating the wordlist
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
:0fw
|
|
Packit Service |
8f0814 |
| bogofilter \-u \-e \-p
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
# if bogofilter failed, return the mail to the queue;
|
|
Packit Service |
8f0814 |
# the MTA will retry to deliver it later
|
|
Packit Service |
8f0814 |
# 75 is the value for EX_TEMPFAIL in /usr/include/sysexits\&.h
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
:0e
|
|
Packit Service |
8f0814 |
{ EXITCODE=75 HOST }
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
# file the mail to spam\-bogofilter if it\*(Aqs spam\&.
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
:0:
|
|
Packit Service |
8f0814 |
* ^X\-Bogosity: Spam, tests=bogofilter
|
|
Packit Service |
8f0814 |
spam\-bogofilter
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
# file the mail to unsure\-bogofilter
|
|
Packit Service |
8f0814 |
# if it\*(Aqs neither ham nor spam\&.
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
:0:
|
|
Packit Service |
8f0814 |
* ^X\-Bogosity: Unsure, tests=bogofilter
|
|
Packit Service |
8f0814 |
unsure\-bogofilter
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
# With this recipe, you can train bogofilter starting with an empty
|
|
Packit Service |
8f0814 |
# wordlist\&. Be sure to check your unsure\-folder regularly, take the
|
|
Packit Service |
8f0814 |
# messages out of it, classify them as ham (or spam), and use them to
|
|
Packit Service |
8f0814 |
# train bogofilter\&.
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The following procmail rule will take mail on stdin and save it to file
|
|
Packit Service |
8f0814 |
spam
|
|
Packit Service |
8f0814 |
if
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
thinks it\*(Aqs spam:
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
:0HB:
|
|
Packit Service |
8f0814 |
* ? bogofilter
|
|
Packit Service |
8f0814 |
spam
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
and this similar rule will also register the tokens in the mail according to the
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
classification:
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
:0HB:
|
|
Packit Service |
8f0814 |
* ? bogofilter \-u
|
|
Packit Service |
8f0814 |
spam
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
If
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
fails (returning 3) the message will be treated as non\-spam\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
This one is for
|
|
Packit Service |
8f0814 |
maildrop, it automatically defers the mail and retries later when the
|
|
Packit Service |
8f0814 |
xfilter
|
|
Packit Service |
8f0814 |
command fails, use this in your
|
|
Packit Service |
8f0814 |
~/\&.mailfilter:
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
xfilter "bogofilter \-u \-e \-p"
|
|
Packit Service |
8f0814 |
if (/^X\-Bogosity: Spam, tests=bogofilter/)
|
|
Packit Service |
8f0814 |
{
|
|
Packit Service |
8f0814 |
to "spam\-bogofilter"
|
|
Packit Service |
8f0814 |
}
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The following
|
|
Packit Service |
8f0814 |
\&.muttrc
|
|
Packit Service |
8f0814 |
lines will create mutt macros for dispatching mail to
|
|
Packit Service |
8f0814 |
bogofilter\&.
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
macro index d "<enter\-command>unset wait_key\en\e
|
|
Packit Service |
8f0814 |
<pipe\-entry>bogofilter \-n\en\e
|
|
Packit Service |
8f0814 |
<enter\-command>set wait_key\en\e
|
|
Packit Service |
8f0814 |
<delete\-message>" "delete message as non\-spam"
|
|
Packit Service |
8f0814 |
macro index \eed "<enter\-command>unset wait_key\en\e
|
|
Packit Service |
8f0814 |
<pipe\-entry>bogofilter \-s\en\e
|
|
Packit Service |
8f0814 |
<enter\-command>set wait_key\en\e
|
|
Packit Service |
8f0814 |
<delete\-message>" "delete message as spam"
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Integration with Mail Transport Agent (MTA)
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.ie n \{\
|
|
Packit Service |
8f0814 |
\h'-04' 1.\h'+01'\c
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.el \{\
|
|
Packit Service |
8f0814 |
.sp -1
|
|
Packit Service |
8f0814 |
.IP " 1." 4.2
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
can also be integrated into an MTA to filter all incoming mail\&. While the specific implementation is MTA dependent, the general steps are as follows:
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.ie n \{\
|
|
Packit Service |
8f0814 |
\h'-04' 2.\h'+01'\c
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.el \{\
|
|
Packit Service |
8f0814 |
.sp -1
|
|
Packit Service |
8f0814 |
.IP " 2." 4.2
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
Install
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
on the mail server
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.ie n \{\
|
|
Packit Service |
8f0814 |
\h'-04' 3.\h'+01'\c
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.el \{\
|
|
Packit Service |
8f0814 |
.sp -1
|
|
Packit Service |
8f0814 |
.IP " 3." 4.2
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
Prime the
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
databases with a spam and non\-spam corpus\&. Since
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
will be serving a larger community, it is important to prime it with a representative set of messages\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.ie n \{\
|
|
Packit Service |
8f0814 |
\h'-04' 4.\h'+01'\c
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.el \{\
|
|
Packit Service |
8f0814 |
.sp -1
|
|
Packit Service |
8f0814 |
.IP " 4." 4.2
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
Set up the MTA to invoke
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
on each message\&. While this is an MTA specific step, you\*(Aqll probably need to use the
|
|
Packit Service |
8f0814 |
\fB\-p\fR,
|
|
Packit Service |
8f0814 |
\fB\-u\fR, and
|
|
Packit Service |
8f0814 |
\fB\-e\fR
|
|
Packit Service |
8f0814 |
options\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.ie n \{\
|
|
Packit Service |
8f0814 |
\h'-04' 5.\h'+01'\c
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.el \{\
|
|
Packit Service |
8f0814 |
.sp -1
|
|
Packit Service |
8f0814 |
.IP " 5." 4.2
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
Set up a mechanism for users to register spam/non\-spam messages, as well as to correct mis\-classifications\&. The most generic solution is to set up alias email addresses to which users bounce messages\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.ie n \{\
|
|
Packit Service |
8f0814 |
\h'-04' 6.\h'+01'\c
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.el \{\
|
|
Packit Service |
8f0814 |
.sp -1
|
|
Packit Service |
8f0814 |
.IP " 6." 4.2
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
See the
|
|
Packit Service |
8f0814 |
doc
|
|
Packit Service |
8f0814 |
and
|
|
Packit Service |
8f0814 |
contrib
|
|
Packit Service |
8f0814 |
directories for more information\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Use of R to verify
|
|
Packit Service |
8f0814 |
bogofilter\*(Aqs calculations
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The \-R option tells
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
to generate an R data frame\&. The data frame contains one row per token analyzed\&. Each such row contains the token, the sum of its database "good" and "spam" counts, the "good" count divided by the number of non\-spam messages used to create the training database, the "spam" count divided by the spam message count, Robinson\*(Aqs f(w) for the token, the natural logs of (1 \- f(w)) and f(w), and an indicator character (+ if the token\*(Aqs f(w) value exceeded the minimum deviation from 0\&.5, \- if it didn\*(Aqt)\&. There is one additional row at the end of the table that contains a label in the token field, followed by the number of words actually used (the ones with + indicators), Robinson\*(Aqs P, Q, S, s and x values and the minimum deviation\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
The R data frame can be saved to a file and later read into an R session (see
|
|
Packit Service |
8f0814 |
\m[blue]\fBthe R project website\fR\m[]\&\s-2\u[5]\d\s+2
|
|
Packit Service |
8f0814 |
for information about the mathematics package R)\&. Provided with the
|
|
Packit Service |
8f0814 |
bogofilter
|
|
Packit Service |
8f0814 |
distribution is a simple R script (file bogo\&.R) that can be used to verify
|
|
Packit Service |
8f0814 |
bogofilter\*(Aqs calculations\&. Instructions for its use are included in the script in the form of comments\&.
|
|
Packit Service |
8f0814 |
.SH "LOG MESSAGES"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Bogofilter
|
|
Packit Service |
8f0814 |
writes messages to the system log when the
|
|
Packit Service |
8f0814 |
\fB\-l\fR
|
|
Packit Service |
8f0814 |
option is used\&. What is written depends on which other flags are used\&.
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
A classification run will generate (we are not showing the date and host part here):
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
bogofilter[1412]: X\-Bogosity: Ham, spamicity=0\&.000227
|
|
Packit Service |
8f0814 |
bogofilter[1415]: X\-Bogosity: Spam, spamicity=0\&.998918
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Using
|
|
Packit Service |
8f0814 |
\fB\-u\fR
|
|
Packit Service |
8f0814 |
to classify a message and update a wordlist will produce (one a single line):
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
bogofilter[1426]: X\-Bogosity: Spam, spamicity=0\&.998918,
|
|
Packit Service |
8f0814 |
register \-s, 329 words, 1 messages
|
|
Packit Service |
8f0814 |
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
Registering words (\fB\-l\fR
|
|
Packit Service |
8f0814 |
and
|
|
Packit Service |
8f0814 |
\fB\-s\fR,
|
|
Packit Service |
8f0814 |
\fB\-n\fR,
|
|
Packit Service |
8f0814 |
\fB\-S\fR, or
|
|
Packit Service |
8f0814 |
\fB\-N\fR) will produce:
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
bogofilter[1440]: register\-n, 255 words, 1 messages
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
A registration run (using
|
|
Packit Service |
8f0814 |
\fB\-s\fR,
|
|
Packit Service |
8f0814 |
\fB\-n\fR,
|
|
Packit Service |
8f0814 |
\fB\-N\fR, or
|
|
Packit Service |
8f0814 |
\fB\-S\fR) will generate messages like:
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
bogofilter[17330]: register\-n, 574 words, 3 messages
|
|
Packit Service |
8f0814 |
bogofilter[6244]: register\-s, 1273 words, 4 messages
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.SH "FILES"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
@sysconfdir@/bogofilter\&.cf
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
System configuration file\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
~/\&.bogofilter\&.cf
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
User configuration file\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
~/\&.bogofilter/wordlist\&.db
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
Combined list of good and spam tokens\&.
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.SH "AUTHOR"
|
|
Packit Service |
8f0814 |
.sp
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.nf
|
|
Packit Service |
8f0814 |
Eric S\&. Raymond <esr@thyrsus\&.com>\&.
|
|
Packit Service |
8f0814 |
David Relson <relson@osagesoftware\&.com>\&.
|
|
Packit Service |
8f0814 |
Matthias Andree <matthias\&.andree@gmx\&.de>\&.
|
|
Packit Service |
8f0814 |
Greg Louis <glouis@dynamicro\&.on\&.ca>\&.
|
|
Packit Service |
8f0814 |
.fi
|
|
Packit Service |
8f0814 |
.if n \{\
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.\}
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
For updates, see the
|
|
Packit Service |
8f0814 |
\m[blue]\fBbogofilter project page\fR\m[]\&\s-2\u[6]\d\s+2\&.
|
|
Packit Service |
8f0814 |
.SH "SEE ALSO"
|
|
Packit Service |
8f0814 |
.PP
|
|
Packit Service |
8f0814 |
bogolexer(1), bogotune(1), bogoupgrade(1), bogoutil(1)
|
|
Packit Service |
8f0814 |
.SH "NOTES"
|
|
Packit Service |
8f0814 |
.IP " 1." 4
|
|
Packit Service |
8f0814 |
A Plan For Spam
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
\%http://www.paulgraham.com/spam.html
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.IP " 2." 4
|
|
Packit Service |
8f0814 |
Spam Detection
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
\%http://radio-weblogs.com/0101454/stories/2002/09/16/spamDetection.html
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.IP " 3." 4
|
|
Packit Service |
8f0814 |
A Statistical Approach to the Spam Problem
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
\%http://www.linuxjournal.com/article/6467
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.IP " 4." 4
|
|
Packit Service |
8f0814 |
Another improvement
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
\%http://www.garyrobinson.net/2004/04/improved%5fchi.html
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.IP " 5." 4
|
|
Packit Service |
8f0814 |
the R project website
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
\%http://cran.r-project.org/
|
|
Packit Service |
8f0814 |
.RE
|
|
Packit Service |
8f0814 |
.IP " 6." 4
|
|
Packit Service |
8f0814 |
bogofilter project page
|
|
Packit Service |
8f0814 |
.RS 4
|
|
Packit Service |
8f0814 |
\%http://bogofilter.sourceforge.net/
|
|
Packit Service |
8f0814 |
.RE
|