|
Packit |
e8bc57 |
GETTING STARTED
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Summary:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
0. Terminology
|
|
Packit |
e8bc57 |
1. Installing Bogofilter
|
|
Packit |
e8bc57 |
2. Preparing for use
|
|
Packit |
e8bc57 |
a. Configuring bogofilter
|
|
Packit |
e8bc57 |
b. Training bogofilter
|
|
Packit |
e8bc57 |
3. Setting up the mail transfer and delivery agents
|
|
Packit |
e8bc57 |
4. Use with mail user agent
|
|
Packit |
e8bc57 |
5. Ongoing training
|
|
Packit |
e8bc57 |
6. Tuning bogofilter
|
|
Packit |
e8bc57 |
7. The bogoutil program
|
|
Packit |
e8bc57 |
8. Other useful commands
|
|
Packit |
e8bc57 |
9. Additional information
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
0. Terminology
|
|
Packit |
e8bc57 |
--------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
spam - unwanted email
|
|
Packit |
e8bc57 |
ham - wanted mail (also called non-spam)
|
|
Packit |
e8bc57 |
false positive - a ham message that is wrongly scored as spam
|
|
Packit |
e8bc57 |
false negative - a spam message that is wrongly scored as ham
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
1. Installing Bogofilter
|
|
Packit |
e8bc57 |
------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter can be installed from source or from a binary package.
|
|
Packit |
e8bc57 |
Releases are made available on SourceForge.net.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
If you're a newbie, installing from a binary package is quickest
|
|
Packit |
e8bc57 |
and easiest. If you're running an rpm based distro like Fedora
|
|
Packit |
e8bc57 |
or OpenSUSE, install bogofilter from an rpm. Similarly if you're
|
|
Packit |
e8bc57 |
running Debian, Mint, or Ubuntu, install from a deb package.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Once downloaded and untarred, build and install with the usual
|
|
Packit |
e8bc57 |
commands, i.e. "configure", "make", and "make install". To ensure
|
|
Packit |
e8bc57 |
that the newly built bogofilter is running properly on your hardware
|
|
Packit |
e8bc57 |
and operating system, use "make check" to run a series of tests.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
For source rpms, use "rpm -bb bogofilter.spec" and "rpm -ivh
|
|
Packit |
e8bc57 |
bogofilter" (or comparable commands).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Binary formats include builds for dynamically linked (shared)
|
|
Packit |
e8bc57 |
libaries, e.g. bogofilter-VER.x64_64.rpm.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
See the INSTALL file for more info.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
2. Preparing for use
|
|
Packit |
e8bc57 |
--------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Once bogofilter has been installed, it needs to be configured and
|
|
Packit |
e8bc57 |
trained, i.e. given messages that you classify as spam and ham.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
2a. Configuring bogofilter
|
|
Packit |
e8bc57 |
--------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter's default configuration is conservative, i.e. only
|
|
Packit |
e8bc57 |
messages that score very high on the ham/spam scale are classified
|
|
Packit |
e8bc57 |
as spam. This is done to minimize the number of false positives
|
|
Packit |
e8bc57 |
(non-spam messages which are classified as spam).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
If you need (or wish) to change bogofilter's configuration
|
|
Packit |
e8bc57 |
options, the file is named "bogofilter.cf" and bogofilter first
|
|
Packit |
e8bc57 |
checks for /etc/bogofilter.cf and then for
|
|
Packit |
e8bc57 |
~/.bogofilter/bogofilter.cf. The configuration options are
|
|
Packit |
e8bc57 |
described in file bogofilter.cf.example.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
2b. Training bogofilter
|
|
Packit |
e8bc57 |
-----------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter uses a database for storing its tokens and their ham
|
|
Packit |
e8bc57 |
and spam counts. The file is commonly called "the wordlist" and
|
|
Packit |
e8bc57 |
its standard location is ~/.bogofilter/wordlist.db.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The simple rule when training bogofilter is "more is better".
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
As distributed, bogofilter does not include a wordlist. You, the
|
|
Packit |
e8bc57 |
user, need to tell bogofilter what you consider spam and what you
|
|
Packit |
e8bc57 |
consider ham. This is bogofilter's training process and involves
|
|
Packit |
e8bc57 |
running bogofilter with appropriate flags and with messages you've
|
|
Packit |
e8bc57 |
determined are ham and spam. As bogofilter can work with multiple
|
|
Packit |
e8bc57 |
mail formats, e.g. mailboxes, maildirs, MH directories, etc, the
|
|
Packit |
e8bc57 |
training commands will depend on your environment.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
As the default wordlist directory is $HOME/.bogofilter, the
|
|
Packit |
e8bc57 |
wordlist itself will be in $HOME/.bogofilter/wordlist.db. For
|
|
Packit |
e8bc57 |
user john, this is /home/john/.bogofilter/wordlist.db.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Some useful options for training include:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-s - register message(s) as spam.
|
|
Packit |
e8bc57 |
-n - register message(s) as non-spam.
|
|
Packit |
e8bc57 |
-M - use mailbox mode, i.e. classify multiple messages in an
|
|
Packit |
e8bc57 |
mbox formatted file.
|
|
Packit |
e8bc57 |
-B file1, file2, ... - set bulk mode, i.e. process multiple
|
|
Packit |
e8bc57 |
messages (files or directories) named on the command
|
|
Packit |
e8bc57 |
line.
|
|
Packit |
e8bc57 |
-v - sets the verbosity level, with the -s and -n training
|
|
Packit |
e8bc57 |
options, this will give the number of messages read and
|
|
Packit |
e8bc57 |
words entered in wordlist.db
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
These options are documented in the bogofilter man page.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Here are some sample commands:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogofilter -vn < ham.message.file
|
|
Packit |
e8bc57 |
bogofilter -vnM
|
|
Packit |
e8bc57 |
bogofilter -vnMB ham.maildir
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogofilter -vs < spam.message.file
|
|
Packit |
e8bc57 |
bogofilter -vsM
|
|
Packit |
e8bc57 |
bogofilter -vsMB spam.maildir
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
3. Setting up the mail transfer and delivery agents
|
|
Packit |
e8bc57 |
---------------------------------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter works with many mail transfer agents (such as postfix,
|
|
Packit |
e8bc57 |
sendmail, and qmail) and many mail delivery agents (for example
|
|
Packit |
e8bc57 |
procmail and maildrop). Each of these has its own configuration
|
|
Packit |
e8bc57 |
file and methods for invoking spam filters. Bogofilter's
|
|
Packit |
e8bc57 |
documentation includes files "integrating-with-postfix" and
|
|
Packit |
e8bc57 |
"integrating-with-qmail". Read them for ideas on how to set up
|
|
Packit |
e8bc57 |
bogofilter for your environment.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The most common setup uses bogofilter's "-p" (passthrough) option
|
|
Packit |
e8bc57 |
which adds an "X-Bogosity:" line as the end of the message's mail
|
|
Packit |
e8bc57 |
header. Typical examples of this line are:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
(for spam)
|
|
Packit |
e8bc57 |
X-Bogosity: Spam, tests=bogofilter, spamicity=1.000000, version=0.92.8
|
|
Packit |
e8bc57 |
X-Bogosity: Spam, tests=bogofilter, spamicity=0.999765, version=0.92.8
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
(for non-spam)
|
|
Packit |
e8bc57 |
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=0.92.8
|
|
Packit |
e8bc57 |
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000413, version=0.92.8
|
|
Packit |
e8bc57 |
X-Bogosity: Ham, tests=bogofilter, spamicity=0.373476, version=0.92.8
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
(for "unsures")
|
|
Packit |
e8bc57 |
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.500332, version=0.92.8
|
|
Packit |
e8bc57 |
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.463498, version=0.92.8
|
|
Packit |
e8bc57 |
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.640426, version=0.92.8
|
|
Packit |
e8bc57 |
X-Bogosity: Unsure, tests=bogofilter, spamicity=0.824933, version=0.92.8
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Alternatively, bogofilter's return codes can be used by procmail
|
|
Packit |
e8bc57 |
(or maildrop) rules to put spam in one mailbox and ham in another.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
4. Use with mail user agent
|
|
Packit |
e8bc57 |
---------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter is compatible with all mail user agents. MUAs with
|
|
Packit |
e8bc57 |
filtering abilities can check the headers for "X-Bogosity: Spam"
|
|
Packit |
e8bc57 |
and "X-Bogosity: Ham" and take the appropriate actions for spam and
|
|
Packit |
e8bc57 |
ham.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Alternatively, if your MUA has sufficient scripting capabilities,
|
|
Packit |
e8bc57 |
the MUA can run bogofilter and take the appropriate action.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
As time goes by and bogofilter encounters messages that it can not
|
|
Packit |
e8bc57 |
classify with certainty, there will be messages classified as
|
|
Packit |
e8bc57 |
"Unsure". As these messages are in the "gray" area, meaning "not
|
|
Packit |
e8bc57 |
clearly ham and not clearly spam" it's useful to have your MUA
|
|
Packit |
e8bc57 |
filter these messages to a separate folder (or mailbox) so you can
|
|
Packit |
e8bc57 |
use them to
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
train bogofilter.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
5. Ongoing training
|
|
Packit |
e8bc57 |
-------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter can only do a good job if it has accurate and
|
|
Packit |
e8bc57 |
comprehensive information in its wordlist.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
As time goes by and bogofilter classifies messages for you, it
|
|
Packit |
e8bc57 |
will encounter problems because it does not have enough information
|
|
Packit |
e8bc57 |
to correctly classify each and every message. It's important to
|
|
Packit |
e8bc57 |
check message classifications!
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
"False negatives", i.e. spam classified as ham, are easy since
|
|
Packit |
e8bc57 |
they'll appear in your inbox and be noticed. "False positives"
|
|
Packit |
e8bc57 |
are important to find because they're messages you want! All
|
|
Packit |
e8bc57 |
messages in these groups should be used to train bogofilter.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Filtering "Unsure" messages into a separate folder (or mailbox),
|
|
Packit |
e8bc57 |
and manually classifying and separating them into spam and ham,
|
|
Packit |
e8bc57 |
gives a good set of messages for training (using bogofilter's "-s"
|
|
Packit |
e8bc57 |
and "-n" flags).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter's FAQ has two entries that provide additional info:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
How do I start my bogofilter training?"
|
|
Packit |
e8bc57 |
What are "training on error" and "training to exhaustion"?
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The FAQ can be online in English and French at:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
http://bogofilter.sourceforge.net/bogofilter-faq.html
|
|
Packit |
e8bc57 |
http://bogofilter.sourceforge.net/bogofilter-faq-fr.html
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
6. Tuning bogofilter
|
|
Packit |
e8bc57 |
--------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Once you've use bogofilter for a while, you may wish to optimize
|
|
Packit |
e8bc57 |
its classification parameters. The bogotune utility uses your
|
|
Packit |
e8bc57 |
wordlist and additional ham and spam messages to check a large
|
|
Packit |
e8bc57 |
variety of possible parameter values and find what'll work best
|
|
Packit |
e8bc57 |
for your environment. For more info, read the bogotune man page
|
|
Packit |
e8bc57 |
and file bogofilter-tuning.HOWTO.html.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
7. The bogoutil program
|
|
Packit |
e8bc57 |
-----------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogoutil is a program that allows dumping the wordlist (as a text
|
|
Packit |
e8bc57 |
file), loading the wordlist (from a text file), displaying
|
|
Packit |
e8bc57 |
information about individual words, etc.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Here are some sample uses of it:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To display the wordlist contents:
|
|
Packit |
e8bc57 |
bogoutil -d ~/.bogofilter/wordlist.db
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To display the message counts for a word:
|
|
Packit |
e8bc57 |
bogoutil -w ~/.bogofilter .MSG_COUNT
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
8. Other useful commands
|
|
Packit |
e8bc57 |
------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To test scoring of individual words:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
echo show these words | bogofilter -H -vvv
|
|
Packit |
e8bc57 |
or:
|
|
Packit |
e8bc57 |
bogoutil -p ~/.bogofilter show these words
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To see the tokens and their spamicity scores for a message:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogofilter -vvv < message
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
9. Additional information
|
|
Packit |
e8bc57 |
-------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter's distribution includes a number of files containing
|
|
Packit |
e8bc57 |
more information. You'll find them in /usr/share/doc (or
|
|
Packit |
e8bc57 |
comparable location). The following files are included:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
FAQs:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
English - bogofilter-faq.html
|
|
Packit |
e8bc57 |
French - bogofilter-faq-fr.html
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
General:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
INSTALL
|
|
Packit |
e8bc57 |
NEWS
|
|
Packit |
e8bc57 |
README
|
|
Packit |
e8bc57 |
RELEASE.NOTES
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Man pages:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogofilter
|
|
Packit |
e8bc57 |
bogolexer
|
|
Packit |
e8bc57 |
bogoutil
|
|
Packit |
e8bc57 |
bogotune
|
|
Packit |
e8bc57 |
bogoupgrade
|
|
Packit |
e8bc57 |
(also distributed in html and xml formats)
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
HOWTOS:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogofilter-tuning.HOWTO.html
|
|
Packit |
e8bc57 |
integrating-with-postfix
|
|
Packit |
e8bc57 |
integrating-with-qmail
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Operating System specific README files:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
README.freebsd
|
|
Packit |
e8bc57 |
README.hp-ux
|
|
Packit |
e8bc57 |
README.RISC-OS
|