GETTING STARTED Summary: 0. Terminology 1. Installing Bogofilter 2. Preparing for use a. Configuring bogofilter b. Training bogofilter 3. Setting up the mail transfer and delivery agents 4. Use with mail user agent 5. Ongoing training 6. Tuning bogofilter 7. The bogoutil program 8. Other useful commands 9. Additional information 0. Terminology -------------- spam - unwanted email ham - wanted mail (also called non-spam) false positive - a ham message that is wrongly scored as spam false negative - a spam message that is wrongly scored as ham 1. Installing Bogofilter ------------------------ Bogofilter can be installed from source or from a binary package. Releases are made available on SourceForge.net. If you're a newbie, installing from a binary package is quickest and easiest. If you're running an rpm based distro like Fedora or OpenSUSE, install bogofilter from an rpm. Similarly if you're running Debian, Mint, or Ubuntu, install from a deb package. Once downloaded and untarred, build and install with the usual commands, i.e. "configure", "make", and "make install". To ensure that the newly built bogofilter is running properly on your hardware and operating system, use "make check" to run a series of tests. For source rpms, use "rpm -bb bogofilter.spec" and "rpm -ivh bogofilter" (or comparable commands). Binary formats include builds for dynamically linked (shared) libaries, e.g. bogofilter-VER.x64_64.rpm. See the INSTALL file for more info. 2. Preparing for use -------------------- Once bogofilter has been installed, it needs to be configured and trained, i.e. given messages that you classify as spam and ham. 2a. Configuring bogofilter -------------------------- Bogofilter's default configuration is conservative, i.e. only messages that score very high on the ham/spam scale are classified as spam. This is done to minimize the number of false positives (non-spam messages which are classified as spam). If you need (or wish) to change bogofilter's configuration options, the file is named "bogofilter.cf" and bogofilter first checks for /etc/bogofilter.cf and then for ~/.bogofilter/bogofilter.cf. The configuration options are described in file bogofilter.cf.example. 2b. Training bogofilter ----------------------- Bogofilter uses a database for storing its tokens and their ham and spam counts. The file is commonly called "the wordlist" and its standard location is ~/.bogofilter/wordlist.db. The simple rule when training bogofilter is "more is better". As distributed, bogofilter does not include a wordlist. You, the user, need to tell bogofilter what you consider spam and what you consider ham. This is bogofilter's training process and involves running bogofilter with appropriate flags and with messages you've determined are ham and spam. As bogofilter can work with multiple mail formats, e.g. mailboxes, maildirs, MH directories, etc, the training commands will depend on your environment. As the default wordlist directory is $HOME/.bogofilter, the wordlist itself will be in $HOME/.bogofilter/wordlist.db. For user john, this is /home/john/.bogofilter/wordlist.db. Some useful options for training include: -s - register message(s) as spam. -n - register message(s) as non-spam. -M - use mailbox mode, i.e. classify multiple messages in an mbox formatted file. -B file1, file2, ... - set bulk mode, i.e. process multiple messages (files or directories) named on the command line. -v - sets the verbosity level, with the -s and -n training options, this will give the number of messages read and words entered in wordlist.db These options are documented in the bogofilter man page. Here are some sample commands: bogofilter -vn < ham.message.file bogofilter -vnM