|
Packit |
e8bc57 |
WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
POTENTIAL FOR DATA CORRUPTION DURING UPDATES
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
If you plan to upgrade your database library, if only as a side effect
|
|
Packit |
e8bc57 |
of an operating system upgrade, DO HEED the relevant documentation, for
|
|
Packit |
e8bc57 |
instance, the doc/README.db file. You may need to prepare the upgrade
|
|
Packit |
e8bc57 |
with the old version of the software.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Otherwise, you may cause irrecoverable damage to your databases.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
DO backup your databases before making the upgrade.
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
This file documents changes in bogofilter since version 0.11. In
|
|
Packit |
e8bc57 |
particular it describes: (1) Features, which are significant changes
|
|
Packit |
e8bc57 |
(noteworthy and compatible) and (2) Incompatibilities, which are
|
|
Packit |
e8bc57 |
changes that require action upon update.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Caution: If upgrading from an old version and skipping several
|
|
Packit |
e8bc57 |
intervening versions of bogofilter, be smart and check all the
|
|
Packit |
e8bc57 |
changes of the versions you skipped! In particular, read the sections
|
|
Packit |
e8bc57 |
labeled "Incompat" and "Major".
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
NOTE: the NEWS document has greater detail on some of these changes.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 1.2.5] Kyoto Cabinet and LMDB support added.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter, as of release 1.2.5, supports:
|
|
Packit |
e8bc57 |
+ Kyoto Cabinet databases, courtesy of Denny Lin. The Kyoto Cabinet database
|
|
Packit |
e8bc57 |
is written and maintained by the same author as the Toyko Cabinet database,
|
|
Packit |
e8bc57 |
and they recommend to use Kyoto Cabinet instead of Tokyo Cabinet.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
+ LMDB databases (Lightning Memory-Mapped Database Manager), courteously
|
|
Packit |
e8bc57 |
implemented and contributed by Steffen Nurpmeso.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 1.1.6] Tokyo Cabinet support (B+-trees with transactions) added
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter, as of release 1.1.6, supports Tokyo Cabinet databases,
|
|
Packit |
e8bc57 |
courtesy of Pierre Habouzit. Tokyo Cabinet is the sequel to QDBM
|
|
Packit |
e8bc57 |
with support for larger files and also written by Mikio Hirabayashi.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
For new installations, if you considered using QDBM, consider using
|
|
Packit |
e8bc57 |
Tokyo Cabinet instead.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.96.0] TDB removed
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Support for the TDB database library has been removed.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.95.2] Applies to Berkeley DB Transactional ONLY:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
This release gives up on locking the databases at page granularity and
|
|
Packit |
e8bc57 |
locks whole environments, to overcome lock sizing requirements which are
|
|
Packit |
e8bc57 |
a major issue in unattended setups.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
This however means that a writer (token registration) will lock out
|
|
Packit |
e8bc57 |
readers (message scoring) and readers will prevent new writers from
|
|
Packit |
e8bc57 |
starting. This may be fixed in a future version.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 0.95.0] Unicode in UTF-8
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
This release supports Unicode (UTF-8). A new meta-token .ENCODING has
|
|
Packit |
e8bc57 |
been added to the wordlist so that bogofilter can determine if it's
|
|
Packit |
e8bc57 |
using Unicode or not. A value of 1 indicates raw storage and 2
|
|
Packit |
e8bc57 |
indicates UTF-8 encoded tokens. Bogofilter checks for this meta-token
|
|
Packit |
e8bc57 |
and converts incoming text to UTF-8 as appropriate.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Command line options "--unicode=yes" and "--unicode=no" can be used.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
- With bogofilter, they control encoding of newly created databases.
|
|
Packit |
e8bc57 |
- With bogoutil, --unicode=yes converts the wordlist to Unicode.
|
|
Packit |
e8bc57 |
- For bogolexer, they print parser results in new and old modes.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
./configure options allow bogofilter customization:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
- "./configure --unicode=yes" will _always_ operate in Unicode mode
|
|
Packit |
e8bc57 |
- "./configure --unicode=no" will _never_ operate in Unicode mode
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Wordlists can be converted from raw storage to Unicode using:
|
|
Packit |
e8bc57 |
NOTE: Replace iso-8859-1 by the character set and encoding of the
|
|
Packit |
e8bc57 |
dominant input token character set!
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogoutil -d wordlist.db > wordlist.raw.txt
|
|
Packit |
e8bc57 |
iconv -f iso-8859-1 -t UTF-8 < wordlist.raw.txt > wordlist.UTF-8.txt
|
|
Packit |
e8bc57 |
bogoutil -l wordlist.db.new < wordlist.UTF-8.txt
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
For a wordlist containing tokens from multiple languages, particularly
|
|
Packit |
e8bc57 |
non-European languages, the conversion methods described above may not
|
|
Packit |
e8bc57 |
work well for you. Building a new wordlist (from scratch) will likely
|
|
Packit |
e8bc57 |
work better as the new wordlist will be based solely on Unicode.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.94.12] Changed Options
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Some options have been added or modified. If you use any of the
|
|
Packit |
e8bc57 |
changed options, you will probably need to modify your scripts,
|
|
Packit |
e8bc57 |
procmail recipes, etc. As an example, some bogoutil options which
|
|
Packit |
e8bc57 |
used to allow either filenames or directory names are now restricted
|
|
Packit |
e8bc57 |
to filenames. See the man pages and help messages if you have
|
|
Packit |
e8bc57 |
questions.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
------------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.94.0] Transactions
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The transactional mode now defaults to off because the lock table sizing
|
|
Packit |
e8bc57 |
issue is unresolved.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter and bogoutil now support both build-time and run-time
|
|
Packit |
e8bc57 |
choosing whether to operate with (or without) transaction support.
|
|
Packit |
e8bc57 |
They can also auto-detect whether you've been using transactions or not.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Run-time Selection:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
For bogofilter and bogoutil, transactions can be enabled or disabled
|
|
Packit |
e8bc57 |
in 2 ways -- by command line options or config file options.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Command line option "--db-transaction=yes" enables transactions and
|
|
Packit |
e8bc57 |
"--db-transaction=no" disables them.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Config file options "db_transaction=yes" and "db_transaction=no"
|
|
Packit |
e8bc57 |
have the same effect.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Auto-detection:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
If none of the above methods are used to enable/disable transactions,
|
|
Packit |
e8bc57 |
bogofilter and bogoutil will query Berkeley DB to see if a transaction
|
|
Packit |
e8bc57 |
environment already exists. If so, transactions will be enabled. If
|
|
Packit |
e8bc57 |
not, they will be disabled.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Compile-time selection:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
A default build includes the run-time and auto-detect capabilities.
|
|
Packit |
e8bc57 |
If you wish to minimize program size, ./configure can be used to
|
|
Packit |
e8bc57 |
create single mode versions of bogofilter and bogoutil, i.e. programs
|
|
Packit |
e8bc57 |
that only run transactionally or non-transactionallly. Use
|
|
Packit |
e8bc57 |
"./configure --enable-transactions" to enable transactions and
|
|
Packit |
e8bc57 |
"./configure --disable-transactions" to disable them. These programs
|
|
Packit |
e8bc57 |
will be _slightly_ smaller than the default build.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.93] Summary for the hasty
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
YOU MUST ADJUST YOUR SCRIPTS EVALUATING "X-Bogosity" HEADERS!
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
YOU MAY NEED TO ADJUST YOUR SCRIPTS THAT PARSE 'bogofilter -V'!
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
WHEN USING BERKELEY DB (DEFAULT), NFS NO LONGER WORKS AND
|
|
Packit |
e8bc57 |
YOU M U S T READ doc/README.db AND POSSIBLY CONFIGURE THE DATABASE!
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.93] Defaults changed
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter's defaults have been changed. It now operates in tri-state
|
|
Packit |
e8bc57 |
mode and will classify messages as Spam, Ham, or Unsure.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
If you're checking messages for "X-Bogosity: Yes" or "X-Bogosity: No",
|
|
Packit |
e8bc57 |
you _need_ to change your checks. Use "X-Bogosity: Spam" and
|
|
Packit |
e8bc57 |
"X-Bogosity: Ham" instead of the old forms. Also, checking for
|
|
Packit |
e8bc57 |
"X-Bogosity: Unsure" and putting those messages in a separate folder (or
|
|
Packit |
e8bc57 |
mailbox) will give you an excellent set of messages for training, as
|
|
Packit |
e8bc57 |
"Unsure" messages are messages that bogofilter has too little
|
|
Packit |
e8bc57 |
information to classify (with certainty) as spam or ham.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.93] Berkeley DB switched to Transactional Data Store
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will now use the Berkeley DB Transactional Data Store when
|
|
Packit |
e8bc57 |
compiled with Berkeley DB as the data base engine (the default).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
This means the Berkeley DB directory can no longer reside on a networked
|
|
Packit |
e8bc57 |
or otherwise shared file system (such as NFS, AFS, Coda).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
When using BerkeleyDB 4.1 - 4.3, it is recommended that you dump and
|
|
Packit |
e8bc57 |
load the data bases to add checksums, for enhanced reliablity. See
|
|
Packit |
e8bc57 |
section 2.2 in doc/README.db for details.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
This means that bogofilter programs now exhibit the A C I D traits:
|
|
Packit |
e8bc57 |
changes are atomic (all-or-nothing); the data base is always consistent;
|
|
Packit |
e8bc57 |
changes are always isolated from each other; and all changes that are
|
|
Packit |
e8bc57 |
acknowledged are durable.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter can support multiple writers at the same time, mixed freely
|
|
Packit |
e8bc57 |
with simultaneous readers, and the data base will not be corrupted by
|
|
Packit |
e8bc57 |
application or system crashes, except when the disk drive gets damaged.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Note that this requires that the operating system and disk drive
|
|
Packit |
e8bc57 |
maintain proper write order on the disk, and that both be honest about
|
|
Packit |
e8bc57 |
synchronous I/O completion.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Note also that this causes bogofilter to write additional "log" files
|
|
Packit |
e8bc57 |
to its ~/.bogofilter (or other) home directory. The log files need to
|
|
Packit |
e8bc57 |
be archived or deleted periodically.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
For detailed instructions, be sure to _read_ doc/README.db and check the
|
|
Packit |
e8bc57 |
BerkeleyDB documentation.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
As a backwards compatibility option, for instance when space and I/O
|
|
Packit |
e8bc57 |
bandwidth are tight, it is possible to use the old non-transactional,
|
|
Packit |
e8bc57 |
non-concurrent Berkeley DB Data Store, which can only register messages
|
|
Packit |
e8bc57 |
when there are NO scoring processes at all and that may not be able
|
|
Packit |
e8bc57 |
recover from application or system crashes.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
These benefits are not available when bogofilter is compiled to use the
|
|
Packit |
e8bc57 |
TDB or QDBM data bases.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.93] Berkeley DB version strings changed
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will now return the BerkeleyDB's actual DB_VERSION_STRING
|
|
Packit |
e8bc57 |
in the output of 'bogofilter -V'. The OLD format was:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Database: BerkeleyDB (4.3.21)
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The NEW format is:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Database: Sleepycat Software: Berkeley DB 4.3.21: (November 8, 2004)
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
You may need to adjust your scripts.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.93] QDBM database format changed to B+ trees
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The QDBM database format has been changed from hash tables to B+
|
|
Packit |
e8bc57 |
trees, i.e. from the Depot API to the Villa API. This results in
|
|
Packit |
e8bc57 |
significantly better performance, i.e. faster speed. Unfortunately,
|
|
Packit |
e8bc57 |
the two modes are incompatible, so upgrading to 0.93 requires running
|
|
Packit |
e8bc57 |
a special command to convert the database once:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
bogoQDBMupgrade wordlist.qdbm wordlist.tmp wordlist.qdbm.old
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
If this command didn't print anything, everything has gone well and it
|
|
Packit |
e8bc57 |
has left your old data base in wordlist.qdbm.old.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
NOTE: bogoQDBMupgrade needs qdbm-1.7.23 or newer to build.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.93] Bogotune option parsing changes
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
In bogotune 0.93.2 and newer, you must repeat the -n or -s option as
|
|
Packit |
e8bc57 |
prefix for the mailbox.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Example: bogotune -n good1 good2 -s bad1 bad2 ...
|
|
Packit |
e8bc57 |
will be: bogotune -n good1 -n good2 -s bad1 -s bad2
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 0.93.3] SQLite 3.0.8 (and newer) is now supported. It isn't
|
|
Packit |
e8bc57 |
nearly as fast as Berkeley DB but uses only one permanent and one
|
|
Packit |
e8bc57 |
transient file (hence less maintenance work) and is supposed to be
|
|
Packit |
e8bc57 |
proof against application and system crashes.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.92]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The formatting parameters have changed:
|
|
Packit |
e8bc57 |
'%A' is now the message's IP address.
|
|
Packit |
e8bc57 |
'%I' is now the Message-ID.
|
|
Packit |
e8bc57 |
'%Q' is now the Queue-ID.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.17] Support for --enable-deprecated-code (see the 0.16
|
|
Packit |
e8bc57 |
release notes) has been removed. If you've run 0.16.X without that
|
|
Packit |
e8bc57 |
switch, nothing changes for you.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Support for Berkeley DB 3.0 was removed in bogofilter 0.17.3
|
|
Packit |
e8bc57 |
as a side effect of adding Concurrent Database support.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.16] A number of features have been deprecated. The
|
|
Packit |
e8bc57 |
relevant code is bracketed by "#ifdef ENABLE_DEPRECATED_CODE" and
|
|
Packit |
e8bc57 |
"#endif" statements. The default build will not include the
|
|
Packit |
e8bc57 |
deprecated features. For those who still need these features,
|
|
Packit |
e8bc57 |
configure option "--enable-deprecated-code" exists to allow them to be
|
|
Packit |
e8bc57 |
turned on.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
THIS MAY REQUIRE MAJOR CHANGES TO YOUR CONFIGURATION OR SCRIPTS!
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The following list is supposed to be complete. Let us know if we've
|
|
Packit |
e8bc57 |
omitted anything. We shall try to provide workarounds and migration
|
|
Packit |
e8bc57 |
paths whenever possible.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
1) Scoring algorithms
|
|
Packit |
e8bc57 |
---------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will support only the Robinson-Fisher algorithm, commonly
|
|
Packit |
e8bc57 |
called the "Fisher algorithm". The Graham algorithm and Robinson
|
|
Packit |
e8bc57 |
geometric-mean algorithm, a.k.a. Robinson algorithm, have been
|
|
Packit |
e8bc57 |
deprecated.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
2) Wordlist support
|
|
Packit |
e8bc57 |
-------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will now support only the combined wordlist, i.e.
|
|
Packit |
e8bc57 |
wordlist.db, which contains both the ham and spam counts for each token.
|
|
Packit |
e8bc57 |
The older, separate wordlists (spamlist.db and goodlist.db) are no
|
|
Packit |
e8bc57 |
longer supported.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The bogoupgrade program can still be used to merge the separate
|
|
Packit |
e8bc57 |
databases for you. Type "bogoupgrade -d /you/wordlist/directory/".
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Ignore lists, i.e. ignorelist.db, are also being deprecated. The ignore
|
|
Packit |
e8bc57 |
list feature has never been thoroughly tested and is not used (as far as
|
|
Packit |
e8bc57 |
we know).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
3) Command line switches
|
|
Packit |
e8bc57 |
------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will no longer support the switches listed in this section.
|
|
Packit |
e8bc57 |
If used, bogofilter will print an error message and exit.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Scoring related switches:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-g - select Graham algorithm
|
|
Packit |
e8bc57 |
-r - select Robinson Geometric-Mean algorithm
|
|
Packit |
e8bc57 |
-f - select Robinson-Fisher algorithm
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
see section 1 above
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-2 - set binary classification mode
|
|
Packit |
e8bc57 |
-3 - set ternary classification mode
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will use binary mode if ham_cutoff is zero and will use
|
|
Packit |
e8bc57 |
ternary mode (Yes, No, Unsure) if ham_cutoff in non-zero and less
|
|
Packit |
e8bc57 |
than spam_cutoff.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Wordlist modes:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-W - use combined wordlist for spam and ham tokens
|
|
Packit |
e8bc57 |
-WW - use separate wordlists for spam and ham tokens
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will always operate in combined mode now.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Backwards compatible token generation switches:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-Pi and -PI - ignore_case
|
|
Packit |
e8bc57 |
-Pt and -PT - tokenize_html_tags
|
|
Packit |
e8bc57 |
-Pc and -PC - strict_check
|
|
Packit |
e8bc57 |
-Pd and -PD - degen_enabled
|
|
Packit |
e8bc57 |
-Pf and -PF - first_match
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Note: Since last May, the default values for these switches
|
|
Packit |
e8bc57 |
have been:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
ignore_case disabled
|
|
Packit |
e8bc57 |
tokenize_html_tags enabled
|
|
Packit |
e8bc57 |
strict_check disabled
|
|
Packit |
e8bc57 |
degen_enabled disabled
|
|
Packit |
e8bc57 |
first_match disabled
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
There will be no change in the default values.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
4) Configuration options
|
|
Packit |
e8bc57 |
------------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The following configuration options (for the above switches) are
|
|
Packit |
e8bc57 |
deprecated:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
algorithm
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
wordlist
|
|
Packit |
e8bc57 |
wordlist_mode
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
ignore_case
|
|
Packit |
e8bc57 |
tokenize_html_tags
|
|
Packit |
e8bc57 |
tokenize_html_script
|
|
Packit |
e8bc57 |
header_degen
|
|
Packit |
e8bc57 |
degen_enabled
|
|
Packit |
e8bc57 |
first_match
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The following configuration options (which don't correspond to
|
|
Packit |
e8bc57 |
switches) are deprecated:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
thresh_stats
|
|
Packit |
e8bc57 |
thresh_rtable
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Note: Bogofilter will print a warning message if it sees any of
|
|
Packit |
e8bc57 |
these options, but will run fine anyhow.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
5) Miscellany
|
|
Packit |
e8bc57 |
-------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The user formatted SPAM_HEADER will no longer support format
|
|
Packit |
e8bc57 |
specification "%a" (for algorithm) since bogofilter now has only one
|
|
Packit |
e8bc57 |
algorithm.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.15.9]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter no longer allows disabling of algorithms, a feature which has
|
|
Packit |
e8bc57 |
never been well supported.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.15.4]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
All header line tokens are now tagged as:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Subject: subj:
|
|
Packit |
e8bc57 |
To: to:
|
|
Packit |
e8bc57 |
From: from:
|
|
Packit |
e8bc57 |
Return-Path: rtrn:
|
|
Packit |
e8bc57 |
Received: rcvd: ***new***
|
|
Packit |
e8bc57 |
any other: head: ***new***
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Because existing wordlists don't have "head:???" tokens, the new tokens
|
|
Packit |
e8bc57 |
won't be found in the wordlist and bogofilter's accuracy will go down.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To correct this you can do one of the following things:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
1 - Use the new "-H" (for header-degen) option when scoring messages.
|
|
Packit |
e8bc57 |
This option tells bogofilter to check the wordlist twice for each header
|
|
Packit |
e8bc57 |
token - once for "head:xyz" and a second time for "xyz". The ham and
|
|
Packit |
e8bc57 |
spam counts are added together to give a cumulative result.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Note that, with bogofilter 0.15.4 and later, during message
|
|
Packit |
e8bc57 |
registration, "head:xyz" tokens are added to the wordlist (for the
|
|
Packit |
e8bc57 |
header lines). The "-H" option is only applied during scoring.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The "-H" option is meant for temporary usage to cover the period while
|
|
Packit |
e8bc57 |
bogofilter goes from having no "head:xyz" tokens in the wordlist to the
|
|
Packit |
e8bc57 |
time when there are enough such tokens to score messages effectively.
|
|
Packit |
e8bc57 |
After a few weeks, or perhaps months, of registering messages with the
|
|
Packit |
e8bc57 |
new bogofilter, use of the "-H" option can end and bogofilter will use
|
|
Packit |
e8bc57 |
the newly added "head:xyz" tokens.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
2 - Retrain bogofilter with whatever ham and spam you have available.
|
|
Packit |
e8bc57 |
This will create "header:xyz" tokens and allow the new, more effective
|
|
Packit |
e8bc57 |
header tagging to be used to fullest advantage.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 0.15]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The code for processing multiple messages has been rewritten. In
|
|
Packit |
e8bc57 |
addition to understanding mbox format files, bogofilter now understands
|
|
Packit |
e8bc57 |
maildirs and MH folders.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.14]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The exit codes returned by bogofilter have been expanded. They are:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Spam = 0 -- unchanged
|
|
Packit |
e8bc57 |
Ham = 1 -- unchanged
|
|
Packit |
e8bc57 |
Unsure = 2 -- *NEW*
|
|
Packit |
e8bc57 |
Error = 3 -- *CHANGED*
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 0.14] Bogofilter now supports TDB (Trivial Data base).
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Instead of separate wordlists for spam and ham tokens, bogofilter can
|
|
Packit |
e8bc57 |
now use a single combined, wordlist that stores both all tokens.
|
|
Packit |
e8bc57 |
In the combined wordlist each token contains two counts - for spam and
|
|
Packit |
e8bc57 |
ham. The name of the new file is wordlist.db.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
However, this change broke the early versions (up to and including
|
|
Packit |
e8bc57 |
0.14.2) of bogofilter. You should use at least bogofilter 0.14.3.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter will check in $BOGOFILTER_DIR and use the wordlist(s) that
|
|
Packit |
e8bc57 |
are there. If wordlist.db is present, bogofilter will use the combined
|
|
Packit |
e8bc57 |
mode. If wordlist.db is not present, but both spamlist.db and
|
|
Packit |
e8bc57 |
goodlist.db are present, bogofilter will use the separate wordlist mode.
|
|
Packit |
e8bc57 |
If no wordlists are present, bogofilter will create wordlist.db and use
|
|
Packit |
e8bc57 |
it.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Command line switches '-W' and '-WW' can be used to tell bogofilter the
|
|
Packit |
e8bc57 |
mode you want. Also config file options "wordlist_mode=combined" and
|
|
Packit |
e8bc57 |
"wordlist_mode=separate" can be used.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Upgrading from an old bogofilter environment with its two wordlists
|
|
Packit |
e8bc57 |
(spamlist.db and goodlist.db) to the new 0.14.x environment with its
|
|
Packit |
e8bc57 |
single, combined wordlist.db involves 3 main steps - dumping the current
|
|
Packit |
e8bc57 |
spamlist.db and goodlist.db files, formatting that output, and then
|
|
Packit |
e8bc57 |
loading the data into a new file wordlist.db. The script "bogoupgrade" is
|
|
Packit |
e8bc57 |
included with bogofilter and performs the task. Use command
|
|
Packit |
e8bc57 |
"bogoupgrade -d /path/to/your/wordlists" to do the upgrade. After
|
|
Packit |
e8bc57 |
running it, your BOGOFILTER_DIR will contain all 3 database files. When
|
|
Packit |
e8bc57 |
started, bogofilter checks for wordlist.db and will use it.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.13]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Parsing has changed. As background, Paul Graham has done work to
|
|
Packit |
e8bc57 |
improve the results of his bayesian filter and has published them in
|
|
Packit |
e8bc57 |
"Better Bayesian Filtering" at http://www.paulgraham.com/better.html.
|
|
Packit |
e8bc57 |
He found the following definition of a token to be beneficial:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
1. Case is preserved.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
2. Exclamation points are constituent characters.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
3. Periods and commas are constituents if they occur between two
|
|
Packit |
e8bc57 |
digits. This lets me get ip addresses and prices intact.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
4. A price range like $20-25 yields two tokens, $20 and $25.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
5. Tokens that occur within the To, From, Subject, and Return-Path
|
|
Packit |
e8bc57 |
lines, or within urls, get marked accordingly.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter has always done #3 and has tagged for Subject lines for a
|
|
Packit |
e8bc57 |
while. Its parser now does all of these things. Several command line
|
|
Packit |
e8bc57 |
switches and config file options have been added to allow enabling or
|
|
Packit |
e8bc57 |
disabling them. Here are the new switches and options:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-Pi/-PI ignore_case default - disabled
|
|
Packit |
e8bc57 |
-Ph/-PH header_line_markup default - enabled
|
|
Packit |
e8bc57 |
-Pt/-PT tokenize_html_tags default - enabled
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The options can be enabled using the lower case switch or disabled using
|
|
Packit |
e8bc57 |
the upper case switch.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
When header_line_markup_is enabled, tokens in To:, From:, Subject:, and
|
|
Packit |
e8bc57 |
Return-Path: lines are prefixed by "to:", "from:", "subj:", and "rtrn:"
|
|
Packit |
e8bc57 |
respectively.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
When tokenize_html_tags_is enabled, tokens in A, IMG, and FONT tags are
|
|
Packit |
e8bc57 |
scored while classifying the message.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
NOTE: To take full advantage of these changes, additional training of
|
|
Packit |
e8bc57 |
bogofilter is necessary. Here's why:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
With bogofilter's use of upper and lower case, the wordlists won't match
|
|
Packit |
e8bc57 |
as many words as before. For example, "From" and "from" both used to
|
|
Packit |
e8bc57 |
match "from", but this is no longer the case. As additional training is
|
|
Packit |
e8bc57 |
done, words like these will be added to the wordlists and bogofilter
|
|
Packit |
e8bc57 |
will have a larger number of distinct tokens to use when classifying
|
|
Packit |
e8bc57 |
messages. This will improve its classification accuracy.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Similarly, the use of header_line_markup will tokenize "Subject: great
|
|
Packit |
e8bc57 |
p0rn site" as "subj:great", "subj:p0rn", and "subj:site". At first
|
|
Packit |
e8bc57 |
these tokens won't be recognized, so bogofilter won't use them to score
|
|
Packit |
e8bc57 |
the message. After being trained, bogofilter will have these additional
|
|
Packit |
e8bc57 |
tokens to aid in the classification process.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 0.12]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Directory bogofilter/tuning has been added and contains
|
|
Packit |
e8bc57 |
scripts for running tuning experiments as described in the new
|
|
Packit |
e8bc57 |
HOWTO. See file bogofilter/tuning/README for more information.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter's man page and help message describe the many command line
|
|
Packit |
e8bc57 |
switches. They have been divided into groups (help, classification,
|
|
Packit |
e8bc57 |
registration, general, algorithm, parameter, and info) in both places.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Bogofilter 0.12.0 has three new command line switches for rapidly
|
|
Packit |
e8bc57 |
scoring large numbers of messages. These "bulk mode" switches are
|
|
Packit |
e8bc57 |
especially useful for the tuning process. The new switches are:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-M - allows scoring all the messages in a mbox formatted file. If
|
|
Packit |
e8bc57 |
used with "-v", an X-Bogosity line is printed as each message is
|
|
Packit |
e8bc57 |
scored. Using the "-t" (terse) option is recommended to reduce the
|
|
Packit |
e8bc57 |
amount of output.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-B - allows scoring of multiple message files, with each file
|
|
Packit |
e8bc57 |
containing a single message. With this option, bogofilter expects the
|
|
Packit |
e8bc57 |
file names to be at the end of the command line. If used with "-v",
|
|
Packit |
e8bc57 |
the file name is included in each printed line. Using "-t" is
|
|
Packit |
e8bc57 |
recommended.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-b - allows scoring of multiple message files, with each file
|
|
Packit |
e8bc57 |
containing a single message. With this option, bogofilter reads the
|
|
Packit |
e8bc57 |
file names from stdin. This option can be used with maildirs, as in
|
|
Packit |
e8bc57 |
"ls Maildir/* | bogofilter -b ..." If used with "-v", the file name
|
|
Packit |
e8bc57 |
is included in each printed line. Using "-t" is recommended.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
New script bogolex.sh converts an email to a special file format that
|
|
Packit |
e8bc57 |
contains the information needed by bogofilter to score the email. Its
|
|
Packit |
e8bc57 |
use speeds up the message scoring done by the tuning scripts. The
|
|
Packit |
e8bc57 |
script is described in more detail in bogofilter/tuning/README.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Incompat 0.11]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Command line flags:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The meaning of command line flags '-S' and '-N' was changed in version
|
|
Packit |
e8bc57 |
0.11.0. Previously '-S' meant to unregister a message from the spam
|
|
Packit |
e8bc57 |
wordlist and register the message in the non-spam wordlist and '-N'
|
|
Packit |
e8bc57 |
meant to unregister from non-spam and register as spam.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Each of the flags now performs a single action.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
'-S' unregisters a message from the spam wordlist and
|
|
Packit |
e8bc57 |
'-N' unregisters a message from the non-spam wordlist.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To duplicate the old (compound) actions, it is necessary to use two
|
|
Packit |
e8bc57 |
options - an unregister option ('-S' or '-N') and a register option
|
|
Packit |
e8bc57 |
('-s' or '-n').
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
To duplicate the effect of the old '-S' option, use '-N -s'. To
|
|
Packit |
e8bc57 |
duplicate the effect of the old '-N' option, use '-S -n'. The order of
|
|
Packit |
e8bc57 |
the options doesn't matter and they can be concatenated, as in '-Sn' and
|
|
Packit |
e8bc57 |
'-sN'.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Config file processing
|
|
Packit |
e8bc57 |
----------------------
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
The code to process config files now checks numeric values for validity.
|
|
Packit |
e8bc57 |
It complains when it detects something wrong. In particular, double
|
|
Packit |
e8bc57 |
precision values are no longer allowed to have a terminal 'f'. For
|
|
Packit |
e8bc57 |
example "spam_cutoff=0.95f" will generate a messages.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
[Major 0.11]
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
New parameter query option:
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
Using options "-q -v" in a bogofilter command line will run the
|
|
Packit |
e8bc57 |
query_config() function and will display bogofilter's various parameter
|
|
Packit |
e8bc57 |
values. This can be very useful in finding the reason for an unexpected
|
|
Packit |
e8bc57 |
message classification.
|
|
Packit |
e8bc57 |
|
|
Packit |
e8bc57 |
-----------------------------------------------------------------------
|
|
Packit |
e8bc57 |
End of RELEASE.NOTES
|