Blame RELEASE.NOTES

Packit e8bc57
WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
POTENTIAL FOR DATA CORRUPTION DURING UPDATES
Packit e8bc57
Packit e8bc57
If you plan to upgrade your database library, if only as a side effect
Packit e8bc57
of an operating system upgrade, DO HEED the relevant documentation, for
Packit e8bc57
instance, the doc/README.db file.  You may need to prepare the upgrade
Packit e8bc57
with the old version of the software.
Packit e8bc57
Packit e8bc57
Otherwise, you may cause irrecoverable damage to your databases.
Packit e8bc57
Packit e8bc57
DO backup your databases before making the upgrade.
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING
Packit e8bc57
Packit e8bc57
Packit e8bc57
Packit e8bc57
This file documents changes in bogofilter since version 0.11.  In
Packit e8bc57
particular it describes: (1) Features, which are significant changes
Packit e8bc57
(noteworthy and compatible) and (2) Incompatibilities, which are
Packit e8bc57
changes that require action upon update.
Packit e8bc57
Packit e8bc57
Caution: If upgrading from an old version and skipping several
Packit e8bc57
intervening versions of bogofilter, be smart and check all the
Packit e8bc57
changes of the versions you skipped!  In particular, read the sections
Packit e8bc57
labeled "Incompat" and "Major".
Packit e8bc57
Packit e8bc57
NOTE: the NEWS document has greater detail on some of these changes.
Packit e8bc57
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
[Major 1.2.5] Kyoto Cabinet and LMDB support added.
Packit e8bc57
Packit e8bc57
Bogofilter, as of release 1.2.5, supports:
Packit e8bc57
+ Kyoto Cabinet databases, courtesy of Denny Lin.  The Kyoto Cabinet database
Packit e8bc57
  is written and maintained by the same author as the Toyko Cabinet database,
Packit e8bc57
  and they recommend to use Kyoto Cabinet instead of Tokyo Cabinet.
Packit e8bc57
Packit e8bc57
+ LMDB databases (Lightning Memory-Mapped Database Manager), courteously
Packit e8bc57
  implemented and contributed by Steffen Nurpmeso.
Packit e8bc57
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
[Major 1.1.6] Tokyo Cabinet support (B+-trees with transactions) added
Packit e8bc57
Packit e8bc57
Bogofilter, as of release 1.1.6, supports Tokyo Cabinet databases,
Packit e8bc57
courtesy of Pierre Habouzit. Tokyo Cabinet is the sequel to QDBM
Packit e8bc57
with support for larger files and also written by Mikio Hirabayashi.
Packit e8bc57
Packit e8bc57
For new installations, if you considered using QDBM, consider using
Packit e8bc57
Tokyo Cabinet instead.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.96.0] TDB removed
Packit e8bc57
Packit e8bc57
Support for the TDB database library has been removed.
Packit e8bc57
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
[Incompat 0.95.2] Applies to Berkeley DB Transactional ONLY:
Packit e8bc57
Packit e8bc57
This release gives up on locking the databases at page granularity and
Packit e8bc57
locks whole environments, to overcome lock sizing requirements which are
Packit e8bc57
a major issue in unattended setups.
Packit e8bc57
Packit e8bc57
This however means that a writer (token registration) will lock out
Packit e8bc57
readers (message scoring) and readers will prevent new writers from
Packit e8bc57
starting. This may be fixed in a future version.
Packit e8bc57
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
[Major 0.95.0] Unicode in UTF-8
Packit e8bc57
Packit e8bc57
This release supports Unicode (UTF-8).  A new meta-token .ENCODING has
Packit e8bc57
been added to the wordlist so that bogofilter can determine if it's
Packit e8bc57
using Unicode or not.  A value of 1 indicates raw storage and 2
Packit e8bc57
indicates UTF-8 encoded tokens.  Bogofilter checks for this meta-token
Packit e8bc57
and converts incoming text to UTF-8 as appropriate.  
Packit e8bc57
Packit e8bc57
Command line options "--unicode=yes" and "--unicode=no" can be used.
Packit e8bc57
Packit e8bc57
 - With bogofilter, they control encoding of newly created databases.
Packit e8bc57
 - With bogoutil, --unicode=yes converts the wordlist to Unicode.
Packit e8bc57
 - For bogolexer, they print parser results in new and old modes.
Packit e8bc57
Packit e8bc57
./configure options allow bogofilter customization:
Packit e8bc57
Packit e8bc57
 - "./configure --unicode=yes" will _always_ operate in Unicode mode
Packit e8bc57
 - "./configure --unicode=no"  will _never_ operate in Unicode mode
Packit e8bc57
Packit e8bc57
Wordlists can be converted from raw storage to Unicode using:
Packit e8bc57
NOTE: Replace iso-8859-1 by the character set and encoding of the
Packit e8bc57
dominant input token character set!
Packit e8bc57
Packit e8bc57
    bogoutil -d wordlist.db > wordlist.raw.txt
Packit e8bc57
    iconv -f iso-8859-1 -t UTF-8 < wordlist.raw.txt > wordlist.UTF-8.txt
Packit e8bc57
    bogoutil -l wordlist.db.new < wordlist.UTF-8.txt
Packit e8bc57
Packit e8bc57
For a wordlist containing tokens from multiple languages, particularly
Packit e8bc57
non-European languages, the conversion methods described above may not
Packit e8bc57
work well for you.  Building a new wordlist (from scratch) will likely
Packit e8bc57
work better as the new wordlist will be based solely on Unicode.
Packit e8bc57
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
[Incompat 0.94.12] Changed Options
Packit e8bc57
Packit e8bc57
Some options have been added or modified.  If you use any of the
Packit e8bc57
changed options, you will probably need to modify your scripts,
Packit e8bc57
procmail recipes, etc.  As an example, some bogoutil options which
Packit e8bc57
used to allow either filenames or directory names are now restricted
Packit e8bc57
to filenames.  See the man pages and help messages if you have
Packit e8bc57
questions.
Packit e8bc57
Packit e8bc57
------------------------------------------------------------------------
Packit e8bc57
[Incompat 0.94.0] Transactions
Packit e8bc57
Packit e8bc57
The transactional mode now defaults to off because the lock table sizing
Packit e8bc57
issue is unresolved.
Packit e8bc57
Packit e8bc57
Bogofilter and bogoutil now support both build-time and run-time
Packit e8bc57
choosing whether to operate with (or without) transaction support.
Packit e8bc57
They can also auto-detect whether you've been using transactions or not.
Packit e8bc57
Packit e8bc57
Run-time Selection:
Packit e8bc57
Packit e8bc57
For bogofilter and bogoutil, transactions can be enabled or disabled
Packit e8bc57
in 2 ways -- by command line options or config file options.
Packit e8bc57
Packit e8bc57
Command line option "--db-transaction=yes" enables transactions and
Packit e8bc57
"--db-transaction=no" disables them.
Packit e8bc57
Packit e8bc57
Config file options "db_transaction=yes" and "db_transaction=no"
Packit e8bc57
have the same effect.
Packit e8bc57
Packit e8bc57
Auto-detection:
Packit e8bc57
Packit e8bc57
If none of the above methods are used to enable/disable transactions,
Packit e8bc57
bogofilter and bogoutil will query Berkeley DB to see if a transaction
Packit e8bc57
environment already exists.  If so, transactions will be enabled.  If
Packit e8bc57
not, they will be disabled.
Packit e8bc57
Packit e8bc57
Compile-time selection:
Packit e8bc57
Packit e8bc57
A default build includes the run-time and auto-detect capabilities.
Packit e8bc57
If you wish to minimize program size, ./configure can be used to
Packit e8bc57
create single mode versions of bogofilter and bogoutil, i.e. programs
Packit e8bc57
that only run transactionally or non-transactionallly.  Use
Packit e8bc57
"./configure --enable-transactions" to enable transactions and
Packit e8bc57
"./configure --disable-transactions" to disable them.  These programs
Packit e8bc57
will be _slightly_ smaller than the default build.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.93] Summary for the hasty
Packit e8bc57
Packit e8bc57
YOU MUST ADJUST YOUR SCRIPTS EVALUATING "X-Bogosity" HEADERS!
Packit e8bc57
Packit e8bc57
YOU MAY NEED TO ADJUST YOUR SCRIPTS THAT PARSE 'bogofilter -V'!
Packit e8bc57
Packit e8bc57
WHEN USING BERKELEY DB (DEFAULT), NFS NO LONGER WORKS AND
Packit e8bc57
YOU   M U S T   READ doc/README.db AND POSSIBLY CONFIGURE THE DATABASE!
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.93] Defaults changed
Packit e8bc57
Packit e8bc57
Bogofilter's defaults have been changed.  It now operates in tri-state
Packit e8bc57
mode and will classify messages as Spam, Ham, or Unsure.
Packit e8bc57
Packit e8bc57
If you're checking messages for "X-Bogosity: Yes" or "X-Bogosity: No",
Packit e8bc57
you _need_ to change your checks.  Use "X-Bogosity: Spam" and
Packit e8bc57
"X-Bogosity: Ham" instead of the old forms.  Also, checking for
Packit e8bc57
"X-Bogosity: Unsure" and putting those messages in a separate folder (or
Packit e8bc57
mailbox) will give you an excellent set of messages for training, as
Packit e8bc57
"Unsure" messages are messages that bogofilter has too little
Packit e8bc57
information to classify (with certainty) as spam or ham.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.93] Berkeley DB switched to Transactional Data Store
Packit e8bc57
Packit e8bc57
Bogofilter will now use the Berkeley DB Transactional Data Store when
Packit e8bc57
compiled with Berkeley DB as the data base engine (the default).
Packit e8bc57
Packit e8bc57
This means the Berkeley DB directory can no longer reside on a networked
Packit e8bc57
or otherwise shared file system (such as NFS, AFS, Coda).
Packit e8bc57
Packit e8bc57
When using BerkeleyDB 4.1 - 4.3, it is recommended that you dump and
Packit e8bc57
load the data bases to add checksums, for enhanced reliablity. See
Packit e8bc57
section 2.2 in doc/README.db for details.
Packit e8bc57
Packit e8bc57
This means that bogofilter programs now exhibit the A C I D traits:
Packit e8bc57
changes are atomic (all-or-nothing); the data base is always consistent;
Packit e8bc57
changes are always isolated from each other; and all changes that are
Packit e8bc57
acknowledged are durable.
Packit e8bc57
Packit e8bc57
Bogofilter can support multiple writers at the same time, mixed freely
Packit e8bc57
with simultaneous readers, and the data base will not be corrupted by
Packit e8bc57
application or system crashes, except when the disk drive gets damaged.
Packit e8bc57
Packit e8bc57
Note that this requires that the operating system and disk drive
Packit e8bc57
maintain proper write order on the disk, and that both be honest about
Packit e8bc57
synchronous I/O completion.
Packit e8bc57
Packit e8bc57
Note also that this causes bogofilter to write additional "log" files
Packit e8bc57
to its ~/.bogofilter (or other) home directory.  The log files need to
Packit e8bc57
be archived or deleted periodically.
Packit e8bc57
Packit e8bc57
For detailed instructions, be sure to _read_ doc/README.db and check the
Packit e8bc57
BerkeleyDB documentation.
Packit e8bc57
Packit e8bc57
As a backwards compatibility option, for instance when space and I/O
Packit e8bc57
bandwidth are tight, it is possible to use the old non-transactional,
Packit e8bc57
non-concurrent Berkeley DB Data Store, which can only register messages
Packit e8bc57
when there are NO scoring processes at all and that may not be able
Packit e8bc57
recover from application or system crashes.
Packit e8bc57
Packit e8bc57
These benefits are not available when bogofilter is compiled to use the
Packit e8bc57
TDB or QDBM data bases.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.93] Berkeley DB version strings changed
Packit e8bc57
Packit e8bc57
Bogofilter will now return the BerkeleyDB's actual DB_VERSION_STRING
Packit e8bc57
in the output of 'bogofilter -V'. The OLD format was:
Packit e8bc57
Packit e8bc57
    Database: BerkeleyDB (4.3.21)
Packit e8bc57
Packit e8bc57
The NEW format is:
Packit e8bc57
Packit e8bc57
    Database: Sleepycat Software: Berkeley DB 4.3.21: (November  8, 2004)
Packit e8bc57
Packit e8bc57
You may need to adjust your scripts.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.93] QDBM database format changed to B+ trees
Packit e8bc57
Packit e8bc57
The QDBM database format has been changed from hash tables to B+
Packit e8bc57
trees, i.e. from the Depot API to the Villa API.  This results in
Packit e8bc57
significantly better performance, i.e. faster speed.  Unfortunately,
Packit e8bc57
the two modes are incompatible, so upgrading to 0.93 requires running
Packit e8bc57
a special command to convert the database once:
Packit e8bc57
Packit e8bc57
bogoQDBMupgrade wordlist.qdbm wordlist.tmp wordlist.qdbm.old
Packit e8bc57
Packit e8bc57
If this command didn't print anything, everything has gone well and it
Packit e8bc57
has left your old data base in wordlist.qdbm.old.
Packit e8bc57
Packit e8bc57
NOTE: bogoQDBMupgrade needs qdbm-1.7.23 or newer to build.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.93] Bogotune option parsing changes
Packit e8bc57
Packit e8bc57
In bogotune 0.93.2 and newer, you must repeat the -n or -s option as
Packit e8bc57
prefix for the mailbox.
Packit e8bc57
Packit e8bc57
Example: bogotune -n good1 good2 -s bad1 bad2 ...
Packit e8bc57
will be: bogotune -n good1 -n good2 -s bad1 -s bad2
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Major 0.93.3] SQLite 3.0.8 (and newer) is now supported. It isn't
Packit e8bc57
nearly as fast as Berkeley DB but uses only one permanent and one
Packit e8bc57
transient file (hence less maintenance work) and is supposed to be
Packit e8bc57
proof against application and system crashes.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.92]
Packit e8bc57
Packit e8bc57
The formatting parameters have changed:
Packit e8bc57
      '%A' is now the message's IP address.
Packit e8bc57
      '%I' is now the Message-ID.
Packit e8bc57
      '%Q' is now the Queue-ID.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.17] Support for --enable-deprecated-code (see the 0.16
Packit e8bc57
release notes) has been removed. If you've run 0.16.X without that
Packit e8bc57
switch, nothing changes for you.
Packit e8bc57
Packit e8bc57
Support for Berkeley DB 3.0 was removed in bogofilter 0.17.3
Packit e8bc57
as a side effect of adding Concurrent Database support.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.16] A number of features have been deprecated.  The
Packit e8bc57
relevant code is bracketed by "#ifdef ENABLE_DEPRECATED_CODE" and
Packit e8bc57
"#endif" statements.  The default build will not include the
Packit e8bc57
deprecated features.  For those who still need these features,
Packit e8bc57
configure option "--enable-deprecated-code" exists to allow them to be
Packit e8bc57
turned on.
Packit e8bc57
Packit e8bc57
THIS MAY REQUIRE MAJOR CHANGES TO YOUR CONFIGURATION OR SCRIPTS!
Packit e8bc57
Packit e8bc57
The following list is supposed to be complete.  Let us know if we've
Packit e8bc57
omitted anything. We shall try to provide workarounds and migration
Packit e8bc57
paths whenever possible.
Packit e8bc57
Packit e8bc57
1) Scoring algorithms
Packit e8bc57
---------------------
Packit e8bc57
Packit e8bc57
Bogofilter will support only the Robinson-Fisher algorithm, commonly
Packit e8bc57
called the "Fisher algorithm".  The Graham algorithm and Robinson
Packit e8bc57
geometric-mean algorithm, a.k.a. Robinson algorithm, have been
Packit e8bc57
deprecated.
Packit e8bc57
Packit e8bc57
2) Wordlist support
Packit e8bc57
-------------------
Packit e8bc57
Packit e8bc57
Bogofilter will now support only the combined wordlist, i.e.
Packit e8bc57
wordlist.db, which contains both the ham and spam counts for each token.
Packit e8bc57
The older, separate wordlists (spamlist.db and goodlist.db) are no
Packit e8bc57
longer supported.
Packit e8bc57
Packit e8bc57
The bogoupgrade program can still be used to merge the separate
Packit e8bc57
databases for you.  Type "bogoupgrade -d /you/wordlist/directory/".
Packit e8bc57
Packit e8bc57
Ignore lists, i.e. ignorelist.db, are also being deprecated.  The ignore
Packit e8bc57
list feature has never been thoroughly tested and is not used (as far as
Packit e8bc57
we know).
Packit e8bc57
Packit e8bc57
3) Command line switches
Packit e8bc57
------------------------
Packit e8bc57
Packit e8bc57
Bogofilter will no longer support the switches listed in this section.
Packit e8bc57
If used, bogofilter will print an error message and exit.
Packit e8bc57
Packit e8bc57
  Scoring related switches:
Packit e8bc57
Packit e8bc57
    -g - select Graham algorithm
Packit e8bc57
    -r - select Robinson Geometric-Mean algorithm
Packit e8bc57
    -f - select Robinson-Fisher algorithm
Packit e8bc57
Packit e8bc57
    see section 1 above
Packit e8bc57
Packit e8bc57
    -2 - set binary classification mode
Packit e8bc57
    -3 - set ternary classification mode
Packit e8bc57
Packit e8bc57
    Bogofilter will use binary mode if ham_cutoff is zero and will use
Packit e8bc57
    ternary mode (Yes, No, Unsure) if ham_cutoff in non-zero and less
Packit e8bc57
    than spam_cutoff.
Packit e8bc57
Packit e8bc57
  Wordlist modes:
Packit e8bc57
Packit e8bc57
    -W   - use combined wordlist  for spam and ham tokens
Packit e8bc57
    -WW  - use separate wordlists for spam and ham tokens
Packit e8bc57
Packit e8bc57
    Bogofilter will always operate in combined mode now.
Packit e8bc57
Packit e8bc57
  Backwards compatible token generation switches:
Packit e8bc57
Packit e8bc57
    -Pi and -PI - ignore_case
Packit e8bc57
    -Pt and -PT - tokenize_html_tags
Packit e8bc57
    -Pc and -PC - strict_check
Packit e8bc57
    -Pd and -PD - degen_enabled
Packit e8bc57
    -Pf and -PF - first_match
Packit e8bc57
Packit e8bc57
    Note: Since last May, the default values for these switches
Packit e8bc57
    have been:
Packit e8bc57
Packit e8bc57
    ignore_case         disabled
Packit e8bc57
    tokenize_html_tags  enabled
Packit e8bc57
    strict_check        disabled
Packit e8bc57
    degen_enabled       disabled
Packit e8bc57
    first_match         disabled
Packit e8bc57
Packit e8bc57
    There will be no change in the default values.
Packit e8bc57
Packit e8bc57
4) Configuration options
Packit e8bc57
------------------------
Packit e8bc57
Packit e8bc57
The following configuration options (for the above switches) are
Packit e8bc57
deprecated:
Packit e8bc57
Packit e8bc57
    algorithm
Packit e8bc57
Packit e8bc57
    wordlist
Packit e8bc57
    wordlist_mode
Packit e8bc57
Packit e8bc57
    ignore_case
Packit e8bc57
    tokenize_html_tags
Packit e8bc57
    tokenize_html_script
Packit e8bc57
    header_degen
Packit e8bc57
    degen_enabled
Packit e8bc57
    first_match
Packit e8bc57
Packit e8bc57
The following configuration options (which don't correspond to
Packit e8bc57
switches) are deprecated:
Packit e8bc57
Packit e8bc57
    thresh_stats
Packit e8bc57
    thresh_rtable
Packit e8bc57
Packit e8bc57
Note:  Bogofilter will print a warning message if it sees any of
Packit e8bc57
these options, but will run fine anyhow.
Packit e8bc57
Packit e8bc57
5) Miscellany
Packit e8bc57
-------------
Packit e8bc57
Packit e8bc57
The user formatted SPAM_HEADER will no longer support format
Packit e8bc57
specification "%a" (for algorithm) since bogofilter now has only one
Packit e8bc57
algorithm.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.15.9]
Packit e8bc57
Packit e8bc57
Bogofilter no longer allows disabling of algorithms, a feature which has
Packit e8bc57
never been well supported.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.15.4]
Packit e8bc57
Packit e8bc57
All header line tokens are now tagged as:
Packit e8bc57
Packit e8bc57
	Subject:      subj:
Packit e8bc57
	To:           to:
Packit e8bc57
	From:         from:
Packit e8bc57
	Return-Path:  rtrn:
Packit e8bc57
	Received:     rcvd:   ***new***
Packit e8bc57
	any other:    head:   ***new***
Packit e8bc57
Packit e8bc57
Because existing wordlists don't have "head:???" tokens, the new tokens
Packit e8bc57
won't be found in the wordlist and bogofilter's accuracy will go down.
Packit e8bc57
Packit e8bc57
To correct this you can do one of the following things:
Packit e8bc57
Packit e8bc57
1 - Use the new "-H" (for header-degen) option when scoring messages.
Packit e8bc57
This option tells bogofilter to check the wordlist twice for each header
Packit e8bc57
token - once for "head:xyz" and a second time for "xyz".  The ham and
Packit e8bc57
spam counts are added together to give a cumulative result.
Packit e8bc57
Packit e8bc57
Note that, with bogofilter 0.15.4 and later, during message
Packit e8bc57
registration, "head:xyz" tokens are added to the wordlist (for the
Packit e8bc57
header lines).  The "-H" option is only applied during scoring.
Packit e8bc57
Packit e8bc57
The "-H" option is meant for temporary usage to cover the period while
Packit e8bc57
bogofilter goes from having no "head:xyz" tokens in the wordlist to the
Packit e8bc57
time when there are enough such tokens to score messages effectively.
Packit e8bc57
After a few weeks, or perhaps months, of registering messages with the
Packit e8bc57
new bogofilter, use of the "-H" option can end and bogofilter will use
Packit e8bc57
the newly added "head:xyz" tokens.
Packit e8bc57
Packit e8bc57
2 - Retrain bogofilter with whatever ham and spam you have available.
Packit e8bc57
This will create "header:xyz" tokens and allow the new, more effective
Packit e8bc57
header tagging to be used to fullest advantage.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Major 0.15]
Packit e8bc57
Packit e8bc57
The code for processing multiple messages has been rewritten.  In
Packit e8bc57
addition to understanding mbox format files, bogofilter now understands
Packit e8bc57
maildirs and MH folders.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.14]
Packit e8bc57
Packit e8bc57
The exit codes returned by bogofilter have been expanded.  They are:
Packit e8bc57
Packit e8bc57
	Spam   = 0 -- unchanged
Packit e8bc57
	Ham    = 1 -- unchanged
Packit e8bc57
	Unsure = 2 -- *NEW*
Packit e8bc57
	Error  = 3 -- *CHANGED*
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Major 0.14] Bogofilter now supports TDB (Trivial Data base).
Packit e8bc57
Packit e8bc57
Instead of separate wordlists for spam and ham tokens, bogofilter can
Packit e8bc57
now use a single combined, wordlist that stores both all tokens.
Packit e8bc57
In the combined wordlist each token contains two counts - for spam and
Packit e8bc57
ham.  The name of the new file is wordlist.db.
Packit e8bc57
Packit e8bc57
However, this change broke the early versions (up to and including
Packit e8bc57
0.14.2) of bogofilter. You should use at least bogofilter 0.14.3.
Packit e8bc57
Packit e8bc57
Bogofilter will check in $BOGOFILTER_DIR and use the wordlist(s) that
Packit e8bc57
are there.  If wordlist.db is present, bogofilter will use the combined
Packit e8bc57
mode.  If wordlist.db is not present, but both spamlist.db and
Packit e8bc57
goodlist.db are present, bogofilter will use the separate wordlist mode.
Packit e8bc57
If no wordlists are present, bogofilter will create wordlist.db and use
Packit e8bc57
it.
Packit e8bc57
Packit e8bc57
Command line switches '-W' and '-WW' can be used to tell bogofilter the
Packit e8bc57
mode you want.  Also config file options "wordlist_mode=combined" and
Packit e8bc57
"wordlist_mode=separate" can be used.
Packit e8bc57
Packit e8bc57
Upgrading from an old bogofilter environment with its two wordlists
Packit e8bc57
(spamlist.db and goodlist.db) to the new 0.14.x environment with its
Packit e8bc57
single, combined wordlist.db involves 3 main steps - dumping the current
Packit e8bc57
spamlist.db and goodlist.db files, formatting that output, and then
Packit e8bc57
loading the data into a new file wordlist.db.  The script "bogoupgrade" is
Packit e8bc57
included with bogofilter and performs the task.  Use command
Packit e8bc57
"bogoupgrade -d /path/to/your/wordlists" to do the upgrade.  After
Packit e8bc57
running it, your BOGOFILTER_DIR will contain all 3 database files.  When
Packit e8bc57
started, bogofilter checks for wordlist.db and will use it.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.13]
Packit e8bc57
Packit e8bc57
Parsing has changed.  As background, Paul Graham has done work to
Packit e8bc57
improve the results of his bayesian filter and has published them in
Packit e8bc57
"Better Bayesian Filtering" at http://www.paulgraham.com/better.html.
Packit e8bc57
He found the following definition of a token to be beneficial:
Packit e8bc57
Packit e8bc57
       1. Case is preserved.
Packit e8bc57
Packit e8bc57
       2. Exclamation points are constituent characters.
Packit e8bc57
Packit e8bc57
       3. Periods and commas are constituents if they occur between two
Packit e8bc57
	  digits. This lets me get ip addresses and prices intact.
Packit e8bc57
Packit e8bc57
       4. A price range like $20-25 yields two tokens, $20 and $25.
Packit e8bc57
Packit e8bc57
       5. Tokens that occur within the To, From, Subject, and Return-Path
Packit e8bc57
	  lines, or within urls, get marked accordingly.
Packit e8bc57
Packit e8bc57
Bogofilter has always done #3 and has tagged for Subject lines for a
Packit e8bc57
while.  Its parser now does all of these things.  Several command line
Packit e8bc57
switches and config file options have been added to allow enabling or
Packit e8bc57
disabling them.  Here are the new switches and options:
Packit e8bc57
Packit e8bc57
       -Pi/-PI	ignore_case		default - disabled
Packit e8bc57
       -Ph/-PH	header_line_markup 	default - enabled
Packit e8bc57
       -Pt/-PT	tokenize_html_tags 	default - enabled
Packit e8bc57
Packit e8bc57
The options can be enabled using the lower case switch or disabled using
Packit e8bc57
the upper case switch.
Packit e8bc57
Packit e8bc57
When header_line_markup_is enabled, tokens in To:, From:, Subject:, and
Packit e8bc57
Return-Path: lines are prefixed by "to:", "from:", "subj:", and "rtrn:"
Packit e8bc57
respectively.
Packit e8bc57
Packit e8bc57
When tokenize_html_tags_is enabled, tokens in A, IMG, and FONT tags are
Packit e8bc57
scored while classifying the message.
Packit e8bc57
Packit e8bc57
NOTE: To take full advantage of these changes, additional training of
Packit e8bc57
bogofilter is necessary.  Here's why:
Packit e8bc57
Packit e8bc57
With bogofilter's use of upper and lower case, the wordlists won't match
Packit e8bc57
as many words as before.  For example, "From" and "from" both used to
Packit e8bc57
match "from", but this is no longer the case.  As additional training is
Packit e8bc57
done, words like these will be added to the wordlists and bogofilter
Packit e8bc57
will have a larger number of distinct tokens to use when classifying
Packit e8bc57
messages.  This will improve its classification accuracy.
Packit e8bc57
Packit e8bc57
Similarly, the use of header_line_markup will tokenize "Subject: great
Packit e8bc57
p0rn site" as "subj:great", "subj:p0rn", and "subj:site".  At first
Packit e8bc57
these tokens won't be recognized, so bogofilter won't use them to score
Packit e8bc57
the message.  After being trained, bogofilter will have these additional
Packit e8bc57
tokens to aid in the classification process.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Major 0.12]
Packit e8bc57
Packit e8bc57
Directory bogofilter/tuning has been added and contains
Packit e8bc57
scripts for running tuning experiments as described in the new
Packit e8bc57
HOWTO. See file bogofilter/tuning/README for more information.
Packit e8bc57
Packit e8bc57
Bogofilter's man page and help message describe the many command line
Packit e8bc57
switches.  They have been divided into groups (help, classification,
Packit e8bc57
registration, general, algorithm, parameter, and info) in both places.
Packit e8bc57
Packit e8bc57
Bogofilter 0.12.0 has three new command line switches for rapidly
Packit e8bc57
scoring large numbers of messages.  These "bulk mode" switches are
Packit e8bc57
especially useful for the tuning process.  The new switches are:
Packit e8bc57
Packit e8bc57
    -M - allows scoring all the messages in a mbox formatted file.  If
Packit e8bc57
    used with "-v", an X-Bogosity line is printed as each message is
Packit e8bc57
    scored.  Using the "-t" (terse) option is recommended to reduce the
Packit e8bc57
    amount of output.
Packit e8bc57
Packit e8bc57
    -B - allows scoring of multiple message files, with each file
Packit e8bc57
    containing a single message.  With this option, bogofilter expects the
Packit e8bc57
    file names to be at the end of the command line.  If used with "-v",
Packit e8bc57
    the file name is included in each printed line.  Using "-t" is
Packit e8bc57
    recommended.
Packit e8bc57
Packit e8bc57
    -b - allows scoring of multiple message files, with each file
Packit e8bc57
    containing a single message.  With this option, bogofilter reads the
Packit e8bc57
    file names from stdin.  This option can be used with maildirs, as in
Packit e8bc57
    "ls Maildir/* | bogofilter -b ..."  If used with "-v", the file name
Packit e8bc57
    is included in each printed line.  Using "-t" is recommended.
Packit e8bc57
Packit e8bc57
New script bogolex.sh converts an email to a special file format that
Packit e8bc57
contains the information needed by bogofilter to score the email.  Its
Packit e8bc57
use speeds up the message scoring done by the tuning scripts.  The
Packit e8bc57
script is described in more detail in bogofilter/tuning/README.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Incompat 0.11]
Packit e8bc57
Packit e8bc57
Command line flags:
Packit e8bc57
Packit e8bc57
The meaning of command line flags '-S' and '-N' was changed in version
Packit e8bc57
0.11.0.  Previously '-S' meant to unregister a message from the spam
Packit e8bc57
wordlist and register the message in the non-spam wordlist and '-N'
Packit e8bc57
meant to unregister from non-spam and register as spam.
Packit e8bc57
Packit e8bc57
Each of the flags now performs a single action.
Packit e8bc57
Packit e8bc57
	'-S' unregisters a message from the spam wordlist and
Packit e8bc57
	'-N' unregisters a message from the non-spam wordlist.
Packit e8bc57
Packit e8bc57
To duplicate the old (compound) actions, it is necessary to use two
Packit e8bc57
options - an unregister option ('-S' or '-N') and a register option
Packit e8bc57
('-s' or '-n').
Packit e8bc57
Packit e8bc57
To duplicate the effect of the old '-S' option, use '-N -s'.  To
Packit e8bc57
duplicate the effect of the old '-N' option, use '-S -n'.  The order of
Packit e8bc57
the options doesn't matter and they can be concatenated, as in '-Sn' and
Packit e8bc57
'-sN'.
Packit e8bc57
Packit e8bc57
Config file processing
Packit e8bc57
----------------------
Packit e8bc57
Packit e8bc57
The code to process config files now checks numeric values for validity.
Packit e8bc57
It complains when it detects something wrong.  In particular, double
Packit e8bc57
precision values are no longer allowed to have a terminal 'f'.  For
Packit e8bc57
example "spam_cutoff=0.95f" will generate a messages.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
[Major 0.11]
Packit e8bc57
Packit e8bc57
New parameter query option:
Packit e8bc57
Packit e8bc57
Using options "-q -v" in a bogofilter command line will run the
Packit e8bc57
query_config() function and will display bogofilter's various parameter
Packit e8bc57
values.  This can be very useful in finding the reason for an unexpected
Packit e8bc57
message classification.
Packit e8bc57
Packit e8bc57
-----------------------------------------------------------------------
Packit e8bc57
End of RELEASE.NOTES