Blame doc/bogotune-faq.html

Packit Service 8f0814
Packit Service 8f0814
<html>
Packit Service 8f0814
  <head>
Packit Service 8f0814
    <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
Packit Service 8f0814
    <title>Bogotune FAQ</title>
Packit Service 8f0814
    <style type="text/css">
Packit Service 8f0814
      h2 {
Packit Service 8f0814
	margin-top: 1em;
Packit Service 8f0814
	font-size: 125%;
Packit Service 8f0814
      }
Packit Service 8f0814
      h3 {
Packit Service 8f0814
	margin-top: 1em;
Packit Service 8f0814
	font-size: 110%;
Packit Service 8f0814
      }
Packit Service 8f0814
      p {
Packit Service 8f0814
        margin-top : 0.5em;
Packit Service 8f0814
        margin-bottom: 0.5em;
Packit Service 8f0814
      }
Packit Service 8f0814
      ul {
Packit Service 8f0814
	margin-top: 1.5em;
Packit Service 8f0814
	margin-bottom: 0.5em;
Packit Service 8f0814
      }
Packit Service 8f0814
      ul ul {
Packit Service 8f0814
	margin-top: 0.25em;
Packit Service 8f0814
	margin-bottom: 0;
Packit Service 8f0814
      }
Packit Service 8f0814
      li {
Packit Service 8f0814
	margin-top: 0;
Packit Service 8f0814
	margin-bottom: 1em;
Packit Service 8f0814
      }
Packit Service 8f0814
      li li {
Packit Service 8f0814
	margin-bottom: 0.25em;
Packit Service 8f0814
      }
Packit Service 8f0814
      dt {
Packit Service 8f0814
	margin-top: 0.5em;
Packit Service 8f0814
	margin-bottom: 0;
Packit Service 8f0814
      }
Packit Service 8f0814
      hr {
Packit Service 8f0814
	margin-top: 1em;
Packit Service 8f0814
	margin-bottom: 1em;
Packit Service 8f0814
      }
Packit Service 8f0814
    </style>
Packit Service 8f0814
  </head>
Packit Service 8f0814
Packit Service 8f0814
  <body>
Packit Service 8f0814
    

Bogotune FAQ

Packit Service 8f0814
Packit Service 8f0814
    

Official Versions: In

Packit Service 8f0814
    bogotune-faq
Packit Service 8f0814
    Maintainer: David Relson <relson@osagesoftware.com>

Packit Service 8f0814
Packit Service 8f0814
    

This document is intended to answer frequently asked questions

Packit Service 8f0814
    about bogotune.

Packit Service 8f0814
Packit Service 8f0814
    
    Packit Service 8f0814
        
  • Where did bogotune come from?
  • Packit Service 8f0814
        
  • What's the message count format?
  • Packit Service 8f0814
        
  • How does bogotune work?
  • Packit Service 8f0814
    Packit Service 8f0814
        
  • How does bogotune ensure the messages it
  • Packit Service 8f0814
        works with are numerous enough, and well enough classified, to
    Packit Service 8f0814
        deliver useful recommendations?
    Packit Service 8f0814
    Packit Service 8f0814
        
  • Can I tell bogotune to do its work even
  • Packit Service 8f0814
        though it doesn't like the data?
    Packit Service 8f0814
        
    Packit Service 8f0814
    Packit Service 8f0814
        
    Packit Service 8f0814
        

    Where did bogotune come from?

    Packit Service 8f0814
    Packit Service 8f0814
        

    Greg Louis wrote the original Robinson geometric-mean and

    Packit Service 8f0814
        Robinson-Fisher algorithm code for bogofilter.  To determine the
    Packit Service 8f0814
        optimal parameters for the Robinson-Fisher algorithm he wrote
    Packit Service 8f0814
        bogotune.  The initial implementation was written in the R
    Packit Service 8f0814
        programming language.  This was followed by the Perl
    Packit Service 8f0814
        implementation.  Both of these implementations were slow because
    Packit Service 8f0814
        bogofilter had to be run for each message being scored.  David
    Packit Service 8f0814
        Relson translated bogotune from Perl to C to provide more
    Packit Service 8f0814
        speed.

    Packit Service 8f0814
    Packit Service 8f0814
        
    Packit Service 8f0814
        

    What's the message count format?

    Packit Service 8f0814
    Packit Service 8f0814
        

    The parsing of a message by bogofilter takes some time. After

    Packit Service 8f0814
        parsing, finding the spam and non-spam counts for each token takes
    Packit Service 8f0814
        additional time.  Having to repeate these steps every time
    Packit Service 8f0814
        bogotune needed a score was slow.  It was realized that parsing
    Packit Service 8f0814
        and look-up could be done once with the results being saved in a
    Packit Service 8f0814
        special format.  Initially this was called the bogolex format
    Packit Service 8f0814
        because the work was done by piping bogolexer output to bogoutil
    Packit Service 8f0814
        and formatting the result.  Since each processed message begins
    Packit Service 8f0814
        with the .MSG_COUNT token the format became knowns as the message
    Packit Service 8f0814
        count format.  The convention is to use a .mc extension for these
    Packit Service 8f0814
        files.

    Packit Service 8f0814
    Packit Service 8f0814
        
    Packit Service 8f0814
        

    How does bogotune work?

    Packit Service 8f0814
    Packit Service 8f0814
        

    First it reads all the files into memory, i.e. the wordlist and

    Packit Service 8f0814
        the ham messages and the spam messages.  From the wordlist tokens,
    Packit Service 8f0814
        it computes an initial robx value which is used in the initial
    Packit Service 8f0814
        scan of the messages to ensure they're usable.

    Packit Service 8f0814
    Packit Service 8f0814
        

    Given the total number of messages in the test set, a target

    Packit Service 8f0814
        number of false positives is selected for use in determining spam
    Packit Service 8f0814
        cutoff values in the individual scans.

    Packit Service 8f0814
    Packit Service 8f0814
        

    Then comes the coarse scan. Using 225 combinations of values

    Packit Service 8f0814
        chosen to span the potentially useful ranges for robs, robx, and
    Packit Service 8f0814
        min_dev, all the ham messages are scored and the target value is
    Packit Service 8f0814
        used to find a spam_cutoff score.  Then the spam messages are
    Packit Service 8f0814
        scored and the false negatives are counted.  The scan finishes
    Packit Service 8f0814
        with a listing of the ten best sets of parameters and their scores
    Packit Service 8f0814
        (false negative and false positive counts and percent).

    Packit Service 8f0814
    Packit Service 8f0814
        

    From the results, the best non-outlying result is picked and

    Packit Service 8f0814
        these parameters become the starting point for the fine scan.

    Packit Service 8f0814
    Packit Service 8f0814
        

    The fine scan, as the name suggests, scans the region (range of

    Packit Service 8f0814
        values of robs, robx and min_dev) surrounding the optimum found in
    Packit Service 8f0814
        the coarse scan, with smaller intervals so as to determine the
    Packit Service 8f0814
        optimum values more precisely. 

    Packit Service 8f0814
    Packit Service 8f0814
        
    Packit Service 8f0814
    Packit Service 8f0814
        

    How does bogotune ensure the messages it works with

    Packit Service 8f0814
        are numerous enough, and well enough classified, to deliver useful
    Packit Service 8f0814
        recommendations?
    Packit Service 8f0814
    Packit Service 8f0814
        

    It has certain minimum requirements that it checks for as it

    Packit Service 8f0814
        starts up.  It will complain (and halt) if there are fewer than
    Packit Service 8f0814
        2,000 ham or 2,000 spam in the wordlist, or if there are fewer
    Packit Service 8f0814
        than 500 ham or 500 spam in the set of test messages.  It will
    Packit Service 8f0814
        warn, but not halt, if there's too little scoring variation in the
    Packit Service 8f0814
        ham messages or the spam messages or if too many of the ham
    Packit Service 8f0814
        messages score as spam (or vice versa) on the initial pass.  There
    Packit Service 8f0814
        are additional checks, but I'm sure you get the idea from these
    Packit Service 8f0814
        examples.  For details, use the source :)

    Packit Service 8f0814
    Packit Service 8f0814
        
    Packit Service 8f0814
        

    Can I tell bogotune to do its work even though it

    Packit Service 8f0814
        doesn't like the data?
    Packit Service 8f0814
    Packit Service 8f0814
        

    No. At one time we had a -F option to force bogotune to run

    Packit Service 8f0814
        with unsuitable message data, but it was realized that this could
    Packit Service 8f0814
        be misleading and had little chance of being helpful.  Bogotune
    Packit Service 8f0814
        will warn the operator if its conclusions are untrustworthy due to
    Packit Service 8f0814
        marginal input, and will not run if its input data are detectably
    Packit Service 8f0814
        inadequate.

    Packit Service 8f0814
    Packit Service 8f0814
        
    Packit Service 8f0814
    </body>
    Packit Service 8f0814
    </html>