Blob Blame History Raw
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<refentry id="bogoutil.1">
    <refmeta>
	<refentrytitle>bogoutil</refentrytitle>
	<manvolnum>1</manvolnum>
	<refmiscinfo class="manual">Bogofilter Reference Manual</refmiscinfo>
	<refmiscinfo class="source">Bogofilter</refmiscinfo>
    </refmeta>
    <refnamediv id="name">
	<refname>bogoutil</refname>
	<refpurpose>Dumps, loads, and maintains
	<application>bogofilter</application> database files</refpurpose>
    </refnamediv>
    <refsynopsisdiv id="synopsis">

	<cmdsynopsis>
	    <command>bogoutil</command>
	    <group choice="req">
		<arg choice="plain">-h</arg>
		<arg choice="plain">-V</arg>
	    </group>
	</cmdsynopsis>

	<cmdsynopsis>
	    <command>bogoutil</command>
	    <arg choice="opt">options</arg>
	    <group choice="req">
		<arg choice="plain">-d <replaceable>file</replaceable></arg>
		<arg choice="plain">-H <replaceable>file</replaceable></arg>
		<arg choice="plain">-l <replaceable>file</replaceable></arg>
		<arg choice="plain">-m <replaceable>file</replaceable></arg>
		<arg choice="plain">-w <replaceable>file</replaceable></arg>
		<arg choice="plain">-p <replaceable>file</replaceable></arg>
	    </group>
	</cmdsynopsis>

	<cmdsynopsis>
	    <command>bogoutil</command>
	    <group choice="req">
		<arg choice="plain">-r <replaceable>file</replaceable></arg>
		<arg choice="plain">-R <replaceable>file</replaceable></arg>
	    </group>
	</cmdsynopsis>

	<cmdsynopsis>
	    <command>bogoutil</command>
	    <group choice="req">
		<arg choice="plain">--db-print-leafpage-count <replaceable>file</replaceable></arg>
		<arg choice="plain">--db-print-pagesize <replaceable>file</replaceable></arg>
		<arg choice="plain">--db-verify <replaceable>file</replaceable></arg>
		<arg choice="plain">--db-checkpoint
		    <replaceable>directory</replaceable><arg
			choice="opt" rep="repeat">flag</arg></arg>
		<arg choice="plain">--db-list-logfiles <replaceable>directory</replaceable></arg>
		<arg choice="plain">--db-prune <replaceable>directory</replaceable></arg>
		<arg choice="plain">--db-recover <replaceable>directory</replaceable></arg>
		<arg choice="plain">--db-recover-harder <replaceable>directory</replaceable></arg>
		<arg choice="plain">--db-remove-environment <replaceable>directory</replaceable></arg>
	    </group>
	</cmdsynopsis>

	<para>where <option>options</option> is</para>
	<cmdsynopsis>
	    <command>bogoutil</command>
	    <arg choice="opt">-v</arg>
	    <arg choice="opt">-n</arg>
	    <arg choice="opt">-C</arg>
	    <arg choice="opt">-D</arg>
	    <arg choice="opt">-a <replaceable>age</replaceable></arg>
	    <arg choice="opt">-c <replaceable>count</replaceable></arg>
	    <arg choice="opt">-s <replaceable>min,max</replaceable></arg>
	    <arg choice="opt">-y <replaceable>date</replaceable></arg>
	    <arg choice="opt">-I <replaceable>file</replaceable></arg>
	    <arg choice="opt">-O <replaceable>file</replaceable></arg>
	    <arg choice="opt">-x <replaceable>flags</replaceable></arg>
	    <arg choice="opt">--config-file <replaceable>file</replaceable></arg>
	</cmdsynopsis>

    </refsynopsisdiv>
    <refsect1 id="description">
	<title>DESCRIPTION</title>
	<para><application>Bogoutil</application> is part of the
	    <application>bogofilter</application> Bayesian spam filter package.</para>
	<para>It is used to dump and load <application>bogofilter</application>'s
	    Berkeley DB databases to and from text files, perform database maintenance
	    functions, and to display the values for specific words.</para>
    </refsect1>
    <refsect1 id="options">
	<title>OPTIONS</title>
	<para>
	    The <option>-d <replaceable>file</replaceable></option> 
	    option tells <application>bogoutil</application> to print
	    the contents of the database file to <option>stdout</option>.
	</para>
	<para>
	    The <option>-H <replaceable>file</replaceable></option>
	    option tells <application>bogoutil</application> to print
	    a histogram of the database file to
	    <option>stdout</option>.  The output is similar to
	    <application>bogofilter -vv</application>. Finally,
	    hapaxes (tokens which were only seen once) and pure tokens
	    (tokens which were encountered only in ham or only in
	    spam) are counted.
	</para>
	<para>
	    The <option>-l <replaceable>file</replaceable></option>
	    option tells <application>bogoutil</application>
	    to load the data from <option>stdin</option> into the database file.
	    If the database file exists, <option>stdin</option> data is
	    merged into the database file, with counts added up.
	</para>
	<para>The <option>-m</option> option tells <application>bogoutil</application> 
	    to perform maintenance functions on the specified database, i.e. discard tokens 
	    that are older than desired, have counts that are too small, or sizes (lengths) 
	    that are too long or too short.
	</para>
	<para>
	    The <option>-w <replaceable>file</replaceable></option> 
	    option tells <application>bogoutil</application> to
	    display token information from the database file.  The option
	    takes an argument, which is either the name of the
	    wordlist (usually wordlist.db) or the name of the directory
	    containing it.  Tokens can be listed on the command line
	    or piped to <application>bogoutil</application>.  When
	    there are extra arguments on the command line,
	    <application>bogoutil</application> will use them as the
	    tokens to lookup.  If there are no extra arguments,
	    <application>bogoutil</application> will read tokens from
	    <option>stdin</option>.
	</para>
	<para>
	    The <option>-p <replaceable>file</replaceable></option> 
	    option tells <application>bogoutil</application> to
	    display the database information for one or more tokens.
	    The display includes a probability column with the
	    token's spam score (computed using
	    <application>bogofilter</application>'s default values).
	    Option <option>-p</option> takes the same arguments as
	    option <option>-w</option> .
	</para>
	<para>The <option>-r <replaceable>file</replaceable></option> option tells
	    <application>bogoutil</application> to recalculate the ROBX
	    value and print it as a six-digit fraction.
	</para>

	<para>The <option>-R <replaceable>file</replaceable></option>
	    option does the same as <option>-r</option>, but saves the
	    result in the training database without printing it.
	</para>

	<para>The <option>-I <replaceable>file</replaceable></option> option tells
	    <application>bogoutil</application> to read its input from
	    <replaceable>file</replaceable> rather than stdin.
	</para>
	<para>The <option>-O <replaceable>file</replaceable></option> option tells
	    <application>bogoutil</application> to write its output to
	    <replaceable>file</replaceable> rather than stdout.
	</para>

	<para>
	    The <option>-v</option> option produces verbose output on <option>stderr</option>.
	    This option is primarily useful for debugging.
	</para>
	<para>The <option>-C</option> inhibits reading configuration
	    files and lets <application>bogoutil</application> go with the defaults.</para>
	<para>The <option>--config-file
		<replaceable>file</replaceable></option> option tells
	    <application>bogoutil</application> to read <replaceable>file</replaceable>
	    instead of the standard configuration file.</para>
	<para>The <option>-D</option> redirects debug output to stdout (it
	    usually goes to stderr).</para>
	<para>The <option>-x <replaceable>flags</replaceable></option>
	    option sets debugging flags.</para>
	<para>
	    Option <option>-n</option> stands for "replace non-ascii characters".  
	    It will replace characters with the high bit (0x80) by question marks.  
	    This can be useful if a word list has lots of unreadable tokens, for
	    example from Asian spam.  The "bad" characters will be converted to
	    question marks and matching tokens will be combined when used with
	    <option>-m</option> or <option>-l</option>, but not with <option>-d</option>.
	</para>
	<para>
	    Option <option>-a age</option> indicates an acceptable token age, with older ones being discarded.  
	    The age can be a date (in form YYYYMMMDD) or a day count, i.e. discard tokens older than 
	    <option>age</option> days.
	</para>
	<para>
	    Option <option>-c value</option> indicates that tokens with counts less than or equal to <option>value</option> 
	    are to be discarded.
	</para>
	<para>
	    Option <option>-s min,max</option> is used to discard tokens based on their size, i.e. length.  
	    All tokens shorter than <option>min</option> or longer than <option>max</option> will be discarded.
	</para>
	<para>
	    Option <option>-y date</option> is specifies the date to
	give to tokens that don't have dates.  The format is YYYYMMDD.
	</para>
	<para>The <option>-h</option> option prints the help message and exits.</para>
	<para>The <option>-V</option> option prints the version number and exits.</para>
    </refsect1>

    <refsect1 id="environment_maintenance">
	<title>ENVIRONMENT MAINTENANCE</title>
	<para>The <option>--db-checkpoint <replaceable>dir</replaceable></option>
	    option causes <application>bogoutil</application> to flush the buffer
	    caches and checkpoint the database environment.</para>
	<para>The <option>--db-list-logfiles
		<replaceable>dir</replaceable></option>
	    option causes <application>bogoutil</application> to list the log
	    files in the environment.  Zero or more keywords can be added or
	    combined (separated by whitespace) to modify the behavior of this
	    mode. The default behavior is to list only inactive log
	    files with relative paths. You can add <option>all</option>
	    to list all log files (inactive and active). You can add
	    <option>absolute</option> to switch the listing to absolute
	    paths.
	</para>
	<para>The <option>--db-prune <replaceable>dir</replaceable></option>
	    option causes <application>bogoutil</application> to checkpoint
	    the database environment and remove inactive log files.</para>
	<para>The <option>--db-recover <replaceable>dir</replaceable></option>
	    option runs a regular database recovery
	    in the specified database directory. If that fails, it will retry
	    with a (usually slower) catastrophic database recovery. If
	    that fails, too, your database cannot be repaired and must
	    be rebuilt from scratch.
	    This is only supported when compiled with Berkeley DB
	    support with transactions enabled. Trying recovery with QDBM or SQLite3 support will
	    result in an error.</para>
	<para>The <option>--db-recover-harder <replaceable>dir</replaceable></option>
	    option runs a catastrophic data
	    base recovery in the specified database directory. If that fails,
	    your database cannot be repaired and must be rebuilt from
	    scratch.
	    This is only supported when compiled with Berkeley DB
	    support with transactions enabled. Trying recovery with QDBM or SQLite3 support will
	    result in an error.</para>
	<para>The <option>--db-remove-environment
		<replaceable>directory</replaceable></option> option has
	    no short option equivalent. It runs recovery in the given
	    directory and then removes the database environment. Use
	    this <emphasis>before</emphasis> upgrading to a new Berkeley
	    DB version if the new version to be installed requires a log
	    file format update.</para>
	<para>The <option>--db-print-leafpage-count
		<replaceable>file</replaceable></option> option prints
	    the number of leaf pages in the database file 
	    <replaceable>file</replaceable> as a decimal number, or
	    UNKNOWN if the database does not support querying this
	    figure.</para>
	<para>The <option>--db-print-pagesize
		<replaceable>file</replaceable></option> option prints
	    the size of a database page in
	    <replaceable>file</replaceable> as a decimal number, or
	    UNKNOWN for databases with variable page size or databases
	    that do not allow a query of the database page size.</para>
	<para>
	    The <option>--db-verify <replaceable>file</replaceable></option>
	    option requests that <application>bogofilter</application> verifies
	    the database file.  It prints only errors, unless in verbose mode.
	</para>
    </refsect1>

    <refsect1 id="dataformat">
	<title>DATA FORMAT</title>
	<para>
	    <application>Bogoutil</application> reads and writes text files where each nonblank
	    line consists of a word, any amount of horizontal whitespace, a numeric word count, 
	    more whitespace, and (optionally) a date in form YYYYMMDD.
	    Blank lines are skipped.
	</para>
    </refsect1>

    <refsect1 id="returns">
	<title>RETURN VALUES</title>
	<para>
	    0 for successful operation.
	    1 for most errors.
	    3 for I/O or other errors.
	    Error 3 usually means that something is seriously wrong with the database files. 
	</para>
    </refsect1>
    <refsect1 id="author">
    <title>AUTHOR</title>
    <para>Gyepi Sam <email>gyepi@praxis-sw.com</email>.</para>
    <para>Matthias Andree <email>matthias.andree@gmx.de</email>.</para>
    <para>David Relson <email>relson@osagesoftware.com</email>.</para>

  <para>
      For updates, see <ulink url="http://bogofilter.sourceforge.net/">
	  the bogofilter project page</ulink>.
  </para>
  </refsect1>

<refsect1 id="also">
    <title>SEE ALSO </title>
    <para>bogofilter(1), bogolexer(1), bogotune(1), bogoupgrade(1)</para>
</refsect1>
</refentry>