bogofilter TODO list **** Solaris: /opt/csw - fix system iconv vs. csw iconv. This fails if for instance libsqlite3 is taken from /opt/csw --with-libsqlite3-prefix, because iconv isn't taken from there. This needs a *general* cleanup of prefixes, because those don't apply to includes (which probably causes these issues). **** Documentation: Berkeley DB 4.7, check options (versions for existing ones, new options), use a list of versions that require a log format upgrade (or DB format or whatever) for simplicity **** Database (Berkeley DB): Use auto-recover features of Berkeley DB 4.4+ and give up on our own recovery locking and crash detection. **** Database (Berkeley DB): Can we use the bulk load feature to our advantage? **** If insufficient data is present and the default "undecided" bogosity is added in -p mode, add also a comment stating that bogofilter needs more training first **** Add a "reservation lock" (fcntl style on separate file) so that a writer can prevent new readers from starting, so that busy scoring systems don't starve registration processes. (Figure out the details to avoid deadlock.) **** Drop/fix MAXTOKENLEN: where it is an allocation, it must die. Where it is a character limit, count characters, not octets, to support UTF-8. **** Database (Berkeley DB): Implement Concurrent Data store, quite similar to Transactional. **** MIME: Make sure that RFC-2047 decoder runs only once, not recursively. **** MIME: Implement RFC-2046 section 5.2.2 (message/partial reassembly rules, Take most headers from enclosing message except Content-*, Subject, Message-ID, Encrypted, MIME-Version, which are taken from the enclosed message). **** Reimplement seeking passthrough mode that got dropped on 2003-08-23 with the switch to bogoreader.* http://article.gmane.org/gmane.mail.bogofilter.general/9035 and followups. (MID <20041222105734.GA30574@sela.f4n.org>, by "John" Subject "Size limit?" on 2004-12-22) The fseek() code to determine if the input is seekable got removed when the reader moved out of main.c between 1.66 and 1.67 (CVS) and has never been in bogoreader.c. **** New Feature: Token aging. Support for struct data in the wordlists is already present. **** New feature: Token merging, based on delta tokens (Andras Salamon, andras@dns.net on bogofilter-dev, 2005-01-25) **** Two deletes for kmail? This wouldn't be a patch for bogofilter itself, but a change to give kmail delete-as-spam and delete-as- nonspam buttons. Similarly for other MUAs. **** New Feature: Make it a milter? **** New Feature: Multiple list file support with weights and rules. Wordlist verfification. Eric Seppanen: > Allow use of a variable number of list files, each with their > own weights and rules. > Possible uses: > - hand-maintained "whitelist" or "blacklist" files, with massive > weighting to override everything else. > - allow users to use system-wide list files and their own files. > Shared-database version based on the autodaemon code, In the shared-database version (which doesn't yet exist) worldlist verification to avoid attacks on posters (thanks, Barry!). Emulate the Vipul's Razor reputation scheme for people reporting tokens? http://razor.sourceforge.net/ **** What this software is probably heading towards is a scheme in which there's a general notion of tagged categories (spam being one) with cluster analysis being applied to categorize which categories a message belongs to at above 0.9 confidence level. **** New Feature: Web based tool for wordlist management. Allow message registration and whitelist management. HTML Templatized for easy integration with existing web mail systems. **** New Feature: Add support for a user configurable list of headers that should be used to ignore (single or multi-line) headers that appear in the list. The list should be used to ignore headers both during the message registration and evaluation procedures.