Blame README-st_ZA.txt

Packit Service 93c372
How to use this dictionary framework
Packit Service 93c372
====================================
Packit Service 93c372
Packit Service 93c372
Each dictionary has its own directory using the ISO639 language code.  Eg
Packit Service 93c372
Afrikaans uses af.  In each directory is an aspell, myspell and in some cases
Packit Service 93c372
an ispell directory - these contain the speller specific information.
Packit Service 93c372
Packit Service 93c372
Files and directories
Packit Service 93c372
---------------------
Packit Service 93c372
Packit Service 93c372
|-- Makefile
Packit Service 93c372
|-- af
Packit Service 93c372
|   |-- COPYING
Packit Service 93c372
|   |-- CREDITS
Packit Service 93c372
|   |-- ChangeLog
Packit Service 93c372
|   |-- INSTALL
Packit Service 93c372
|   |-- Makefile
Packit Service 93c372
|   |-- README
Packit Service 93c372
|   |-- VERSION
Packit Service 93c372
|   |-- aspell
Packit Service 93c372
|   |   |-- Copyright
Packit Service 93c372
|   |   |-- info.in
Packit Service 93c372
|   |-- ispell
Packit Service 93c372
|   |   |-- README
Packit Service 93c372
|   |   `-- afrikaans.aff
Packit Service 93c372
|   |-- myspell
Packit Service 93c372
|   |   |-- README_af_ZA.txt
Packit Service 93c372
|   |   |-- af_ZA.aff
Packit Service 93c372
|   |-- wordlists
Packit Service 93c372
|   |   |-- wordlist.nieuwoudt.in
Packit Service 93c372
Packit Service 93c372
Prerequisites
Packit Service 93c372
============
Packit Service 93c372
aspell - needs word-list-compress which is contained in the aspell package for
Packit Service 93c372
your system. Other tools needed to package and build are located in the utils
Packit Service 93c372
directory
Packit Service 93c372
Packit Service 93c372
myspell - the tools needed are in the utils directory
Packit Service 93c372
Packit Service 93c372
Packit Service 93c372
Required Files
Packit Service 93c372
==============
Packit Service 93c372
Packit Service 93c372
ispell
Packit Service 93c372
------
Packit Service 93c372
The framework can build ispell but it is disabled by default because we are not
Packit Service 93c372
completely sure how to build ispell dictionaries.
Packit Service 93c372
Packit Service 93c372
aspell
Packit Service 93c372
------
Packit Service 93c372
Copyright - details of the copyright holders etc.  The actual copyright text
Packit Service 93c372
is added based on a line in the info.in file.
Packit Service 93c372
info.in - some basic definitions for the aspell package including copyright
Packit Service 93c372
holders, license, language name etc.  Check the Afrikaans one for a good
Packit Service 93c372
understanding of its construction.  If you need more details then look at the
Packit Service 93c372
instructions in the latest aspell dictionary build system.  ?URL?
Packit Service 93c372
What can be problematic in this file is the "special" line:
Packit Service 93c372
	special ' **- - -*- 4 -*- 6 -*-
Packit Service 93c372
It is simple to understand.  The above says ' (apostrophe) is permissible at the
Packit Service 93c372
beginning of a word and the middle. - (dash) is allowed in the middle of a
Packit Service 93c372
word, etc
Packit Service 93c372
Packit Service 93c372
myspell
Packit Service 93c372
-------
Packit Service 93c372
README_lang_REGION.txt - a README containing Copyright info and installation
Packit Service 93c372
instructions.
Packit Service 93c372
lang_REGION.aff - the affix file.  Mostly these are based on ispell affix
Packit Service 93c372
compression.  Follow the instructions ?here? for converting an ispell affix file
Packit Service 93c372
to myspell.  An affix file allows you to compress a wordlist and expand it to
Packit Service 93c372
the exact same set of words.  If you have not developed affix rules for the
Packit Service 93c372
language then the minimum you need is a SET and TRY line.
Packit Service 93c372
Packit Service 93c372
SET - the character set used by the language.  Only ISO8859 character sets can
Packit Service 93c372
be used in MySpell.  Although I think you can define your own internal mapping
Packit Service 93c372
if your language does not match an ISO charset. (Need to confirm this)
Packit Service 93c372
TRY - a list of letters in order of frequency.  The python script 
Packit Service 93c372
src/wordlist/letter-frequency.py allows you to create a frequency list.
Packit Service 93c372
Packit Service 93c372
Other useful entries:
Packit Service 93c372
MAP - map similar characters eg eêë
Packit Service 93c372
REP - create REPlacement maps that are useful for mapping common spelling
Packit Service 93c372
mistakes. eg REP ph f - as in phone.
Packit Service 93c372
Packit Service 93c372
Affix compression:
Packit Service 93c372
SFX - a suffix
Packit Service 93c372
PFX - a prefix
Packit Service 93c372
FIXME add more details on creating these.
Packit Service 93c372
Packit Service 93c372
Setting up a new language
Packit Service 93c372
-------------------------
Packit Service 93c372
Apart from input files required by each of the different spellcheckers, as
Packit Service 93c372
listed above, the main requirements are a wordlist and some definitions in the
Packit Service 93c372
language Makefile.
Packit Service 93c372
Packit Service 93c372
The wordlist is simply a text list of words one per line - currently we store these in UTF-8 to
Packit Service 93c372
ensure ease of use in the future.  Lines that start with a # are treated as
Packit Service 93c372
comments and removed when the wordlist is processed.  The wordlists can be
Packit Service 93c372
called anything although we name them wordlist.*.in.  But as you list them in
Packit Service 93c372
the Makefile you can name them as you please.  We have kept existing wordlists
Packit Service 93c372
in tact and used separate files for new additions.  In English we have grouped
Packit Service 93c372
similar concepts together eg. bird names, city names, etc.  Some languages
Packit Service 93c372
group words according to parts of speech which may aid later use with advances
Packit Service 93c372
in grammar checkers or in agglutinative languages that may have rules as to how
Packit Service 93c372
words may be joined together.
Packit Service 93c372
Packit Service 93c372
The Makefile calls the generic Makefile called utils/Makefile.language.  The
Packit Service 93c372
language Makefile contains a number of definitions such as the name of the
Packit Service 93c372
language its character set, etc.  If you need to understand some of the build
Packit Service 93c372
process steps then look at the generic Makefile.
Packit Service 93c372
Packit Service 93c372
Add a VERSION file.  We default to using the date as spellcheckers are really
Packit Service 93c372
enhancements and refinements of wordlists, so a newer date should always
Packit Service 93c372
indicate a better spellchecker.
Packit Service 93c372
Packit Service 93c372
Also add you language to the Makefile in the dict/ directory.  Both as a build
Packit Service 93c372
rule and as a TARGET.
Packit Service 93c372
Packit Service 93c372
Packit Service 93c372
Building
Packit Service 93c372
--------
Packit Service 93c372
Packit Service 93c372
make - generate all dictionaries for that language
Packit Service 93c372
make count - will return simple stats on your wordlist
Packit Service 93c372
make aspell - create the aspell dictionary (relatively quick)
Packit Service 93c372
make myspell - create a myspell dictionary (looooong)
Packit Service 93c372
make clean - cleans up all packaged files
Packit Service 93c372
Packit Service 93c372
Outputs
Packit Service 93c372
-------
Packit Service 93c372
All outputs are placed in the respective spellchecker directories.
Packit Service 93c372
Packit Service 93c372
aspell - creates a tarball that would be compiled and installed on the target platform
Packit Service 93c372
Packit Service 93c372
myspell - creates a few outputs.  
Packit Service 93c372
	lang_REGION.zip - the basic MySpell spellchecker usable in OpenOffice.org
Packit Service 93c372
	pack-lang_REGION.xip - as above but installable by the offline installer
Packit Service 93c372
	lang-REGION.xpi - the same as lang_REGION.zip but installable in Mozilla
Packit Service 93c372
Packit Service 93c372
Resources - South African
Packit Service 93c372
-------------------------
Packit Service 93c372
Packit Service 93c372
Common Names of fish - http://www.fishbase.org/search.cfm
Packit Service 93c372
Birds - 
Packit Service 93c372
	Robert's Birds list - http://web.uct.ac.za/depts/fitzpatrick/docs/listintro.html
Packit Service 93c372
	http://www.wildlifesafari.info/south_african_birds.htm
Packit Service 93c372
	http://www.wildlifesafari.info/southern_africa_bird_checklist.htm
Packit Service 93c372
Trees - http://www.wildlifesafari.info/southern_africa_tree_list.html
Packit Service 93c372
Endangered species -
Packit Service 93c372
http://www.unep-wcmc.org/index.html?http://sea.unep-wcmc.org/isdb/CITES/Taxonomy/?displaylanguage=eng~main
Packit Service 93c372
http://www.e-gnu.com/check_005.html
Packit Service 93c372
http://www.e-gnu.com/check_003.html
Packit Service 93c372
http://www.e-gnu.com/check_004.html
Packit Service 93c372
Listed companies:
Packit Service 93c372
http://www.jse.co.za/listed/companies/la.html
Packit Service 93c372
Names Changes:
Packit Service 93c372
http://africanhistory.about.com/cs/southafrica/a/sa_new_name.htm
Packit Service 93c372
www.sapo.co.za - get downloadable postal codes for names of towns and suburbs