Blame README-ns_ZA.txt

Packit 9b7714
How to use this dictionary framework
Packit 9b7714
====================================
Packit 9b7714
Packit 9b7714
Each dictionary has its own directory using the ISO639 language code.  Eg
Packit 9b7714
Afrikaans uses af.  In each directory is an aspell, myspell and in some cases
Packit 9b7714
an ispell directory - these contain the speller specific information.
Packit 9b7714
Packit 9b7714
Files and directories
Packit 9b7714
---------------------
Packit 9b7714
Packit 9b7714
|-- Makefile
Packit 9b7714
|-- af
Packit 9b7714
|   |-- COPYING
Packit 9b7714
|   |-- CREDITS
Packit 9b7714
|   |-- ChangeLog
Packit 9b7714
|   |-- INSTALL
Packit 9b7714
|   |-- Makefile
Packit 9b7714
|   |-- README
Packit 9b7714
|   |-- VERSION
Packit 9b7714
|   |-- aspell
Packit 9b7714
|   |   |-- Copyright
Packit 9b7714
|   |   |-- info.in
Packit 9b7714
|   |-- ispell
Packit 9b7714
|   |   |-- README
Packit 9b7714
|   |   `-- afrikaans.aff
Packit 9b7714
|   |-- myspell
Packit 9b7714
|   |   |-- README_af_ZA.txt
Packit 9b7714
|   |   |-- af_ZA.aff
Packit 9b7714
|   |-- wordlists
Packit 9b7714
|   |   |-- wordlist.nieuwoudt.in
Packit 9b7714
Packit 9b7714
Prerequisites
Packit 9b7714
============
Packit 9b7714
aspell - needs prezip-bin (since 0.60, was word-list-compress for 0.50), 
Packit 9b7714
which is contained in the aspell package for your system. Other tools 
Packit 9b7714
needed to package and build are located in the utils directory
Packit 9b7714
Packit 9b7714
myspell - the tools needed are in the utils directory
Packit 9b7714
Packit 9b7714
Packit 9b7714
Required Files
Packit 9b7714
==============
Packit 9b7714
Packit 9b7714
ispell
Packit 9b7714
------
Packit 9b7714
The framework can build ispell but it is disabled by default because we are not
Packit 9b7714
completely sure how to build ispell dictionaries.
Packit 9b7714
Packit 9b7714
aspell
Packit 9b7714
------
Packit 9b7714
Copyright - details of the copyright holders etc.  The actual copyright text
Packit 9b7714
is added based on a line in the info.in file.
Packit 9b7714
info.in - some basic definitions for the aspell package including copyright
Packit 9b7714
holders, license, language name etc.
Packit 9b7714
lang.dat - the language data file, used along with info.in.
Packit 9b7714
What can be problematic in the language data file is the "special" line:
Packit 9b7714
	special ' **- - -*- 4 -*- 6 -*-
Packit 9b7714
It is simple to understand.  The above says ' (apostrophe) is permissible at the
Packit 9b7714
beginning of a word and the middle. - (dash) is allowed in the middle of a
Packit 9b7714
word, etc
Packit 9b7714
Check the Afrikaans one for a good understanding of its construction.  If 
Packit 9b7714
you need more details then look at the README instructions in the latest 
Packit 9b7714
aspell dictionary build system (the aspell-lang package):
Packit 9b7714
	ftp://ftp.gnu.org/gnu/aspell/aspell-lang-20071024.tar.bz2
Packit 9b7714
Packit 9b7714
myspell
Packit 9b7714
-------
Packit 9b7714
README_lang_REGION.txt - a README containing Copyright info and installation
Packit 9b7714
instructions.
Packit 9b7714
lang_REGION.aff - the affix file.  Mostly these are based on ispell affix
Packit 9b7714
compression.  Follow the instructions ?here? for converting an ispell affix file
Packit 9b7714
to myspell.  An affix file allows you to compress a wordlist and expand it to
Packit 9b7714
the exact same set of words.  If you have not developed affix rules for the
Packit 9b7714
language then the minimum you need is a SET and TRY line.
Packit 9b7714
Packit 9b7714
SET - the character set used by the language.  Only ISO8859 character sets can
Packit 9b7714
be used in MySpell.  Although I think you can define your own internal mapping
Packit 9b7714
if your language does not match an ISO charset. (Need to confirm this)
Packit 9b7714
TRY - a list of letters in order of frequency.  The python script 
Packit 9b7714
src/wordlist/letter-frequency.py allows you to create a frequency list.
Packit 9b7714
Packit 9b7714
Other useful entries:
Packit 9b7714
MAP - map similar characters eg eêë
Packit 9b7714
REP - create REPlacement maps that are useful for mapping common spelling
Packit 9b7714
mistakes. eg REP ph f - as in phone.
Packit 9b7714
Packit 9b7714
Affix compression:
Packit 9b7714
SFX - a suffix
Packit 9b7714
PFX - a prefix
Packit 9b7714
FIXME add more details on creating these.
Packit 9b7714
Packit 9b7714
Setting up a new language
Packit 9b7714
-------------------------
Packit 9b7714
Apart from input files required by each of the different spellcheckers, as
Packit 9b7714
listed above, the main requirements are a wordlist and some definitions in the
Packit 9b7714
language Makefile.
Packit 9b7714
Packit 9b7714
The wordlist is simply a text list of words one per line - currently we store these in UTF-8 to
Packit 9b7714
ensure ease of use in the future.  Lines that start with a # are treated as
Packit 9b7714
comments and removed when the wordlist is processed.  The wordlists can be
Packit 9b7714
called anything although we name them wordlist.*.in.  But as you list them in
Packit 9b7714
the Makefile you can name them as you please.  We have kept existing wordlists
Packit 9b7714
in tact and used separate files for new additions.  In English we have grouped
Packit 9b7714
similar concepts together eg. bird names, city names, etc.  Some languages
Packit 9b7714
group words according to parts of speech which may aid later use with advances
Packit 9b7714
in grammar checkers or in agglutinative languages that may have rules as to how
Packit 9b7714
words may be joined together.
Packit 9b7714
Packit 9b7714
The Makefile calls the generic Makefile called utils/Makefile.language.  The
Packit 9b7714
language Makefile contains a number of definitions such as the name of the
Packit 9b7714
language its character set, etc.  If you need to understand some of the build
Packit 9b7714
process steps then look at the generic Makefile.
Packit 9b7714
Packit 9b7714
Add a VERSION file.  We default to using the date as spellcheckers are really
Packit 9b7714
enhancements and refinements of wordlists, so a newer date should always
Packit 9b7714
indicate a better spellchecker.
Packit 9b7714
Packit 9b7714
Also add you language to the Makefile in the dict/ directory.  Both as a build
Packit 9b7714
rule and as a TARGET.
Packit 9b7714
Packit 9b7714
Packit 9b7714
Building
Packit 9b7714
--------
Packit 9b7714
Packit 9b7714
make - generate all dictionaries for that language
Packit 9b7714
make count - will return simple stats on your wordlist
Packit 9b7714
make aspell - create the aspell dictionary (relatively quick)
Packit 9b7714
make myspell - create a myspell dictionary (looooong)
Packit 9b7714
make clean - cleans up all packaged files
Packit 9b7714
Packit 9b7714
Outputs
Packit 9b7714
-------
Packit 9b7714
All outputs are placed in the respective spellchecker directories.
Packit 9b7714
Packit 9b7714
aspell - creates a tarball that would be compiled and installed on the target platform
Packit 9b7714
Packit 9b7714
myspell - creates a few outputs.  
Packit 9b7714
	lang_REGION.zip - the basic MySpell spellchecker usable in OpenOffice.org
Packit 9b7714
	pack-lang_REGION.xip - as above but installable by the offline installer
Packit 9b7714
	lang-REGION.xpi - the same as lang_REGION.zip but installable in Mozilla
Packit 9b7714
Packit 9b7714
Resources - South African
Packit 9b7714
-------------------------
Packit 9b7714
Packit 9b7714
Common Names of fish - http://www.fishbase.org/search.cfm
Packit 9b7714
Birds - 
Packit 9b7714
	Robert's Birds list - http://web.uct.ac.za/depts/fitzpatrick/docs/listintro.html
Packit 9b7714
	http://www.wildlifesafari.info/south_african_birds.htm
Packit 9b7714
	http://www.wildlifesafari.info/southern_africa_bird_checklist.htm
Packit 9b7714
Trees - http://www.wildlifesafari.info/southern_africa_tree_list.html
Packit 9b7714
Endangered species -
Packit 9b7714
http://www.unep-wcmc.org/index.html?http://sea.unep-wcmc.org/isdb/CITES/Taxonomy/?displaylanguage=eng~main
Packit 9b7714
http://www.e-gnu.com/check_005.html
Packit 9b7714
http://www.e-gnu.com/check_003.html
Packit 9b7714
http://www.e-gnu.com/check_004.html
Packit 9b7714
Listed companies:
Packit 9b7714
http://www.jse.co.za/listed/companies/la.html
Packit 9b7714
Names Changes:
Packit 9b7714
http://africanhistory.about.com/cs/southafrica/a/sa_new_name.htm
Packit 9b7714
www.sapo.co.za - get downloadable postal codes for names of towns and suburbs