Blame README-st_ZA.txt

Packit f36994
How to use this dictionary framework
Packit f36994
====================================
Packit f36994
Packit f36994
Each dictionary has its own directory using the ISO639 language code.  Eg
Packit f36994
Afrikaans uses af.  In each directory is an aspell, myspell and in some cases
Packit f36994
an ispell directory - these contain the speller specific information.
Packit f36994
Packit f36994
Files and directories
Packit f36994
---------------------
Packit f36994
Packit f36994
|-- Makefile
Packit f36994
|-- af
Packit f36994
|   |-- COPYING
Packit f36994
|   |-- CREDITS
Packit f36994
|   |-- ChangeLog
Packit f36994
|   |-- INSTALL
Packit f36994
|   |-- Makefile
Packit f36994
|   |-- README
Packit f36994
|   |-- VERSION
Packit f36994
|   |-- aspell
Packit f36994
|   |   |-- Copyright
Packit f36994
|   |   |-- info.in
Packit f36994
|   |-- ispell
Packit f36994
|   |   |-- README
Packit f36994
|   |   `-- afrikaans.aff
Packit f36994
|   |-- myspell
Packit f36994
|   |   |-- README_af_ZA.txt
Packit f36994
|   |   |-- af_ZA.aff
Packit f36994
|   |-- wordlists
Packit f36994
|   |   |-- wordlist.nieuwoudt.in
Packit f36994
Packit f36994
Prerequisites
Packit f36994
============
Packit f36994
aspell - needs word-list-compress which is contained in the aspell package for
Packit f36994
your system. Other tools needed to package and build are located in the utils
Packit f36994
directory
Packit f36994
Packit f36994
myspell - the tools needed are in the utils directory
Packit f36994
Packit f36994
Packit f36994
Required Files
Packit f36994
==============
Packit f36994
Packit f36994
ispell
Packit f36994
------
Packit f36994
The framework can build ispell but it is disabled by default because we are not
Packit f36994
completely sure how to build ispell dictionaries.
Packit f36994
Packit f36994
aspell
Packit f36994
------
Packit f36994
Copyright - details of the copyright holders etc.  The actual copyright text
Packit f36994
is added based on a line in the info.in file.
Packit f36994
info.in - some basic definitions for the aspell package including copyright
Packit f36994
holders, license, language name etc.  Check the Afrikaans one for a good
Packit f36994
understanding of its construction.  If you need more details then look at the
Packit f36994
instructions in the latest aspell dictionary build system.  ?URL?
Packit f36994
What can be problematic in this file is the "special" line:
Packit f36994
	special ' **- - -*- 4 -*- 6 -*-
Packit f36994
It is simple to understand.  The above says ' (apostrophe) is permissible at the
Packit f36994
beginning of a word and the middle. - (dash) is allowed in the middle of a
Packit f36994
word, etc
Packit f36994
Packit f36994
myspell
Packit f36994
-------
Packit f36994
README_lang_REGION.txt - a README containing Copyright info and installation
Packit f36994
instructions.
Packit f36994
lang_REGION.aff - the affix file.  Mostly these are based on ispell affix
Packit f36994
compression.  Follow the instructions ?here? for converting an ispell affix file
Packit f36994
to myspell.  An affix file allows you to compress a wordlist and expand it to
Packit f36994
the exact same set of words.  If you have not developed affix rules for the
Packit f36994
language then the minimum you need is a SET and TRY line.
Packit f36994
Packit f36994
SET - the character set used by the language.  Only ISO8859 character sets can
Packit f36994
be used in MySpell.  Although I think you can define your own internal mapping
Packit f36994
if your language does not match an ISO charset. (Need to confirm this)
Packit f36994
TRY - a list of letters in order of frequency.  The python script 
Packit f36994
src/wordlist/letter-frequency.py allows you to create a frequency list.
Packit f36994
Packit f36994
Other useful entries:
Packit f36994
MAP - map similar characters eg eêë
Packit f36994
REP - create REPlacement maps that are useful for mapping common spelling
Packit f36994
mistakes. eg REP ph f - as in phone.
Packit f36994
Packit f36994
Affix compression:
Packit f36994
SFX - a suffix
Packit f36994
PFX - a prefix
Packit f36994
FIXME add more details on creating these.
Packit f36994
Packit f36994
Setting up a new language
Packit f36994
-------------------------
Packit f36994
Apart from input files required by each of the different spellcheckers, as
Packit f36994
listed above, the main requirements are a wordlist and some definitions in the
Packit f36994
language Makefile.
Packit f36994
Packit f36994
The wordlist is simply a text list of words one per line - currently we store these in UTF-8 to
Packit f36994
ensure ease of use in the future.  Lines that start with a # are treated as
Packit f36994
comments and removed when the wordlist is processed.  The wordlists can be
Packit f36994
called anything although we name them wordlist.*.in.  But as you list them in
Packit f36994
the Makefile you can name them as you please.  We have kept existing wordlists
Packit f36994
in tact and used separate files for new additions.  In English we have grouped
Packit f36994
similar concepts together eg. bird names, city names, etc.  Some languages
Packit f36994
group words according to parts of speech which may aid later use with advances
Packit f36994
in grammar checkers or in agglutinative languages that may have rules as to how
Packit f36994
words may be joined together.
Packit f36994
Packit f36994
The Makefile calls the generic Makefile called utils/Makefile.language.  The
Packit f36994
language Makefile contains a number of definitions such as the name of the
Packit f36994
language its character set, etc.  If you need to understand some of the build
Packit f36994
process steps then look at the generic Makefile.
Packit f36994
Packit f36994
Add a VERSION file.  We default to using the date as spellcheckers are really
Packit f36994
enhancements and refinements of wordlists, so a newer date should always
Packit f36994
indicate a better spellchecker.
Packit f36994
Packit f36994
Also add you language to the Makefile in the dict/ directory.  Both as a build
Packit f36994
rule and as a TARGET.
Packit f36994
Packit f36994
Packit f36994
Building
Packit f36994
--------
Packit f36994
Packit f36994
make - generate all dictionaries for that language
Packit f36994
make count - will return simple stats on your wordlist
Packit f36994
make aspell - create the aspell dictionary (relatively quick)
Packit f36994
make myspell - create a myspell dictionary (looooong)
Packit f36994
make clean - cleans up all packaged files
Packit f36994
Packit f36994
Outputs
Packit f36994
-------
Packit f36994
All outputs are placed in the respective spellchecker directories.
Packit f36994
Packit f36994
aspell - creates a tarball that would be compiled and installed on the target platform
Packit f36994
Packit f36994
myspell - creates a few outputs.  
Packit f36994
	lang_REGION.zip - the basic MySpell spellchecker usable in OpenOffice.org
Packit f36994
	pack-lang_REGION.xip - as above but installable by the offline installer
Packit f36994
	lang-REGION.xpi - the same as lang_REGION.zip but installable in Mozilla
Packit f36994
Packit f36994
Resources - South African
Packit f36994
-------------------------
Packit f36994
Packit f36994
Common Names of fish - http://www.fishbase.org/search.cfm
Packit f36994
Birds - 
Packit f36994
	Robert's Birds list - http://web.uct.ac.za/depts/fitzpatrick/docs/listintro.html
Packit f36994
	http://www.wildlifesafari.info/south_african_birds.htm
Packit f36994
	http://www.wildlifesafari.info/southern_africa_bird_checklist.htm
Packit f36994
Trees - http://www.wildlifesafari.info/southern_africa_tree_list.html
Packit f36994
Endangered species -
Packit f36994
http://www.unep-wcmc.org/index.html?http://sea.unep-wcmc.org/isdb/CITES/Taxonomy/?displaylanguage=eng~main
Packit f36994
http://www.e-gnu.com/check_005.html
Packit f36994
http://www.e-gnu.com/check_003.html
Packit f36994
http://www.e-gnu.com/check_004.html
Packit f36994
Listed companies:
Packit f36994
http://www.jse.co.za/listed/companies/la.html
Packit f36994
Names Changes:
Packit f36994
http://africanhistory.about.com/cs/southafrica/a/sa_new_name.htm
Packit f36994
www.sapo.co.za - get downloadable postal codes for names of towns and suburbs