Blame README-ve_ZA.txt

Packit 62d0ca
How to use this dictionary framework
Packit 62d0ca
====================================
Packit 62d0ca
Packit 62d0ca
Each dictionary has its own directory using the ISO639 language code.  Eg
Packit 62d0ca
Afrikaans uses af.  In each directory is an aspell, myspell and in some cases
Packit 62d0ca
an ispell directory - these contain the speller specific information.
Packit 62d0ca
Packit 62d0ca
Files and directories
Packit 62d0ca
---------------------
Packit 62d0ca
Packit 62d0ca
|-- Makefile
Packit 62d0ca
|-- af
Packit 62d0ca
|   |-- COPYING
Packit 62d0ca
|   |-- CREDITS
Packit 62d0ca
|   |-- ChangeLog
Packit 62d0ca
|   |-- INSTALL
Packit 62d0ca
|   |-- Makefile
Packit 62d0ca
|   |-- README
Packit 62d0ca
|   |-- VERSION
Packit 62d0ca
|   |-- aspell
Packit 62d0ca
|   |   |-- Copyright
Packit 62d0ca
|   |   |-- info.in
Packit 62d0ca
|   |-- ispell
Packit 62d0ca
|   |   |-- README
Packit 62d0ca
|   |   `-- afrikaans.aff
Packit 62d0ca
|   |-- myspell
Packit 62d0ca
|   |   |-- README_af_ZA.txt
Packit 62d0ca
|   |   |-- af_ZA.aff
Packit 62d0ca
|   |-- wordlists
Packit 62d0ca
|   |   |-- wordlist.nieuwoudt.in
Packit 62d0ca
Packit 62d0ca
Prerequisites
Packit 62d0ca
============
Packit 62d0ca
aspell - needs word-list-compress which is contained in the aspell package for
Packit 62d0ca
your system. Other tools needed to package and build are located in the utils
Packit 62d0ca
directory
Packit 62d0ca
Packit 62d0ca
myspell - the tools needed are in the utils directory
Packit 62d0ca
Packit 62d0ca
Packit 62d0ca
Required Files
Packit 62d0ca
==============
Packit 62d0ca
Packit 62d0ca
ispell
Packit 62d0ca
------
Packit 62d0ca
The framework can build ispell but it is disabled by default because we are not
Packit 62d0ca
completely sure how to build ispell dictionaries.
Packit 62d0ca
Packit 62d0ca
aspell
Packit 62d0ca
------
Packit 62d0ca
Copyright - details of the copyright holders etc.  The actual copyright text
Packit 62d0ca
is added based on a line in the info.in file.
Packit 62d0ca
info.in - some basic definitions for the aspell package including copyright
Packit 62d0ca
holders, license, language name etc.  Check the Afrikaans one for a good
Packit 62d0ca
understanding of its construction.  If you need more details then look at the
Packit 62d0ca
instructions in the latest aspell dictionary build system.  ?URL?
Packit 62d0ca
What can be problematic in this file is the "special" line:
Packit 62d0ca
	special ' **- - -*- 4 -*- 6 -*-
Packit 62d0ca
It is simple to understand.  The above says ' (apostrophe) is permissible at the
Packit 62d0ca
beginning of a word and the middle. - (dash) is allowed in the middle of a
Packit 62d0ca
word, etc
Packit 62d0ca
Packit 62d0ca
myspell
Packit 62d0ca
-------
Packit 62d0ca
README_lang_REGION.txt - a README containing Copyright info and installation
Packit 62d0ca
instructions.
Packit 62d0ca
lang_REGION.aff - the affix file.  Mostly these are based on ispell affix
Packit 62d0ca
compression.  Follow the instructions ?here? for converting an ispell affix file
Packit 62d0ca
to myspell.  An affix file allows you to compress a wordlist and expand it to
Packit 62d0ca
the exact same set of words.  If you have not developed affix rules for the
Packit 62d0ca
language then the minimum you need is a SET and TRY line.
Packit 62d0ca
Packit 62d0ca
SET - the character set used by the language.  Only ISO8859 character sets can
Packit 62d0ca
be used in MySpell.  Although I think you can define your own internal mapping
Packit 62d0ca
if your language does not match an ISO charset. (Need to confirm this)
Packit 62d0ca
TRY - a list of letters in order of frequency.  The python script 
Packit 62d0ca
src/wordlist/letter-frequency.py allows you to create a frequency list.
Packit 62d0ca
Packit 62d0ca
Other useful entries:
Packit 62d0ca
MAP - map similar characters eg eêë
Packit 62d0ca
REP - create REPlacement maps that are useful for mapping common spelling
Packit 62d0ca
mistakes. eg REP ph f - as in phone.
Packit 62d0ca
Packit 62d0ca
Affix compression:
Packit 62d0ca
SFX - a suffix
Packit 62d0ca
PFX - a prefix
Packit 62d0ca
FIXME add more details on creating these.
Packit 62d0ca
Packit 62d0ca
Setting up a new language
Packit 62d0ca
-------------------------
Packit 62d0ca
Apart from input files required by each of the different spellcheckers, as
Packit 62d0ca
listed above, the main requirements are a wordlist and some definitions in the
Packit 62d0ca
language Makefile.
Packit 62d0ca
Packit 62d0ca
The wordlist is simply a text list of words one per line - currently we store these in UTF-8 to
Packit 62d0ca
ensure ease of use in the future.  Lines that start with a # are treated as
Packit 62d0ca
comments and removed when the wordlist is processed.  The wordlists can be
Packit 62d0ca
called anything although we name them wordlist.*.in.  But as you list them in
Packit 62d0ca
the Makefile you can name them as you please.  We have kept existing wordlists
Packit 62d0ca
in tact and used separate files for new additions.  In English we have grouped
Packit 62d0ca
similar concepts together eg. bird names, city names, etc.  Some languages
Packit 62d0ca
group words according to parts of speech which may aid later use with advances
Packit 62d0ca
in grammar checkers or in agglutinative languages that may have rules as to how
Packit 62d0ca
words may be joined together.
Packit 62d0ca
Packit 62d0ca
The Makefile calls the generic Makefile called utils/Makefile.language.  The
Packit 62d0ca
language Makefile contains a number of definitions such as the name of the
Packit 62d0ca
language its character set, etc.  If you need to understand some of the build
Packit 62d0ca
process steps then look at the generic Makefile.
Packit 62d0ca
Packit 62d0ca
Add a VERSION file.  We default to using the date as spellcheckers are really
Packit 62d0ca
enhancements and refinements of wordlists, so a newer date should always
Packit 62d0ca
indicate a better spellchecker.
Packit 62d0ca
Packit 62d0ca
Also add you language to the Makefile in the dict/ directory.  Both as a build
Packit 62d0ca
rule and as a TARGET.
Packit 62d0ca
Packit 62d0ca
Packit 62d0ca
Building
Packit 62d0ca
--------
Packit 62d0ca
Packit 62d0ca
make - generate all dictionaries for that language
Packit 62d0ca
make count - will return simple stats on your wordlist
Packit 62d0ca
make aspell - create the aspell dictionary (relatively quick)
Packit 62d0ca
make myspell - create a myspell dictionary (looooong)
Packit 62d0ca
make clean - cleans up all packaged files
Packit 62d0ca
Packit 62d0ca
Outputs
Packit 62d0ca
-------
Packit 62d0ca
All outputs are placed in the respective spellchecker directories.
Packit 62d0ca
Packit 62d0ca
aspell - creates a tarball that would be compiled and installed on the target platform
Packit 62d0ca
Packit 62d0ca
myspell - creates a few outputs.  
Packit 62d0ca
	lang_REGION.zip - the basic MySpell spellchecker usable in OpenOffice.org
Packit 62d0ca
	pack-lang_REGION.xip - as above but installable by the offline installer
Packit 62d0ca
	lang-REGION.xpi - the same as lang_REGION.zip but installable in Mozilla
Packit 62d0ca
Packit 62d0ca
Resources - South African
Packit 62d0ca
-------------------------
Packit 62d0ca
Packit 62d0ca
Common Names of fish - http://www.fishbase.org/search.cfm
Packit 62d0ca
Birds - 
Packit 62d0ca
	Robert's Birds list - http://web.uct.ac.za/depts/fitzpatrick/docs/listintro.html
Packit 62d0ca
	http://www.wildlifesafari.info/south_african_birds.htm
Packit 62d0ca
	http://www.wildlifesafari.info/southern_africa_bird_checklist.htm
Packit 62d0ca
Trees - http://www.wildlifesafari.info/southern_africa_tree_list.html
Packit 62d0ca
Endangered species -
Packit 62d0ca
http://www.unep-wcmc.org/index.html?http://sea.unep-wcmc.org/isdb/CITES/Taxonomy/?displaylanguage=eng~main
Packit 62d0ca
http://www.e-gnu.com/check_005.html
Packit 62d0ca
http://www.e-gnu.com/check_003.html
Packit 62d0ca
http://www.e-gnu.com/check_004.html
Packit 62d0ca
Listed companies:
Packit 62d0ca
http://www.jse.co.za/listed/companies/la.html
Packit 62d0ca
Names Changes:
Packit 62d0ca
http://africanhistory.about.com/cs/southafrica/a/sa_new_name.htm
Packit 62d0ca
www.sapo.co.za - get downloadable postal codes for names of towns and suburbs