|
Packit |
f36994 |
How to use this dictionary framework
|
|
Packit |
f36994 |
====================================
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Each dictionary has its own directory using the ISO639 language code. Eg
|
|
Packit |
f36994 |
Afrikaans uses af. In each directory is an aspell, myspell and in some cases
|
|
Packit |
f36994 |
an ispell directory - these contain the speller specific information.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Files and directories
|
|
Packit |
f36994 |
---------------------
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
|-- Makefile
|
|
Packit |
f36994 |
|-- af
|
|
Packit |
f36994 |
| |-- COPYING
|
|
Packit |
f36994 |
| |-- CREDITS
|
|
Packit |
f36994 |
| |-- ChangeLog
|
|
Packit |
f36994 |
| |-- INSTALL
|
|
Packit |
f36994 |
| |-- Makefile
|
|
Packit |
f36994 |
| |-- README
|
|
Packit |
f36994 |
| |-- VERSION
|
|
Packit |
f36994 |
| |-- aspell
|
|
Packit |
f36994 |
| | |-- Copyright
|
|
Packit |
f36994 |
| | |-- info.in
|
|
Packit |
f36994 |
| |-- ispell
|
|
Packit |
f36994 |
| | |-- README
|
|
Packit |
f36994 |
| | `-- afrikaans.aff
|
|
Packit |
f36994 |
| |-- myspell
|
|
Packit |
f36994 |
| | |-- README_af_ZA.txt
|
|
Packit |
f36994 |
| | |-- af_ZA.aff
|
|
Packit |
f36994 |
| |-- wordlists
|
|
Packit |
f36994 |
| | |-- wordlist.nieuwoudt.in
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Prerequisites
|
|
Packit |
f36994 |
============
|
|
Packit |
f36994 |
aspell - needs word-list-compress which is contained in the aspell package for
|
|
Packit |
f36994 |
your system. Other tools needed to package and build are located in the utils
|
|
Packit |
f36994 |
directory
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
myspell - the tools needed are in the utils directory
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Required Files
|
|
Packit |
f36994 |
==============
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
ispell
|
|
Packit |
f36994 |
------
|
|
Packit |
f36994 |
The framework can build ispell but it is disabled by default because we are not
|
|
Packit |
f36994 |
completely sure how to build ispell dictionaries.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
aspell
|
|
Packit |
f36994 |
------
|
|
Packit |
f36994 |
Copyright - details of the copyright holders etc. The actual copyright text
|
|
Packit |
f36994 |
is added based on a line in the info.in file.
|
|
Packit |
f36994 |
info.in - some basic definitions for the aspell package including copyright
|
|
Packit |
f36994 |
holders, license, language name etc. Check the Afrikaans one for a good
|
|
Packit |
f36994 |
understanding of its construction. If you need more details then look at the
|
|
Packit |
f36994 |
instructions in the latest aspell dictionary build system. ?URL?
|
|
Packit |
f36994 |
What can be problematic in this file is the "special" line:
|
|
Packit |
f36994 |
special ' **- - -*- 4 -*- 6 -*-
|
|
Packit |
f36994 |
It is simple to understand. The above says ' (apostrophe) is permissible at the
|
|
Packit |
f36994 |
beginning of a word and the middle. - (dash) is allowed in the middle of a
|
|
Packit |
f36994 |
word, etc
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
myspell
|
|
Packit |
f36994 |
-------
|
|
Packit |
f36994 |
README_lang_REGION.txt - a README containing Copyright info and installation
|
|
Packit |
f36994 |
instructions.
|
|
Packit |
f36994 |
lang_REGION.aff - the affix file. Mostly these are based on ispell affix
|
|
Packit |
f36994 |
compression. Follow the instructions ?here? for converting an ispell affix file
|
|
Packit |
f36994 |
to myspell. An affix file allows you to compress a wordlist and expand it to
|
|
Packit |
f36994 |
the exact same set of words. If you have not developed affix rules for the
|
|
Packit |
f36994 |
language then the minimum you need is a SET and TRY line.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
SET - the character set used by the language. Only ISO8859 character sets can
|
|
Packit |
f36994 |
be used in MySpell. Although I think you can define your own internal mapping
|
|
Packit |
f36994 |
if your language does not match an ISO charset. (Need to confirm this)
|
|
Packit |
f36994 |
TRY - a list of letters in order of frequency. The python script
|
|
Packit |
f36994 |
src/wordlist/letter-frequency.py allows you to create a frequency list.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Other useful entries:
|
|
Packit |
f36994 |
MAP - map similar characters eg eêë
|
|
Packit |
f36994 |
REP - create REPlacement maps that are useful for mapping common spelling
|
|
Packit |
f36994 |
mistakes. eg REP ph f - as in phone.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Affix compression:
|
|
Packit |
f36994 |
SFX - a suffix
|
|
Packit |
f36994 |
PFX - a prefix
|
|
Packit |
f36994 |
FIXME add more details on creating these.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Setting up a new language
|
|
Packit |
f36994 |
-------------------------
|
|
Packit |
f36994 |
Apart from input files required by each of the different spellcheckers, as
|
|
Packit |
f36994 |
listed above, the main requirements are a wordlist and some definitions in the
|
|
Packit |
f36994 |
language Makefile.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
The wordlist is simply a text list of words one per line - currently we store these in UTF-8 to
|
|
Packit |
f36994 |
ensure ease of use in the future. Lines that start with a # are treated as
|
|
Packit |
f36994 |
comments and removed when the wordlist is processed. The wordlists can be
|
|
Packit |
f36994 |
called anything although we name them wordlist.*.in. But as you list them in
|
|
Packit |
f36994 |
the Makefile you can name them as you please. We have kept existing wordlists
|
|
Packit |
f36994 |
in tact and used separate files for new additions. In English we have grouped
|
|
Packit |
f36994 |
similar concepts together eg. bird names, city names, etc. Some languages
|
|
Packit |
f36994 |
group words according to parts of speech which may aid later use with advances
|
|
Packit |
f36994 |
in grammar checkers or in agglutinative languages that may have rules as to how
|
|
Packit |
f36994 |
words may be joined together.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
The Makefile calls the generic Makefile called utils/Makefile.language. The
|
|
Packit |
f36994 |
language Makefile contains a number of definitions such as the name of the
|
|
Packit |
f36994 |
language its character set, etc. If you need to understand some of the build
|
|
Packit |
f36994 |
process steps then look at the generic Makefile.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Add a VERSION file. We default to using the date as spellcheckers are really
|
|
Packit |
f36994 |
enhancements and refinements of wordlists, so a newer date should always
|
|
Packit |
f36994 |
indicate a better spellchecker.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Also add you language to the Makefile in the dict/ directory. Both as a build
|
|
Packit |
f36994 |
rule and as a TARGET.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Building
|
|
Packit |
f36994 |
--------
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
make - generate all dictionaries for that language
|
|
Packit |
f36994 |
make count - will return simple stats on your wordlist
|
|
Packit |
f36994 |
make aspell - create the aspell dictionary (relatively quick)
|
|
Packit |
f36994 |
make myspell - create a myspell dictionary (looooong)
|
|
Packit |
f36994 |
make clean - cleans up all packaged files
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Outputs
|
|
Packit |
f36994 |
-------
|
|
Packit |
f36994 |
All outputs are placed in the respective spellchecker directories.
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
aspell - creates a tarball that would be compiled and installed on the target platform
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
myspell - creates a few outputs.
|
|
Packit |
f36994 |
lang_REGION.zip - the basic MySpell spellchecker usable in OpenOffice.org
|
|
Packit |
f36994 |
pack-lang_REGION.xip - as above but installable by the offline installer
|
|
Packit |
f36994 |
lang-REGION.xpi - the same as lang_REGION.zip but installable in Mozilla
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Resources - South African
|
|
Packit |
f36994 |
-------------------------
|
|
Packit |
f36994 |
|
|
Packit |
f36994 |
Common Names of fish - http://www.fishbase.org/search.cfm
|
|
Packit |
f36994 |
Birds -
|
|
Packit |
f36994 |
Robert's Birds list - http://web.uct.ac.za/depts/fitzpatrick/docs/listintro.html
|
|
Packit |
f36994 |
http://www.wildlifesafari.info/south_african_birds.htm
|
|
Packit |
f36994 |
http://www.wildlifesafari.info/southern_africa_bird_checklist.htm
|
|
Packit |
f36994 |
Trees - http://www.wildlifesafari.info/southern_africa_tree_list.html
|
|
Packit |
f36994 |
Endangered species -
|
|
Packit |
f36994 |
http://www.unep-wcmc.org/index.html?http://sea.unep-wcmc.org/isdb/CITES/Taxonomy/?displaylanguage=eng~main
|
|
Packit |
f36994 |
http://www.e-gnu.com/check_005.html
|
|
Packit |
f36994 |
http://www.e-gnu.com/check_003.html
|
|
Packit |
f36994 |
http://www.e-gnu.com/check_004.html
|
|
Packit |
f36994 |
Listed companies:
|
|
Packit |
f36994 |
http://www.jse.co.za/listed/companies/la.html
|
|
Packit |
f36994 |
Names Changes:
|
|
Packit |
f36994 |
http://africanhistory.about.com/cs/southafrica/a/sa_new_name.htm
|
|
Packit |
f36994 |
www.sapo.co.za - get downloadable postal codes for names of towns and suburbs
|