Tree - source-git/hunspell-sw - CentOS Git server

source-git / hunspell-sw

Files

Blob Blame History Raw
README Swahili Myspell Dictionary
Release 1.1 2005-08-17

1. Intro 

Myspell Swahili Dictionary - Compiled by Alberto Escudero-Pascual aep@it46.se
http://www.it46.se


2. Word list Sources

   The  wordlists  have  been  compiled  based  on  the following
   resources:

   - Dr. Jason M. Githeko (githeko at egerton.ac.ke)
   Egerton University, Njoro, Kenya
   http://www.egerton.ac.ke/ict/kiswa.php
   (48340 words)

   - Prof. D.P.B. Massamba, Prof. A.M. Khamisi et al.
   TUKI English-Swahili Dictionary
   (18327 words)

   - Dr. Martin Benjamin et al. (swahili at yale.edu)
   The Kamusi Project,
   http://www.yale.edu/swahili/
   (15418 words)

   - Dr. Kevin P. Scannell (scannell at slu.edu)
   Corpus building for minority languages
   http://borel.slu.edu/crubadan/
   (+8008 words)

   Total words: 67901

   In  addition,  the programming skills of the following persons
   have also contributed to the Jambo Spellchecker:
   Dwayne Bailey, Louise Berthilson, Iñaki Cívico Campos, Alberto
   Escudero-Pascual and Fredrik Lilieblad.

3. Licence

   The Jambo Spellchecker is released as free software (LGPL).

4. Final Notes

   - Kamusi Project wordlist:

   The  Kamusi  Project  is  an  ongoing  work  of  collaborative
   scholarship  that  is  developing a free online dictionary and
   learning resources for Swahili. Established in 1994, it is the
   world's  most-used  resource for the Swahili language, and the
   first  result  for "Swahili" delivered by most Internet search
   engines;    see    http://www.yale.edu/swahili/    for    more
   information.

   - An Crúbadán:

   The  Swahili  word  list  was  improved with the help of Kevin
   Scannell's  software  An  Crúbadán, a web crawler that targets
   minority  languages  and  languages with limited computational
   resources.

   In  December  2004, the web crawler searched into 6600+ online
   Swahili  documents and collected about 10 million (non unique)
   words .

   The  goal of the An Crúbadán is to develop language technology
   for  as  many  languages  as  possible by applying statistical
   techniques  to the vast quantities of text freely available on
   the  web.  Text  corpora  have  been  created  for  nearly 200
   languages so far, and these data are available for use by open
   source  projects;  see http://borel.slu.edu/crubadan/ for more
   information.

5. TODO
 
   Work in the .aff file
source-git / hunspell-sw

Source Code

Files