|
Packit |
ca9683 |
The README file describes what Hspell is and what it includes. This file
|
|
Packit |
ca9683 |
explains how to build and install it.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
===========================================================================
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
Hspell is normally installed and used in one of two ways:
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
1. Native Hspell: Hspell can be used as a command-line tool "hspell", and/or
|
|
Packit |
ca9683 |
using a library libhspell, together with a dictionary in Hspell's own
|
|
Packit |
ca9683 |
format.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
2. Derivative dictionaries: Hspell's dictionary data is compiled into a
|
|
Packit |
ca9683 |
format used by some common multi-lingual spell-checker, such as aspell,
|
|
Packit |
ca9683 |
myspell or hunspell.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
One benefit of the native Hspell method is much better peformance: When
|
|
Packit |
ca9683 |
Hspell's native spell-checker is compared to hunspell, for example, it is
|
|
Packit |
ca9683 |
10 times smaller on disk, 10 times faster to start, uses half the memory,
|
|
Packit |
ca9683 |
and spell-checks hundreds of times (!) faster. Hspell's code also has
|
|
Packit |
ca9683 |
additional features that no multi-lingual spell-checker currently supports,
|
|
Packit |
ca9683 |
especially morphological analysis.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
The benefit of generating dictionaries for one of the existing multi-lingual
|
|
Packit |
ca9683 |
spell-checkers like aspell or hunspell are obvious: no additional code needs
|
|
Packit |
ca9683 |
to be installed, so it will work on any system where such multi-lingual spell
|
|
Packit |
ca9683 |
checker works. Even more importantly: Large applications, such as OpenOffice,
|
|
Packit |
ca9683 |
Firefox and even Google's Gmail, which already use aspell or hunspell to
|
|
Packit |
ca9683 |
provide spell-checking for many languages, gain Hebrew spell-checking without
|
|
Packit |
ca9683 |
any extra effort.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
============================================ Native Hspell ===============
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
Installing Hspell on a Unix-compatible system (Linux, Unix, Mac OS X) is
|
|
Packit |
ca9683 |
usually as simple as running
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
./configure
|
|
Packit |
ca9683 |
make
|
|
Packit |
ca9683 |
make install
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
Note that before running "make install", if you want to run the hspell
|
|
Packit |
ca9683 |
executable from the build directory, you must tell it to expect the dictionary
|
|
Packit |
ca9683 |
files in the current directory, rather than in their final location. Do this
|
|
Packit |
ca9683 |
by running "hspell -Dhebrew.wgz".
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
By default, Hspell is built for installation in the /usr/local tree. If you
|
|
Packit |
ca9683 |
want to install it somewhere else, use "./configure --prefix=/some/dir".
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
The --prefix option is just one of configure's usual options that give
|
|
Packit |
ca9683 |
you more control on the way that Hspell is compiled - run "./configure -h"
|
|
Packit |
ca9683 |
to see the entire list of these options.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
In addition to configure's usual options, Hspell's configure add a few
|
|
Packit |
ca9683 |
options whose names start with "--enable-", that enable optional features
|
|
Packit |
ca9683 |
in Hspell. These are the options you might want to use:
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
--enable-fatverb
|
|
Packit |
ca9683 |
Allow "objective kinuyim" on all forms of verbs. Because this adds
|
|
Packit |
ca9683 |
as many as 130,000 correct but very rarely-used (in modern texts)
|
|
Packit |
ca9683 |
inflections, a compile-time option is present for enabling or
|
|
Packit |
ca9683 |
disabling these forms. The default in this version is not to enable
|
|
Packit |
ca9683 |
them.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
--enable-linginfo
|
|
Packit |
ca9683 |
Include a full morphological analyzer in "hspell -l", explaining how
|
|
Packit |
ca9683 |
each correct word could be derived. This slows down the build and makes
|
|
Packit |
ca9683 |
the installation about 4 times larger, but doesn't slow hspell if "-l"
|
|
Packit |
ca9683 |
isn't used, so it is recommended.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
These optional features are not turned on by default because they present
|
|
Packit |
ca9683 |
a feature/performance tradeoff (you get more features but slower build,
|
|
Packit |
ca9683 |
larger installation, and/or slower executable), or a feature/feature tradeoff
|
|
Packit |
ca9683 |
(when you add more rare word forms, you're allowing more spelling mistakes
|
|
Packit |
ca9683 |
to masquerade as real words).
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
============================================ Derivative Dictionaries =====
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
After you run "configure" as explained above, the Makefile has additional
|
|
Packit |
ca9683 |
targets for creating dictionaries for several common multi-lingual
|
|
Packit |
ca9683 |
spell-checkers and applications:
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
Except where otherwise noted, these dictionary faithfully reproduce all of
|
|
Packit |
ca9683 |
Hebrew's morphological richness as understood by the native Hspell spell-
|
|
Packit |
ca9683 |
checker. This includes correctly allowing the various prefixes used in Hebrew,
|
|
Packit |
ca9683 |
and not allowing them when they are not appropriate (e.g., the definite article
|
|
Packit |
ca9683 |
on a verb).
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
"make hunspell" -
|
|
Packit |
ca9683 |
Creates the files "he.dic" and "he.aff".
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
These dictionaries are uncompressed. It is recommended that they be
|
|
Packit |
ca9683 |
compressed with "hzip". While hzip compression is not as good as
|
|
Packit |
ca9683 |
aspell's prezip-bin (or our own wzip), this is the compression format
|
|
Packit |
ca9683 |
which hunspell understand, and still can compress he.dic to a tenth
|
|
Packit |
ca9683 |
of its original size.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
If you package these files, please also package misc/Copyright,
|
|
Packit |
ca9683 |
so that users know they were generated by Hspell.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
"make aspell" -
|
|
Packit |
ca9683 |
Creates the files "he_affix.dat" and "he.wl".
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
Additionally, one can do "make he.rws" to create he.rws from he.wl.
|
|
Packit |
ca9683 |
(rws is a dump of aspell's in-memory hash table, which allows aspell
|
|
Packit |
ca9683 |
to mmap(2) the dictionary almost instantly, instead of reading it).
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
Unfortunately, there is one case where the aspell dictionary cannot
|
|
Packit |
ca9683 |
correctly reproduce Hspell (because it lacks hunspell's NEEDAFFIX
|
|
Packit |
ca9683 |
extension): In Hebrew, the infinitive verb may be preceded by the
|
|
Packit |
ca9683 |
prefixes lamed, bet, kaf or mem, but often must not come without any
|
|
Packit |
ca9683 |
prefix. We have not yet found a way to express this in the aspell,
|
|
Packit |
ca9683 |
so words like éùåï are incorrectly accepted as correct.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
If you package these files, please also package misc/Copyright,
|
|
Packit |
ca9683 |
so that users know they were generate by Hspell.
|
|
Packit |
ca9683 |
|
|
Packit |
ca9683 |
============================================ Additional Targets =====
|
|
Packit |
ca9683 |
Finally, there is a target, "make hif", for creating a full inflection list
|
|
Packit |
ca9683 |
which might be useful for other future applications besides spell-checking.
|
|
Packit |
ca9683 |
For more information, see README-hif.
|