|
Packit |
2be50e |
README-file for the distribution of the Norwegian dictionaries for ISPELL.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
DESCRIPTION
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
This distribution contains a big collection of Norwegian words (both
|
|
Packit |
2be50e |
bokmål and nynorsk) and support files to make useful things from it.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The main file norsk.source contains 747500 words from the Norwegian
|
|
Packit |
2be50e |
language. Each word has a commonness indicator, and it is hyphenated
|
|
Packit |
2be50e |
at compound points.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
There is also a Makefile to assist in building dictionaries for Ispell
|
|
Packit |
2be50e |
and other word processors, using a sensible subset of the available
|
|
Packit |
2be50e |
words. There is also a Makefile in the patterns directory which makes
|
|
Packit |
2be50e |
hyphenation patterns for TeX based on the dictionary and a simple set
|
|
Packit |
2be50e |
of hyphenation patterns that works on non-compound words.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The latest version is available at
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
http://spell-norwegian.alioth.debian.org/
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Comments, suggestions and bug-reports to i18n-no@lister.ping.uio.no.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
There is also a slashdot project with a similar goal. We should try to
|
|
Packit |
2be50e |
join forces with them. <URL:http://sourceforge.net/projects/spell-no>
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
BUILDING A NORWEGIAN ISPELL DICTIONARY
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
* Get the ispell sources and unpack it.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cd /source
|
|
Packit |
2be50e |
tar -zxvf ispell-3.1.20.tar.gz
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
You can also unpack the sources for the Norwegian dictionary now:
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cd ispell-3.1/languages
|
|
Packit |
2be50e |
tar -zxvf ispell-norsk-2.0.tar.gz
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
* Patch Ispell
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
I have made a patch for ispell based mainly on other patches found
|
|
Packit |
2be50e |
on the net. If you think you have found a bug in ispell, please
|
|
Packit |
2be50e |
make sure that it has nothing to do with this patch before
|
|
Packit |
2be50e |
reporting it to the ispell manager!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The following things are done:
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
1. An attempt is made to fix the backslash bug. The patch for this
|
|
Packit |
2be50e |
was found at Ken Stevens ispell.el site.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
2. Ispell can now parse html files thanks to a patch by Gerry
|
|
Packit |
2be50e |
Tierney. Basically this means that a patched copy of ispell will
|
|
Packit |
2be50e |
ignore any mark-up tags or html entities in a html document when
|
|
Packit |
2be50e |
spell checking that document. Any text inside an 'alt' attribute
|
|
Packit |
2be50e |
will however be checked.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Examples: ispell index.html # html tags will be ignored
|
|
Packit |
2be50e |
ispell -h README # html tags will be ignored
|
|
Packit |
2be50e |
ispell -n index.html # html tags will be spell-checked
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
I have not been able to make the html mode work well when using
|
|
Packit |
2be50e |
ispell from emacs. That doesn't matter too much, since ispell.el
|
|
Packit |
2be50e |
has its own skipping mechanism.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
3. Buildhash now accepts all characters between A and z as flags,
|
|
Packit |
2be50e |
not only the alphanumeric ones when MASKBITS=64. This is needed
|
|
Packit |
2be50e |
by the Norwegian affix file.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
4. The AMS and breqn math environments are now skipped by ispell.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
5. Ispell gets the ability to suggest "- as a separation character
|
|
Packit |
2be50e |
in addition to - and space. This only happens if such support is
|
|
Packit |
2be50e |
compiled in, e.g. the COMPOUNDBABEL flag must be defined, and it
|
|
Packit |
2be50e |
only happens in TeX mode and if the language is norsk. It is
|
|
Packit |
2be50e |
useful to mark compound points in words to ensure good
|
|
Packit |
2be50e |
hyphenation when using LaTeX with Babel. The Norwegian
|
|
Packit |
2be50e |
hyphenation patterns distributed in this package hyphenate almost
|
|
Packit |
2be50e |
every word in the Ispell dictionary correctly, but no guaranty is
|
|
Packit |
2be50e |
offered for other compound words.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
6. Added an -r switch, which is almost like the -a switch, but the
|
|
Packit |
2be50e |
suggestions are printed even if the word is found in the
|
|
Packit |
2be50e |
dictionary. This is useful for hyphenating words and for
|
|
Packit |
2be50e |
eliminating rare words close to very common words. There has to
|
|
Packit |
2be50e |
be some german out there wanting to make TeX hyphenate only
|
|
Packit |
2be50e |
compound words.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
7. Added a patch from the Redhat rpm to avoid compilation error in
|
|
Packit |
2be50e |
ijoin.c.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
So if you are feeling a little brave;
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cd ispell-3.1
|
|
Packit |
2be50e |
patch < languages/norsk/ispell-3.1.20.no.patch
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Additional patches might be needed on various systems. The Redhat
|
|
Packit |
2be50e |
source RPM is a good place to look if something fails.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
* CONFIGURE ISPELL The file Config.X in the ispell-3.1 distribution
|
|
Packit |
2be50e |
contains configuration information for ispell (no ./configure yet).
|
|
Packit |
2be50e |
The definitions are overridden by those in the file local.h, for
|
|
Packit |
2be50e |
which there is a local.h.samp. The following local.h works for me
|
|
Packit |
2be50e |
on my Redhat-6.0 system. You have to adopt the file to those
|
|
Packit |
2be50e |
languages you have dictionaries for.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
-----------------------------------------------------------------------
|
|
Packit |
2be50e |
#define MINIMENU /* Display a mini-menu at the bottom of the screen */
|
|
Packit |
2be50e |
#define USG /* Define this on System V */
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
#define BINDIR "/usr/bin"
|
|
Packit |
2be50e |
#define LIBDIR "/usr/lib"
|
|
Packit |
2be50e |
#define MAN1DIR "/usr/man/man1"
|
|
Packit |
2be50e |
#define MAN4DIR "/usr/man/man4"
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
#define LANGUAGES "{american,MASTERDICTS=american.med+,HASHFILES=americanmed+.ha
|
|
Packit |
2be50e |
sh,EXTRADICT=/usr/dict/words} {norsk}"
|
|
Packit |
2be50e |
#define MASKBITS 64
|
|
Packit |
2be50e |
#define LOOK "look"
|
|
Packit |
2be50e |
#define CFLAGS "-O3" /* Mostly to speed up my batch operations */
|
|
Packit |
2be50e |
#define LDFLAGS "-s"
|
|
Packit |
2be50e |
#define COMPOUNDBABEL
|
|
Packit |
2be50e |
-----------------------------------------------------------------------
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
It might be wise to try to build ispell only for English, to test that
|
|
Packit |
2be50e |
everything works, and add new languages afterwards.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cd ispell-3.1
|
|
Packit |
2be50e |
make all
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
This takes some time, but almost nothing compared to building the
|
|
Packit |
2be50e |
Norwegian dictionary.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
* ADD LANGUAGES
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Get dictionaries for the languages you want to install from the
|
|
Packit |
2be50e |
ispell home page. Unpack them in the appropriate directories.
|
|
Packit |
2be50e |
Update the LANGUAGES variable in local.h and remake.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Make sure that there is enough free space to build the dictionary.
|
|
Packit |
2be50e |
If it isn't the build process will loose miserabely. About 120 MB is
|
|
Packit |
2be50e |
needed!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The Norwegian dictionary can be configured. You can choose which
|
|
Packit |
2be50e |
categories of words to include, and how common a word has to be to
|
|
Packit |
2be50e |
be included. This is documented in the Makefile in languages/norsk.
|
|
Packit |
2be50e |
This flexibility has its price; it takes a very long time and a lot
|
|
Packit |
2be50e |
of disk space to build the dictionary, up to 120Mb.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
You can also customize the affix file to remove or add some forms of
|
|
Packit |
2be50e |
words. For example you could choose to allow or disallow the
|
|
Packit |
2be50e |
spelling `komitéen'. To do this you can make the file norsk.aff,
|
|
Packit |
2be50e |
edit it according to your needs, and make norsk.hash afterwards.
|
|
Packit |
2be50e |
Look for the word `valgfritt' in the file. Bear in mind that
|
|
Packit |
2be50e |
norsk.aff will is dependent on norsk.aff.in, so if you touch that
|
|
Packit |
2be50e |
file your version will be overwritten. It will not work as expected
|
|
Packit |
2be50e |
to change norsk.aff.in.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
* INSTALL
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Before you install, you might want to test if ispell works.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cd languages/norsk
|
|
Packit |
2be50e |
echo vurderingskriterier | ../../ispell -a -d norsk.hash
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
should find vurderingskriterium. Then
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
make install
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
USING THE DICTIONARY
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
CHARACTER SETS
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
By default ispell assumes you use latin-1 encoding in your Norwegian
|
|
Packit |
2be50e |
files. To spell-check such a file you just say
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
ispell -d norsk mythesis.tex
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
In TeX you can use `{\aa}', `{\oe}', `{\o}', `\'e', `\'o' and `\^o' to
|
|
Packit |
2be50e |
represent the special Norwegian characters. If you do this, you have
|
|
Packit |
2be50e |
to say
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
ispell -T plaintex -d norsk mythesis.tex
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
to spell-check a file. The characters æøåéòô will not be recognized
|
|
Packit |
2be50e |
then, so unfortunately you have to choose one standard. If you use
|
|
Packit |
2be50e |
`\aa{}' etc. instead, you should change the affix file or add a
|
|
Packit |
2be50e |
similar entry in the affix file.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
In a plain ASCII file `æ ø å' are sometimes represented `ae oe aa'.
|
|
Packit |
2be50e |
Use
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
ispell -T ascii -d norsk mythesis.tex
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
to spell-check such a file.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The iso246 encoding puts æøå after z in the collating sequence.
|
|
Packit |
2be50e |
If you use this encoding, say
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
ispell -T iso246 -d norsk mythesis.tex
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Does anybody use this??
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
COMPOUND WORDS
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The use of compound words is what makes it both fun and difficult to
|
|
Packit |
2be50e |
produce a good and secure ispell dictionary and to make hyphenation
|
|
Packit |
2be50e |
patterns for TeX.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Ispell has two very important switches, -B and -C, controlling whether
|
|
Packit |
2be50e |
ispell accepts words formed by a root and another word as correct. If
|
|
Packit |
2be50e |
the -C flag is given, ispell will accept words as
|
|
Packit |
2be50e |
`avdelingsbestyrerstilling', which is right, but also words as
|
|
Packit |
2be50e |
`premierene' (premie-rene), which is wrong. It is *not recommended*
|
|
Packit |
2be50e |
to use the -C option with the Norwegian dictionary, since far to many
|
|
Packit |
2be50e |
incorrect spellings will be accepted.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
If you don't give the -B or -C flag, ispell will accept compound words
|
|
Packit |
2be50e |
formed by a small subset of the words in the dictionary. The subset
|
|
Packit |
2be50e |
depends on the configuration variables in the Makefile. This is called
|
|
Packit |
2be50e |
controlled compoundwords mode. It is even more safe to give the -B
|
|
Packit |
2be50e |
option, such that only words in the dictionary are regarded as
|
|
Packit |
2be50e |
correct. I would do that if I had written something important.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The hyphenation patterns for TeX are only tested on words in the
|
|
Packit |
2be50e |
dictionary, so these patterns might fail on compound words accepted in
|
|
Packit |
2be50e |
controlled compoundwords mode. If you want to be absolutely certain
|
|
Packit |
2be50e |
that there will be no bad hyphens in your document, you have to use
|
|
Packit |
2be50e |
the -B switch. See `The hyphenation problem' below.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
FIGHTING `ORD DELINGS SYNDROMET'
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Most spell checkers, including ispell, suggest to split compound words
|
|
Packit |
2be50e |
it doesn't find in its dictionary. If people follow these suggestions
|
|
Packit |
2be50e |
blindly, the result is disaster; they get spelling errors in the
|
|
Packit |
2be50e |
actual document and even worse; they think they have learned the
|
|
Packit |
2be50e |
correct spelling! (arkitekt tegnet hus i Holmenkoll åsen...)
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
I have done two things to fight this. Ispell suggests `"-' in
|
|
Packit |
2be50e |
addition to `-' and ` ' for compound words, which tells TeX that here
|
|
Packit |
2be50e |
is a compound point and makes the spell-check skip the word next time.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The second thing is more important. The script inorsk-maybecompound
|
|
Packit |
2be50e |
searches a document (or standard input) for two and three words
|
|
Packit |
2be50e |
following each other that can be written in one word, hyphenates them
|
|
Packit |
2be50e |
using TeX and prints the compound words to standard output. By
|
|
Packit |
2be50e |
hyphenating one avoids words like sommer (som mer), forlenge (for
|
|
Packit |
2be50e |
lenge) etc. Use it!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
EMACS
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The version of `ispell.el' distributed with emacs-19.34 does not
|
|
Packit |
2be50e |
support Norwegian. I suggest you get the latest ispell.el from
|
|
Packit |
2be50e |
ftp://kdstevens.com/pub/stevens/ispell.el.gz. Good versions are also
|
|
Packit |
2be50e |
found in emacs-20.[4567].
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
So make sure that your version of ispell.el uses the variable
|
|
Packit |
2be50e |
ispell-local-dictionary-alist, and put a suitable subset of the
|
|
Packit |
2be50e |
following in your .emacs file:
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(setq
|
|
Packit |
2be50e |
ispell-local-dictionary-alist
|
|
Packit |
2be50e |
'(("norsk" ; 8 bit Norwegian mode
|
|
Packit |
2be50e |
"[A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[^A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-B" "-S" "-d" "norsk") "~list" iso-8859-1)
|
|
Packit |
2be50e |
("norsk7-tex" ; 7 bit Norwegian TeX mode
|
|
Packit |
2be50e |
"[A-Za-z{}\\'^`@]" "[^A-Za-z{}\\'^`@]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-B" "-S" "-d" "norsk" "-T" "plaintex") "~plaintex" nil)
|
|
Packit |
2be50e |
("norsk7-html" ; 7 bit Norwegian html mode
|
|
Packit |
2be50e |
"[A-Za-z\&;]" "[^A-Za-z\&;]" ; Don't use ispell's html-parser
|
|
Packit |
2be50e |
"[.,:]" t ("-B" "-S" "-n" "-d" "norsk") "~html" iso-8859-1)
|
|
Packit |
2be50e |
("norsk7-ascii" ; 7 bit Norwegian (aa, ae, oe)
|
|
Packit |
2be50e |
"[A-Za-z]" "[^A-Za-z]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-B" "-S" "-d" "norsk") "~ascii" iso-8859-1)
|
|
Packit |
2be50e |
("norsk7-iso246" "[][A-Za-z{}|\\]" "[^][A-Za-z{}|\\]"
|
|
Packit |
2be50e |
"[\".,;:]" nil ("-B" "-S" "-d" "norsk") "~iso246" iso-8859-1)
|
|
Packit |
2be50e |
("norsk-comp" ; 8 bit Norwegian mode
|
|
Packit |
2be50e |
"[A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[^A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-S" "-d" "norsk") "~list" iso-8859-1)
|
|
Packit |
2be50e |
("norsk7-tex-comp" ; 7 bit Norwegian TeX mode
|
|
Packit |
2be50e |
"[A-Za-z{}\\'^`@]" "[^A-Za-z{}\\'^`@]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-S" "-d" "norsk" "-T" "plaintex") "~plaintex" nil)
|
|
Packit |
2be50e |
("norsk7-html-comp" ; 7 bit Norwegian html mode
|
|
Packit |
2be50e |
"[A-Za-z\&;]" "[^A-Za-z\&;]" ; Don't use ispell's html-parser
|
|
Packit |
2be50e |
"[.,:]" t ("-S" "-n" "-d" "norsk") "~html" iso-8859-1)
|
|
Packit |
2be50e |
("norsk7-ascii-comp" ; 7 bit Norwegian (aa, ae, oe)
|
|
Packit |
2be50e |
"[A-Za-z]" "[^A-Za-z]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-S" "-d" "norsk") "~ascii" iso-8859-1)
|
|
Packit |
2be50e |
("norsk7-iso246" "[][A-Za-z{}|\\]" "[^][A-Za-z{}|\\]"
|
|
Packit |
2be50e |
"[\".,;:]" nil ("-B" "-S" "-d" "norsk") "~iso246" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk" ; 8 bit Norwegian mode
|
|
Packit |
2be50e |
"[A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[^A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-B" "-S" "-d" "nynorsk") "~list" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk7-tex" ; 7 bit Norwegian TeX mode
|
|
Packit |
2be50e |
"[A-Za-z{}\\'^`@]" "[^A-Za-z{}\\'^`@]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-B" "-S" "-d" "nynorsk" "-T" "plaintex") "~plaintex" nil)
|
|
Packit |
2be50e |
("nynorsk7-html" ; 7 bit Norwegian html mode
|
|
Packit |
2be50e |
"[A-Za-z\&;]" "[^A-Za-z\&;]" ; Don't use ispell's html-parser
|
|
Packit |
2be50e |
"[.,:]" t ("-B" "-S" "-n" "-d" "nynorsk") "~html" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk7-ascii" ; 7 bit Norwegian (aa, ae, oe)
|
|
Packit |
2be50e |
"[A-Za-z]" "[^A-Za-z]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-B" "-S" "-d" "nynorsk") "~ascii" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk7-iso246" "[][A-Za-z{}|\\]" "[^][A-Za-z{}|\\]"
|
|
Packit |
2be50e |
"[\".,;:]" nil ("-B" "-S" "-d" "nynorsk") "~iso246" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk-comp" ; 8 bit Norwegian mode
|
|
Packit |
2be50e |
"[A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[^A-Za-z\305\306\307\310\311\322\323\324\330\345\346\347\350\351\362\363\364\370]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-S" "-d" "nynorsk") "~list" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk7-tex-comp" ; 7 bit Norwegian TeX mode
|
|
Packit |
2be50e |
"[A-Za-z{}\\'^`@]" "[^A-Za-z{}\\'^`@]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-S" "-d" "nynorsk" "-T" "plaintex") "~plaintex" nil)
|
|
Packit |
2be50e |
("nynorsk7-html-comp" ; 7 bit Norwegian html mode
|
|
Packit |
2be50e |
"[A-Za-z\&;]" "[^A-Za-z\&;]" ; Don't use ispell's html-parser
|
|
Packit |
2be50e |
"[.,:]" t ("-S" "-n" "-d" "nynorsk") "~html" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk7-ascii-comp" ; 7 bit Norwegian (aa, ae, oe)
|
|
Packit |
2be50e |
"[A-Za-z]" "[^A-Za-z]"
|
|
Packit |
2be50e |
"[\".,;:]" t ("-S" "-d" "nynorsk") "~ascii" iso-8859-1)
|
|
Packit |
2be50e |
("nynorsk7-iso246" "[][A-Za-z{}|\\]" "[^][A-Za-z{}|\\]"
|
|
Packit |
2be50e |
"[\".,;:]" nil ("-B" "-S" "-d" "nynorsk") "~iso246" iso-8859-1)
|
|
Packit |
2be50e |
))
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(load-library "ispell")
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The above is very unpretty indeed. It is basically four copies of the
|
|
Packit |
2be50e |
same list. If you come up with something better, please let me know.
|
|
Packit |
2be50e |
I am a terrible lisp programmer!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
As you see there are a lot of entries. The -comp entries puts ispell
|
|
Packit |
2be50e |
in controlled compoundwords mode. Nice to do for a quick spell-check.
|
|
Packit |
2be50e |
I recommend to delete the entries you you don't plan to use. I like
|
|
Packit |
2be50e |
to use the -S switch, e.g. not sort the suggestions made by ispell.
|
|
Packit |
2be50e |
Then it is more likely that the correct suggestion will be early in
|
|
Packit |
2be50e |
the list.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
In the future I hope that ispell will be able to sort the suggestions
|
|
Packit |
2be50e |
it makes by commonness, at least for the most common words. That
|
|
Packit |
2be50e |
should not be too difficult to implement. Just load the most common
|
|
Packit |
2be50e |
words and their frequency indicator into memory, and do the nessesary
|
|
Packit |
2be50e |
lookups. Or use the external look program. Suggestions and
|
|
Packit |
2be50e |
implementations are most welcome!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
There is also a file flyspell.el around. This also offers
|
|
Packit |
2be50e |
spell-checking on the fly, and the interface is more like m$-word.
|
|
Packit |
2be50e |
Flyspell-mode highlights incorrect words, and you can even click on
|
|
Packit |
2be50e |
them to get suggestions for correct spelling. Being able to sort on
|
|
Packit |
2be50e |
commonness would make flyspell's auto-correction mode much more
|
|
Packit |
2be50e |
useful!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
USING ISPELL IN BATCH MODE
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
I find ispell's batch mode very useful. The command
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cat myfile.tex | ispell -l -d norsk | sort | uniq -c | sort -n -r -s
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
prints all words in myfile.tex that is not in the Norwegian
|
|
Packit |
2be50e |
dictionary, where the most common words comes first. Nice to spot
|
|
Packit |
2be50e |
errors, or as a starting point for a local dictionary.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
HYPHENATION IN TEX
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Two sets of hyphenation patterns for the Norwegian language are
|
|
Packit |
2be50e |
provided. The file norskb.tex hyphenates almost as TeX used to, and
|
|
Packit |
2be50e |
the file nohyphbc.tex only splits compound words.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
It is fairly easy to install the nohyphb.tex file. Just put it where
|
|
Packit |
2be50e |
TeX can find it, edit the file language.dat to point to the correct
|
|
Packit |
2be50e |
file, and remake the formats. If you use teTeX you just say texconfig
|
|
Packit |
2be50e |
init.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
If you want to install both sets of patterns, you have a TeX capacity
|
|
Packit |
2be50e |
problem. The variable ssup_tree_size needs to be bigger than 65535
|
|
Packit |
2be50e |
and trie_op_size bigger than 1501. I use 262142 and 3501. So you
|
|
Packit |
2be50e |
need to change tex.ch (and omega.ch) and recompile TeX. If you are
|
|
Packit |
2be50e |
using teTeX that should be quite easy. Here is a patch:
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
*** tex.ch~ Fri Jan 21 23:13:24 2000
|
|
Packit |
2be50e |
--- tex.ch Mon Jul 10 18:46:15 2000
|
|
Packit |
2be50e |
***************
|
|
Packit |
2be50e |
*** 196 ****
|
|
Packit |
2be50e |
! @d ssup_trie_size == 65535
|
|
Packit |
2be50e |
--- 196 ----
|
|
Packit |
2be50e |
! @d ssup_trie_size == 262143
|
|
Packit |
2be50e |
***************
|
|
Packit |
2be50e |
*** 215 ****
|
|
Packit |
2be50e |
! @!trie_op_size=1501; {space for ``opcodes'' in the hyphenation patterns;
|
|
Packit |
2be50e |
--- 215 ----
|
|
Packit |
2be50e |
! @!trie_op_size=3501; {space for ``opcodes'' in the hyphenation patterns;
|
|
Packit |
2be50e |
***************
|
|
Packit |
2be50e |
*** 217 ****
|
|
Packit |
2be50e |
! @!neg_trie_op_size=-1501; {for lower |trie_op_hash| array bound;
|
|
Packit |
2be50e |
--- 217 ----
|
|
Packit |
2be50e |
! @!neg_trie_op_size=-3501; {for lower |trie_op_hash| array bound;
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
*** omega.ch~ Thu Jul 13 11:37:08 2000
|
|
Packit |
2be50e |
--- omega.ch Sun Jul 23 20:38:03 2000
|
|
Packit |
2be50e |
***************
|
|
Packit |
2be50e |
*** 125,127 ****
|
|
Packit |
2be50e |
@d ssup_trie_opcode == 65535
|
|
Packit |
2be50e |
! @d ssup_trie_size == 100000
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
--- 125,127 ----
|
|
Packit |
2be50e |
@d ssup_trie_opcode == 65535
|
|
Packit |
2be50e |
! @d ssup_trie_size == 262143
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
***************
|
|
Packit |
2be50e |
*** 139,143 ****
|
|
Packit |
2be50e |
{Use |hash_offset=0| for compilers which cannot decrement pointers.}
|
|
Packit |
2be50e |
! @!trie_op_size=1501; {space for ``opcodes'' in the hyphenation patterns;
|
|
Packit |
2be50e |
best if relatively prime to 313, 361, and 1009.}
|
|
Packit |
2be50e |
! @!neg_trie_op_size=-1501; {for lower |trie_op_hash| array bound;
|
|
Packit |
2be50e |
must be equal to |-trie_op_size|.}
|
|
Packit |
2be50e |
--- 139,143 ----
|
|
Packit |
2be50e |
{Use |hash_offset=0| for compilers which cannot decrement pointers.}
|
|
Packit |
2be50e |
! @!trie_op_size=3501; {space for ``opcodes'' in the hyphenation patterns;
|
|
Packit |
2be50e |
best if relatively prime to 313, 361, and 1009.}
|
|
Packit |
2be50e |
! @!neg_trie_op_size=-3501; {for lower |trie_op_hash| array bound;
|
|
Packit |
2be50e |
must be equal to |-trie_op_size|.}
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
The easiest way to use the norskbc patterns is to define the macros
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
\def\goodhyphens{\lefthyphenmin2\righthyphenmin2\language=\l@norskc}
|
|
Packit |
2be50e |
\def\allhyphens{\lefthyphenmin1\righthyphenmin2\language=\l@norsk}
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
and change whenever you want to. A better solution might be to define
|
|
Packit |
2be50e |
norskc as another language in the Babel system anf use the Babel
|
|
Packit |
2be50e |
language switching system.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
MAKING IT PERFECT
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
So you have installed these great new patterns. But TeX still might
|
|
Packit |
2be50e |
fail on Norwegian words not in the dictionary, so if you don't feel
|
|
Packit |
2be50e |
particularly lucky you will have to do something about that too.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
There are two strategies. I tend to prefer the second one.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
1. Mark the compound point in the compound word with "-, e.g.
|
|
Packit |
2be50e |
administrasjons"-sjef"-stillings-"søker. If you have patched
|
|
Packit |
2be50e |
ispell, you can do this during spell-checking most of the time.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
2. Use the script inorsk-hyphenmaybe to print every word in your
|
|
Packit |
2be50e |
document not in the dictionary (nynorsk and bokmål) hyphenated by
|
|
Packit |
2be50e |
TeX. Then you can easily browse through this list and put the
|
|
Packit |
2be50e |
badly hyphenated words in a \hyphenation command. The next time
|
|
Packit |
2be50e |
you run the script it should produce correct hyphenation.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
For example if inorsk-hyphenmaybe outputs `kon-flik-t-akse' and
|
|
Packit |
2be50e |
`kon-flik-t-ak-sen' you have to say \hyphenation{kon-flikt-akse
|
|
Packit |
2be50e |
`kon-flikt-ak-sen'} in your TeX document.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
But we are not done with hyphenation yet. Have you ever considered
|
|
Packit |
2be50e |
the problem of hyphenating the word `villede' in TeX. Of course you
|
|
Packit |
2be50e |
have. The hyphenation should be `vill-lede', thus an extra `l' should
|
|
Packit |
2be50e |
be added.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Most languages which have such hyphenation (in particular German, with
|
|
Packit |
2be50e |
ss) support this in Babel. The convention is that you code villede as
|
|
Packit |
2be50e |
vi"llede. Of course the Norwegian dictionary supports this. Babel-3.7
|
|
Packit |
2be50e |
will also support this for Norwegian. Till then you can use the file
|
|
Packit |
2be50e |
norsk.cfg to get this functionality (and some special hyphen points in
|
|
Packit |
2be50e |
addition). The file itself offers more information.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
THE FUTURE OF HYPHENATION IN TEX
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
In standard TeX today it is not possible to say that one hyphen point
|
|
Packit |
2be50e |
is better than another, e.g. I like barnehage-assistent better than
|
|
Packit |
2be50e |
barne-hageassistent. In the future TeX will be able to handle
|
|
Packit |
2be50e |
multiple classes of hyphens and different penalties can be assigned to
|
|
Packit |
2be50e |
each class. Mathias Clasen has implemented this as a change file,
|
|
Packit |
2be50e |
but it has not made it into the standard distributions yet. The stuff
|
|
Packit |
2be50e |
at the end of the patterns/Makefile is about generating hyphenation
|
|
Packit |
2be50e |
patterns for such a TeX.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
LETS MAKE THE DICTIONARY EVEN BETTER!
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
In the future I would like to add more word categories to the
|
|
Packit |
2be50e |
dictionary. If you have a lot of text from within one field of
|
|
Packit |
2be50e |
knowledge, and would like to help, you can start by saying
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cat allmytextfiles | inorsk-hyphenmaybe -e -p norskbc > mywords
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
You should install the hyphenation patterns norskbc for Norwegian to
|
|
Packit |
2be50e |
get hyphenation only at compound points, and of course the full
|
|
Packit |
2be50e |
dictionary with no words filtered out.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
You will probably spot some new words, some of your own spelling
|
|
Packit |
2be50e |
errors and some hyphenation errors. Fix that file, add flags defined
|
|
Packit |
2be50e |
in the affix file etc.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Next you have to learn to use the munchlist program. Suppose you have
|
|
Packit |
2be50e |
the words in the file mywords
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
gjennom-strømnings-mekanisme
|
|
Packit |
2be50e |
gjennom-strømnings-mekanismen
|
|
Packit |
2be50e |
gjennom-strømnings-mekanismens
|
|
Packit |
2be50e |
gjennom-strømnings-mekanismer
|
|
Packit |
2be50e |
gjennom-strømnings-mekanismene
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
cat mywords \
|
|
Packit |
2be50e |
| tr '-' 'Î' \
|
|
Packit |
2be50e |
| munchlist -v -l norsk.aff.munch \
|
|
Packit |
2be50e |
| tr 'Î' '-'
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
the output should be
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
gjennom-strømnings-mekanisme/AEG
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
which represents these five words. (Of course this only work if
|
|
Packit |
2be50e |
ispell and munchlist is correctly installed.)
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
Here is some elisp stuff I have used (provided as is, probably very badly coded):
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(defun ispell-expand-affixes () (interactive)
|
|
Packit |
2be50e |
(shell-command-on-region (mark) (point) "sed -e \"s/[-0-9 :]//g\" | ispell -e -d norsk"))
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(defun ispell-collect-affixes () (interactive)
|
|
Packit |
2be50e |
(shell-command (concat
|
|
Packit |
2be50e |
"echo \"" (buffer-substring-no-properties (mark) (point))
|
|
Packit |
2be50e |
"\" | sed -e \"s/-/î/g\" -e \"s/[0-9 :]//g\" | "
|
|
Packit |
2be50e |
"munchlist -l norsk.aff.munch | sed -e \"s/î/-/g\" &")))
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(defun ispell-expand-line () (interactive)
|
|
Packit |
2be50e |
(save-excursion
|
|
Packit |
2be50e |
(beginning-of-line)
|
|
Packit |
2be50e |
(let ((beg (point)))
|
|
Packit |
2be50e |
(end-of-line)
|
|
Packit |
2be50e |
(let ((end (point))))
|
|
Packit |
2be50e |
(shell-command-on-region beg (point) "sed -e \"s/[-0-9 :]//g\" | ispell -d norsk -e"))))
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
; We have to quote the `' characters to protect them from shell
|
|
Packit |
2be50e |
; expansion.
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(defun current-line ()
|
|
Packit |
2be50e |
(save-excursion
|
|
Packit |
2be50e |
(beginning-of-line)
|
|
Packit |
2be50e |
(let ((beg (point)))
|
|
Packit |
2be50e |
(end-of-line)
|
|
Packit |
2be50e |
(let ((end (point)))
|
|
Packit |
2be50e |
(setq myvar (buffer-substring-no-properties beg end))
|
|
Packit |
2be50e |
(while (string-match " .*" myvar)
|
|
Packit |
2be50e |
(setq myvar (replace-match "" nil nil myvar)))
|
|
Packit |
2be50e |
(while (string-match "\\([^\\]\\)\\([`'\"]\\|\\\\$\\)" myvar)
|
|
Packit |
2be50e |
(setq myvar (replace-match "\\1\\\\\\2" nil nil myvar)))
|
|
Packit |
2be50e |
(while (string-match "[0-9 \t:.*]" myvar)
|
|
Packit |
2be50e |
(setq myvar (replace-match "" nil nil myvar)))
|
|
Packit |
2be50e |
myvar))))
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
(defun current-region ()
|
|
Packit |
2be50e |
(setq myvar (buffer-substring-no-properties (mark) (point)))
|
|
Packit |
2be50e |
(while (string-match "\\([^\\]\\)\\([`'\"]\\|\\\\$\\)" myvar)
|
|
Packit |
2be50e |
(setq myvar (replace-match "\\1\\\\\\2" nil nil myvar)))
|
|
Packit |
2be50e |
(while (string-match "[0-9 \t]" myvar)
|
|
Packit |
2be50e |
(setq myvar (replace-match "" nil nil myvar)))
|
|
Packit |
2be50e |
myvar)
|
|
Packit |
2be50e |
|
|
Packit |
2be50e |
|