Blob Blame History Raw
Suomi-malaga - Voikko edition
=============================

General information
-------------------

Suomi-malaga is a description of Finnish morphology written in Malaga
(http://home.arcor.de/bjoern-beutel/malaga/). You should use malaga
version 7.8 or later.

Currently Suomi-malaga is used in two different applications: text
indexer Sukija and spellchecker/hyphenator Voikko. Version 1.0 and
later will work with both applications. This release creates
Voikko morphology with version 2 dictionary format.

All of the documentation about Finnish morphology is in Finnish (see
README.fi and subdirectory doc). This README contains only build
and usage instructions for distribution packagers.


Build and installation
----------------------

Building Suomi-malaga for from this package requires malaga, python
and make. No configuration is required: to build the code for Voikko,
you only need to run
    make voikko
Installation can be done by running
    make voikko-install DESTDIR=/usr/lib/voikko
(Replace /usr/lib/voikko with the directory you want to install the
files to. Installing to ~/.voikko will cause libvoikko to use this
version of Suomi-malaga only for the user who does the installation.)
Building the code for Sukija can be done by running
    make sukija

Supported Make targets
----------------------

- sukija
  Builds the binary files needed by text indexer Sukija.
- voikko
  Builds the binary files needed by libvoikko.
- voikko-sukija
  Builds the binary files needed by libvoikko with a big
  dictionary that can be used by Sukija.
- voikko-install DESTDIR=/usr/lib/voikko
  Installs the binary files needed by libvoikko to the directory
  specified by DESTDIR. DESTDIR is optional and defaults to
  /usr/lib/voikko
- dist-gzip
  Builds the full source package.
- clean
  Removes all files generated by other targets.
- update-vocabulary
  Updates the XML vocabulary from the nightly snapshot at
  joukahainen.puimula.org. This target requires wget to
  be available.
- TAGS
  Builds an Emacs tag table file from the vocabulary database.


Variables for tuning the build process
--------------------------------------

- make voikko:
  * VOIKKO_BUILDDIR=path/to/directory
    Specifies the directory where build files are written to while building
    for Voikko.
    Default: voikko (build within source directory)
  * GENLEX_OPTS="--option1=xxx --option2=yyy ..."
    Sets options string for generate_lex.py.
    Available options for generate_lex.py are
    + --min-frequency=n
      Limits the words to be included in the .lex files to the
      specified or higher frequency class. Default is 9.
    + --extra-usage=usage1,usage2,...
      If a word has usage flags (it belongs to a special vocabulary), it is
      included in the vocabulary only if at least one of the usage flags is
      listed here. Available usage flags are listed in file
      vocabulary/flags.txt.
      Listing "sukija" here causes application specific exclusions to be ignored
      (words marked with not_voikko will also be included).
      By default, no special vocabularies are included.
    + --style=style1,style2,...
      If a word has style flags (such as old, foreign or dialect), it is
      included in the vocabulary only if all of the style flags are listed
      here. Available style flags are listed in file vocabulary/flags.txt.
      Default: old,international,inappropriate
    + --sourceid
      Insert word identifiers from Joukahainen to lexicon and return them
      during morphological analysis. This option has no effect unless
      VOIKKO_DEBUG=yes is set. By default source ids are not preserved.
  * EXTRA_LEX="path/to/file1.lex path/to/file2.lex ..."
    Adds extra malaga lexicon files to the vocabulary. By default, no extra
    lexicons are added.
  * VANHAHKOT_MUODOT=yes|no
    See voikko/doc/liput.txt. Default: yes
  * VANHAT_MUODOT=yes|no
    See voikko/doc/liput.txt. Default: no
  * SUKIJAN_MUODOT=yes|no
    Include words that exist just for Sukija. Default: no
  * VOIKKO_DEBUG=yes|no
    Include information that is not needed by libvoikko but may be needed
    for debugging or by external applications (full morphological analysis).
    Default: no
  * VOIKKO_VARIANT=variant
    Set the short name for the language variant of this vocabulary. The
    name should match the regular expression [a-z][a-z0-9_]*
    Default: standard
  * VOIKKO_DESCRIPTION="Description of the vocabulary"
    Set the long description for the language variant of this vocabulary.
  * SM_PATCHINFO="Information about applied patches"
    If you have modified the source code or are distributing prerelease
    versions, describe any modifications made to the released version here.
    It may be best to change this directly in the Makefile.


Copyright and license information
---------------------------------

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version. See file COPYING for details.

Copyright (©) 2006 - 2015 Hannu Väisänen (Email: Hannu.Vaisanen@uef.fi)
and 2006 - 2015 Harri Pitkänen (hatapitk@iki.fi). Contributors listed
in file CONTRIBUTORS hold copyrights to the vocabulary data.