Tree - source-git/malaga - CentOS Git server

source-git / malaga

Files

Commit: 777f8d98e54d8c8d6cf948850fc4d1d5449dcd60
Blob Blame History Raw
\input texinfo
@c %**start of header
@setfilename malaga.info
@settitle Malaga 7.12
@c %**end of header
@paragraphindent 0
@tex
\parindent 0cm
@end tex

@c Makeinfo 4.7 does not create umlauts properly when translating to
@c HTML, so we have to do it by hand.
@c In TeX, macros containing a command which must be on a line by
@c itself, such as a conditional, cannot be invoked in the middle of a line,
@c so we must handle umlauts for TeX separately.

@macro uuml
@html
&uuml;
@end html
@ifinfo
ue@c
@end ifinfo
@end macro

@macro ouml
@html
&ouml;
@end html
@ifinfo
oe@c
@end ifinfo
@end macro

@c Copyright (C) 1995 Bjoern Beutel.

@dircategory Malaga - a Natural Language Analysis System
@direntry
* Malaga: (malaga). A Grammar Development Environment for Natural Languages.
* malaga: (malaga)malaga. Analyse words/sentences using a Malaga grammar.
* mallex: (malaga)mallex. Run allomorph rules on lexicon entries.
* malmake: (malaga)malmake. Compile all files of a Malaga grammar.
* malrul: (malaga)malrul. Compile a rule file of a Malaga grammar.
* malsym: (malaga)malsym. Compile a symbol file of a Malaga grammar.
@end direntry

@c ----------------------------------------------------------------------------

@titlepage
@title Malaga 7.12
@subtitle User's and Programmer's Manual
@author 
Bj@"orn Beutel
@page
@vskip 0pt plus 1 filll
Copyright @copyright{} 1995 Bj@"orn Beutel.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation
approved by the Free Software Foundation.

@end titlepage

@c ----------------------------------------------------------------------------

@ifhtml
@node Top, Contents, (dir), (dir)
@end ifhtml
@ifinfo
@node Top, Introduction, (dir), (dir)
@majorheading Malaga 7.12
@end ifinfo

@ifnottex
This is the documentation for Malaga, a software package for the
development and application of grammars that are used for the analysis
of words and sentences of natural languages.

Copyright @copyright{} 1995 Bj@ouml{}rn Beutel.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual
into another language, under the above conditions for modified versions,
except that this permission notice may be stated in a translation
approved by the Free Software Foundation.
@end ifnottex

@menu
@ifhtml
* Contents::     Table of Contents.
@end ifhtml
* Introduction:: What is Malaga?
* Formalism::    The grammar formalism used by Malaga.
* The Programs:: Invoking @code{malaga} and its friends.
* Commands::     Interactive commands for @code{malaga} and @code{mallex}.
* Options::      Interactive options for @code{malaga} and @code{mallex}.
* The Language:: Definition of the Programming Language Malaga.
* Index::        The Index.
@end menu

@c ----------------------------------------------------------------------------

@ifnotinfo 
@node Contents, Introduction, Top, Top
@contents
@end ifnotinfo

@c ----------------------------------------------------------------------------

@ifnotinfo
@node Introduction, Formalism, Contents, Top
@end ifnotinfo
@ifinfo
@node Introduction, Formalism, Top, Top
@end ifinfo

@chapter Introduction
The Name ``Malaga'' is used with two different meanings: on the one
hand, it is the name of a special purpose programming language, namely a
language to implement grammars for natural languages. On the other hand,
it is the name of a program package for development of Malaga Grammars
and testing them by analysing words and sentences. ``Malaga'' is an
acronym for ``@b{M}erely @b{a} @b{L}eft-@b{A}ssociative @b{G}rammar
@b{A}pplication''.

@ifnottex
The program package ``Malaga'' has been developed by Bj@ouml{}rn Beutel.
@cindex Beutel, Bj@ouml{}rn
Gerald Sch@uuml{}ller
@cindex Sch@uuml{}ller, Gerald
@end ifnottex
@iftex
The program package ``Malaga'' has been developed by Bj@"orn Beutel.
@cindex Beutel, Bj@"orn
Gerald Sch@"uller
@cindex Sch@"uller, Gerald
@end iftex
has implemented parts of the debugger, parts of the
Emacs Malaga mode, and the original Tree and Variable output via TCL/Tk.

So far, morphology grammars for several natural languages have been
developed with Malaga, including the Albanian, Bulgarian, English,
Finnish, German, Italian, Korean and Spanish language.

@c ----------------------------------------------------------------------------

@node Formalism, The Programs, Introduction, Top
@chapter Malaga's Grammar Formalism
@cindex Formalism
@cindex LAG

A formal grammar for a natural language can be used to check whether a
sentence or a word form is grammatically well-formed (a word form is a
special inflectional form of a word, so ``book'' and ``books'' are two
different word forms of the word ``book''). Furthermore, a grammar can
describe the structure and meaning of a sentence or a word form by a
data structure that has been constructed during the analysis process.

Malaga is using a formalism that is derived of the Left-Associative
Grammar (LAG), which has been developed by Roland Hausser. An LAG
analyses a sentence (or a word form) step by step: 
its parts are concatenated from the left to the right, hence the name
``Left-Associative Grammar''. A single LAG rule can only join two
parts to a bigger one: it concatenates the state part (which is the
beginning of the sentence or word form that has already been analysed)
and the link part (which is the next word form or the next
allomorph). In contrast to LAG, Malaga's formalism already reads in
the first part of a word form or of a sentence by applying a rule.
Take a look at the following sentence:

@quotation
Shakespeare liked writing comedies.
@end quotation

The sentence is being analysed by five rule applications:

@quotation
``'' + ``Shakespeare'' @*
``Shakespeare'' + ``liked'' @*
``Shakespeare liked'' + ``writing'' @*
``Shakespeare liked writing'' + ``comedies'' @*
``Shakespeare liked writing comedies'' + ``.'' @*
@end quotation

To apply a rule it's not sufficient to know the spelling of a word or an
allomorph. A rule also requires morphological and syntactic information, such
as word class, gender, meaning of a suffix etc. This information, which is
associated with an element of an utterance, like a sentence, a word form or an
allomorph, is called its @dfn{feature structure}. The analysis of a sentence or
a word form returns such a feature structure as result.

Now let us take a closer look at how a sentence is analysed.

@enumerate
@item 
Before we can start to analyse a sentence, the analysis automaton must be
in an @dfn{initial state}. The initial state includes:

@itemize @bullet
@item 
a feature structure for that state, and
@item 
the @dfn{combination rule} checking whether it is allowed to start with a
specific word form. This rule also builds the feature structure of the 
successor state (whose surface consists of the first word form).
@end itemize

@item 
The next word form that is going to be added is read and analysed
morphologically.  If there is no valid word form, the analysis process
aborts.

@item 
The feature structure that morphology assigns to this word form is called the
link's feature structure. The feature structure of the input that has been
analysed syntactically so far is called the state's feature structure.

@item
The active combination rule checks whether it is allowed to combine the state's
surface (which may be empty if the rule is operating on the initial state) with
the link, i.e., the next word form. The combination rule takes the feature
structures of the state and of the link as parameters.  They can be compared by
logical tests, and finally the feature structure of the successor state (whose
surface includes the word form that has been read), is constructed by the rule.
The rule also specifies which @dfn{successor rule} is active in the successor
state. Execution then continues at step 2.

Instead of specifying a successor rule, a rule can also @emph{accept} the
analysed sentence. In that case, the feature structure of the successor state
will be used as the feature structure of the complete analysed sentence.
@end enumerate

Morphological analysis operates analogously, except that a word form,
composed from allomorphs, is being analysed. The link (step
2) is found in the allomorph lexicon.

This sketch is of course simplified. There can be ambiguities in an
analysis, induced by several causes:

@itemize @bullet
@item 
The initial state may contain several rules to analyse the first word
form or allomorph.
@item 
A rule may have multiple successor rules.
@item 
In morphology, the continuation of the input may match several trie entries.
@item 
In syntax analysis, the link may be assigned several feature structures
by morphology.
@end itemize

These ambiguities are coped with by dividing the analysis into several
subanalyses: if there are two lexicon entries for a word form, for example, the
analysis continues using the first entry (and its feature structure) as well as
the second one. You can compare this with a branching path. The analyses will
be continued independently of each other. So, one analysis path can accept the
input while the other fails. Each analysis path can divide repeatedly when
other ambiguities are met. If several analysis paths are continued until they
accept, the analysis process returns more than one result.

@c ----------------------------------------------------------------------------

@node The Programs, Commands, Formalism, Top
@chapter The Malaga Programs

The Malaga programs are all started in a similar manner: either you give
the name of a @dfn{project file} as argument (this is not possible if
you start @code{malrul} or @code{malsym}), or you give the name of the
files that are needed by the program (for @code{malmake} and
@code{malaga}, you have to give the project file as argument). The file
type is recognised by the file name ending.

Assume you've written a grammar that consists of a symbol file
@file{english.sym}, an allomorph rule file @file{english.all}, a lexicon
file @file{english.lex} and a morphology rule file @file{english.mor},
and you have also written a project file @file{english.pro}. You first
have to create binary files from these files:

@example
malmake english.pro
@end example

The source files must be in the Unicode UTF-8 format, which is also used for
input and output by the Malaga programs.

The binary files have the same name as their source counterparts, but have a
@file{_l} (for little endian processors like x86), a @file{_b}
(for big endian processors like HPPA) or a @file{_c} (for other architectures)
appended. Now you can start the program @code{malaga} by entering
the following command line: @code{malaga english.pro}.

The names of the grammar files will be read from the project file.

If you want to know about the command line arguments of a Malaga
program, you can get help by using the option @samp{-help} or
@samp{-h}, like @code{mallex -help}
@cindex @code{help} (command line option)

If you just want to know which version of a Malaga program you are using, you
can get the version number by using the option @samp{-version} or 
@samp{-v}, like @code{malrul -version}
@cindex @code{version} (command line option)

The program just emits a few lines with information about its version number
and about using and copying it.

@menu
* Projects::  Describing the parts of a Malaga grammar.
* Profiles::  Settings for @code{malaga}, @code{mallex} and @code{malshow}.
* malaga::    Analysing words and sentences.
* mallex::    Generating and debugging the allomorph lexicon.
* malmake::   Controlling the compilation of a Malaga grammar.
* malrul::    Compiling a Malaga rule file.
* malsym::    Compiling a Malaga symbol file.
@end menu

@c ----------------------------------------------------------------------------

@node Projects, Profiles, The Programs, The Programs
@section Projects
@cindex projects

A couple of files, taken together, form a Malaga grammar:

@table @asis
@item The @dfn{lexicon file} (@file{.lex})
A lexicon of base forms.

@item The @dfn{prelex file} (@file{.prelex}, optional)
A precompiled lexicon in binary format.

@item The @dfn{allomorph rule file} (@file{.all})
A file with rules which generate the allomorphs of the base forms.

@item The @dfn{morphology rule file} (@file{.mor})
A file with rules which combine allomorphs to word forms.

@item The @dfn{symbol file} (@file{.sym})
A file with the symbols that may be used in rules and feature structures.

@item The @dfn{syntax rule file} (@file{.syn}, optional)
A file with rules that combine word forms to sentences.

@item The @dfn{extended symbol file} (@file{.esym}, optional) 
A file with additional symbols that may only be used in a syntax rule file.

@end table

You can group these files together to a @dfn{project}. To do this, you
have to write a project file, with a name ending in @file{.pro}, in
which you list the names of the several files, each one behind a keyword
(each file type in a line on its own). Imagine you have written a
grammar that consists of the files @file{standard.sym},
@file{webster.lex}, @file{english.all}, @file{english.mor}, and
@file{english.syn}. The project file for this grammar will look like
this:

@example
sym: standard.sym
lex: webster.lex
all: english.all
mor: english.mor
syn: english.syn
@end example

In your source files, you can include further source files by using the 
@code{include} statement; so a binary file of your grammar may be dependent on
several source files. The program @code{malmake} uses the information in the
project file to check for dependencies between source files and binaries, so
the project file must contain the name of all source files for a specific
binary. Relative path names are always relative to the directory of
the project file.

Assume, you've got a lexicon file @file{webster.lex} that
looks like this:

@example
include "suffixes.lex";
include "verbs.lex";
include "adjectives.lex";
include "nouns.lex";
include "particles.lex";
include "abbreviations.lex";
include "names.lex";
include "numbers.lex";
@end example

In this case, you must write the names of all these files in the @samp{lex:}
line of your project file behind the name of the real lexicon file:

@example
lex: webster.lex suffixes.lex verbs.lex adjectives.lex
lex: nouns.lex particles.lex abbreviations.lex names.lex numbers.lex
@end example

Since there is a number of files in this example, the @samp{lex:} line has
been divided into two lines, each line starting with @samp{lex:}.

If you want to extend an existing project (for example, you might want to add a
syntax rule file to a morphology grammar), you can include the project file of
the morphology grammar in the project file of your syntax grammar by using a
line starting with @samp{include:}:

@example
include: /projects/grammars/english/english.pro
syn: english-syntax.syn
@end example

The file entries in the project file of the morphology are treated as
if they would replace the @samp{include:} line. Relative paths in the
included file are relative to the @emph{included} directory, not the
@emph{including} directory.

The programs @code{malaga} and @code{mallex} can set options like
@code{hidden} or @code{robust} from the project file, so you do not need
to set these options each time you start @code{malaga}. Each line in the
project file that starts with @samp{malaga:} and @samp{mallex:},
respectively, will be executed when @code{malaga} and @code{mallex},
respectively, has been started, but you may only use the @code{set}
command, so you can only set options in the project file. Here is an
example:

@example
  ...
malaga: set hidden +semantics
malaga: set robust-rule on
mallex: set hidden +semantics +syntax
  ...
@end example

When you start @code{malaga}, the commands @code{set hidden +semantics} and
@code{set robust-rule on} will be executed; when you start @code{mallex}, the
command @code{set hidden +semantics +syntax} will be executed.

Options in project files that are read in by @samp{include:} lines in other
project files will be executed as if they were in place of the 
@samp{include:} line.

Lines in project files that start with @samp{info:} contain information
about the grammar. In @code{malaga}, you get this information if you use the
command @code{info}. Example:

@example
info: =====================================
info: Deutsche Malaga Morphologie 3.0
info: written by Oliver Lorenz, 11.04.1997
info: =====================================
@end example

@cindex Hangul
The Korean writing system, Hangul, needs special treatment, because the
characters it uses are syllables that must be split up into individual letters
for morphological analysis. Such a conversion is built-in into malaga.
To activate this Hangul support, insert the following line into your
project file:

@example
split-hangul-syllables: yes
@end example
@cindex @code{split-hangul-syllables}

If Hangul support has been switch on, you may also enter Hangul text
in a Latin transcription that is based on the Yale transcription.
Transcribed text must be contained in curly brackets, and each
syllable must start with a dot, for example @samp{@{.cwess.ta@}}.
Malaga can also display Hangul text in Latin transcription if Hangul
support has been activated. This can be controlled by the option
@code{roman-hangul}.

When Malaga splits Hangul syllables, you must be aware that string
operations work with Hangul letters, even if you have entered
syllables in you grammar source code:
@itemize @bullet
@item 
In a pattern, a character class that contains a syllable will match
each single Hangul letter that is part of that syllable. Only use
single letters in character classes. If you want to select between
different syllables, use alternatives, separated by vertical bars. 
@item 
In a pattern, when a postfix operator (@samp{*}, @samp{?}, or
@samp{+}) follows a Hangul syllable, it will only operate on the last
letter of that syllable. If you want the operator to work on the whole
syllable, put the syllable in parentheses.
@item
The functions @code{substring} and @code{length} will count single
characters, not syllables.
@end itemize

@c ----------------------------------------------------------------------------

@node Profiles, malaga, Projects, The Programs
@section The Malaga Profiles @file{.malagarc} or @file{malaga.ini}
@cindex @code{.malagarc} (file)
@cindex @code{malaga.ini} (file)
@cindex profile

If you prefer some options that you want to use with every Malaga
project, you may create a personal profile. On POSIX systems, it is
located in your home directory and is called @file{.malagarc}. In
Microsoft Windows (NT based) systems, it is located in your user
profile directory and is called @file{malaga.ini}. In Microsoft
Windows (DOS based) systems, it is located in the root directory of
your system drive and is also called @file{malaga.ini}. You can enter
@code{malaga} and @code{mallex} options in the same manner as you do
in the project file:

@example
malaga: set display-cmd "malshow"
malaga: set use-display yes
mallex: set display-cmd "malshow"
mallex: set use-display yes
@end example

The settings in your personal profile override the settings in the
project file.

You can set some attributes of the graphical user interface
@code{malshow}, like the position, the size, and the font size of each
window that is opened by @code{malshow}. The following attributes are
available:
@table @code

@item *_geometry:
@cindex geometry
@cindex window geometry
Defines the size and/or position of a window. The ``*'' must be
replaced by the name of the window, which may be @code{allomorphs},
@code{path}, @code{result}, @code{tree}, @code{variables}, or
@code{expressions}. The attribute must be followed by an expression
like @samp{628x480+640+512}. The first two numbers (@samp{628x480})
define the width and the height of the window in pixels, the last two
numbers (@samp{+640+512}) define the position of its upper left
corner.

@item font:
@cindex font family
The attribute must be followed by the name of the font family to use.

@item font_size:
@cindex font size
The attribute must be followed by an integer font size, given in
points.  The available font sizes are 8, 10, 12, 14, 18, and 24
points.

@item show_indexes:
@cindex indexes, state
@cindex state indexes
The attribute must be followed by @code{yes} or @code{no}, which
determines whether state indexes are shown in the Tree and Path
windows.

@item hanging_style:
The attribute must be followed by @code{yes} or @code{no}, which
determines whether horizontally adjacent complex values are aligned at
their top lines (@dfn{hanging style}) or at their bottom lines
(@dfn{non-hanging style}).

@item inline_path:
The attribute must be followed by @code{yes} or @code{no}, which
determines whether the components of a state or a link in a path will
be arranged horizontally or vertically. For small feature-structures,
e.g. in formal grammars, horizontal arrangement is better readable,
while full-blown natural language grammar paths look better in
vertical arrangement.

@item show_tree:
A three-valued attribute that determines which states of a tree are
shown. Possible values are: @code{full}, @code{no_dead_ends} and
@code{result_paths}.

@end table

Here is an example which sets every option available:
@example
allomorphs_geometry: 628x480+640+0
path_geometry: 628x480+640+0
result_geometry: 628x480+640+0
tree_geometry: 628x480+640+512
variables_geometry: 628x480+640+512
expressions_geometry: 628x480+640+0

font: helvetica
font_size: 12

show_indexes: yes
hanging_style: yes
inline_path: yes
show_tree: no_dead_ends
@end example

@c ----------------------------------------------------------------------------

@node malaga, mallex, Profiles, The Programs
@section The Program @code{malaga}
@cindex @code{malaga} (program)

The program @code{malaga} is the user interface for analysing word forms and
sentences, displaying the results and finding bugs in a grammar. Start
@code{malaga} with the name of a project file as argument:

@example
malaga english.pro
@end example

When @code{malaga} has been started, it loads the symbol file, the lexicon file
and the morphology rule file, and the syntax rule file, if there is one. After
loading, the @dfn{prompt} appears. Then @code{malaga} is ready to execute your
commands:

@cartouche
@example
$ malaga english.pro
This is malaga, version 7.12.
Copyright (C) 1995 Bjoern Beutel.
This program is part of Malaga, a system for Natural Language Analysis.
You can distribute it under the terms of the GNU General Public License.
malaga> 
@end example
@end cartouche

You can now enter any @code{malaga} command. If you are not sure about
the name of a command, use the command @code{help} to get an overview of
all @code{malaga} commands.

If you want to quit @code{malaga}, enter the command @code{quit}.

You can use the following command line options when you start @code{malaga}:

@table @asis
@item @samp{-morphology} or @samp{-m} 
@cindex @code{morphology} (command line option)
Starts @code{malaga} in @dfn{morphology mode}. That is, word forms are
being read in from the standard input stream and analysed (one word form
per line). The analysis result is being written to the standard output
stream.

@item @samp{-syntax} or @samp{-s}
@cindex @code{syntax} (command line option)
Starts @code{malaga} in @dfn{syntax mode}. That is, sentences are being
read in from the standard input stream and analysed (one sentence per
line). The analysis result is being written to the standard output
stream.

@item @samp{-quoted} or @samp{-q}
@cindex @code{quoted} (command line option)
When @code{malaga} has been started in syntax or morphology mode, and the 
option @samp{-quoted} has been used, then each input line must be enclosed in 
double quotes which are removed prior to analysis. Within the double quotes 
there may be any combination of printable characters except the backslash 
@samp{\} and the double quotes. These characters must be preceded by a @samp{\}
(escape character). 

@item @samp{-input} or @samp{-i}
@cindex @code{input} (command line option)
Starts @code{malaga} in @dfn{argument analysis mode}. That is, the
argument following the @samp{-input} is being analysed. Either the
@samp{-morphology} or the @samp{-syntax} option must also be
given. The analysis result is being sent to the standard output stream
in a structured format.
@end table

@c ----------------------------------------------------------------------------

@node mallex, malmake, malaga, The Programs
@section The Program @code{mallex}
@cindex @code{mallex} (program)

By using @code{mallex}, you can make the allomorph rules process the entries of
a base form lexicon. 

You can start @code{mallex} either with the name of a project file or with the
names of the needed grammar files:

@example
mallex english.pro
@end example

or

@example
mallex english.sym english.all english.lex
@end example

If you are not using a project file, you must give

@itemize @bullet
@item 
the name of the symbol file (@file{.sym}),
@item 
the name of the allomorph rule file (@file{.all}), 
@item 
the name of the lexicon file (@file{.lex}, in batch mode), and
@item
the name of the prelex file (@file{.prelex}, in batch mode, optional).
@end itemize

Normally, @code{mallex} runs interactively: it loads the symbol file and the
allomorph rule file. Then the @dfn{prompt} appears:

@cartouche
@example
$ mallex english.pro
This is mallex, version 7.12.
Copyright (C) 1995 Bjoern Beutel.
This program is part of Malaga, a system for Natural Language Analysis.
You can distribute it under the terms of the GNU General Public License.
mallex> 
@end example
@end cartouche

You can now enter any @code{mallex} command. If you do not remember the command
names, you can use the command @code{help} to see an overview of the 
@code{mallex} commands.

If you want to quit @code{mallex}, enter the command @code{quit}.

If you have started @code{mallex} by using the option @samp{-binary}
or @samp{-b}, it creates the run time lexicon file from the base form
lexicon file and the optional prelex file. If the lexicons are very
big or the allomorph rules are very complex, this can take some
time. After creation, @code{mallex} exits.

If you have started @code{mallex} by using the option @samp{-prelex}
or @samp{-p}, it creates a precompiled lexicon file from the source
lexicon file and the optional prelex file and exits.

You can use the following command line options when you start
@code{mallex}: 

@table @asis
@item @samp{-binary} or @samp{-b}
@cindex @code{binary} (command line option)
Runs @code{mallex} in batch mode and creates the run-time lexicon.

@item @samp{-readable} or @samp{-r}
@cindex @code{readable} (command line option)
Runs @code{mallex} in batch mode and outputs the allomorph lexicon in
readable form on the standard output stream.

@item @samp{-prelex} or @samp{-p}
@cindex @code{prelex} (command line option)
Runs @code{mallex} in batch mode, but doesn't apply the allomorph filter yet.
Outputs the allomorph lexicon as a @file{.prelex} binary file.

@end table

@c ----------------------------------------------------------------------------

@node malmake, malrul, mallex, The Programs
@section The Program @code{malmake}
@cindex @code{malmake} (program)

The program @code{malmake} reads a project file, checks if all grammar
files needed do exist, and translates all grammar files that have not
yet been translated or whose source files have changed since they have
been translated. @code{malmake} itself calls the programs
@code{malsym}, @code{mallex} and @code{malrul} if needed. An example:
assume you have written a morphology grammar whose grammar files are
bundled in a project file @file{english.pro}:

@example
sym: rules/english.sym
all: rules/english.all
lex: rules/english.lex lex/adjectives.lex
lex: lex/particles.lex lex/suffixes.lex lex/verbs.lex
lex: lex/nouns.lex lex/abbreviations.lex lex/numbers.lex
mor: rules/english.mor
mallex: set hidden +semantics +syntax
malaga: set hidden +semantics
@end example

When executing @code{malmake dmm.pro} for the first time, the symbol file,
the rule files and the lexicon file will be translated:

@cartouche
@example
$ malmake dmm.pro
compiling "dmm.sym"
compiling "dmm.all"
compiling "dmm.mor"
compiling "dmm.lex"
project is up to date
$
@end example
@end cartouche

If you want all files to be recompiled on all accounts, use the option
@file{-new} or @file{-n}.

The translation of a big lexicon can take some minutes, since the allomorph
rules have to be executed for each lexicon entry.

@c ----------------------------------------------------------------------------

@node malrul, malsym, malmake, The Programs
@section The Program @code{malrul}

The program @code{malrul} translates Malaga rule files, i.e.@ files that
have the endings @file{.all}, @file{.mor} or @file{.syn}. The compiled
file gets the suffix @file{_l}, @file{_b}, or @file{_c}, depending on the 
endianness of your processor.  Give the following arguments if you are starting
@code{malrul}:

@itemize @bullet
@item 
the name of the rule file that is to be translated, and
@item 
the name of the associated symbol file 
(@file{.sym} or @file{.esym}).
@end itemize

The order of the arguments is arbitrary. Here is an example:

@example
malrul english.mor english.sym
@end example

@c ----------------------------------------------------------------------------

@node malsym,  , malrul, The Programs
@section The Program @code{malsym}

@code{malsym} can translate Malaga symbol files, i.e.@ files having the
ending @file{.sym} or @file{.esym}. The translated file gets the suffix
@file{_l}, @file{_b}, or @file{_c}, depending on the endianness of your
processor.

For example:

@example
malsym english.sym
@end example

If you are translating an extended symbol file with the ending
@file{.esym}, enter the name of the compiled symbol file after the command
line option @file{-use} or @file{-u}:

@example
malsym english.esym -use english.sym
@end example

This argument is needed since extended symbol files are extensions of ordinary
symbol files.

If you use the command line option @samp{-split-hangul-syllables} when
starting @code{malsym}, the symbol file and all the Malaga files that
use it will split up Hangul syllables in individual letters
internally. This option is invoked by @code{malmake} if the project
file contains the line @samp{split-hangul-syllables: yes}.

@c ----------------------------------------------------------------------------

@node Commands, Options, The Programs, Top
@chapter The Commands of @code{malaga} and @code{mallex}
@cindex commands

Since the user interfaces of @code{malaga} and @code{mallex} are very
similar and since they have a bunch of commands in common, I will
describe them in a common chapter. Commands that can be used in
@code{malaga} or in @code{mallex} only, are marked by the name of the
program in which they can be used.

@menu
* backtrace::      Show where rule execution has stopped.
* break::          Add a new breakpoint.
* clear-cache::    Clear the word cache.
* continue::       Continue execution up to next breakpoint.
* debug-ga::       Debug Generating Allomorphs.
* debug-ga-file::  Debug Generating Allomorphs from a file.
* debug-ga-line::  Debug Generating Allomorphs from a single line in a file.
* debug-ma::       Debug Morphology Analysis.
* debug-ma-line::  Debug Morphology Analysis of a line in a file.
* debug-sa::       Debug Syntax Analysis.
* debug-sa-line::  Debug Syntax Analysis of a line in a file.
* debug-state::    Debug rule execution at an analysis state.
* delete::         Delete breakpoints.
* down::           Show code position and variables in calling rule.
* finish::         Continue execution up to return or path termination.
* frame::          Show code position and variables of a frame.
* ga::             Generate Allomorphs.
* ga-file::        Generate Allomorphs from a file.
* ga-line::        Generate Allomorphs from a single line in a file.
* get::            Get current values of options.
* help::           Get help about commands and options.
* info::           Get info about current grammar.
* list::           List current breakpoints.
* ma::             Analyse a word.
* ma-file::        Analyse words in a file.
* ma-line::        Analyse a word at line in a file.
* mg::             Generate words from allomorphs.
* next::           Continue execution up to next line, skip subrules.
* print::          Display a variable or constant or a part of it.
* quit::           Quit @code{malaga} or @code{mallex}.
* read-constants:: Read constant definitions in lexicon file.
* result::         Show results.
* run::            Continue execution up to the end.
* sa::             Analyse a sentence.
* sa-file::        Analyse sentences in a file.
* sa-line::        Analyse a sentence at a line in a file.
* set::            Set values of options.
* sg::             Generate sentences from words.
* step::           Continue execution up to next line, enter subrules.
* transmit::       Send value to transmit process and display answer.
* tree::           Display analysis tree.
* up::             Show code position and variables in called rule.
* variables::      Display current variables.
* walk::           Execute until next rule.
* where::          Show current analysis state.
@end menu

@c ----------------------------------------------------------------------------

@node backtrace, break, Commands, Commands
@section The Command @code{backtrace}
@cindex @code{backtrace} (command)

If you are executing your rules in debug mode or the rules were interrupted
by an error, this command shows where rule execution currently stopped. If it
stopped in a subrule, all calling rules are also shown. The currently examined
rule is marked with a @samp{*}:

@cartouche
@example
debug> backtrace
*2: "dmm.mor", line 1218, rule "deletePOS"
 1: "dmm.mor", line 31, rule "Start"
debug>
@end example
@end cartouche

This means, rule execution stopped in frame 2, line 1218 of @file{dmm.mor},
in rule @code{deletePOS}. This subrule was called from frame 1, line 31 in
@file{dmm.mor}, in rule @code{Start}.

@c ----------------------------------------------------------------------------

@node break, clear-cache, backtrace, Commands
@section The Command @code{break}
@cindex @code{break} (command)
@cindex breakpoints

If you want to stop the rules at a specific point, for example to take a look
at the variables, you can use the command @code{break} to set
@dfn{breakpoints}. A breakpoint is a point in the rule source text where rule
execution is interrupted, so you can enter commands in debug mode. Breakpoints
are only active in debug mode, this means you have started rule execution by a
debug command or you have continued rule execution by one of the
commands @code{step}, @code{next}, @code{walk}, or @code{continue}.

Behind the command name, @code{break}, you can give one of the following
arguments:

@table @asis
@item A line number.
A breakpoint is set at this line in the current source file. If there is
no statement starting at this line, the breakpoint will be set at the
nearest line where a statement starts. You can, for example, set a
breakpoint at line 245 in the current source file by entering the
command

@example 
break 245
@end example

@item A file name and a line number.
A breakpoint is set at this line in this file. If there is no statement
starting at this line, the breakpoint will be set at the nearest line
where a statement starts. An example:

@example
break english.syn 59
@end example

@item A rule name.
A breakpoint is set at the first statement in this rule. An example:

@example
break final_rule
@end example
@end table

If the rule name or the file name is ambiguous, you can insert an abbreviation
for the rule system you refer to. Put it in front of the rule name or the file
name. The following abbreviations are used:

@itemize @bullet
@item
@samp{all} for allomorph rules,
@item
@samp{mor} for morphology rules,
@item
@samp{syn} for syntax rules,
@end itemize

If you omit any argument, the breakpoint is set on the current line in the
current file (this is helpful in debug mode).

Every breakpoint gets a unique number once it has been set, so you can delete
it later, when you do not need it any longer.

You can list the breakpoints using the command @code{list} and delete
them using @code{delete}.

@c ----------------------------------------------------------------------------

@node clear-cache, continue, break, Commands
@section The Command @code{clear-cache} (@code{malaga})
@cindex @code{clear-cache} (@code{malaga} command)

If you have changed your settings so that the wordform cache is no longer
valid, you can clear the cache using @code{clear-cache}. This can be necessary
if you have turned on/off input or output filters or modified switches.

@c ----------------------------------------------------------------------------

@node continue, debug-ga, clear-cache, Commands
@section The Command @code{continue}
@cindex @code{continue} (command)

This command can only be executed in debug mode. It resumes rule execution and
may be followed by:

@table @emph
@item Nothing.
Rule execution is continued until a breakpoint is met or the rules have
been executed completely.

@item A line number.
Rule execution is continued until a breakpoint is met, the rules have
been executed completely or the given line in the current source file is
met. If there is no statement starting at this line, execution will be
stopped at the nearest line where a statement starts. You can, for
example, continue execution until line 245 in the current source file is
met by entering the command

@example
continue 245
@end example

@item A file name and a line number. 
Rule execution is continued until a breakpoint is met, the rules have
been executed completely or the given line in the given file is met. If
there is no statement starting at this line, execution will be stopped
at the nearest line where a statement starts. An example:

@example
continue english.syn 59
@end example

@item A rule name.
Rule execution is continued until a breakpoint is met, the rules have
been executed completely or the first statement of the given rule is
met. An example:

@example     
continue final_rule
@end example

@item A comparison.
The comparison must be of the form @code{@var{variable} = @var{value}},
where @var{variable} may be any variable name, maybe followed by a path,
and @var{value} may be any Malaga value. Rule execution is continued
until a breakpoint is met, the rules have been executed completely or
until @var{variable} is defined and its value is @var{value}.
@end table

@c ----------------------------------------------------------------------------

@node debug-ga, debug-ga-file, continue, Commands
@section The Command @code{debug-ga} (@code{mallex})
@cindex @code{debug-ga} (@code{mallex} command)

Use @code{debug-ga} to find errors in your allomorph rules. This command
works like @code{ga}, but the allomorph generation will be stopped before the
first statement of the first rule is executed:

@cartouche
@example
mallex> debug-ga [surface: "john", class: name]
at rule "irregular_verb"
debug> 
@end example
@end cartouche

The prompt @samp{debug>} that appears instead of @samp{mallex>} indicates
that @code{mallex} is currently executing the allomorph rules but has been
interrupted. Since this ability has been developed to support the 
@emph{debugging} of Malaga rules, this mode is called @dfn{debug mode}.

When @code{mallex} arrives at the start of a new rule in debug mode (as in the
example above), the name of this rule is displayed. When in debug mode, you can
always get the name of the current rule using the command @code{rule}.

If you're running @code{mallex} from Emacs, another Emacs window will display
the source file. An arrow is used to show to the statement that will be
executed next.

@cartouche
@example
  ...
allo_rule irregular_verb ($entry):
=>? $entry.class = verb;
  ...
@end example
@end cartouche

In debug mode, you can, for example, get the variables that are
currently defined (using @code{variable} or @code{print}), and you can
execute statements (using @code{step}, @code{next}, @code{walk},
@code{continue}, or @code{run}). If you want to quit the debug mode,
just enter @code{run}. The remaining statements for generation will then
be executed without interruption.

@c ----------------------------------------------------------------------------

@node debug-ga-file, debug-ga-line, debug-ga, Commands
@section The Command @code{debug-ga-file} (@code{mallex})
@cindex @code{debug-ga-file} (@code{mallex} command)

Use the command @code{debug-ga-file} to make the allomorph rules work on
a lexicon file in debug mode. Assume you have written a lexicon file
@file{mini.lex}:

@example
[surface: "m@{a@}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
@end example

To let the rules process this lexicon in debug mode, enter:

@example
debug-ga-file mini.lex
@end example

@c ----------------------------------------------------------------------------

@node debug-ga-line, debug-ma, debug-ga-file, Commands
@section The Command @code{debug-ga-line} (@code{mallex})
@cindex @code{debug-ga-line} (@code{mallex} command)

Use the command @code{debug-ga-line} to make the allomorph rules generate
allomorphs for a single lexicon entry in debug mode. Assume you want to test
the second line in the lexicon file @file{mini.lex}:

@example
[surface: "m@{a@}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
@end example

Enter the following line:

@example
debug-ga-line mini.lex 2
@end example

Then @code{mallex} stops in debug mode at the entry of the first allomorph rule
that is being executed for the lexicon entry 

@example
[surface: "table", class:noun];
@end example

If there is no lexicon entry at this line, the subsequent lexicon entry will be
taken.

@c ----------------------------------------------------------------------------

@node debug-ma, debug-ma-line, debug-ga-line, Commands
@section The Command @code{debug-ma} (@code{malaga})
@cindex @code{debug-ma} (@code{malaga} command)

Use the command @code{debug-ma} to find errors in your morphology combination
rules. This command analyses the rest of the command line morphologically and
executes the morphology combination rules in debug mode. Debug mode is
explained for the command @code{debug-ga}.

@c ----------------------------------------------------------------------------

@node debug-ma-line, debug-sa, debug-ma, Commands
@section The Command @code{debug-ma-line} (@code{malaga})
@cindex @code{debug-ma-line} (@code{malaga} command)

Use the command @code{debug-ma-line} to find errors in your morphology
combination rules. This command analyses the rest of the command line
morphologically and executes the morphology combination rules in debug mode.
Debug mode is explained for the command @code{debug-ga}.

@c ----------------------------------------------------------------------------

@node debug-sa, debug-sa-line, debug-ma-line, Commands
@section The Command @code{debug-sa} (@code{malaga})
@cindex @code{debug-sa} (@code{malaga} command)

Use the command @code{debug-sa} to find errors in your syntax combination
rules. This command analyses the rest of the command line syntactically and
executes the syntax combination rules in debug mode. Debug mode is explained
for the command @code{debug-ga}.

@c ----------------------------------------------------------------------------

@node debug-sa-line, debug-state, debug-sa, Commands
@section The Command @code{debug-sa-line} (@code{malaga})
@cindex @code{debug-sa-line} (@code{malaga} command)

Use the command @code{debug-sa-line} to find errors in your syntax
combination rules. This command analyses the rest of the command line
morphologically and executes the morphology combination rules in debug mode.
Debug mode is explained for the command @code{debug-ga}.

@c ----------------------------------------------------------------------------

@node debug-state, delete, debug-sa-line, Commands
@section The Command @code{debug-state} (@code{malaga})
@cindex @code{debug-state} (@code{malaga} command)

Use the command @code{debug-state} to execute the successor rules of a
specific LAG state in debug mode. Previously, you must have already
analysed a word or a sentence, respectively.  Let malaga display the
analysis tree by entering @code{tree}, move the mouse pointer over the
state you want to debug, and press the left mouse button. A window
opens in which this state's feature structure is shown.  The window's title
line contains the index of the state. Use this number as argument for
@code{debug-state}. The last analysis input will be analysed again,
and analysis stops when reaching the first successor rule of the
specified state and malaga switches to debug mode. Debug mode is
explained for the command @code{debug-ga}.

@c ----------------------------------------------------------------------------

@node delete, down, debug-state, Commands
@section The Command @code{delete}
@cindex @code{delete} (command)

If you want to delete a breakpoint, use the command @code{delete} with the
number of the breakpoints as argument.

Enter @samp{delete all} to delete all breakpoints.

@c ----------------------------------------------------------------------------

@node down, finish, delete, Commands
@section The Command @code{down}
@cindex @code{down} (command)

If you want to look at the source and the variables of the (sub)rule that is
currently being called by the current subrule, you can do this by entering
@code{down}. You can list the frames via @code{backtrace}.

@c ----------------------------------------------------------------------------

@node finish, frame, down, Commands
@section The Command @code{finish}
@cindex @code{finish} (command)

This command can only be executed in debug mode. The rule execution will be
resumed and continues until a @code{return} statement is met or until
the current rule path will be terminated.

@c ----------------------------------------------------------------------------

@node frame, ga, finish, Commands
@section The Command @code{frame}
@cindex @code{frame} (command)

If you want to look at the source and the variables of a (sub)rule that has
called the current subrule, directly or indirectly, you can do this by typing
@code{frame} and the number of the frame you want to examine. You can list the
frames via @code{backtrace}.

@c ----------------------------------------------------------------------------

@node ga, ga-file, frame, Commands
@section The Command @code{ga} (@code{mallex})
@cindex @code{ga} (@code{mallex} command)

Use the command @code{ga} (short for @emph{generate allomorphs}) to
generate allomorphs. This is useful for testing allomorph generation
from within @code{mallex}. When you enter the command, give a lexicon
entry as argument. All allomorphs that are generated from this entry by
the allomorph rules, are displayed on screen. For example:

@cartouche
@example
mallex> ga [Lemma: "!", POS: Punctuation, Type: ExclamationMark]
"!": [POS: <Punctuation>,
      Punctuation: <[Allomorph: "!",
                     BaseForm: "!",
                     concatStem: no,
                     concatSx: no,
                     POS: Punctuation,
                     Type: ExclamationMark,
                     terminal: yes]>,
      Surface: "!"]
mallex>
@end example
@end cartouche

If the rules create multiple allomorphs from an entry, they are displayed one
after another.

@c ----------------------------------------------------------------------------

@node ga-file, ga-line, ga, Commands
@section The Command @code{ga-file} (@code{mallex})
@cindex @code{ga-file} (@code{mallex} command)

Use the command @code{ga-file} to make the allomorph rules generate allomorphs
for a lexicon file. Assume you have written a lexicon file @file{mini.lex}:

@example
[surface: "m@{a@}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
@end example

To generate the allomorphs for this lexicon, enter @samp{ga-file mini.lex}.

This will produce a readable allomorph file whose name ends in
@file{.out}; for @file{mini.lex} its name will be @file{mini.lex.out}:

@example
"man": [class: noun, syn: singular]
"men": [class: noun, syn: plural]
"table": [class: noun]
"wise": [class: adjective, restr: complete]
"wis": [class: adjective, restr: inflect]
@end example

@c ----------------------------------------------------------------------------

@node ga-line, get, ga-file, Commands
@section The Command @code{ga-line} (@code{mallex})
@cindex @code{ga-line} (@code{mallex} command)

Use the command @code{ga-line} to make the allomorph rules generate
allomorphs for a single lexicon entry. Assume you want to test
the second line in the lexicon file @file{mini.lex}:

@example
[surface: "m@{a@}n", class: noun];
[surface: "table", class: noun];
[surface: "wise", class: adjective];
@end example

Enter the following line:

@example
ga-line mini.lex 2
@end example

Then @code{mallex} generates allomorphs for 
@code{[surface: "table", class:noun];}.

If there is no lexicon entry at this line, the subsequent lexicon entry will be
taken.

@c ----------------------------------------------------------------------------

@node get, help, ga-line, Commands
@section The Command @code{get}
@cindex @code{get} (command)

This command is used to query settings of @code{malaga} or
@code{mallex}. Enter it together with the name of the option whose
setting you want to know. The possible options are described in the next
chapter. If you just enter @samp{get}, all settings will be shown.

@c ----------------------------------------------------------------------------

@node help, info, get, Commands
@section The Command @code{help}
@cindex @code{help} (command)

Use this command to get a list of the commands you can use. If you give the
name of a command or an option as argument, a short explanation of this item
will be displayed. If a name represents a command as well as an option, prepend
@samp{command} or @samp{option} to it.

@c ----------------------------------------------------------------------------

@node info, list, help, Commands
@section The Command @code{info} (@code{malaga})
@cindex @code{info} (@code{malaga} command)

This command gives you information about the grammar you are using. It
takes no argument.

@c ----------------------------------------------------------------------------

@node list, ma, info, Commands
@section The Command @code{list}
@cindex @code{list} (command)

If you enter the command @code{list}, all breakpoints are listed. For each
breakpoint, its number, the name of the source file and the source line is
shown.

@c ----------------------------------------------------------------------------

@node ma, ma-file, list, Commands
@section The Command @code{ma} (@code{malaga})
@cindex @code{ma} (@code{malaga} command)

The command @code{ma} (for @emph{morphological analysis}) starts a word form
analysis. Give the word form that you want to be analysed as argument:

@example
ma house
@end example

Malaga will show the results automatically, and it will also show the
analysis tree automatically if you specified it using the
@code{auto-tree} option. You can look at the results using
@code{result} or at the entire analysis tree using @code{tree}.

If you do not enter a word form behind the command @code{ma}, @code{malaga}
re-analyses the last input.

@c ----------------------------------------------------------------------------

@node ma-file, ma-line, ma, Commands
@section The Command @code{ma-file} (@code{malaga})
@cindex @code{ma-file} (@code{malaga} command)

The command @code{ma-file} can be used to analyse files that contain
word lists. A word list consists of a number of word forms, each word
form on a line on its own. There may be empty lines in a word list. The
following example is a word list called @file{word-list}:

@example
table
men's
blue
handicap
@end example

To analyse this word list, enter:

@example
ma-file word-list result
@end example

This will produce a file @file{result} that contains the analysis
results. If the second argument is missing, the result will be written
to a file whose name ends in @file{.out}; for @file{word-list}, its name
will be @file{word-list.out}:

@example
1: "table": [class: noun, ...]
2: "men's": [class: noun, ...]
3: "blue": [class: noun, ...]
3: "blue": [class: adjective, ...]
3: "blue": [class: name, ...]
4: "handicap: unknown
@end example

The number at the line start represents the line number of the analysed
original word form. The output format can be changed by using the options
@code{result-format} and @code{unknown-format}.

If a runtime error occurs during the analysis of a word, the line will be
output in the format given by the option @code{error-format}.

After the analysis, some statistics will be displayed:

@itemize @bullet
@item The number of analysed word forms.
@item The number of recognised word forms.
@item The number of word forms recognised by combi-rules and end-rules.
@item The number of word forms recognised by robust-rules.
@item The number of word forms whose analyses produced errors.
@item The average number of results per word form.
@item The analysis run time.
@item The average number of word forms that have been analysed per second.
@item The number of cache accesses.
@item The number of cache hits.
@end itemize

@c ----------------------------------------------------------------------------

@node ma-line, mg, ma-file, Commands
@section The Command @code{ma-line} (@code{malaga})
@cindex @code{ma-line} (@code{malaga} command)

You can use this command to analyse a single line in a text file
morphologically. Assume you want to analyse the word in the third line in the
file @file{words}. Then enter the following command:

@example
ma-line words 3
@end example

Malaga will show the results automatically, and it will also show the
analysis tree automatically if you specified it using the
@code{auto-tree} option. You can look at the results using @code{result}
or at the entire analysis tree using @code{tree}.

@c ----------------------------------------------------------------------------

@node mg, next, ma-line, Commands
@section The Command @code{mg} (@code{malaga})
@cindex @code{mg} (@code{malaga} command)

Use the command @code{mg} to generate all word forms that consist of a
specified set of allomorphs. For example, the command

@example
mg 3 un able believe
@end example

This generates all word forms that consist of up to three allomorphs,
where only the specified allomorphs (@samp{un}, @samp{able}, and
@samp{believe}) are used. The word forms are numbered from 1 onward, but
different analyses of the same word form get the same index. The output
will look like this:

@cartouche
@example
malaga> mg 3 un able believe
1: "able"
2: "believe"
3: "unable"
4: "unbelieveable"
malaga>
@end example
@end cartouche

Please note that generation does not know of filters, pruning rules and
default rules.

@c ----------------------------------------------------------------------------

@node next, print, mg, Commands
@section The Command @code{next}
@cindex @code{next} (command)

This command can only be executed in debug mode. The rule execution
will be resumed and continues until a different source line is met, a
different path is going to be executed since the old one has
terminated, or until the rules have been executed completely. It is
like @code{step}, but subrules will be executed without
interruption. If you specify a number as argument, the command will be
repeated as often as specified.

@c ----------------------------------------------------------------------------

@node print, quit, next, Commands
@section The Command @code{print}
@cindex @code{print} (command)

The command @code{print} is used to display the current values of Malaga
variables or named constants, or parts of them. You can specify any
variable or constant names (including the @samp{$} or @samp{@@}) as
arguments to this command; you can also specify a path of attributes
and/or indexes (with suffix @samp{L} or @samp{R}) behind each of the
variable or constant names. In that case, only the values of the
specified paths are displayed:

@cartouche
@example
debug> print $word
$word = [class: pronoun, 
         result: S2]
debug> print $word.class
$word.class = pronoun
debug> print @@plan.1L.name
$plan.1L = declarative
debug>
@end example
@end cartouche

If the option @code{use-display} is on and @code{malshow} is used as
@code{display-cmd}, the expressions will be displayed in window on
their own. If the @code{Expressions} window is not open yet, it will
open now. If there is an open @code{Expressions} window, the new
expressions and their values will be displayed in this window.

You can left-click on an expression to make its value disappear or appear
again. You can middle-click or right-click on an expression to erase
it.

The @code{Expressions} window has a menu with some commands:

@table @code
@item Window
@table @code
@item Export Postscript...
Export the displayed expressions as an Embedded Postscript
file. Currently, only ASCII, Latin-1 Supplement, Hangul Compatibility
Jamo and Hangul Syllables can be converted to Postscript.
@item Close
Close the @code{Expressions} window.
@end table
@item Style
@table @code
@item Font Size ...
Select an item to adjust the font size.
@item Hanging
Normally, all values and subvalues are aligned at their bottom. If
this option is active, records are ``hanging down'': they are
aligned at their top.
@end table
@item Expressions
@table @code
@item Clear All
Clear all expressions.
@item Show All
Display the values of all expressions currently displayed.
@item Hide All
Suppress the values of all expressions currently displayed.
@end table
@end table

@c ----------------------------------------------------------------------------

@node quit, read-constants, print, Commands
@section The Command @code{quit}
@cindex @code{quit} (command)

Use this command to leave @code{malaga} or @code{mallex}.

@c ----------------------------------------------------------------------------

@node read-constants, result, quit, Commands
@section The Command @code{read-constants} (@code{mallex})
@cindex @code{read-constants} (@code{mallex} command)

If you want to parse lexicon entries that use Malaga constants (prefixed by
@samp{@@}), these constants can be read in using the command 
@samp{read-constants @var{lexicon-file}}. It parses @var{lexicon-file} and
memorizes all constant definitions in it.

@c ----------------------------------------------------------------------------

@node result, run, read-constants, Commands
@section The Command @code{result}
@cindex @code{result} (command)

If you have previously analysed a word form or a sentence using
@code{ma}, @code{ma-line}, @code{sa}, or @code{sa-line} (in
@code{malaga}), or you have generated allomorphs using @code{ga} or
@code{ga-line} (in @code{mallex}), you can display the results with
@code{result}.

@itemize @asis
@item @code{use-display} is off:
The results will be sent to standard output.

@item @code{use-display} is on and @code{malshow} is used as @code{display-cmd}:
The results will show in a window on their own which is called
@code{Results} for @code{malaga} and @code{Allomorphs} for
@code{mallex}. They are numbered from 1 onward.

If you are executing the command @code{result} for the first time, or if
you have closed a @code{Results/Allomorphs} window that you'd opened
before, a window will open, displaying the values of all
results/allomorphs of the last analysis/generation.

If there is a @code{Results/Allomorphs} window currently opened, the new
results/allomorphs will be displayed in this window.
@end itemize

The @code{Result/Allomorphs} window has a menu with some commands:

@table @code
@item Window
@table @code
@item Export Postscript...
Export the displayed results as an Embedded Postscript file.
Currently, only ASCII, Latin-1 Supplement, Hangul Compatibility Jamo
and Hangul Syllables can be converted to Postscript.
@item Close
Close the @code{Result/Allomorphs} window.
@end table
@item Style
@table @code
@item Font Size ...
Select an item to adjust the font size.
@item Hanging
Normally, all values and subvalues are aligned at their bottom. If this
option is active, records are ``hanging down'': they are aligned at
their top.
@end table
@end table

@c ----------------------------------------------------------------------------

@node run, sa, result, Commands
@section The Command @code{run}
@cindex @code{run} (command)

This command can only be used in debug mode. The rule execution will be
resumed, and the rules will be executed completely without any interruption.

If you have invoked debug mode by the command @code{debug-node}, rule
execution will be stopped again when another link is going to be analysed.

@c ----------------------------------------------------------------------------

@node sa, sa-file, run, Commands
@section The Command @code{sa} (@code{malaga})
@cindex @code{sa} (@code{malaga} command)

If you have started @code{malaga} with a syntax file in your command line or in
the project file, you can start syntactic analyses using the command @code{sa}
(short for @emph{syntactic analysis}). Put the sentence you want to be
analysed as argument behind the command name:

@example
sa The man is in town.
@end example

Malaga will show the results automatically, and it will also show the analysis
tree automatically if you specified it using the @code{tree} option. You can
look at the results using @code{result} or at the entire analysis tree using
@code{tree}.

If you do not enter a sentence behind the command @code{sa}, @code{malaga}
re-analyses the last input.

@c ----------------------------------------------------------------------------

@node sa-file, sa-line, sa, Commands
@section The Command @code{sa-file} (@code{malaga})
@cindex @code{sa-file} (@code{malaga} command)

Using the command @code{sa-file}, you can analyse files that contain
sentence lists. In a sentence list, each sentence stands in a line on
its own; empty lines are permitted. Here is an example, a sentence list
named @file{sentence-list}:

@example
He sleeps.
He slept.
He has slept.
He had slept.
@end example

To analyse this sentence list, enter:

@example
sa-file sentence-list result
@end example

This will produce a file @file{result} that contains the analysis
results. If the second argument is missing, the result will be written
to a file whose name ends in @file{.out}; for @file{sentence-list}, its
name will be @file{sentence-list.out}.

@example
1: "He sleeps.": [functor: [syn: <S3>, sem: <"sleep">]]
2: "He slept.": [functor: [syn: <S3>, sem: <"sleep">]]
3: "He has slept.": [functor: [syn: <S3>, sem: <"have", "sleep">]]
4: "He had slept.": [functor: [syn: <S3>, sem: <"have", "sleep">]]
@end example

The number at the line start represents the line number of the analysed
original sentence. The output format can be changed by using the options
@code{result-format} and @code{unknown-format}.

If a runtime error occurs during the analysis of a sentence, the
line's output will be in the format given by the option
@code{error-format}.

After the analysis, some statistics will be displayed:

@itemize @bullet
@item The number of analysed sentences.
@item The number of recognised sentences.
@item The number of sentences recognised by combi-rules and end-rules.
@item The number of sentences recognised by robust-rules.
@item The number of sentences whose analyses produced errors.
@item The average number of results per sentence.
@item The analysis run time.
@item The average number of sentences that have been analysed per second.
@item The number of cache accesses.
@item The number of cache hits.
@end itemize

@c ----------------------------------------------------------------------------

@node sa-line, set, sa-file, Commands
@section The Command @code{sa-line} (@code{malaga})
@cindex @code{sa-line} (@code{malaga} command)

If you have started @code{malaga} with a syntax file in your command
line or in the project file, you can start syntactic analyses using the
command @code{sa-line} (short for @emph{syntactic analysis}). Assume you
want to analyse the sentence in the third line in the file
@file{sentences}. Then enter the following command:

@example
sa-line sentences 3
@end example

Malaga will show the results automatically, and it will also show the
analysis tree automatically if you specified it using the
@code{auto-tree} option. You can look at the results using
@code{result} or at the entire analysis tree using @code{tree}.

@c ----------------------------------------------------------------------------

@node set, sg, sa-line, Commands
@section The Command @code{set}
@cindex @code{set} (command)

This command is used to change the settings of @code{malaga} or
@code{mallex}. The command line @samp{set @var{option argument}} changes
@var{option} to @var{argument}. If you want to get the current state of
an option, use the command @code{get}. Options can also be set in the
project file. The possible options are described in the next chapter.

@c ----------------------------------------------------------------------------

@node sg, step, set, Commands
@section The Command @code{sg} (@code{malaga})
@cindex @code{sg} (@code{malaga} command)

Use @code{sg} to generate sentences that are composed of a specified set
of word forms. For example, enter:

@example
sg 3 . ? he she sleeps
@end example

All sentences that consist of up to three word forms, where only the specified
word forms (``.'', ``?'', ``he'', ``she'', and ``sleeps'') are used. The
sentences are numbered from 1 onward, but different analyses of the same
sentence get the same index. The output looks like this:

@cartouche
@example
malaga> sg 3 . ? he she sleeps
1: "he sleeps ."
2: "he sleeps ?"
3: "she sleeps ."
4: "she sleeps ?"
malaga>
@end example
@end cartouche

Please note that generation does not know of filters, pruning rules and
default rules.

@c ----------------------------------------------------------------------------

@node step, transmit, sg, Commands
@section The Command @code{step}
@cindex @code{step} (command)

This command can only be executed in debug mode. The rule execution
will be resumed and continues until a different source line is met, a
different path is going to be executed since the old one has
terminated, or until the rules have been executed completely. 

@c ----------------------------------------------------------------------------

@node transmit, tree, step, Commands
@section The Command @code{transmit}
@cindex @code{transmit} (command)

If you have specified a transmit command line (to do this, use the option 
@code{transmit-cmd}), you can send a command to it:

@cartouche
@example
malaga> set transmit-cmd cat
malaga> transmit [surf: "go", POS: verb];
[POS: verb, 
 surf: "go"]
malaga>
@end example
@end cartouche

@c ----------------------------------------------------------------------------

@node tree, up, transmit, Commands
@section The Command @code{tree} (@code{malaga})
@cindex @code{tree} (@code{malaga} command)

If you've started a grammatical analysis using one of the commands @code{ma} or
@code{sa} (or their debug variants), you can make @code{malaga} display the
result by entering

@example
tree
@end example

If the analysis has not yet finished (in debug mode or in case of an error), a
partial tree will be shown.

If you're executing the command @code{tree} for the first time, or if you've
closed the @code{Tree} window before, a new tree window will open in which the
current analysis tree will be displayed.

If there is already a @code{Tree} window open, the new analysis tree will be
displayed in this window.

In the upper left corner of the @code{Tree} window, you will see the
sentence or the word form that has been analysed. Below, the analysis
tree is displayed. An analysis path always follows the edges from the
left to the right.

A circle node stands for a LAG state, a two-circle node stands for an end
state. A crossed circle stands for a LAG state that has been removed by a
pruning-rule, and a crossed two-circle node stands for an end state that is
invalid because it has some remaining input still remaining. A box node
is not a state, but a @dfn{dead end}, which means that no rule has created a
state at this position.

Above each edge, the link's surface of the corresponding rule application is
displayed. Below the edge, you'll see the name of the applied rule.

You can click on a node using the left mouse button. Then another window will
open, namely the @code{Path} window. The @code{Path} window displays the
surface, the feature structure and the successor rules of the state you've
clicked on. The node will be highlighted by a red border.

If you press the right mouse button while the mouse is on a node, a
pop-up menu will appear. You can then either select that this node is
the first node of the path to be displayed, or you can select it to be
the last one. All rule applications, from the first node up to the
last node in the path, will be displayed in the @code{Path}
window. The corresponding path will be highlighted in the @code{Tree}
window.

If you're clicking on a link surface using any mouse button, the surface
and its feature structure will be displayed in the @code{Path} window.

You can also click on rule names using any mouse button. Then the corresponding
rule application will be displayed in the @code{Path} window, i.e.@ the
surfaces and feature structures of the original state, the link, and the
successor state, and the successor rules.

There are some commands that can be initiated from the @code{Tree} menu bar:

@table @code
@item Window
@table @code
@item Export Postscript...
Export the displayed analysis tree as an Embedded Postscript file.
Currently, only ASCII, Latin-1 Supplement, Hangul Compatibility Jamo
and Hangul Syllables can be converted to Postscript.
@item Close
Close the @code{Tree} window.
@end table
@item Style
Select an item in this menu to adjust the font size.

@item Tree
Specify which nodes of the analysis tree are actually displayed and
whether state indexes are shown.
@table @code
@item Full Tree
All analysis states are displayed, and also boxes for rule
applications that did not succeed (dead ends).
@item No Dead Ends
All analysis states are displayed.
@item Complete paths
Only the nodes that are part of a complete analysis are displayed.
@item Show State Indexes
Toggles the display of the state's indexes.
@end table

@item End States
Select an end state to display in the @code{Path} window.
@table @code
@item Show First
Display the first end state.
@item Show Previous
If there is an end state displayed in the @code{Path} window, jump
to the previous one.
@item Show Next
If there is an end state displayed in the @code{Path} window, jump
to the next one.
@item Show Last 
Display the last end state.
@end table

@end table

The @code{Path} window has got its own menu bar which contains the
menus @code{Window}, @code{Style}, and @code{End States} with the same
menu items as the corresponding menus in the @code{Tree} window, and
two additional options in @code{Style}:

@table @code
@item Hanging
Normally, all values and subvalues are aligned at their bottom. If
this option is active, records are ``hanging down'': they are aligned
at their top.
@item Inline
Normally, a state is displayed with surface, feature structure and
rule set stacked. If this option is active, they are displayed aligned
on on line.
@end table

The @code{Path} window also has a menu @code{Path}, in which you can
specify whether state indexes are shown:
@table @code
@item Show State Indexes
Toggles the display of the state's indexes.
@end table

@c ----------------------------------------------------------------------------

@node up, variables, tree, Commands
@section The Command @code{up}
@cindex @code{up} (command)

If you want to look at the source and the variables of the (sub)rule that has
called the current subrule, you can do this by entering @code{up}. You can list
the frames via @code{backtrace}.

@c ----------------------------------------------------------------------------

@node variables, walk, up, Commands
@section The Command @code{variables}
@cindex @code{variables} (command)

If you invoke @code{variables}, you get the values of all Malaga
variables that are currently defined. The variables will be shown in the
order of their definitions. You can only use the command
@code{variables} in debug mode or if the previous analysis has stopped
with an error in the combination rules.

If the option @code{use-display} is off, the variables will be sent to
standard output:

@cartouche
@example
malaga> sa-debug You are so beautiful.
entering rule "Noun", surf: "", link: "You", state: 1
debug> variables
$sentence = [class: main_clause, 
             parts: <>]
$word = [class: pronoun, 
         result: S2]
debug>
@end example
@end cartouche

If the option @code{use-display} is on and @code{malshow} is used as
@code{display-cmd}, the variables will be displayed in window on their
own. If the @code{Variables} window is not open yet, it will open
now.  If there is an open @code{Variables} window, the new variable
contents will be displayed in this window.

You can left-click on a variable name to make its value disappear or appear
again.

The @code{Variables} window has a menu with some commands:

@table @code
@item Window
@table @code
@item Export Postscript...
Export the displayed variables as an Embedded Postscript
file. Currently, only ASCII, Latin-1 Supplement, Hangul Compatibility
Jamo and Hangul Syllables can be converted to Postscript.
@item Close
Close the @code{Variables} window.
@end table
@item Style
@table @code
@item Font Size ...
Select an item to adjust the font size.
@item Hanging
Normally, all values and subvalues are aligned at their bottom. If
this option is active, records are ``hanging down'': they are
aligned at their top.
@end table
@item Variables
@table @code
@item Show All
Display the values of all variables currently defined.
@item Hide All
Suppress the values of all variables currently defined.
@end table
@end table

@c ----------------------------------------------------------------------------

@node walk, where, variables, Commands
@section The Command @code{walk}
@cindex @code{walk} (command)

This command works in debug mode only. The rule execution will be continued and
stopped again as soon as a new rule is executed, a breakpoint is met or there
are no more rules to execute.

@c ----------------------------------------------------------------------------

@node where,  , walk, Commands
@section The Command @code{where}
@cindex @code{where} (command)

This command can only be used in debugger mode or after rule execution
has been stopped by an error. It displays the name of the rule that
has been executed; additionally, the surfaces of state and link are
displayed in @code{malaga}. For example:

@cartouche
@example
debug> where
at rule "flexion", surf: "hous", link: "es", state: 2
debug>
@end example
@end cartouche

@c ----------------------------------------------------------------------------

@node Options, The Language, Commands, Top
@chapter The Options of @code{malaga} and @code{mallex}
@cindex options

The programs @code{malaga} and @code{mallex} share some of their
options, so I will describe them in a common chapter. Options can be set
using the command @code{set}, and you can get the current value of an
option using @code{get}. Options that can be used in @code{malaga} or
in @code{mallex} only, are marked by the name of the program in which
they can be used.

@menu
* alias::          Shortcuts for other commands.
* allo-format::    The output format for allomorphs in readable form.
* auto-tree::      Is the analysis tree displayed automatically after analysis?
* auto-variables:: Are variables displayed automatically in debug mode?
* cache-size::     The size of the word form cache.
* display-cmd::    The command line for the display GUI.
* error-format::   The output-format for analyses that reported an error.
* hidden::         The attributes whose values are hidden in output.
* mor-incomplete:: Will we accept words that have been incompletely parsed?
* mor-out-filter:: Will the morphology output filter be executed?
* mor-pruning::    Number of states needed to call the morphology pruning rule.
* result-format::  The output format for successful analyses.
* result-list::    Pack all analysis results in a single list.
* robust-rule::    Will the robust rule be executed?
* roman-hangul::   Will Hangul be transcribed? (for Hangul grammars only)
* sort-records::   The order of the attributes in a record when displayed.
* switch::         User options that can be read by the grammar.
* syn-incomplete:: Will we accept sentences that have been incompletely parsed?
* syn-in-filter::  Will the syntax input filter be executed?
* syn-out-filter:: Will the syntax output filter be executed?
* syn-pruning::    Number of states needed to call the syntax pruning rule.
* transmit-cmd::   The command line for the transmit process.
* unknown-format:: The output format for analyses that got no results.
* use-display::    Will the program in @code{display-cmd} be used for output?
@end menu

@c ----------------------------------------------------------------------------

@node alias, allo-format, Options, Options
@section The Option @code{alias}
@cindex @code{alias} (option)

With @code{alias}, you can define abbreviations for longer command
lines. As arguments, give an alias name and an expansion (a command line
which the name will stand for). If the expansion contains spaces,
enclose it in double quotes. Use @code{set alias @var{name}} to delete
alias @var{name}.

If you type the name of an alias at your command line, its expansion
will be executed. The character sequence @samp{%a} in your alias
definition will be replaced by what follows the alias name in the
command line.

Aliases cannot be nested.

@c ----------------------------------------------------------------------------

@node allo-format, auto-tree, alias, Options
@section The Option @code{allo-format} (@code{mallex})
@cindex @code{allo-format} (@code{mallex} option)

With @code{allo-format}, you can change the output format for the
generated allomorphs. Enter a format string as argument. If the format
string contains spaces, enclose it in double quotes. If the argument is
an empty string (@code{""}), no allomorphs will be shown.

In the format string, the following sequences have a special meaning:

@table @samp
@item %c 
Will be replaced by the allomorph's feature structure.
@item %n 
Will be replaced by the allomorph's number.
@item %s 
Will be replaced by the allomorph's surface.
@end table

@c ----------------------------------------------------------------------------

@node auto-tree, auto-variables, allo-format, Options
@section The Option @code{auto-tree} (@code{malaga})
@cindex @code{auto-tree} (@code{malaga} option)

You can use @code{auto-tree} to make @code{malaga} execute the
@code{tree} command each time when you invoked an analysis by @code{ma}
or @code{sa}. Set it in one of the following ways:

@table @code
@item set auto-tree yes
The @code{tree} command will be executed after each analysis.
@item set auto-tree no
The @code{tree} command will not be executed automatically.
@end table

@c ----------------------------------------------------------------------------

@node auto-variables, cache-size, auto-tree, Options
@section The Option @code{auto-variables}
@cindex @code{auto-variables} (option)

When @code{malaga} or @code{mallex} stops in debug mode while executing
a malaga rule, they can automatically show the defined variables at this
point. Use the option @code{auto-variables} to set this behaviour.

@table @code
@item set auto-variables yes
The @code{variables} command will be executed each time when
@code{malaga} or @code{mallex} stops in debug mode.
@item set auto-variables no
The @code{variables} command will not be executed automatically.
@end table

@c ----------------------------------------------------------------------------

@node cache-size, display-cmd, auto-variables, Options
@section The Option @code{cache-size} (@code{malaga})
@cindex @code{cache-size} (@code{malaga} option)

Malaga has a cache for word forms. You can set the cache size, i.e. the maximum
number of words in the cache, to @var{n} with @code{set cache-size @var{n}}.
If you set the cache size to 0, the cache will be deactivated.

When malaga analyses a word form or sentence, it tries to get a word form from
the cache before it uses the morphology combination rules. Therefore, malaga
separates the first word form from the remaining input. It uses spacing
characters as separators; so if a word-form contains a space or does not end
with a space, caching will not work.

@c ----------------------------------------------------------------------------

@node display-cmd, error-format, cache-size, Options
@section The Option @code{display-cmd}
@cindex @code{display-cmd} (option)

The programs @code{malaga} and @code{mallex} normally use the program
@code{malshow} for GUI-based display of Malaga trees, results or
variables. If you want to use a different display program, set the
command line that starts this program with the @code{display} option,
like this:

@example
set display-cmd "java -classpath /opt/malaga/amalgam Amalgam"
@end example

@c ----------------------------------------------------------------------------

@node error-format, hidden, display-cmd, Options
@section The Option @code{error-format} (@code{malaga})
@cindex @code{error-format} (@code{malaga} option)

With @code{error-format}, you can change the output format for items
that produced an analysis error. Enter a format string as argument. If the
format string contains spaces, enclose it in double quotes. If the argument is
an empty string (@code{""}), no forms that produced an error will be shown.

In the format string, the following sequences have a special meaning:

@table @samp
@item %e
Will be replaced by the error message for the analysed form.
@item %l
Will be replaced by the line number of the analysed form.
@item %n 
Will be replaced by the number of analysis states for this form.
@item %s 
Will be replaced by the surface.
@end table

@c ----------------------------------------------------------------------------

@node hidden, mor-incomplete, error-format, Options
@section The Option @code{hidden}
@cindex @code{hidden} (option)

Some grammars can produce very large feature structures, so it can be useful
not to show the values of some specified attributes. To achieve this, use the
option @code{hidden}. You can give any number of arguments to this option. The
following arguments are available:

@table @samp
@item +@var{attribute-name}
The specified attribute name will be put in parentheses if it occurs in
a value; the attribute value will not be shown.
@item -@var{attribute-name}
The specified attribute will be shown completely again in the future.
@item none
All attributes will be shown completely again in the future.
@end table

@c ----------------------------------------------------------------------------

@node mor-incomplete, mor-out-filter, hidden, Options
@section The Option @code{mor-incomplete} (@code{malaga})
@cindex @code{mor-incomplete} (@code{malaga} option)

If you want to get morphological analysis results not only for the whole input
line, but for any grammatically well-formed prefix of the input line, you can
use the option @code{mor-incomplete}:

@table @code
@item set mor-incomplete yes
Accept words that have been incompletely parsed.
@item set mor-incomplete no
Only accept words that have been completely parsed.
@end table

Note that this option has no effect in subordinate morphological analyses that
are needed by syntactic analysis.

@c ----------------------------------------------------------------------------

@node mor-out-filter, mor-pruning, mor-incomplete, Options
@section The Option @code{mor-out-filter} (@code{malaga})
@cindex @code{mor-out-filter} (@code{malaga} option)

Use the option @code{mor-out-filter} to switch the morphology output-filter
on or off:

@table @code
@item set mor-out-filter yes
Activate the filter.
@item set mor-out-filter no
Deactivate the filter.
@end table

@c ----------------------------------------------------------------------------

@node mor-pruning, result-format, mor-out-filter, Options
@section The Option @code{mor-pruning} (@code{malaga})
@cindex @code{mor-pruning} (@code{malaga} option)
@cindex Pruning

In your morphology rules, you may have specified a pruning rule that
can prune the morphology analysis tree, i.e.@ it can reduce the number of
parallel paths. If you want this pruning rule to be executed, use the
option @code{mor-pruning}.  Use one of the following arguments:

@table @code
@item set mor-pruning @var{n}
Call the morphology pruning rule whenever at least @var{n} states have consumed
the same amount of input, for @var{n} > 0.
@item set mor-pruning 0
Deactivate the morphology pruning rule.
@end table

@c ----------------------------------------------------------------------------

@node result-format, result-list, mor-pruning, Options
@section The Option @code{result-format} (@code{malaga})
@cindex @code{result-format} (@code{malaga} option)

With @code{result-format}, you can change the output format for analysed items
that have been recognised. Enter a format string as argument. If the format
string contains spaces, enclose it in double quotes. If the argument is an
empty string (@code{""}), no recognised forms will be shown.

In the format string, the following sequences have a special meaning:

@table @samp
@item %c
Will be replaced by the result feature structure of the analysis.
@item %l
Will be replaced by the line number of the analysed form.
@item %n
Will be replaced by the number of analysis states for this form.
@item %r 
Will be replaced by the reading index (the results for a form are
indexed from 1 to the number of results).
@item %s 
Will be replaced by the surface.
@end table

@c ----------------------------------------------------------------------------

@node result-list, robust-rule, result-format, Options
@section The Option @code{result-list} (@code{malaga})
@cindex @code{result-list} (@code{malaga} option)

With this command, you can specify whether you want malaga to pack all
analysis results into a single list. This option only has an impact in
filter mode or when a file is being analysed. Even results of
different lengths are combined; this could not be achieved by an
output-filter. Results of different lenghts can occur when the option
@code{mor-incomplete} or @code{syn-incomplete} is active.

@table @code
@item set result-list yes
Combine results into a single list.
@item set result-list no
Leave results unchanged.
@end table

@c ----------------------------------------------------------------------------

@node robust-rule, roman-hangul, result-list, Options
@section The Option @code{robust-rule} (@code{malaga})
@cindex @code{robust-rule} (@code{malaga} option)

With this command, you can specify if you want to run a robust-rule for the
word forms that could not be recognised by LAG rules. The robust-rule gets the
surface of an unknown word form as parameter and it can create one or more
results by executing the @code{result} statement.

@table @code
@item set robust-rule yes
Enable the robust rule.
@item set robust-rule no
Disable the robust rule.
@end table

@c ----------------------------------------------------------------------------

@node roman-hangul, sort-records , robust-rule, Options
@section The Option @code{roman-hangul}
@cindex @code{roman-hangul} (option)
@cindex Hangul, transcribed
@cindex transcribed Hangul
If you are using the option @samp{split-hangul-syllables: yes} in
your project file, Malaga can transcribe Hangul using Latin
letters, basing on the Yale system. The transcribed text is enclosed
in curly braces, and each syllable starts with a dot.

@table @code
@item set roman-hangul yes
Display Hangul using Latin transcription.
@item set roman-hangul no
Display Hangul directly.
@end table

@c ----------------------------------------------------------------------------

@node sort-records, switch, roman-hangul, Options
@section The Option @code{sort-records}
@cindex @code{sort-records} (option)
@cindex order, attribute
@cindex attribute order

There are different ways to determine the order in which the attributes of a
record are displayed. With @code{sort-records}, you can choose between three
order schemes:

@table @code
@item set sort-records internal
The attributes will be displayed in the order they have internally.
@item set sort-records alphabetic
The attributes will be ordered alphabetically by their names.
@item set sort-records definition
The attributes will be ordered by their names; the order is the same as
in the symbol table.
@end table

@c ----------------------------------------------------------------------------

@node switch, syn-incomplete, sort-records, Options
@section The Option @code{switch}
@cindex @code{switch} (option)

Malaga rules can query simple Malaga values (@dfn{switches}) that you can
change during run time. Use the option @code{switch} to change the values:

@table @code
@item set switch @var{name} @var{value}
Set the switch @var{name}, which must be a symbol, to @var{value}, which
can be any Malaga value.
@end table

@c ----------------------------------------------------------------------------

@node syn-incomplete, syn-in-filter, switch, Options
@section The Option @code{syn-incomplete} (@code{malaga})
@cindex @code{syn-incomplete} (@code{malaga} option)

If you want to get syntactic analysis results not only for the whole input
line, but for any grammatically well-formed prefix of the sentence, you can use
the option @code{syn-incomplete}:

@table @code
@item set syn-incomplete yes
Accept sentences that have been incompletely parsed.
@item set syn-incomplete no
Only accept sentences that have been completely parsed.
@end table

@c ----------------------------------------------------------------------------

@node syn-in-filter, syn-out-filter, syn-incomplete, Options
@section The Option @code{syn-in-filter} (@code{malaga})
@cindex @code{syn-in-filter} (@code{malaga} option)

Use the option @code{syn-in-filter} to switch the syntax input-filter on or
off:

@table @code
@item set syn-in-filter yes
Activate the filter.
@item set syn-in-filter no
Deactivate the filter.
@end table

@c ----------------------------------------------------------------------------

@node syn-out-filter, syn-pruning, syn-in-filter, Options
@section The Option @code{syn-out-filter} (@code{malaga})
@cindex @code{syn-out-filter} (@code{malaga} option)

Use the option @code{syn-out-filter} to switch the syntax output-filter on
or off:

@table @code
@item set syn-out-filter yes
Activate the filter.
@item set syn-out-filter no
Deactivate the filter.
@end table

@c ----------------------------------------------------------------------------

@node syn-pruning, transmit-cmd, syn-out-filter, Options
@section The Option @code{syn-pruning} (@code{malaga})
@cindex @code{syn-pruning} (@code{malaga} option)
@cindex Pruning

In your syntax rules, you may have specified a pruning rule that can prune the
syntax analysis tree, i.e.@ it can reduce the number of parallel paths. If you
want this pruning rule to be executed, use the option @code{syn-pruning}.
Use one of the following arguments:

@table @code
@item set syn-pruning @var{n}
Call the syntax pruning rule whenever at least @var{n} states have consumed
the same amount of input, for @var{n} > 0.
@item set syn-pruning 0
Deactivate the syntax pruning rule.
@end table

@c ----------------------------------------------------------------------------

@node transmit-cmd, unknown-format, syn-pruning, Options
@section The Option @code{transmit-cmd}
@cindex @code{transmit-cmd} (option)

If you want to use the @code{transmit} function in @code{malaga} or
@code{mallex}, you have to set a command line that starts the transmit
process using the @code{transmit-cmd} option. Here is an example:

@example
set transmit-cmd "perl my-transmit-program.pl"
@end example

@c ----------------------------------------------------------------------------

@node unknown-format, use-display, transmit-cmd, Options
@section The Option @code{unknown-format} (@code{malaga})
@cindex @code{unknown-format} (@code{malaga} option)

With @code{unknown-format}, you can change the output format for analysed items
that have not been recognised. Enter a format string as argument. If the
format string contains spaces, enclose it in double quotes. If the argument is
an empty string (@code{""}), no unrecognised forms will be shown.

In the format string, the following sequences have a special meaning:

@table @samp
@item %l
Will be replaced by the line number of the analysed form.
@item %n
Will be replaced by the number of analysis states for this form.
@item %s
Will be replaced by the surface.
@end table

@c ----------------------------------------------------------------------------

@node use-display, , unknown-format, Options
@section The Option @code{use-display}
@cindex @code{use-display} (option)

If you want the output of the commands @code{result} and @code{variables} to be
shown by the @code{Display} process, use the option @code{use-display}:

@table @code
@item set use-display yes
Use the @code{Display} process to show the output of @code{result} and
@code{variables}.
@item set use-display no 
Send the output of @code{result} and @code{variables} to your terminal.
@end table

@c ----------------------------------------------------------------------------

@node The Language, Index, Options, Top
@chapter The Programming Language Malaga
@cindex Malaga, programming language

@menu
* Characterisation::    The abstract characteristics of the language.
* Source Texts::        General rules for Malaga source files.
* Values::              The types that make any Malaga data.
* Expressions::         How operators can combine values.
* Conditions::          Expressions yielding a boolean value.
* Boolean Operators::   The Operators @code{and}, @code{or} and @code{not}.
* Symbol Table::        All symbols have to be defined here.
* Initial State::       The initial LAG state.
* Constant Definition:: Constants can be used in lexicon and rule files.
* Rules::               Rules are comparable to functions in C.
* Statements::          The atoms of which a rule is constructed.
* Files::               The Files that make a Malaga grammar.
* Syntax Summary::      Formal Description of the Malaga syntax.
@end menu

@c ----------------------------------------------------------------------------

@node Characterisation, Source Texts, The Language, The Language
@section Characterisation of Malaga

A malaga rule file resembles much in programming languages like Pascal
or C (of course, those languages do not have a Left-Associative Grammar
formalism built in). A malaga source file must be translated before
execution, this is the same as for compiler languages. But the
generated Malaga code is not a machine code, but an @emph{intermediate code}
and has to be executed (@dfn{interpreted}) by an analysis program.
Malaga may be characterised as follows, as far as programming structures and
data structures are concerned:

@table @emph
@item structured values:
The basic values in Malaga are symbols (names that can be used e.g. for
categories or subcategories), numbers (floating point numbers), and
strings. Values can be combined to ordered lists or records (also known
as attribute-value matrixes). A value in a list or a record can be a list or a
record itself. An ``ambiguous'' symbol like @code{singular_plural} can
be assigned a list of symbols like @code{<singular, plural>}; such a
symbol is called a @dfn{multi-symbol}.

@item structured statements: 
In Malaga, the concept of statement blocks is implemented in a similar
way as it is in the programming language Pascal. There are structured
control statements to select or repeat a statement sequence. A variable
is always defined @dfn{locally}, i.e.@ it only exists from the point
where it has been defined up to the end of the statement sequence in
which it has been defined.

@item no type restrictions: 
Any value can be assigned to a variable and the programmer can freely
define the structure of values.

@item no side effects:
Malaga is, unlike programming languages like Pascal or C, free of side
effects. If a variable gets a value, no other variable will be
changed. Analysis paths are independent of each other.

@item termination:
A Malaga grammar that contains no recursive subrules and no
@code{repeat} statements is guaranteed to terminate, i.e.@ it can never
hang in a loop.

@item variables:
In a @code{define} statement, a variable is defined and gets an initial
value. Use an assignment to set a variable that has already been defined
to a new value.

@item operators: 
Many generative grammar theories or linguistical programming languages use the
concept of unification of feature structures. Malaga does not use unification,
but it offers some operators to build feature structures explicitly. Since
Malaga does without unification, analyses are much faster.  
@end table

@c ----------------------------------------------------------------------------

@node Source Texts, Values, Characterisation, The Language
@section Malaga Source Texts

Source texts in Malaga must be in the Unicode UTF-8 format. They are
format-free; this means that between lexical symbols (strings,
identifiers, keywords, numerals and symbols such as @samp{+}, @samp{~}
or @samp{:=}) there may be blanks or newlines (whitespaces) or
comments. Between two identifiers or two keywords there @emph{must} be
at least one whitespace to separate them syntactically.

@menu
* Comments::    How to insert comments in your source file.
* Include::     How to read other files from your source file.
* Identifiers:: Names in Malaga source files.
@end menu

@c ----------------------------------------------------------------------------

@node Comments, Include, Source Texts, Source Texts
@subsection Comments
@cindex comments

A comment may be inserted everywhere where a whitespace may be inserted. A
comment begins with the symbol @samp{#} and extends to the end of the line.
Comments are being ignored.

@c ----------------------------------------------------------------------------

@node Include, Identifiers, Comments, Source Texts
@subsection The @code{include} Statement
@cindex @code{include} (statement)

A Malaga file may contain the statement

@example
include "@var{filename}";
@end example

In a rule file, it can stand everywhere a rule can stand. In lexicon
files, it can stand in place of a value; in symbol files, it can replace
a symbol definition. The text of the included file is inserted verbatim
at the very location where the @code{include} statement occurs. The file
name has to be stated relatively to the directory of the file which
contains the @code{include} statement.

@c ----------------------------------------------------------------------------

@node Identifiers,  , Include, Source Texts
@subsection Identifiers
@cindex identifiers

In Malaga, names for variables, constants, symbols, and rules, and (see below
for explanation) are called @dfn{identifiers}. An identifier may consist of
uppercase and lowercase characters, the underscore @samp{_}, the ampersand
@samp{&}, the vertical bar @samp{|}, and, from the second character on,
also of digits. Uppercase and lowercase characters are not distinguished, i.e.,
Malaga is @emph{not} case-sensitive. Malaga keywords must not be used as
identifiers. A variable name must start with a @samp{$}, a constant name
must start with a @samp{@@}. The same identifier may be used as variable
name, constant name, symbol name, or rule name independently. Malaga can
distinguish them by the context in which they occur.

Valid identifiers would be @samp{Noun}, @samp{noun} (the same as the
first), @samp{R2D2}, @samp{Vb_aux}, @samp{A|G|D}, @samp{_INF}.
Identifiers like @samp{2Noun}, @samp{Verb.Frame}, @samp{OK?},
@samp{_~INF} are @emph{not} valid.

@c ----------------------------------------------------------------------------

@node Values, Expressions, Source Texts, The Language
@section Values
@cindex values

Malaga expressions can have values with very complex structures. To describe
how those values can be composed from simple values a few rules suffice. Simple
values in Malaga are @dfn{symbols}, @dfn{numbers}, and @dfn{strings},
which can be composed to form @dfn{records} and @dfn{lists}.

@menu
* Symbols:: The atomic datatype that is basic to Malaga.
* Numbers:: Floating point numbers, also used for indexes.
* Strings:: A sequence of characters, used to store text.
* Lists::   An ordered sequence of subvalues.
* Records:: A set of attribute-value pairs.
@end menu

@c ----------------------------------------------------------------------------

@node Symbols, Numbers, Values, Values
@subsection Symbols
@cindex symbols

The central data type in Malaga is the symbol. It is used for describing
syntactic or semantic properties of an allomorph, a word, or a
sentence. A symbol is an identifier like @samp{Verb}, @samp{reflexive},
@samp{Sing_1}. The symbols @samp{nil}, @samp{yes}, @samp{no},
@samp{symbol}, @samp{string}, @samp{number}, @samp{list}, and
@samp{record} are predefined and have special meanings.

@c ----------------------------------------------------------------------------

@node Numbers, Strings, Symbols, Values
@subsection Numbers
@cindex numbers

A number in Malaga consists of an integer part, an optional fractional
part and an optional exponent of the form @samp{E[+|-]n}. There must be
a dot between the integer part and the fractional part. Examples:
@samp{0}, @samp{1}, @samp{1.0}, @samp{13.75}, @samp{1.2E-5}.

Alternatively, a number may consist of an integer number followed by
@samp{L}, indicating that the number is intended as a list index
counting from the @emph{left} border), or by @samp{R}, indicating that
the number is intended as a list index counting from the @emph{right}
border. Examples: @code{5L} = @code{5}, @code{12R} = @code{-12}.

@c ----------------------------------------------------------------------------

@node Strings, Lists, Numbers, Values
@subsection Strings
@cindex strings

A string may consist of any number of characters (it may also be empty). It
must be enclosed in double quotes and must not extend over more than one line.
Within the double quotes there may be any combination of printable characters
except the backslash @samp{\} and the double quotes. When part of a
string, these characters must be preceded by a @samp{\} (escape character). 
@cindex escape character (@samp{\})
Examples: @code{"Hello"}, @code{"He says: \"Great\""}.

@c ----------------------------------------------------------------------------

@node Lists, Records, Strings, Values
@subsection Lists
@cindex lists

A list is an ordered sequence of values. The values are separated by commas and
enclosed in angle brackets:

@example
<@var{element1}, @var{element2}, ...>
@end example

A list may as well be empty. The elements in a list may be arbitrarily complex;
they may also be lists or records.

@c ----------------------------------------------------------------------------

@node Records,  , Lists, Values
@subsection Records
@cindex records
@cindex attributes

A record is a collection of attributes. An @emph{attribute} consists of a
symbol, the @emph{attribute name}, and an associated @emph{attribute value},
which can be an arbitrary Malaga value. The attribute name serves as an access
key for the attribute value, so all attributes in a record must have distinct
names. 

Records are noted down as follows:

@example
[@var{name1}: @var{value1}, @var{name2}: @var{value2}, ...]
@end example

where @var{name i} denotes an attribute name and @var{value i} the associated
attribute value. Example: @code{[Class: Verb, Reg: Reg, Val: dirObj]}.

A record with no attributes, @code{[]}, is called @dfn{empty record}.
@cindex record, empty
@cindex empty record

@c ----------------------------------------------------------------------------

@node Expressions, Conditions, Values, The Language
@section Expressions
@cindex expressions

An expression is the form in which a value is used in Malaga. Values can be
written as follows:

@example
[Surf: "he", Class: Pron, Case&Number: S3]
@end example

Variables (these are placeholders for values within a rule) can as well be used
as expressions:

@example
$Pron
@end example

Furthermore, constants (placeholders for values in a rule file) can be used as
expressions:

@example
@@combination_table
@end example

All three forms can be mixed:

@example
[Surf: "he", Class: Pron, Case&Number: $result]
@end example

Furthermore, there are operators which modify values or combine two
values to form a new value. Complex values can be composed using those
operators. All operators have a priority assigned. An operator with
higher priority is applied before an operator with lower priority. If
two operators have the same priority, they are applied from the left to
the right. The order in which the operators are to be applied can be
changed by bracketing with round parentheses @samp{()}.
@cindex priority, operator
@cindex operator priority

@table @asis
@item unary @samp{-}
very high priority
@item @samp{.}
high priority
@item @samp{*}, @samp{/}
middle priority
@item @samp{+}, @samp{-}
low priority
@end table

@menu
* Malaga Variables:: Containers for Malaga Values in a Rule.
* Constants::        Global containers for Malaga Values.
* Subrule Calls::    Call a subrule from another rule.
* Atoms::            The atoms of a multisymbol.
* Capital::          Does a string begin with a capital letter?
* Floor::            Round down to the next integer.
* Length::           The length of a list or a string.
* Multi::            The multisymbol of the given atoms list.
* Set::              Make a list contain unique elements only.
* Substring::        Get a substring of a string.
* Switch::           Get a user-defined value.
* Transmit::         Call the transmit process.
* Value_String::     Convert a value to a string.
* Value_Type::       Get the type of a value.
* If Expression::    Evaluate one of several expressions.
* Unary Minus::      Negate a value.
* Operator Dot::     Select an attribute or a list element.
* Operator Plus::    Concat strings, lists or records, or add.
* Operator Minus::   Delete an attribute or an element, or subtract.
* Operator Times::   Intersect lists, concat records, or multiply.
* Operator Divide::  Delete elements from a list, or divide.
@end menu

@c ----------------------------------------------------------------------------

@node Malaga Variables, Constants, Expressions, Expressions
@subsection Variables

A variable is marked by a @samp{$} preceding its name. The name may be any
valid identifier. A variable is defined by the @code{define} statement; it
receives a value and may from this point on be used in all expressions within
the statement sequence. In such a statement sequence (and all subordinated
statement sequences) a variable with the same name must not be defined again.

@c ----------------------------------------------------------------------------

@node Constants, Subrule Calls, Malaga Variables, Expressions
@subsection Constants
@cindex constants

A constant is marked by a @samp{@@} preceding its name. The name may be any
valid identifier. A constant is defined by a constant definition in a rule
file, outside a rule. It is assigned a value and can be used in subsequent
rules and constant definitions in that rule file.

@c ----------------------------------------------------------------------------

@node Subrule Calls, Atoms, Constants, Expressions
@subsection Subrule Invokations
@cindex subrules, calling

A subrule is invoked when an expression 
@code{@var{subrule}(@var{value1}, @var{value2}, ...)} is evaluated. 

The expression yields the value that is returned by the @code{return}
statement in the subrule.
@cindex @code{return} (statement)

The number of parameters in a subrule invokation must match the number of
parameters in the subrule definition.

There is a number of default subrules which are predefined. They are called
@dfn{functions}.
@cindex functions

@c ----------------------------------------------------------------------------

@node Atoms, Capital, Subrule Calls, Expressions
@subsection The Function @code{atoms}
@cindex @code{atoms} (function)

The expression @code{atoms(@var{symbol})} yields the list of atomic
symbols for @var{symbol}. If @var{symbol} is not a multi-symbol, it
yields the list @code{<@var{symbol}>}.

@c ----------------------------------------------------------------------------

@node Capital, Floor, Atoms, Expressions
@subsection The Function @code{capital}
@cindex @code{capital} (function)

The expression @code{capital(@var{string})} yields @code{yes} if the
first character of @var{string} is a capital letter, else it yields
@code{no}.

@c ----------------------------------------------------------------------------

@node Floor, Length, Capital, Expressions
@subsection The Function @code{floor}
@cindex @code{floor} (function)

The expression @code{floor(@var{number})} yields the largest integer
number that is not greater than @var{number}.

@c ----------------------------------------------------------------------------

@node Length, Multi, Floor, Expressions
@subsection The Function @code{length}
@cindex @code{length} (function)

The expression @code{length(@var{list})} yields the number of
elements in @var{list}.

The expression @code{length(@var{string})} yields the number of
characters in @var{string}.


@c ----------------------------------------------------------------------------

@node Multi, Set, Length, Expressions
@subsection The Function @code{multi}
@cindex @code{multi} (function)

The expression @code{multi(@var{list})} where @var{list} is a list of
symbols, yields the multi-symbol whose atomic list corresponds to
@var{list}. If @var{list} contains a single atomic symbol, this symbol
will be yield by the expression.

@c ----------------------------------------------------------------------------

@node Set, Substring, Multi, Expressions
@subsection The Function @code{set}
@cindex @code{set} (function)

The expression @code{set(@var{list})} yields a list which contains
each element of @var{list}, but only once. That means, the list is
converted to a set.

@c ----------------------------------------------------------------------------

@node Substring, Switch, Set, Expressions
@subsection The Function @code{substring}
@cindex @code{substring} (function)

The expression @code{substring(@var{string}, @var{start_index},
@var{end_index})} yields the substring of @var{string} that starts at
@var{start_index} and ends at @var{end_index}, both inclusive. A
positive index counts from the string start: @code{1L} is the index of
the first character; a negative index counts from the string end:
@code{1R} is the index of the last character. If @var{end_index} is
omitted, it is assumed to be the same as @var{start_index}, so
@code{substring(@var{string}, @var{index})} yields the character at
@var{index} in @var{string}.  If @var{end_index} is less than
@var{start_index}, the function yields an empty string.

@c ----------------------------------------------------------------------------

@node Switch, Transmit, Substring, Expressions
@subsection The Function @code{switch}
@cindex @code{switch} (function)

The expression @code{switch(@var{symbol})} yields the current value of
the switch associated to @var{symbol}. Use the option @code{switch} to
change this value.

@c ----------------------------------------------------------------------------

@node Transmit, Value_String, Switch, Expressions
@subsection The Function @code{transmit}
@cindex @code{transmit} (function)

The expression @code{transmit(@var{value})} writes @var{value},
converted to text format, to the transmit process via pipe and reads a
value in text format from the transmit process via pipe. The answer is
converted to the internal Malaga value format and returned as the
result of the expression.

When this function is evaluated, the transmit process is started if it
is not running. The command line of the transmit process is specified by
the option @code{transmit}.

@c ----------------------------------------------------------------------------

@node Value_String, Value_Type, Transmit, Expressions
@subsection The Function @code{value_string}
@cindex @code{value_string} (function)

The expression @code{value_string(@var{value})} returns @var{value}
converted to text format as a string.

@c ----------------------------------------------------------------------------

@node Value_Type, If Expression, Value_String, Expressions
@subsection The Function @code{value_type}
@cindex @code{value_type} (function)

The expression @code{value_type(@var{value})} yields the type of
@var{value}. The type information is coded as one of the symbols
@code{symbol}, @code{string}, @code{number}, @code{list}, or
@code{record}.

@c ----------------------------------------------------------------------------

@node If Expression, Unary Minus, Value_Type, Expressions
@subsection The @code{if} Expression
@cindex @code{if} (expression)
@cindex @code{else} (keyword)
@cindex @code{elseif} (keyword)

An @code{if} expression has the following form:

@example
if @var{condition1} then
  @var{expression1}
elseif @var{condition2} then
  @var{expression2}
else 
  @var{expression3}
end if
@end example

The @code{elseif} part may be repeated unrestrictedly (including zero times).

First, @var{condition1} is evaluated. If it is satisfied, the
expression @var{expression1} is evaluated and yields the value of the
@code{if} expression.

If @var{condition1} is not satisfied, each condition following an
@code{elseif} keyword is evaluated in turn, until a condition is found
that is satisfied. The expression that follows this condition will be
evaluated and yields the value of the @code{if} expression.

If the @code{if} condition and @code{elseif} conditions all fail, the
expression @var{expression3} will be evaluated and yields the value of
the @code{if} expression.

The @code{if} after the @code{end} may be omitted.

@c ----------------------------------------------------------------------------

@node Unary Minus, Operator Dot, If Expression, Expressions
@subsection Unary @samp{-}

A @samp{-} in front of a value of type @code{number} negates that value.

@c ----------------------------------------------------------------------------

@node Operator Dot, Operator Plus, Unary Minus, Expressions
@subsection The Operator @samp{.}

This operator may only be used in the following ways:

@table @code
@item @var{record}.@var{symbol}
This yields the attribute value of the attribute of @var{record} whose
name is @var{symbol}. If there is no attribute in @var{record} whose
name is @var{symbol}, the expression yields the special symbol
@code{nil}.

@item @var{list}.@var{number}
This yields the element of @var{list} at position @var{number}. If
there is no element at position @var{number} in @var{list}, the
expression yields the special symbol @code{nil}.

@item @var{value}.@var{list}
Here, @var{list} must be a list @code{<@var{e1}, @var{e2}, ...>} of
symbols and/or numbers. This expression serves as an abbreviation for
@code{@var{value}.@var{e1}.@var{e2}...}.
@end table

@c ----------------------------------------------------------------------------

@node Operator Plus, Operator Minus, Operator Dot, Expressions
@subsection The Operator @samp{+}
@cindex @code{+} (operator)

This operator may only be used in the following ways:

@table @code
@item @var{string1} + @var{string2}
This yields the concatenation of @var{string1} and @var{string2}.

@item @var{list1} + @var{list2}
This yields the concatenation of @var{list1} and @var{list2}.

@item @var{number1} + @var{number2}
This yields the sum of @var{number1} and @var{number2}.

@item @var{record1} + @var{record2}
This yields a record wich consists of all attributes of @var{record1}
and @var{record2}. If @var{record1} and @var{record2} have a common
attribute names, the corresponding attributes in the result record will
have the attribute values from @var{record2}, in contrast to the
operator @samp{*}.
@end table

@c ----------------------------------------------------------------------------

@node Operator Minus, Operator Times, Operator Plus, Expressions
@subsection The Operator @samp{-}
@cindex @code{-} (operator)

This operator may only be used in the following ways:

@table @code
@item @var{record} - @var{symbol}
This yields @var{record} without the attribute named @var{symbol}, if
@var{symbol} is an attribute name in @var{record}. If not, the
expression yields @var{record}.

@item @var{record} - @var{list}
Here, @var{list} must be a list of symbols. This yields @var{record}
without the attributes in @var{list}.

@item @var{list} - @var{number}
This yields @var{list} without the element at index @var{number}. If
this element does not exist, the expression yields @var{list}.

@item @var{list1} - @var{list2}
This yields the multi-set difference of the two lists @var{list1} and
@var{list2}. This means, it yields the list @var{list1}, but the first
@var{n} appearances of each element will be deleted, if that element
appears @var{n} times in @var{list2}.

@item @var{number1} - @var{number2}
This yields the difference of @var{number1} and @var{number2}.
@end table

@c ----------------------------------------------------------------------------

@node Operator Times, Operator Divide, Operator Minus, Expressions
@subsection The Operator @samp{*}
@cindex @code{*} (operator)

This operator may only be used in the following ways:

@table @code
@item @var{record} * @var{symbol}
This yields the record which only contains the attribute of @var{record}
whose name is @var{symbol}.

@item @var{record1} * @var{record2}
This yields a record wich consists of all attributes of @var{record1}
and @var{record2}. If @var{record1} and @var{record2} have a common
attribute names, the corresponding attributes in the result record will
have the attribute values from @var{record1}, in contrast to the
operator @samp{+}.

@item @var{record} * @var{list}
Her, @var{list} must be a list of symbols. This yields the record which
only contains the attributes of @var{record} whose names are in
@var{list}.

@item @var{list1} * @var{list2}
This yields the @dfn{intersection} of the lists interpreted as
multi-sets; if an element is @var{m} times contained in @var{list1} and
@var{n} times contained in @var{list2}, it will be @code{min(@var{m},
@var{n})} times contained in the result.

@item @var{number1} * @var{number2}
This yields the product of @var{number1} and @var{number2}.
@end table

@c ----------------------------------------------------------------------------

@node Operator Divide,  , Operator Times, Expressions
@subsection The Operator @samp{/}
@cindex @code{/} (operator)

This operator may only be used in the following ways:

@table @code
@item @var{list1} / @var{list2}
This yields the list which contains all elements of @var{list1} which
are not elements of @var{list2}.

@item @var{list} / @var{number}
This yields the list which contains all elements of @var{list} without
the leftmost @var{number} elements, if @var{number} is positive, or
without the rightmost -@var{number} elements, if @var{number} is
negative.

@item @var{number1} / @var{number2}
Here, @var{number2} must not be 0. This yields the quotient of
@var{number1} and @var{number2}.
@end table

@c ----------------------------------------------------------------------------

@node Conditions, Boolean Operators, Expressions, The Language
@section Conditions
@cindex conditions
@cindex @code{yes} (symbol)
@cindex @code{no} (symbol)

A condition can either be true or false, as in @code{Verb = Verb} or
@code{Verb = Noun}, respectively. An expression that is evaluated to
any of the symbols @code{yes} or @code{no} is a valid condition.

A condition can be used in all places where a non-constant value is
needed. It will evaluate to @code{yes} or @code{no}. In this case, the
condition must be surrounded by parentheses.

@menu
* Equality Tests::      Compare any values for equality.
* Number Comparisons::  Test which number is greater.
* Congruency Tests::    Check lists or multi-symbols for common elements.
* Operator In::         Test an element or attribute for inclusion.
* Regular Expressions:: String patterns.
@end menu

@c ----------------------------------------------------------------------------

@node Equality Tests, Number Comparisons, Conditions, Conditions
@subsection The Operators @samp{=} and @samp{/=}
@cindex @code{=} (operator)
@cindex @code{/=} (operator)

The condition @code{@var{expr1} = @var{expr2}} tests whether the
expressions @var{expr1} and @var{expr2} are equal. Depending on the
types of @var{expr1} and @var{expr2}, equality is defined as follows:

@table @asis
@item @var{expr1} and @var{expr2} are both symbols or both numbers.
In this case @var{expr1} and @var{expr2} must be identical.
@item @var{expr1} and @var{expr2} are strings.
In this case @var{expr1} and @var{expr2} must be the same, but the
test is case-insensitive.
@item @var{expr1} and @var{expr2} are lists. 
In this case @var{expr1} and @var{expr2} must have the same length,
and, for each @var{i}, the @var{i}-th element of @var{expr1} must
be equal to the @var{i}-th element of @var{expr2}.
@item @var{expr1} and @var{expr2} are records.
In this case @var{expr1} and @var{expr2} must contain the same
attribute names, though not necessarily in the same order.
For each attribute name, the attribute value of @var{expr1} and the
attribute value of @var{expr2} must be equal.

@end table

If @var{expr1} and @var{expr2} do not have the same type and are both
different from the symbol @code{nil}, the test results in an error;
the symbol @code{nil} can be compared to any value without error message.

The test @code{@var{expr1} /= @var{expr2}} holds if and only if the
test @code{@var{expr1} = @var{expr2}} does not hold.

@c ----------------------------------------------------------------------------

@node Number Comparisons, Congruency Tests, Equality Tests, Conditions
@subsection The Operators @code{less}, @code{less_equal}, @code{greater}, @code{greater_equal}
@cindex @code{less} (operator)
@cindex @code{less_equal} (operator)
@cindex @code{greater} (operator)
@cindex @code{greater_equal} (operator)

A condition of type @code{@var{expr1} @var{operator} @var{expr2}} compares
two numbers. Here, @var{operator} can have the following values:

@table @code
@item less
The condition holds if @var{expr1} has a smaller value than @var{expr2}.
@item less_equal
The condition holds if @var{expr1} has a smaller value than
@var{expr2} or both numbers are equal.
@item greater
The condition holds if @var{expr1} has a bigger value than @var{expr2}
@item greater_equal
The condition holds if @var{expr1} has a bigger value than
@var{expr2} or both numbers are equal.
@end table

If either @var{expr1} or @var{expr2} is no number, an error will be
reported. 

@c ----------------------------------------------------------------------------

@node Congruency Tests, Operator In, Number Comparisons, Conditions
@subsection The Operators @samp{~} and @samp{/~}
@cindex @code{~} (operator)
@cindex @code{/~} (operator)

The operator @samp{~} can be used in the following ways:

@table @code
@item @var{list1} ~ @var{list2}
This tests whether @var{list1} and @var{list2} do @dfn{congruate},
this means, whether they have at least one element in common.

@item @var{symbol1} ~ @var{symbol2}
This tests if @code{atoms(@var{symbol1})} and
@code{atoms(@var{symbol2})}, the lists of their atomic symbols, do
congruate.
@end table

The comparison @code{@var{expr1} /~ @var{expr2}} holds if and only if the
comparison @code{@var{expr1} ~ @var{expr2}} does not hold.

@c ----------------------------------------------------------------------------

@node Operator In, Regular Expressions, Congruency Tests, Conditions
@subsection The Operator @code{in}
@cindex @code{in} (operator)

The operator @code{in} can be only used in the following ways:

@table @code
@item @var{symbol} in @var{record}
This condition holds if and only if @var{record} contains an attribute named
@var{symbol}.

@item @var{value} in @var{list}
This condition holds if and only if @var{value} is an element of
@var{list}.
@end table

@c ----------------------------------------------------------------------------

@node Regular Expressions,  , Operator In, Conditions
@subsection The @code{matches} Condition (Regular Expressions)
@cindex @code{matches} (operator)
@cindex expressions, regular
@cindex regular expressions
@cindex patterns, string
@cindex string patterns

The condition 
@code{@var{expr} matches @var{pattern}} 
or
@code{@var{expr} matches (@var{pattern})}
interprets @var{pattern} as a pattern (a regular expression) and
tests whether @var{expr} matches @var{pattern}. Patterns are defined as
follows:

@table @asis
@item @var{pattern} ::= @var{alternative} @{@samp{|} @var{alternative}@}
The string must be identical with one of the alternatives.

@item @var{alternative} ::= @{@var{atom} [@samp{*} | @samp{?} | @samp{+}]@}
An alternative is a (possibly empty) sequence of atoms. An atom in a
pattern corresponds to a character in a string. By using an optional
postfix operator it is possible to specify for any atom how often it may
be repeated within the string at that location: zero times or
once (@samp{?}), at least once (@samp{+}), or arbitrarily often,
including zero times (@samp{*}).

Normally, these operators are @emph{greedy}, i.e. they try to match as
much as possible. If you put a @samp{?} behind a postfix operator, it
will try to match as few characters as possible. This can make a
difference if you're assigning variables in your pattern.

@item @var{atom} ::= @samp{(} @var{pattern} @samp{)}
A pattern may be grouped by parentheses.

@item @var{atom} ::= @samp{[} [@samp{^}] @var{range} @{@var{range}@} @samp{]}
A character class. It represents exactly one character from one of the
ranges. If the symbol @samp{^} is the first one in the class, the
expression represents exactly one character that is @emph{not} contained
in one of the ranges.

@item @var{atom} ::= @samp{.}
Represents any character.

@item @var{atom} ::= @var{character}
Represents the character itself.

@item @var{range} ::= @var{character1} [@samp{-} @var{character2}]
The range contains any character with a code at least as big as the code
of @var{character1} and not bigger than the code of
@var{character2}. The code of @var{character2} must be at least as big
as the code of @var{character1}. If @var{character2} is omitted, the
range only contains @var{character1}.

@item @var{character} ::= Any character except @samp{*?+[]^-.\|()}
To use one of the characters @samp{*?+[]^-.|()}, it must be preceded by
a @samp{\\} (pattern escape). To insert the pattern escape itself, you
have to double it: @samp{\\\\}.

@end table

You can divide the pattern into segments:

@example
$surf matches ("un|in|im|ir|il", ".*", "(en)?")
@end example

is is the same as

@example
$surf matches ("(un|in|im|ir|il).*(en)?")
@end example

A section of the string can be stored in a variable by suffixing the respective
pattern with @samp{: @var{variable_name}}, as in

@example
$surf matches ("un|in|im|ir|il": $a, ".*")
@end example

For backwards compatibility, you may also prefix the pattern with the variable
name, as in

@example
$surf matches $a: "un|in|im|ir|il", ".*"
@end example

The variables defined by pattern matching are only defined in the statement
sequence which is being executed if the pattern matching is successful.
A @code{matches} condition may not have variable definitions in it if it
is

@itemize @bullet
@item 
contained in a disjunction (an @code{or} condition),
@item 
contained in a negation (a @code{not} condition), or
@item 
used as a truth value (e.g. in an assignment).
@end itemize

@c ----------------------------------------------------------------------------

@node Boolean Operators, Symbol Table, Conditions, The Language
@section The Operators @code{not}, @code{and}, and @code{or}
@cindex @code{not} (operator)
@cindex @code{and} (operator)
@cindex @code{or} (operator)
@cindex boolean operators

Conditions can be combined logically:

@table @code
@item not @var{cond}
This  is true if condition @var{cond} is false.

@item @var{cond1} and @var{cond2} and @var{cond3} and ...
This is true if all conditions @var{cond1}, @var{cond2}, @var{cond3},
... are true. The conditions are tested one by one from left to right
until one of them is false. This is called @dfn{short-cut evaluation}.

@item @var{cond1} or @var{cond2} or @var{cond3} or ...
This is true if at least one of the conditions @var{cond1},
@var{cond2}, @var{cond3}, ... is true. The conditions are tested one
by one from left to right until one of them is true. This is also a form
of short-cut evaluation.
@end table

The operator @code{not} takes exactly one argument. If its argument contains
another logical operator, put it in parentheses @samp{()}, as in 
@code{not (@var{cond1} or @var{cond2})}.

The operators @code{and} and @code{or} may not be mixed as in 
@code{@var{cond1} and @var{cond2} or @var{cond3}}; here the order of
evaluation would be ambiguous. Use parentheses @samp{()} to indicate in wich
order the condition is to be evaluated, as in 
@code{(@var{cond1} and @var{cond2}) or @var{cond3}}.

@c ----------------------------------------------------------------------------

@node Symbol Table, Initial State, Boolean Operators, The Language
@section The Symbol Table
@cindex symbol table
@cindex symbol definition

Every symbol used in a grammar has to be defined at least once in the 
@dfn{symbol table}. Every symbol must be followed by a semicolon:
@code{verb; noun; adjective;}

Symbols that are being defined that way are called @dfn{atoms}. A
symbol can also be defined as a @dfn{molecule}. Then the entry for this
symbol has the following format:
@cindex atoms
@cindex molecules

@example
@var{symbol} := @var{list};
@end example

The @var{list} for this symbol must consist of at least two atoms; no atom may
occur more than once in the list. This list will be used by the operators
@samp{~} and @samp{/~}, @code{atoms}, and @code{multi}. The
lists in the symbol table must be different from each other; it does not
suffice that they only differ in the order of their elements. If a symbol is
defined more than once in the symbol table, the definitions must all match:
Either the symbol must always be defined atomic or it must always be molecular
with the same atom-list.

@c ----------------------------------------------------------------------------

@node Initial State, Constant Definition, Symbol Table, The Language
@section The Initial State
@cindex state, initial
@cindex initial state

The initial state in a combination rule file is defined as follows:

@example
initial @var{value}, rules @var{rule1}, @var{rule2}, ...;
@end example

The initial state of a combi rule file specifies a feature structure and a list
of rules (behind the keyword @code{rules}). Each of the rules will be applied
to read in the first allomorph (in morphology) or word form (in syntax). The
list may be enclosed in parentheses.

@cindex failing rule
@cindex rule, failing
@cindex successful rule
@cindex rule, successful
A combi rule or an end rule is successful if it creates at least one
new state, otherwise it fails. If you want rules to be executed only
if all other rules failed, you can put their names behind the other rules'
names and write an @code{else} in front of them:

@example
initial @var{value}, rules @var{rule1}, @var{rule2} else
@var{rule3}, @var{rule4} else ...;
@end example

If both rules @var{rule1} and @var{rule2} fail, @var{rule3} and
@var{rule4} are executed. If these rules also fail, the next rules are
executed, and so on.

@c ----------------------------------------------------------------------------

@node Constant Definition, Rules, Initial State, The Language
@section The Constant Definition
@cindex constant definition
@cindex definition, constant

A constant definition is of the form

@example
define @@@var{constant} := @var{expr};
@end example

The constant expression @var{expr} will be evalued and the constant
@@@var{constant} will be defined to have this value. The constant must
not have been defined previously. The constant is valid from this
definition up to the end of the rule file. If you use the keyword
@code{default} instead of @code{define}, you provide a default value for
@@@var{constant}. This means, the value is only preliminary and may be
changed by a normal constant definition. After a constant has been used
in an expression, its value may not be changed any more.

@c ----------------------------------------------------------------------------

@node Rules, Statements, Constant Definition, The Language
@section Rules
@cindex rules
@cindex @code{allo_rule} (rule)
@cindex @code{combi_rule} (rule)
@cindex @code{end_rule} (rule)
@cindex @code{pruning_rule} (rule)
@cindex @code{robust_rule} (rule)
@cindex @code{input_filter} (rule)
@cindex @code{output_filter} (rule)
@cindex @code{subrule} (rule)

A rule is a sequence of statements that is executed as a unit:

@example
combi_rule @var{name}(@var{$param1}, @var{$param2}, ...):
  @var{statement1}
  @var{statement2}
  ...
end @var{name};
@end example

A rule has to begin with one of the keywords @code{allo_rule},
@code{combi_rule}, @code{end_rule}, @code{pruning_rule},
@code{robust_rule}, @code{input_filter}, @code{output_filter} or
@code{subrule}. It is followed by its @emph{parameter list}, a list of
variable names. The variables will be assigned the
parameter values when the rule is executed. The number of parameters
depends on the rule type. The rule names have the following meanings:

@table @code
@item allo_rule(@var{$lex_entry})
An allo-rule must occur exactly once in an allomorph rule file. It
analyses a lexical entry and must generate one or more allomorph entries
via @code{result}. An allomorph rule has one parameter, namely the
lexicon entry.

@item combi_rule(@var{$state}, @var{$link}, @var{$surf}, @var{$index})
Any number of combi-rules may occur in a combi-rule file. Before processing
such a rule, the @dfn{link} is read in, which is either the word form or
the allomorph that follows the state's surface. The first parameter of the rule
is the state's feature structure, the second is the link's feature structure,
the third is the link's surface, and the fourth is the link's index. The third
and the fourth parameter are optional. A combi-rule may state a successor rule
set or accept the analysed input (both via @code{result}).

@item end_rule(@var{$state}, @var{$remain_input})
Any number of end-rules may occur in a combi-rule file. The first parameter is
the state's feature structure, the second, which is optional, is the remaining
input. If the rule takes only one parameter, it is only called if the remaining
input is empty or begins with a space. An end rule may accept the analysed
input via @code{result}.

@item pruning_rule(@var{$list})
A pruning-rule may occur at most once in a combi-rule file. During
analysis, it can decide which states are still valid and which
are to be deleted. The parameter is a list of feature structures of the states
that have consumed the same input so far. The pruning-rule must execute
a @code{return} statement with a list of the symbols @code{yes} and/or
@code{no}. Each state in @var{$list} corresponds to a symbol in the
result list. If the symbol is @code{yes}, the corresponding state is
preserved. If the symbol is @code{no}, the state is abandoned.

@item robust_rule(@var{$surface}, @var{$remain_input})
A robust-rule can only appear at most once a morphology rule file. If
robust analysis has been switched on by the @code{robust} command, and a
word form could not be recognised by the combi-rules, the robust-rule is
executed with the surface of the next word form as its first
parameter. The next word form is defined as the remaining input up to
(but excluding) the next space. The optional second parameter contains
the whole remaining input. A robust-rule can accept any prefix of the
remaining input via @code{result}.

@item input_filter(@var{$feature_structure_list})
An input-filter may occur at most once in a syntax rule file. The
input-filter is called after a word form has been analysed. It gets one
parameter, namely the list of the analysis results, and it transforms it
to one or more filtered results (via @code{result}).

@item output_filter(@var{$feature_structure_list})
An output-filter may occur at most once in any rule file.

@table @emph
@item In allo-rule files:
The output-filter is called after all lexicon entry have been processed
by the allo-rules. The filter is called for every allomorph surface. It
gets one parameter, namely the list of the generated feature structures with
that surface, and it transforms it to one or more filtered allomorph
feature structures (via @code{result}).

@item In combi-rule files:
The output-filter is called after an item has been analysed. It gets one
parameter, namely the list of the analysis results, and it transforms it
to one or more filtered results (via @code{result}).
@end table

@item subrule(@var{$param1}, @var{$param2}, ...)
Any number of subrules may occur in any rule file. A subrule can be
invoked from other rules and it must return a value to this rule via
@code{return}. It can have any number of parameters (at least one).
@end table

If a rule is executed, all statements in the rule are processed sequentially.
After that, the rule execution is terminated. Thereby, the @code{if} statement,
the @code{foreach} statement, and the @code{select} statement may change the
processing order. Special conditions apply if:

@enumerate
@item 
A condition in a @code{require} statement does not hold. In this case the
processing of the current rule path is terminated. This is not an error.
@item 
The @code{stop} statement was executed. In this case the
processing of the current rule path is terminated. This is not an error.
@item 
An @code{assert} condition does not hold. In this case the processing of
the whole grammar is terminated and an error message is displayed. This rule
termination can be used to find bugs in the rule system or in the lexicon.
@item 
The @code{error} statement was executed. In this case the processing of
the whole grammar is terminated and an error message is displayed. 
@item 
The @code{return} statement was executed in a subrule or in a pruning
rule. In a subrule, this terminates the subrule int the current rule path and
immediately returns to the calling rule. In a pruning rule, this terminates
the pruning rule.
@end enumerate

@c ----------------------------------------------------------------------------

@node Statements, Files, Rules, The Language
@section Statements
@cindex statements

A rule body contains a sequence of statements.

The statements are the assignment and the statements beginning with
@code{assert}, @code{choose}, @code{define}, @code{error},
@code{foreach}, @code{if}, @code{repeat}, @code{require},
@code{result}, @code{return}, @code{select}, and @code{stop}.

@menu
* Assert::         Report an error if condition is not met.
* The Assignment:: Assign a new value to a variable.
* Break::          Break a @code{foreach} loop.
* Choose::         Branch the current path for different values.
* Continue::       Go to the next pass of a @code{foreach} loop.
* Define::         Define a new variable.
* Error::          Report an error.
* Foreach::        Repeat statements for a given number of iterations.
* If::             Conditionally execute statements.
* Repeat::         Repeat statements for an unknown number of iterations.
* Require::        Terminate the current path if condition is not met.
* Result::         Emit a result in a rule.
* Return::         Terminate the current subrule and return a value.
* Select::         Branch the current path for different statement sequences.
* Stop::           Terminate a path.

@end menu

@c ----------------------------------------------------------------------------

@node Assert, The Assignment, Statements, Statements
@subsection The @code{assert} Statement

The statement @code{assert @var{condition};} or @code{! 
@var{condition};} tests whether @var{condition} holds. If this is not
the case, an error message with the line number in the source code is
displayed and the processing of @emph{all} paths is terminated.

The @code{assert} statement should be used to check whether there are
structural flaws in the lexicon or the rule system.

@c ----------------------------------------------------------------------------

@node The Assignment, Break, Assert, Statements
@subsection The Assignment
@cindex assignment

To set the value of an already defined variable to a different value, use a
statement of the following form:

@example
@var{$var} := @var{expr};
@end example

The expression @var{expr} is evaluated and the result is assigned to the
variable @var{$var}. The variable must have already been defined.

You can assign the elements of a list value to multiple variables at once:
@cindex list assignment

@example
<@var{$var1}, @var{$var2}, ... > := @var{expr};
@end example

The first, second, ... element of @var{expr}, which must be a list, is
assigned to variable @var{$var1}, @var{$var2}, ... respectively. Any of
these variables may be followed by a path.
The number of variables must match the length of the list value.

You can optionally specify a path behind the variable that is to be set by an
assignment:

@example 
@var{$var}.@var{part1}.@var{part2} := @var{value};
@end example

In this case, only the value of @code{@var{$var}.@var{part1}.@var{part2}}
will be set to @var{value}; the remainder of the variable @var{$var}
will be unchanged. Each @var{part} must be an expression that evaluates
to a symbol, a number or a list of symbols and numbers.

You can also use one of four other assignment operators instead of the operator
@samp{:=}: The statement @code{@var{$var} :=+ @var{value};} is a
shorthand for @code{@var{$var} := @var{$var} + @var{value};}. The
same holds for the assignment operators @samp{:=-}, @samp{:=*}, and
@samp{:=/}. Here, @var{$var} may be followed by a path again.

@c ----------------------------------------------------------------------------

@node Break, Choose, The Assignment, Statements
@subsection The @code{break} Statement
@cindex @code{break} (statement)

The @code{break} statement leaves the @code{foreach} loop with @var{Label}.

@example
break @var{Label};
@end example

If the label is omitted, the break statement leaves the innermost
@code{foreach} loop it is contained in. The statement must be situated
in the body of the @code{foreach} loop it wants to leave.

@c ----------------------------------------------------------------------------

@node Choose, Continue, Break, Statements
@subsection The @code{choose} Statement
@cindex @code{choose} (statement)

The @code{choose} statement chooses an element of a list. Its format
is:

@example
choose @var{$var} in @var{expr};
@end example

For every element in the list @var{expr} a rule path is created; in this
rule path the element is stored in the variable @var{$var}. Thus the
number of rule paths can multiply. If, for example, @var{expr} has the
value @code{<A, B, C>}, the currently processed rule path has three
continuations: In the first one @var{$var} has the value @code{A}, in
the second one it has the value @code{B} and in the third one it has the
value @code{C}. The three paths behave independently from now on.

The @code{choose} statement can also be used for records. In that case, the
variable @var{$var} gets a different attribute name of the record
@var{expr} in each path.

The @code{choose} statement also works for numbers: 
@itemize @bullet
@item 
If @var{expr} is a positive number @var{n}, the variable @var{$var} is
assigned the numbers 1, 2, ..., @var{n}, respectively, in each path.
@item 
If @var{expr} is a negative number @var{-n}, the variable @var{$var} is
assigned the numbers -1, -2, ..., @var{-n}, respectively, in each path.
@end itemize

@c ----------------------------------------------------------------------------

@node Continue, Define, Choose, Statements
@subsection The @code{continue} Statement
@cindex @code{continue} (statement)

The @code{continue} statement terminates the current pass of the
@code{foreach} loop with @var{Label} and starts the next pass. If the
current pass is the last one, the loop will be left.

@example
continue @var{Label};
@end example

If the label is omitted, the statement affects the innermost
@code{foreach} loop it is contained in. The statement must be situated
in the body of the @code{foreach} loop it wants to affect.

@c ----------------------------------------------------------------------------

@node Define, Error, Continue, Statements
@subsection The @code{define} Statement
@cindex @code{define} (statement)

A @code{define} statement is of the form
@example
define @var{$var} := @var{expr};
@end example
The expression @var{expr} is evaluated and the result is assigned to the
variable @var{$var}. The variable may not be defined before this statement;
it is defined by the statement and only exists until the statement sequence in
which the assignment is situated has been processed fully.

You can assign the elements of a list value to multiple variables at once:
@example
define <@var{$var1}, @var{$var2}, ... > := @var{expr};
@end example
The first, second, ... element of @var{expr}, which must be a list, is
assigned to the new variable @var{$var1}, @var{$var2}, ... respectively.
The number of variables must match the length of the list value.

@c ----------------------------------------------------------------------------

@node Error, Foreach, Define, Statements
@subsection The @code{error} Statement
@cindex @code{error} (statement)

The statement @code{error} terminates the execution of @emph{all}
paths and displays the given expression, which must be a string, and
the line of the source text:

@example
error @var{message};
@end example

@c ----------------------------------------------------------------------------

@node Foreach, If, Error, Statements
@subsection The @code{foreach} Statement
@cindex @code{foreach} (statement)

You may wish to manipulate all elements of a list or a record
@emph{sequentially} in @var{one} rule path. For this purpose, the
@code{foreach} statement was introduced. It has the following format:

@example
foreach @var{$var} in @var{expr}:
  @var{statements}
end foreach;
@end example

Sequentually, @var{$var} is assigned a number of values, depending on the
type of @var{expr}, and the statement sequence @var{statements} is executed
for each of those assignments. Every time the @var{statements} are being
walked through, the variable @var{$var} is defined again. Its scope is the
block @var{statements}.

@itemize @bullet
@item 
If @var{expr} is a list, @var{$var} is assigned the first, second,
third, ... element of @var{expr}.
@item 
If @var{expr} is a record, @var{$var} is assigned the first, second,
... attribute name of @var{expr}.
@item 
If @var{expr} is a positive number @var{n}, the variable @var{$var} is
assigned the numbers 1, 2, ..., @var{n} sequentially.
@item 
If @var{expr} is a negative number @var{n}, the variable @var{$var} is
assigned the numbers -1, -2, ..., @var{-n} sequentially.
@item 
If @var{expr} is an empty list, an empty record or the number 0, the
foreach loop is terminated immediately.
@end itemize

@c ----------------------------------------------------------------------------

@node If, Repeat, Foreach, Statements
@subsection The @code{if} Statement
@cindex @code{if} (statement)
@cindex @code{else} (keyword)
@cindex @code{elseif} (keyword)

An @code{if} statement has the following form:

@example
if @var{condition1} then
  @var{statements1}
elseif @var{condition2} then
  @var{statements2}
else 
  @var{statements3}
end if;
@end example

The @code{elseif} part may be repeated unrestrictedly (including zero times),
the @code{else} part may be omitted.

First, @var{condition1} is evaluated. If it is satisfied, the
statement sequence @var{statements1} is executed.

If the first condition is not satisfied, @var{condition2} is evaluated; if
the result is true, @var{statements2} is executed. This procedure is
repeated for every @code{elseif} part until a condition is satisfied.

If the @code{if} condition and @code{elseif} conditions fail, the statement
sequence @var{statements3} is executed (if it exists).

After the @code{if} statement has been processed, the following statement is
executed.

The @code{if} after the @code{end} may be omitted.

@c ----------------------------------------------------------------------------

@node Repeat, Require, If, Statements
@subsection The @code{repeat} Statement
@cindex @code{repeat} (statement)

You may wish to repeat a sequence of statements while a specific condition
holds. This can be realised by the @code{repeat} loop. It has the following
form:

@example
repeat
  @var{statements1}
while @var{condition};
  @var{statements2}
end repeat;
@end example

The statements @var{statements1} are executed. Then,  @var{condition}
is tested. If it holds, the @var{statements2} are
executed and the @code{repeat} statement is executed again. If @var{condition}
does not hold, execution proceeds after the @code{repeat} statement.

If @var{statements1} is empty, the @code{repeat} loop is equivalent to a
while loop in C:

@example
repeat while @var{condition};
  @var{statements}
end repeat;
@end example

If @var{statements2} is empty, the @code{repeat} loop is equivalent to a
do-while loop in C:

@example
repeat
  @var{statements}
while @var{condition}; 
end repeat;
@end example

@c ----------------------------------------------------------------------------

@node Require, Result, Repeat, Statements
@subsection The @code{require} Statement
@cindex @code{require} (statement)

A statement of the form

@example
require @var{condition};
@end example

or

@example
? @var{condition};
@end example

tests whether @var{condition} is true. If this is not the case the rule path
is terminated @emph{without} error message. Test statements should be used to
decide whether the combination of a state and a link is grammatical.

@c ----------------------------------------------------------------------------

@node Result, Return, Require, Statements
@subsection The @code{result} Statement
@cindex @code{result} (statement)
@cindex @code{accept} (keyword)

@table @emph
@item In combi rules:
The statement

@example 
result @var{expr}, rules @var{rule1}, @var{rule2}, ...;
@end example

specifies the Result feature structure of the rule and the successor rules. The
value @var{expr} is the Result feature structure. Behind the keyword
@code{rules} the names of all successor rules are enumerated. For every
successor rule that is being executed a new rule path will be created. The rule
set may be enclosed in parentheses.

If you want successor rules to be executed only if no other rule has
been successful, you can put their names behind the other rules' names
and write an @code{else} in front of them:

@example 
result @var{expr}, 
rules @var{rule1}, @var{rule2} else @var{rule3}, @var{rule4} else ...;
@end example

If none of the normal rules (here: @var{rule1} and @var{rule2}) has been
successful, @var{rule3} and @var{rule4} are executed. If these rule also fail,
the next rules are executed, and so on. A rule has been successful if at least
one @code{result} statement has been executed.

@item In combi-rules and end-rules:
If the input is to be accepted by the @code{result} statement (and
therefore no successor rules are to be called) the following format has
to be used:

@example 
result @var{expr}, accept;
@end example

If this statement is reached in a rule path, the input is accepted as
grammatically well-formed. The value @var{expr} is returned as the
result of the morphological or syntactic analysis.

@item In filters: 
The format of a @code{result} statement in a filter or robust-rule is

@example
result @var{expr};
@end example

If this statement is reached, the value @var{expr} is used as a result
of the executed rule.

@item In robust-rules:
The format of a @code{result} statement in a robust-rule:

@example
result @var{feature_structure};
@end example

or

@example
result @var{surface}, @var{feature_structure};
@end example

The word form @var{surface} with feature structure @var{feature_structure} is
used as a result of the robust-rule. @var{surface} must be a prefix of the
input that has not been parsed yet. If it is omitted, the input up to, but
excluding, the first space is taken.

@item In allo-rules:
The format of the @code{result} statement in an allo rule is:

@example 
result @var{surface}, @var{feature_structure};
@end example

It creates an entry in the allomorph lexicon. The allomorph surface
@var{surface} must be a string; @var{feature_structure} is the feature
structure of the allomorph.

@end table

@c ----------------------------------------------------------------------------

@node Return, Select, Result, Statements
@subsection The @code{return} Statement
@cindex @code{return} (statement)

In a subrule, the @code{return} statement is of the following form:

@example
return @var{expr};
@end example

The value of @var{expr} is returned to the rule that invoked this subrule and
the subrule execution is finished.

In a pruning rule, the @code{return} statement is of the same form. Here,
@var{expr} must be a list a list of the symbols @code{yes} and/or
@code{no}. Each state in the feature structure list, which is the pruning rule
parameter, corresponds to a symbol in the result list. If the symbol is
@code{yes}, the corresponding state is preserved. If the symbol is @code{no},
the state is abandoned.

@c ----------------------------------------------------------------------------

@node Select, Stop, Return, Statements
@subsection The @code{select} Statement
@cindex @code{select} (statement)

By using the @code{select} statement, more than one continuation of an
analysis path can be generated. Its format is:

@example
select
  @var{statements1}
or
  @var{statements2}
or
  @var{statements3}
...
end select;
@end example

This creates as many rule paths as there are statement sequences. In the
first rule path, @var{statements1} are executed, in the second one
@var{statements2} are executed, etc. Each rule path continues by
executing the statements following the @code{select} statement.

The keyword @code{select} behind the @code{end} can be omitted.

@c ----------------------------------------------------------------------------

@node Stop,  , Select, Statements
@subsection The @code{stop} Statement
@cindex @code{stop} (statement)

The @code{stop} statement terminates the current rule path. Its format is:

@example 
stop;
@end example

@c ----------------------------------------------------------------------------

@node Files, Syntax Summary, Statements, The Language
@section Files
@cindex files

A Malaga grammar system comprises several files: a symbol file, a lexicon file,
an allomorph rule file, a morphology rule file, an extended symbol file
(optional), and a syntax rule file (optional). The type of a file can be
seen by the ending of the file name. A grammar for the English language may
consist of the files @file{english.sym}, @file{english.lex}, 
@file{english.all}, @file{english.mor} and @file{english.syn}.

@menu
* Symbol File::          The definition of all morphology symbols.
* Extended Symbol File:: Additional syntax symbols.
* Lexicon File::         The lexicon from which allomorphs will be created.
* Allomorph Rule File::  The rules that create the allomorphs.
* Combi-Rule Files::     The LAG rules that combine the allomorphs or words.
@end menu

@c ----------------------------------------------------------------------------

@node Symbol File, Extended Symbol File, Files, Files
@subsection The Symbol File
@cindex files, symbol
@cindex symbol files

A symbol file has the suffix @file{.sym}. It contains the symbol table.

@c ----------------------------------------------------------------------------

@node Extended Symbol File, Lexicon File, Symbol File, Files
@subsection The Extended Symbol File
@cindex files, extended symbol
@cindex symbol files, extended
@cindex extended symbol files

An extended symbol file has the suffix @file{.esym}. It contains an
additional symbol table that contains symbols which may only be used in the
syntax rule file.

@c ----------------------------------------------------------------------------

@node Lexicon File, Allomorph Rule File, Extended Symbol File, Files
@subsection The Lexicon File
@cindex files, lexicon
@cindex lexicon files

A lexicon file has the suffix @file{.lex}. It consists of any number of
values and constant definitions, each terminated by a semicolon. Each
value stands for a lexical entry. A value may contain named constants
and the operators @samp{.}, @samp{+}, @samp{-}, @samp{*}, and @samp{/}.
values, the lexical entries; The format of the lexical entries is free,
although it should be consistent with the conception of the whole rule
system.

@c ----------------------------------------------------------------------------

@node Allomorph Rule File, Combi-Rule Files, Lexicon File, Files
@subsection The Allomorph Rule File
@cindex files, allomorph rule
@cindex rule files, allomorph
@cindex allomorph rule files

The allomorph lexicon is generated from the base form lexicon by applying the
allo-rule on the base form entries. The allomorph generation rule file has
the suffix @file{.all} and consists of one allo-rule, an optional
output-filter, and any number of subrules and constant definitions.

For every lexical entry, the allo-rule is executed with the value of the
lexicon entry as parameter. The allo-rule can generate allomorphs using the
@code{result} statement.

After all allomorphs have been produced, the output-filter is executed once for
each surface in the (intermediate) allomorph lexicon. As parameter, the
output-filter gets the list of feature structures that share that surface. An
entry in the final allomorph lexicon is created everytime the @code{result}
statement is executed. The surface cannot be changed by the output-filter.

@c ----------------------------------------------------------------------------

@node Combi-Rule Files,  , Allomorph Rule File, Files
@subsection The Combi-Rule Files
@cindex files, combi-rule
@cindex files, syntax rule
@cindex files, morphology rule
@cindex rule files, syntax
@cindex rule files, morphology
@cindex combi-rule files
@cindex syntax rule files
@cindex morphology rule files

A grammar system includes up to two combination rules files: one for
morphological combination with the suffix @file{.mor} and (optionally) one
for syntactic combination with the suffix @file{.syn}.

A combination rule file consists of an initial state and any number of
combi-rules, subrules, and constant definitions. A syntax rule file
may contain one optional pruning-rule, one optional input-filter and
one optional output-filter; a morphology rule file may contain one
optional robust-rule, one optional pruning-rule and one optional
output-filter. 

Beginning with the rules listed up in the initial state, the rules and
their successors are processed until a @code{result} statement with the
keyword @code{accept} is encountered in every path. A path dies if there is no
more input (from the lexicon or from the morphology) that can be processed.

In morphology, if analysis has created no result and robust analysis has been
switched on, the robust-rule will be called with the analysis surface and can
create a result.

In syntax, when a new wordfom has been imported from morphology, the
input-filter can take a look at its feature structuress and create new result
feature structures. 

If a pruning-rule is present, pruning has been activated, and the
number of current LAG states is not less than @code{mor-pruning} (in
morphology) or @code{syn-pruning} (in syntax), the concatenation of
the next allomorph (in morphology) or word form (in syntax) is
preceded by the following step: The feature structures of all current
LAG states are merged into a list, which is the parameter of the
pruning rule. The pruning-rule must execute a @code{return} statement
with a list of the symbols @code{yes} and @code{no}. Each state in the
feature structure list corresponds to a symbol in the result list. If
the symbol is @code{yes}, the corresponding state is preserved. If the
symbol is @code{no}, the state is abandoned.

After analysis has completed, the output-filter can take a look at all result
feature structures and create new result feature structures. This can
be used to merge similar feature structures or drop some results.

@c ----------------------------------------------------------------------------

@node Syntax Summary,  , Files, The Language
@section Summary of the Malaga Syntax
@cindex syntax, Malaga

The syntax of Malaga source texts is defined formally by a sort of EBNF
notation:

@itemize @bullet
@item 
Terminals like @code{assert} and @samp{:=} stand for themselves. 
@item 
Nonterminals like @var{assignment} are defined by @dfn{productions}.
@item
A bar `|' separates alternatives.
@item 
Brackets `[]' enclose optional parts.
@item 
Curly braces `@{@}' enclose parts that are repeated zero times, one time, or
multiple times.
@item
Parentheses `()' are used for grouping.
@end itemize

The start productions for Malaga source texts are
@var{lexicon-file}, @var{rule-file}, and @var{symbol-file}. A
nonterminal marked with @samp{*} in its definition is a lexical
symbol.

@table @var

@item assert-statement:
(@code{assert} | @samp{!}) @var{condition} @samp{;}

@item assignment:
@var{path} (@samp{:=} | @samp{:=+} | @samp{:=-} | @samp{:=*} |
@samp{:=/}) @var{expression} @samp{;} | @samp{<} @var{path} @{@samp{,}
@var{path}@} @samp{>} @samp{:=} @var{expression} @samp{;}

@item break-statement:
@code{break} [@var{label}] @samp{;}

@item choose-statement:
@code{choose} @var{variable} @code{in} @var{expression} @samp{;}

@item comment*:
@samp{#} @{@var{printing-char}@}

@item comparison:
[@code{not}] (@var{expression} [@var{comparison-operator}
@var{expression}] | @var{match-comparison})

@item comparison-operator:
@samp{=} | @samp{/=} | @samp{~} | @samp{/~} | @code{in} | @code{less} |
@code{greater} | @code{less_equal} | @code{greater_equal}

@item condition:
@var{comparison} (@{@code{and} @var{comparison}@} | @{@code{or}
@var{comparison}@})

@item constant*:
@samp{@@} @var{identifier}

@item constant-definition:
(@code{define} | @code{default}) @var{constant} @samp{:=}
@var{constant-expression} @samp{;}

@item constant-expression:
@var{expression}

@item continue-statement:
@code{continue} [@var{label}] @samp{;}

@item define-statement:
@code{define} @var{variable} @samp{:=} @var{expression} @samp{;} |
@code{define} @samp{<} @var{variable} @{@samp{,} @var{variable}@}
@samp{>} @samp{:=} @var{expression} @samp{;}

@item error-statement:
@code{error} @var{expression} @samp{;}

@item expression:
@var{term} @{(@samp{+} | @samp{-}) @var{term}@}

@item factor:
@var{value} @{@samp{.} @var{value}@}

@item foreach-statement:
[@var{label} @samp{:}] @code{foreach} @var{variable} @code{in}
@var{expression} @samp{:} @var{statements} @code{end}
[@code{foreach}] @samp{;}

@item identifier*:
(@var{letter} | @samp{_} | @samp{&}) @{@var{letter} | @var{digit} |
@samp{_} | @samp{&}@}

@item if-statement:
@code{if} @var{condition} @code{then} @var{statements}
@{@code{elseif} @var{condition} @code{then} @var{statements}@}
[@code{else} @var{statements}] @code{end} [@code{if}] @samp{;}

@item if-expression:
@code{if} @var{condition} @code{then} @var{expression}
@{@code{elseif} @var{condition} @code{then} @var{expression}@}
@code{else} @var{expression} @code{end} [@code{if}]

@item include:
@code{include} @var{string} @samp{;}

@item initial:
@code{initial} @var{constant-expression} @samp{,} @var{rule-set} @samp{;}

@item label:
@var{identifier}

@item lexicon-file:
@{@var{constant-definition} | @var{constant-expression} @samp{;}@}

@item list:
@samp{<} @{@var{expression} @{@samp{,} @var{expression}@}@} @samp{>}

@item match:
@var{constant-expression} [@samp{:} @var{variable}] | @var{variable}
@samp{:} @var{constant-expression}

@item match-comparison:
@var{expression} @code{matches} ( @samp{(} @var{match} @{@samp{,}
@var{match}@} @samp{)} | @var{match} @{@samp{,} @var{match}@} )

@item number*:
@var{digit} @{@var{digit}@} ( @samp{L} | @samp{R} | [@samp{.} @var{digit}
@{@var{digit}@}] [@samp{E} @var{digit} @{@var{digit}@}] )

@item path:
@var{variable} @{@samp{.} @var{value}@}

@item record:
@samp{[} @{@var{symbol-value-pair} @{@samp{,}
@var{symbol-value-pair}@}@} @samp{]}

@item repeat-statement:
@code{repeat} @var{statements} @code{while} @var{condition} @samp{;}
@var{statements} @code{end} [@code{repeat}] @samp{;}

@item require-statement:
(@code{require} | @samp{?}) @var{condition} @samp{;}

@item result-statement:
@code{result} @var{expression} [@samp{,} (@var{rule-set} |
@code{accept})] @samp{;}

@item return-statement:
@code{return} @var{expression} @samp{;}

@item rule:
@var{rule-type} @var{rule-name} @samp{(} @var{variable} @{@samp{,}
@var{variable}@} @samp{)} @samp{:} @var{statements} @code{end}
[@var{rule-type}] [@var{rule-name}] @samp{;}

@item rule-file:
@{@var{rule} | @var{constant-definition} | @var{initial} |
@var{include}@}

@item rule-name:
@var{identifier}

@item rule-set:
@code{rules} (@var{rules} @{@code{else} @var{rules}@} | @samp{(}
@var{rules} @{@code{else} @var{rules}@} @samp{)})

@item rule-type:
@code{allo_rule} | @code{combi_rule} | @code{end_rule} |
@code{pruning_rule} | @code{robust_rule} | @code{input_filter} |
@code{output_filter} | @code{subrule}

@item rules:
@var{rule-name} @{@samp{,} @var{rule-name}@}

@item select-statement:
@code{select} @var{statements} @{@code{or} @var{statements}@}
@code{end} [@code{select}] @samp{;}

@item statements:
@{@var{assert-statement} | @var{assignment} | @var{break-statement} | 
@var{choose-statement} | @var{continue-statement} | @var{define-statement} | 
@var{error-statement} | @var{foreach-statement} | @var{if-statement} | 
@var{select-statement} | @var{repeat-statement} | @var{require-statement} | 
@var{result-statement} | @var{return-statement} | @var{stop-statement}@}  

@item stop-statement:
@code{stop} @samp{;}

@item string*:
@samp{"} @{@var{char-except-double-quotes} | @samp{\"} | @samp{\\}@} @samp{"}

@item subrule-invocation:
@var{rule-name} @samp{(} @var{expression} @{@samp{,} @var{expression}@}

@item symbol:
@var{identifier}

@item symbol-definition:
symbol [@samp{:=} @samp{<} @var{symbol} @{@samp{,} @var{symbol}@}
@samp{>}] @samp{;}

@item symbol-file:
@{@var{symbol-definition} | @var{include}@}

@item symbol-value-pair:
@var{expression} @samp{:} @var{expression}

@item term:
@var{factor} @{(@samp{*} | @samp{/}) @var{factor}@}

@item value:
[@samp{-}] (@var{symbol} | @var{string} | @var{number} | @var{list} |
@var{record} | @var{constant} | @var{subrule-invocation} |
@var{variable} | @samp{(} @var{condition} @samp{)}) | @var{if-expression}

@item variable*:
@samp{$} @var{identifier}
@end table

@c ----------------------------------------------------------------------------

@node Index,  , The Language, Top
@unnumbered Index
@printindex cp

@bye

@c End of file. ===============================================================
source-git / malaga

Source Code

Files