Blame manual/message.texi

Packit 6c4009
@node Message Translation, Searching and Sorting, Locales, Top
Packit 6c4009
@c %MENU% How to make the program speak the user's language
Packit 6c4009
@chapter Message Translation
Packit 6c4009
Packit 6c4009
The program's interface with the user should be designed to ease the user's
Packit 6c4009
task.  One way to ease the user's task is to use messages in whatever
Packit 6c4009
language the user prefers.
Packit 6c4009
Packit 6c4009
Printing messages in different languages can be implemented in different
Packit 6c4009
ways.  One could add all the different languages in the source code and
Packit 6c4009
choose among the variants every time a message has to be printed.  This is
Packit 6c4009
certainly not a good solution since extending the set of languages is
Packit 6c4009
cumbersome (the code must be changed) and the code itself can become
Packit 6c4009
really big with dozens of message sets.
Packit 6c4009
Packit 6c4009
A better solution is to keep the message sets for each language
Packit 6c4009
in separate files which are loaded at runtime depending on the language
Packit 6c4009
selection of the user.
Packit 6c4009
Packit 6c4009
@Theglibc{} provides two different sets of functions to support
Packit 6c4009
message translation.  The problem is that neither of the interfaces is
Packit 6c4009
officially defined by the POSIX standard.  The @code{catgets} family of
Packit 6c4009
functions is defined in the X/Open standard but this is derived from
Packit 6c4009
industry decisions and therefore not necessarily based on reasonable
Packit 6c4009
decisions.
Packit 6c4009
Packit 6c4009
As mentioned above, the message catalog handling provides easy
Packit 6c4009
extendability by using external data files which contain the message
Packit 6c4009
translations.  I.e., these files contain for each of the messages used
Packit 6c4009
in the program a translation for the appropriate language.  So the tasks
Packit 6c4009
of the message handling functions are
Packit 6c4009
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
locate the external data file with the appropriate translations
Packit 6c4009
@item
Packit 6c4009
load the data and make it possible to address the messages
Packit 6c4009
@item
Packit 6c4009
map a given key to the translated message
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
The two approaches mainly differ in the implementation of this last
Packit 6c4009
step.  Decisions made in the last step influence the rest of the design.
Packit 6c4009
Packit 6c4009
@menu
Packit 6c4009
* Message catalogs a la X/Open::  The @code{catgets} family of functions.
Packit 6c4009
* The Uniforum approach::         The @code{gettext} family of functions.
Packit 6c4009
@end menu
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Message catalogs a la X/Open
Packit 6c4009
@section X/Open Message Catalog Handling
Packit 6c4009
Packit 6c4009
The @code{catgets} functions are based on the simple scheme:
Packit 6c4009
Packit 6c4009
@quotation
Packit 6c4009
Associate every message to translate in the source code with a unique
Packit 6c4009
identifier.  To retrieve a message from a catalog file solely the
Packit 6c4009
identifier is used.
Packit 6c4009
@end quotation
Packit 6c4009
Packit 6c4009
This means for the author of the program that s/he will have to make
Packit 6c4009
sure the meaning of the identifier in the program code and in the
Packit 6c4009
message catalogs is always the same.
Packit 6c4009
Packit 6c4009
Before a message can be translated the catalog file must be located.
Packit 6c4009
The user of the program must be able to guide the responsible function
Packit 6c4009
to find whatever catalog the user wants.  This is separated from what
Packit 6c4009
the programmer had in mind.
Packit 6c4009
Packit 6c4009
All the types, constants and functions for the @code{catgets} functions
Packit 6c4009
are defined/declared in the @file{nl_types.h} header file.
Packit 6c4009
Packit 6c4009
@menu
Packit 6c4009
* The catgets Functions::      The @code{catgets} function family.
Packit 6c4009
* The message catalog files::  Format of the message catalog files.
Packit 6c4009
* The gencat program::         How to generate message catalogs files which
Packit 6c4009
                                can be used by the functions.
Packit 6c4009
* Common Usage::               How to use the @code{catgets} interface.
Packit 6c4009
@end menu
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node The catgets Functions
Packit 6c4009
@subsection The @code{catgets} function family
Packit 6c4009
Packit 6c4009
@deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
Packit 6c4009
@standards{X/Open, nl_types.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
Packit 6c4009
@c catopen @mtsenv @ascuheap @acsmem
Packit 6c4009
@c  strchr ok
Packit 6c4009
@c  setlocale(,NULL) ok
Packit 6c4009
@c  getenv @mtsenv
Packit 6c4009
@c  strlen ok
Packit 6c4009
@c  alloca ok
Packit 6c4009
@c  stpcpy ok
Packit 6c4009
@c  malloc @ascuheap @acsmem
Packit 6c4009
@c  __open_catalog @ascuheap @acsmem
Packit 6c4009
@c   strchr ok
Packit 6c4009
@c   open_not_cancel_2 @acsfd
Packit 6c4009
@c   strlen ok
Packit 6c4009
@c   ENOUGH ok
Packit 6c4009
@c    alloca ok
Packit 6c4009
@c    memcpy ok
Packit 6c4009
@c   fxstat64 ok
Packit 6c4009
@c   __set_errno ok
Packit 6c4009
@c   mmap @acsmem
Packit 6c4009
@c   malloc dup @ascuheap @acsmem
Packit 6c4009
@c   read_not_cancel ok
Packit 6c4009
@c   free dup @ascuheap @acsmem
Packit 6c4009
@c   munmap ok
Packit 6c4009
@c   close_not_cancel_no_status ok
Packit 6c4009
@c  free @ascuheap @acsmem
Packit 6c4009
The @code{catopen} function tries to locate the message data file named
Packit 6c4009
@var{cat_name} and loads it when found.  The return value is of an
Packit 6c4009
opaque type and can be used in calls to the other functions to refer to
Packit 6c4009
this loaded catalog.
Packit 6c4009
Packit 6c4009
The return value is @code{(nl_catd) -1} in case the function failed and
Packit 6c4009
no catalog was loaded.  The global variable @var{errno} contains a code
Packit 6c4009
for the error causing the failure.  But even if the function call
Packit 6c4009
succeeded this does not mean that all messages can be translated.
Packit 6c4009
Packit 6c4009
Locating the catalog file must happen in a way which lets the user of
Packit 6c4009
the program influence the decision.  It is up to the user to decide
Packit 6c4009
about the language to use and sometimes it is useful to use alternate
Packit 6c4009
catalog files.  All this can be specified by the user by setting some
Packit 6c4009
environment variables.
Packit 6c4009
Packit 6c4009
The first problem is to find out where all the message catalogs are
Packit 6c4009
stored.  Every program could have its own place to keep all the
Packit 6c4009
different files but usually the catalog files are grouped by languages
Packit 6c4009
and the catalogs for all programs are kept in the same place.
Packit 6c4009
Packit 6c4009
@cindex NLSPATH environment variable
Packit 6c4009
To tell the @code{catopen} function where the catalog for the program
Packit 6c4009
can be found the user can set the environment variable @code{NLSPATH} to
Packit 6c4009
a value which describes her/his choice.  Since this value must be usable
Packit 6c4009
for different languages and locales it cannot be a simple string.
Packit 6c4009
Instead it is a format string (similar to @code{printf}'s).  An example
Packit 6c4009
is
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
First one can see that more than one directory can be specified (with
Packit 6c4009
the usual syntax of separating them by colons).  The next things to
Packit 6c4009
observe are the format string, @code{%L} and @code{%N} in this case.
Packit 6c4009
The @code{catopen} function knows about several of them and the
Packit 6c4009
replacement for all of them is of course different.
Packit 6c4009
Packit 6c4009
@table @code
Packit 6c4009
@item %N
Packit 6c4009
This format element is substituted with the name of the catalog file.
Packit 6c4009
This is the value of the @var{cat_name} argument given to
Packit 6c4009
@code{catgets}.
Packit 6c4009
Packit 6c4009
@item %L
Packit 6c4009
This format element is substituted with the name of the currently
Packit 6c4009
selected locale for translating messages.  How this is determined is
Packit 6c4009
explained below.
Packit 6c4009
Packit 6c4009
@item %l
Packit 6c4009
(This is the lowercase ell.) This format element is substituted with the
Packit 6c4009
language element of the locale name.  The string describing the selected
Packit 6c4009
locale is expected to have the form
Packit 6c4009
@code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
Packit 6c4009
first part @var{lang}.
Packit 6c4009
Packit 6c4009
@item %t
Packit 6c4009
This format element is substituted by the territory part @var{terr} of
Packit 6c4009
the name of the currently selected locale.  See the explanation of the
Packit 6c4009
format above.
Packit 6c4009
Packit 6c4009
@item %c
Packit 6c4009
This format element is substituted by the codeset part @var{codeset} of
Packit 6c4009
the name of the currently selected locale.  See the explanation of the
Packit 6c4009
format above.
Packit 6c4009
Packit 6c4009
@item %%
Packit 6c4009
Since @code{%} is used as a meta character there must be a way to
Packit 6c4009
express the @code{%} character in the result itself.  Using @code{%%}
Packit 6c4009
does this just like it works for @code{printf}.
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
Packit 6c4009
Using @code{NLSPATH} allows arbitrary directories to be searched for
Packit 6c4009
message catalogs while still allowing different languages to be used.
Packit 6c4009
If the @code{NLSPATH} environment variable is not set, the default value
Packit 6c4009
is
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
@var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
where @var{prefix} is given to @code{configure} while installing @theglibc{}
Packit 6c4009
(this value is in many cases @code{/usr} or the empty string).
Packit 6c4009
Packit 6c4009
The remaining problem is to decide which must be used.  The value
Packit 6c4009
decides about the substitution of the format elements mentioned above.
Packit 6c4009
First of all the user can specify a path in the message catalog name
Packit 6c4009
(i.e., the name contains a slash character).  In this situation the
Packit 6c4009
@code{NLSPATH} environment variable is not used.  The catalog must exist
Packit 6c4009
as specified in the program, perhaps relative to the current working
Packit 6c4009
directory.  This situation in not desirable and catalogs names never
Packit 6c4009
should be written this way.  Beside this, this behavior is not portable
Packit 6c4009
to all other platforms providing the @code{catgets} interface.
Packit 6c4009
Packit 6c4009
@cindex LC_ALL environment variable
Packit 6c4009
@cindex LC_MESSAGES environment variable
Packit 6c4009
@cindex LANG environment variable
Packit 6c4009
Otherwise the values of environment variables from the standard
Packit 6c4009
environment are examined (@pxref{Standard Environment}).  Which
Packit 6c4009
variables are examined is decided by the @var{flag} parameter of
Packit 6c4009
@code{catopen}.  If the value is @code{NL_CAT_LOCALE} (which is defined
Packit 6c4009
in @file{nl_types.h}) then the @code{catopen} function uses the name of
Packit 6c4009
the locale currently selected for the @code{LC_MESSAGES} category.
Packit 6c4009
Packit 6c4009
If @var{flag} is zero the @code{LANG} environment variable is examined.
Packit 6c4009
This is a left-over from the early days when the concept of locales
Packit 6c4009
had not even reached the level of POSIX locales.
Packit 6c4009
Packit 6c4009
The environment variable and the locale name should have a value of the
Packit 6c4009
form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above.
Packit 6c4009
If no environment variable is set the @code{"C"} locale is used which
Packit 6c4009
prevents any translation.
Packit 6c4009
Packit 6c4009
The return value of the function is in any case a valid string.  Either
Packit 6c4009
it is a translation from a message catalog or it is the same as the
Packit 6c4009
@var{string} parameter.  So a piece of code to decide whether a
Packit 6c4009
translation actually happened must look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
@{
Packit 6c4009
  char *trans = catgets (desc, set, msg, input_string);
Packit 6c4009
  if (trans == input_string)
Packit 6c4009
    @{
Packit 6c4009
      /* Something went wrong.  */
Packit 6c4009
    @}
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
When an error occurs the global variable @var{errno} is set to
Packit 6c4009
Packit 6c4009
@table @var
Packit 6c4009
@item EBADF
Packit 6c4009
The catalog does not exist.
Packit 6c4009
@item ENOMSG
Packit 6c4009
The set/message tuple does not name an existing element in the
Packit 6c4009
message catalog.
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
While it sometimes can be useful to test for errors programs normally
Packit 6c4009
will avoid any test.  If the translation is not available it is no big
Packit 6c4009
problem if the original, untranslated message is printed.  Either the
Packit 6c4009
user understands this as well or s/he will look for the reason why the
Packit 6c4009
messages are not translated.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
Please note that the currently selected locale does not depend on a call
Packit 6c4009
to the @code{setlocale} function.  It is not necessary that the locale
Packit 6c4009
data files for this locale exist and calling @code{setlocale} succeeds.
Packit 6c4009
The @code{catopen} function directly reads the values of the environment
Packit 6c4009
variables.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
The function @code{catgets} has to be used to access the message catalog
Packit 6c4009
previously opened using the @code{catopen} function.  The
Packit 6c4009
@var{catalog_desc} parameter must be a value previously returned by
Packit 6c4009
@code{catopen}.
Packit 6c4009
Packit 6c4009
The next two parameters, @var{set} and @var{message}, reflect the
Packit 6c4009
internal organization of the message catalog files.  This will be
Packit 6c4009
explained in detail below.  For now it is interesting to know that a
Packit 6c4009
catalog can consist of several sets and the messages in each thread are
Packit 6c4009
individually numbered using numbers.  Neither the set number nor the
Packit 6c4009
message number must be consecutive.  They can be arbitrarily chosen.
Packit 6c4009
But each message (unless equal to another one) must have its own unique
Packit 6c4009
pair of set and message numbers.
Packit 6c4009
Packit 6c4009
Since it is not guaranteed that the message catalog for the language
Packit 6c4009
selected by the user exists the last parameter @var{string} helps to
Packit 6c4009
handle this case gracefully.  If no matching string can be found
Packit 6c4009
@var{string} is returned.  This means for the programmer that
Packit 6c4009
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
the @var{string} parameters should contain reasonable text (this also
Packit 6c4009
helps to understand the program seems otherwise there would be no hint
Packit 6c4009
on the string which is expected to be returned.
Packit 6c4009
@item
Packit 6c4009
all @var{string} arguments should be written in the same language.
Packit 6c4009
@end itemize
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
It is somewhat uncomfortable to write a program using the @code{catgets}
Packit 6c4009
functions if no supporting functionality is available.  Since each
Packit 6c4009
set/message number tuple must be unique the programmer must keep lists
Packit 6c4009
of the messages at the same time the code is written.  And the work
Packit 6c4009
between several people working on the same project must be coordinated.
Packit 6c4009
We will see how some of these problems can be relaxed a bit (@pxref{Common
Packit 6c4009
Usage}).
Packit 6c4009
Packit 6c4009
@deftypefun int catclose (nl_catd @var{catalog_desc})
Packit 6c4009
@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
Packit 6c4009
@c catclose @ascuheap @acucorrupt @acsmem
Packit 6c4009
@c  __set_errno ok
Packit 6c4009
@c  munmap ok
Packit 6c4009
@c  free @ascuheap @acsmem
Packit 6c4009
The @code{catclose} function can be used to free the resources
Packit 6c4009
associated with a message catalog which previously was opened by a call
Packit 6c4009
to @code{catopen}.  If the resources can be successfully freed the
Packit 6c4009
function returns @code{0}.  Otherwise it returns @code{@minus{}1} and the
Packit 6c4009
global variable @var{errno} is set.  Errors can occur if the catalog
Packit 6c4009
descriptor @var{catalog_desc} is not valid in which case @var{errno} is
Packit 6c4009
set to @code{EBADF}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node The message catalog files
Packit 6c4009
@subsection  Format of the message catalog files
Packit 6c4009
Packit 6c4009
The only reasonable way to translate all the messages of a function and
Packit 6c4009
store the result in a message catalog file which can be read by the
Packit 6c4009
@code{catopen} function is to write all the message text to the
Packit 6c4009
translator and let her/him translate them all.  I.e., we must have a
Packit 6c4009
file with entries which associate the set/message tuple with a specific
Packit 6c4009
translation.  This file format is specified in the X/Open standard and
Packit 6c4009
is as follows:
Packit 6c4009
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
Lines containing only whitespace characters or empty lines are ignored.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
Lines which contain as the first non-whitespace character a @code{$}
Packit 6c4009
followed by a whitespace character are comment and are also ignored.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
If a line contains as the first non-whitespace characters the sequence
Packit 6c4009
@code{$set} followed by a whitespace character an additional argument
Packit 6c4009
is required to follow.  This argument can either be:
Packit 6c4009
Packit 6c4009
@itemize @minus
Packit 6c4009
@item
Packit 6c4009
a number.  In this case the value of this number determines the set
Packit 6c4009
to which the following messages are added.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
an identifier consisting of alphanumeric characters plus the underscore
Packit 6c4009
character.  In this case the set get automatically a number assigned.
Packit 6c4009
This value is one added to the largest set number which so far appeared.
Packit 6c4009
Packit 6c4009
How to use the symbolic names is explained in section @ref{Common Usage}.
Packit 6c4009
Packit 6c4009
It is an error if a symbol name appears more than once.  All following
Packit 6c4009
messages are placed in a set with this number.
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
If a line contains as the first non-whitespace characters the sequence
Packit 6c4009
@code{$delset} followed by a whitespace character an additional argument
Packit 6c4009
is required to follow.  This argument can either be:
Packit 6c4009
Packit 6c4009
@itemize @minus
Packit 6c4009
@item
Packit 6c4009
a number.  In this case the value of this number determines the set
Packit 6c4009
which will be deleted.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
an identifier consisting of alphanumeric characters plus the underscore
Packit 6c4009
character.  This symbolic identifier must match a name for a set which
Packit 6c4009
previously was defined.  It is an error if the name is unknown.
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
In both cases all messages in the specified set will be removed.  They
Packit 6c4009
will not appear in the output.  But if this set is later again selected
Packit 6c4009
with a @code{$set} command again messages could be added and these
Packit 6c4009
messages will appear in the output.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
If a line contains after leading whitespaces the sequence
Packit 6c4009
@code{$quote}, the quoting character used for this input file is
Packit 6c4009
changed to the first non-whitespace character following
Packit 6c4009
@code{$quote}.  If no non-whitespace character is present before the
Packit 6c4009
line ends quoting is disabled.
Packit 6c4009
Packit 6c4009
By default no quoting character is used.  In this mode strings are
Packit 6c4009
terminated with the first unescaped line break.  If there is a
Packit 6c4009
@code{$quote} sequence present newline need not be escaped.  Instead a
Packit 6c4009
string is terminated with the first unescaped appearance of the quote
Packit 6c4009
character.
Packit 6c4009
Packit 6c4009
A common usage of this feature would be to set the quote character to
Packit 6c4009
@code{"}.  Then any appearance of the @code{"} in the strings must
Packit 6c4009
be escaped using the backslash (i.e., @code{\"} must be written).
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
Any other line must start with a number or an alphanumeric identifier
Packit 6c4009
(with the underscore character included).  The following characters
Packit 6c4009
(starting after the first whitespace character) will form the string
Packit 6c4009
which gets associated with the currently selected set and the message
Packit 6c4009
number represented by the number and identifier respectively.
Packit 6c4009
Packit 6c4009
If the start of the line is a number the message number is obvious.  It
Packit 6c4009
is an error if the same message number already appeared for this set.
Packit 6c4009
Packit 6c4009
If the leading token was an identifier the message number gets
Packit 6c4009
automatically assigned.  The value is the current maximum message
Packit 6c4009
number for this set plus one.  It is an error if the identifier was
Packit 6c4009
already used for a message in this set.  It is OK to reuse the
Packit 6c4009
identifier for a message in another thread.  How to use the symbolic
Packit 6c4009
identifiers will be explained below (@pxref{Common Usage}).  There is
Packit 6c4009
one limitation with the identifier: it must not be @code{Set}.  The
Packit 6c4009
reason will be explained below.
Packit 6c4009
Packit 6c4009
The text of the messages can contain escape characters.  The usual bunch
Packit 6c4009
of characters known from the @w{ISO C} language are recognized
Packit 6c4009
(@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
Packit 6c4009
@code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
Packit 6c4009
a character code).
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
@strong{Important:} The handling of identifiers instead of numbers for
Packit 6c4009
the set and messages is a GNU extension.  Systems strictly following the
Packit 6c4009
X/Open specification do not have this feature.  An example for a message
Packit 6c4009
catalog file is this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
$ This is a leading comment.
Packit 6c4009
$quote "
Packit 6c4009
Packit 6c4009
$set SetOne
Packit 6c4009
1 Message with ID 1.
Packit 6c4009
two "   Message with ID \"two\", which gets the value 2 assigned"
Packit 6c4009
Packit 6c4009
$set SetTwo
Packit 6c4009
$ Since the last set got the number 1 assigned this set has number 2.
Packit 6c4009
4000 "The numbers can be arbitrary, they need not start at one."
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
This small example shows various aspects:
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
Lines 1 and 9 are comments since they start with @code{$} followed by
Packit 6c4009
a whitespace.
Packit 6c4009
@item
Packit 6c4009
The quoting character is set to @code{"}.  Otherwise the quotes in the
Packit 6c4009
message definition would have to be omitted and in this case the
Packit 6c4009
message with the identifier @code{two} would lose its leading whitespace.
Packit 6c4009
@item
Packit 6c4009
Mixing numbered messages with messages having symbolic names is no
Packit 6c4009
problem and the numbering happens automatically.
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
Packit 6c4009
While this file format is pretty easy it is not the best possible for
Packit 6c4009
use in a running program.  The @code{catopen} function would have to
Packit 6c4009
parse the file and handle syntactic errors gracefully.  This is not so
Packit 6c4009
easy and the whole process is pretty slow.  Therefore the @code{catgets}
Packit 6c4009
functions expect the data in another more compact and ready-to-use file
Packit 6c4009
format.  There is a special program @code{gencat} which is explained in
Packit 6c4009
detail in the next section.
Packit 6c4009
Packit 6c4009
Files in this other format are not human readable.  To be easy to use by
Packit 6c4009
programs it is a binary file.  But the format is byte order independent
Packit 6c4009
so translation files can be shared by systems of arbitrary architecture
Packit 6c4009
(as long as they use @theglibc{}).
Packit 6c4009
Packit 6c4009
Details about the binary file format are not important to know since
Packit 6c4009
these files are always created by the @code{gencat} program.  The
Packit 6c4009
sources of @theglibc{} also provide the sources for the
Packit 6c4009
@code{gencat} program and so the interested reader can look through
Packit 6c4009
these source files to learn about the file format.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node The gencat program
Packit 6c4009
@subsection Generate Message Catalogs files
Packit 6c4009
Packit 6c4009
@cindex gencat
Packit 6c4009
The @code{gencat} program is specified in the X/Open standard and the
Packit 6c4009
GNU implementation follows this specification and so processes
Packit 6c4009
all correctly formed input files.  Additionally some extension are
Packit 6c4009
implemented which help to work in a more reasonable way with the
Packit 6c4009
@code{catgets} functions.
Packit 6c4009
Packit 6c4009
The @code{gencat} program can be invoked in two ways:
Packit 6c4009
Packit 6c4009
@example
Packit 6c4009
`gencat [@var{Option} @dots{}] [@var{Output-File} [@var{Input-File} @dots{}]]`
Packit 6c4009
@end example
Packit 6c4009
Packit 6c4009
This is the interface defined in the X/Open standard.  If no
Packit 6c4009
@var{Input-File} parameter is given, input will be read from standard
Packit 6c4009
input.  Multiple input files will be read as if they were concatenated.
Packit 6c4009
If @var{Output-File} is also missing, the output will be written to
Packit 6c4009
standard output.  To provide the interface one is used to from other
Packit 6c4009
programs a second interface is provided.
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
`gencat [@var{Option} @dots{}] -o @var{Output-File} [@var{Input-File} @dots{}]`
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
The option @samp{-o} is used to specify the output file and all file
Packit 6c4009
arguments are used as input files.
Packit 6c4009
Packit 6c4009
Beside this one can use @file{-} or @file{/dev/stdin} for
Packit 6c4009
@var{Input-File} to denote the standard input.  Corresponding one can
Packit 6c4009
use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
Packit 6c4009
standard output.  Using @file{-} as a file name is allowed in X/Open
Packit 6c4009
while using the device names is a GNU extension.
Packit 6c4009
Packit 6c4009
The @code{gencat} program works by concatenating all input files and
Packit 6c4009
then @strong{merging} the resulting collection of message sets with a
Packit 6c4009
possibly existing output file.  This is done by removing all messages
Packit 6c4009
with set/message number tuples matching any of the generated messages
Packit 6c4009
from the output file and then adding all the new messages.  To
Packit 6c4009
regenerate a catalog file while ignoring the old contents therefore
Packit 6c4009
requires removing the output file if it exists.  If the output is
Packit 6c4009
written to standard output no merging takes place.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
The following table shows the options understood by the @code{gencat}
Packit 6c4009
program.  The X/Open standard does not specify any options for the
Packit 6c4009
program so all of these are GNU extensions.
Packit 6c4009
Packit 6c4009
@table @samp
Packit 6c4009
@item -V
Packit 6c4009
@itemx --version
Packit 6c4009
Print the version information and exit.
Packit 6c4009
@item -h
Packit 6c4009
@itemx --help
Packit 6c4009
Print a usage message listing all available options, then exit successfully.
Packit 6c4009
@item --new
Packit 6c4009
Do not merge the new messages from the input files with the old content
Packit 6c4009
of the output file.  The old content of the output file is discarded.
Packit 6c4009
@item -H
Packit 6c4009
@itemx --header=name
Packit 6c4009
This option is used to emit the symbolic names given to sets and
Packit 6c4009
messages in the input files for use in the program.  Details about how
Packit 6c4009
to use this are given in the next section.  The @var{name} parameter to
Packit 6c4009
this option specifies the name of the output file.  It will contain a
Packit 6c4009
number of C preprocessor @code{#define}s to associate a name with a
Packit 6c4009
number.
Packit 6c4009
Packit 6c4009
Please note that the generated file only contains the symbols from the
Packit 6c4009
input files.  If the output is merged with the previous content of the
Packit 6c4009
output file the possibly existing symbols from the file(s) which
Packit 6c4009
generated the old output files are not in the generated header file.
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Common Usage
Packit 6c4009
@subsection How to use the @code{catgets} interface
Packit 6c4009
Packit 6c4009
The @code{catgets} functions can be used in two different ways.  By
Packit 6c4009
following slavishly the X/Open specs and not relying on the extension
Packit 6c4009
and by using the GNU extensions.  We will take a look at the former
Packit 6c4009
method first to understand the benefits of extensions.
Packit 6c4009
Packit 6c4009
@subsubsection Not using symbolic names
Packit 6c4009
Packit 6c4009
Since the X/Open format of the message catalog files does not allow
Packit 6c4009
symbol names we have to work with numbers all the time.  When we start
Packit 6c4009
writing a program we have to replace all appearances of translatable
Packit 6c4009
strings with something like
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
catgets (catdesc, set, msg, "string")
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
@var{catgets} is retrieved from a call to @code{catopen} which is
Packit 6c4009
normally done once at the program start.  The @code{"string"} is the
Packit 6c4009
string we want to translate.  The problems start with the set and
Packit 6c4009
message numbers.
Packit 6c4009
Packit 6c4009
In a bigger program several programmers usually work at the same time on
Packit 6c4009
the program and so coordinating the number allocation is crucial.
Packit 6c4009
Though no two different strings must be indexed by the same tuple of
Packit 6c4009
numbers it is highly desirable to reuse the numbers for equal strings
Packit 6c4009
with equal translations (please note that there might be strings which
Packit 6c4009
are equal in one language but have different translations due to
Packit 6c4009
difference contexts).
Packit 6c4009
Packit 6c4009
The allocation process can be relaxed a bit by different set numbers for
Packit 6c4009
different parts of the program.  So the number of developers who have to
Packit 6c4009
coordinate the allocation can be reduced.  But still lists must be keep
Packit 6c4009
track of the allocation and errors can easily happen.  These errors
Packit 6c4009
cannot be discovered by the compiler or the @code{catgets} functions.
Packit 6c4009
Only the user of the program might see wrong messages printed.  In the
Packit 6c4009
worst cases the messages are so irritating that they cannot be
Packit 6c4009
recognized as wrong.  Think about the translations for @code{"true"} and
Packit 6c4009
@code{"false"} being exchanged.  This could result in a disaster.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@subsubsection Using symbolic names
Packit 6c4009
Packit 6c4009
The problems mentioned in the last section derive from the fact that:
Packit 6c4009
Packit 6c4009
@enumerate
Packit 6c4009
@item
Packit 6c4009
the numbers are allocated once and due to the possibly frequent use of
Packit 6c4009
them it is difficult to change a number later.
Packit 6c4009
@item
Packit 6c4009
the numbers do not allow guessing anything about the string and
Packit 6c4009
therefore collisions can easily happen.
Packit 6c4009
@end enumerate
Packit 6c4009
Packit 6c4009
By constantly using symbolic names and by providing a method which maps
Packit 6c4009
the string content to a symbolic name (however this will happen) one can
Packit 6c4009
prevent both problems above.  The cost of this is that the programmer
Packit 6c4009
has to write a complete message catalog file while s/he is writing the
Packit 6c4009
program itself.
Packit 6c4009
Packit 6c4009
This is necessary since the symbolic names must be mapped to numbers
Packit 6c4009
before the program sources can be compiled.  In the last section it was
Packit 6c4009
described how to generate a header containing the mapping of the names.
Packit 6c4009
E.g., for the example message file given in the last section we could
Packit 6c4009
call the @code{gencat} program as follows (assume @file{ex.msg} contains
Packit 6c4009
the sources).
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
gencat -H ex.h -o ex.cat ex.msg
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This generates a header file with the following content:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
#define SetTwoSet 0x2   /* ex.msg:8 */
Packit 6c4009
Packit 6c4009
#define SetOneSet 0x1   /* ex.msg:4 */
Packit 6c4009
#define SetOnetwo 0x2   /* ex.msg:6 */
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
As can be seen the various symbols given in the source file are mangled
Packit 6c4009
to generate unique identifiers and these identifiers get numbers
Packit 6c4009
assigned.  Reading the source file and knowing about the rules will
Packit 6c4009
allow to predict the content of the header file (it is deterministic)
Packit 6c4009
but this is not necessary.  The @code{gencat} program can take care for
Packit 6c4009
everything.  All the programmer has to do is to put the generated header
Packit 6c4009
file in the dependency list of the source files of her/his project and
Packit 6c4009
add a rule to regenerate the header if any of the input files change.
Packit 6c4009
Packit 6c4009
One word about the symbol mangling.  Every symbol consists of two parts:
Packit 6c4009
the name of the message set plus the name of the message or the special
Packit 6c4009
string @code{Set}.  So @code{SetOnetwo} means this macro can be used to
Packit 6c4009
access the translation with identifier @code{two} in the message set
Packit 6c4009
@code{SetOne}.
Packit 6c4009
Packit 6c4009
The other names denote the names of the message sets.  The special
Packit 6c4009
string @code{Set} is used in the place of the message identifier.
Packit 6c4009
Packit 6c4009
If in the code the second string of the set @code{SetOne} is used the C
Packit 6c4009
code should look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
catgets (catdesc, SetOneSet, SetOnetwo,
Packit 6c4009
         "   Message with ID \"two\", which gets the value 2 assigned")
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Writing the function this way will allow to change the message number
Packit 6c4009
and even the set number without requiring any change in the C source
Packit 6c4009
code.  (The text of the string is normally not the same; this is only
Packit 6c4009
for this example.)
Packit 6c4009
Packit 6c4009
Packit 6c4009
@subsubsection How does to this allow to develop
Packit 6c4009
Packit 6c4009
To illustrate the usual way to work with the symbolic version numbers
Packit 6c4009
here is a little example.  Assume we want to write the very complex and
Packit 6c4009
famous greeting program.  We start by writing the code as usual:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
#include <stdio.h>
Packit 6c4009
int
Packit 6c4009
main (void)
Packit 6c4009
@{
Packit 6c4009
  printf ("Hello, world!\n");
Packit 6c4009
  return 0;
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Now we want to internationalize the message and therefore replace the
Packit 6c4009
message with whatever the user wants.
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
#include <nl_types.h>
Packit 6c4009
#include <stdio.h>
Packit 6c4009
#include "msgnrs.h"
Packit 6c4009
int
Packit 6c4009
main (void)
Packit 6c4009
@{
Packit 6c4009
  nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
Packit 6c4009
  printf (catgets (catdesc, SetMainSet, SetMainHello,
Packit 6c4009
                   "Hello, world!\n"));
Packit 6c4009
  catclose (catdesc);
Packit 6c4009
  return 0;
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
We see how the catalog object is opened and the returned descriptor used
Packit 6c4009
in the other function calls.  It is not really necessary to check for
Packit 6c4009
failure of any of the functions since even in these situations the
Packit 6c4009
functions will behave reasonable.  They simply will be return a
Packit 6c4009
translation.
Packit 6c4009
Packit 6c4009
What remains unspecified here are the constants @code{SetMainSet} and
Packit 6c4009
@code{SetMainHello}.  These are the symbolic names describing the
Packit 6c4009
message.  To get the actual definitions which match the information in
Packit 6c4009
the catalog file we have to create the message catalog source file and
Packit 6c4009
process it using the @code{gencat} program.
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
$ Messages for the famous greeting program.
Packit 6c4009
$quote "
Packit 6c4009
Packit 6c4009
$set Main
Packit 6c4009
Hello "Hallo, Welt!\n"
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Now we can start building the program (assume the message catalog source
Packit 6c4009
file is named @file{hello.msg} and the program source file @file{hello.c}):
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
% gencat -H msgnrs.h -o hello.cat hello.msg
Packit 6c4009
% cat msgnrs.h
Packit 6c4009
#define MainSet 0x1     /* hello.msg:4 */
Packit 6c4009
#define MainHello 0x1   /* hello.msg:5 */
Packit 6c4009
% gcc -o hello hello.c -I.
Packit 6c4009
% cp hello.cat /usr/share/locale/de/LC_MESSAGES
Packit 6c4009
% echo $LC_ALL
Packit 6c4009
de
Packit 6c4009
% ./hello
Packit 6c4009
Hallo, Welt!
Packit 6c4009
%
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
The call of the @code{gencat} program creates the missing header file
Packit 6c4009
@file{msgnrs.h} as well as the message catalog binary.  The former is
Packit 6c4009
used in the compilation of @file{hello.c} while the later is placed in a
Packit 6c4009
directory in which the @code{catopen} function will try to locate it.
Packit 6c4009
Please check the @code{LC_ALL} environment variable and the default path
Packit 6c4009
for @code{catopen} presented in the description above.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node The Uniforum approach
Packit 6c4009
@section The Uniforum approach to Message Translation
Packit 6c4009
Packit 6c4009
Sun Microsystems tried to standardize a different approach to message
Packit 6c4009
translation in the Uniforum group.  There never was a real standard
Packit 6c4009
defined but still the interface was used in Sun's operating systems.
Packit 6c4009
Since this approach fits better in the development process of free
Packit 6c4009
software it is also used throughout the GNU project and the GNU
Packit 6c4009
@file{gettext} package provides support for this outside @theglibc{}.
Packit 6c4009
Packit 6c4009
The code of the @file{libintl} from GNU @file{gettext} is the same as
Packit 6c4009
the code in @theglibc{}.  So the documentation in the GNU
Packit 6c4009
@file{gettext} manual is also valid for the functionality here.  The
Packit 6c4009
following text will describe the library functions in detail.  But the
Packit 6c4009
numerous helper programs are not described in this manual.  Instead
Packit 6c4009
people should read the GNU @file{gettext} manual
Packit 6c4009
(@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
Packit 6c4009
We will only give a short overview.
Packit 6c4009
Packit 6c4009
Though the @code{catgets} functions are available by default on more
Packit 6c4009
systems the @code{gettext} interface is at least as portable as the
Packit 6c4009
former.  The GNU @file{gettext} package can be used wherever the
Packit 6c4009
functions are not available.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@menu
Packit 6c4009
* Message catalogs with gettext::  The @code{gettext} family of functions.
Packit 6c4009
* Helper programs for gettext::    Programs to handle message catalogs
Packit 6c4009
                                    for @code{gettext}.
Packit 6c4009
@end menu
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Message catalogs with gettext
Packit 6c4009
@subsection The @code{gettext} family of functions
Packit 6c4009
Packit 6c4009
The paradigms underlying the @code{gettext} approach to message
Packit 6c4009
translations is different from that of the @code{catgets} functions the
Packit 6c4009
basic functionally is equivalent.  There are functions of the following
Packit 6c4009
categories:
Packit 6c4009
Packit 6c4009
@menu
Packit 6c4009
* Translation with gettext::       What has to be done to translate a message.
Packit 6c4009
* Locating gettext catalog::       How to determine which catalog to be used.
Packit 6c4009
* Advanced gettext functions::     Additional functions for more complicated
Packit 6c4009
                                    situations.
Packit 6c4009
* Charset conversion in gettext::  How to specify the output character set
Packit 6c4009
                                    @code{gettext} uses.
Packit 6c4009
* GUI program problems::           How to use @code{gettext} in GUI programs.
Packit 6c4009
* Using gettextized software::     The possibilities of the user to influence
Packit 6c4009
                                    the way @code{gettext} works.
Packit 6c4009
@end menu
Packit 6c4009
Packit 6c4009
@node Translation with gettext
Packit 6c4009
@subsubsection What has to be done to translate a message?
Packit 6c4009
Packit 6c4009
The @code{gettext} functions have a very simple interface.  The most
Packit 6c4009
basic function just takes the string which shall be translated as the
Packit 6c4009
argument and it returns the translation.  This is fundamentally
Packit 6c4009
different from the @code{catgets} approach where an extra key is
Packit 6c4009
necessary and the original string is only used for the error case.
Packit 6c4009
Packit 6c4009
If the string which has to be translated is the only argument this of
Packit 6c4009
course means the string itself is the key.  I.e., the translation will
Packit 6c4009
be selected based on the original string.  The message catalogs must
Packit 6c4009
therefore contain the original strings plus one translation for any such
Packit 6c4009
string.  The task of the @code{gettext} function is to compare the
Packit 6c4009
argument string with the available strings in the catalog and return the
Packit 6c4009
appropriate translation.  Of course this process is optimized so that
Packit 6c4009
this process is not more expensive than an access using an atomic key
Packit 6c4009
like in @code{catgets}.
Packit 6c4009
Packit 6c4009
The @code{gettext} approach has some advantages but also some
Packit 6c4009
disadvantages.  Please see the GNU @file{gettext} manual for a detailed
Packit 6c4009
discussion of the pros and cons.
Packit 6c4009
Packit 6c4009
All the definitions and declarations for @code{gettext} can be found in
Packit 6c4009
the @file{libintl.h} header file.  On systems where these functions are
Packit 6c4009
not part of the C library they can be found in a separate library named
Packit 6c4009
@file{libintl.a} (or accordingly different for shared libraries).
Packit 6c4009
Packit 6c4009
@deftypefun {char *} gettext (const char *@var{msgid})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
Packit 6c4009
@c Wrapper for dcgettext.
Packit 6c4009
The @code{gettext} function searches the currently selected message
Packit 6c4009
catalogs for a string which is equal to @var{msgid}.  If there is such a
Packit 6c4009
string available it is returned.  Otherwise the argument string
Packit 6c4009
@var{msgid} is returned.
Packit 6c4009
Packit 6c4009
Please note that although the return value is @code{char *} the
Packit 6c4009
returned string must not be changed.  This broken type results from the
Packit 6c4009
history of the function and does not reflect the way the function should
Packit 6c4009
be used.
Packit 6c4009
Packit 6c4009
Please note that above we wrote ``message catalogs'' (plural).  This is
Packit 6c4009
a specialty of the GNU implementation of these functions and we will
Packit 6c4009
say more about this when we talk about the ways message catalogs are
Packit 6c4009
selected (@pxref{Locating gettext catalog}).
Packit 6c4009
Packit 6c4009
The @code{gettext} function does not modify the value of the global
Packit 6c4009
@var{errno} variable.  This is necessary to make it possible to write
Packit 6c4009
something like
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
  printf (gettext ("Operation failed: %m\n"));
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Here the @var{errno} value is used in the @code{printf} function while
Packit 6c4009
processing the @code{%m} format element and if the @code{gettext}
Packit 6c4009
function would change this value (it is called before @code{printf} is
Packit 6c4009
called) we would get a wrong message.
Packit 6c4009
Packit 6c4009
So there is no easy way to detect a missing message catalog besides
Packit 6c4009
comparing the argument string with the result.  But it is normally the
Packit 6c4009
task of the user to react on missing catalogs.  The program cannot guess
Packit 6c4009
when a message catalog is really necessary since for a user who speaks
Packit 6c4009
the language the program was developed in, the message does not need any translation.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
The remaining two functions to access the message catalog add some
Packit 6c4009
functionality to select a message catalog which is not the default one.
Packit 6c4009
This is important if parts of the program are developed independently.
Packit 6c4009
Every part can have its own message catalog and all of them can be used
Packit 6c4009
at the same time.  The C library itself is an example: internally it
Packit 6c4009
uses the @code{gettext} functions but since it must not depend on a
Packit 6c4009
currently selected default message catalog it must specify all ambiguous
Packit 6c4009
information.
Packit 6c4009
Packit 6c4009
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
Packit 6c4009
@c Wrapper for dcgettext.
Packit 6c4009
The @code{dgettext} function acts just like the @code{gettext}
Packit 6c4009
function.  It only takes an additional first argument @var{domainname}
Packit 6c4009
which guides the selection of the message catalogs which are searched
Packit 6c4009
for the translation.  If the @var{domainname} parameter is the null
Packit 6c4009
pointer the @code{dgettext} function is exactly equivalent to
Packit 6c4009
@code{gettext} since the default value for the domain name is used.
Packit 6c4009
Packit 6c4009
As for @code{gettext} the return value type is @code{char *} which is an
Packit 6c4009
anachronism.  The returned string must never be modified.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
Packit 6c4009
@c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c  dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c   libc_rwlock_rdlock @asulock @aculock
Packit 6c4009
@c   current_locale_name ok [protected from @mtslocale]
Packit 6c4009
@c   tfind ok
Packit 6c4009
@c   libc_rwlock_unlock ok
Packit 6c4009
@c   plural_lookup ok
Packit 6c4009
@c    plural_eval ok
Packit 6c4009
@c    rawmemchr ok
Packit 6c4009
@c   DETERMINE_SECURE ok, nothing
Packit 6c4009
@c   strcmp ok
Packit 6c4009
@c   strlen ok
Packit 6c4009
@c   getcwd @ascuheap @acsmem @acsfd
Packit 6c4009
@c   strchr ok
Packit 6c4009
@c   stpcpy ok
Packit 6c4009
@c   category_to_name ok
Packit 6c4009
@c   guess_category_value @mtsenv
Packit 6c4009
@c    getenv @mtsenv
Packit 6c4009
@c    current_locale_name dup ok [protected from @mtslocale by dcigettext]
Packit 6c4009
@c    strcmp ok
Packit 6c4009
@c   ENABLE_SECURE ok
Packit 6c4009
@c   _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c    libc_rwlock_rdlock dup @asulock @aculock
Packit 6c4009
@c    _nl_make_l10nflist dup @ascuheap @acsmem
Packit 6c4009
@c    libc_rwlock_unlock dup ok
Packit 6c4009
@c    _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c     libc_lock_lock_recursive @aculock
Packit 6c4009
@c     libc_lock_unlock_recursive @aculock
Packit 6c4009
@c     open->open_not_cancel_2 @acsfd
Packit 6c4009
@c     fstat ok
Packit 6c4009
@c     mmap dup @acsmem
Packit 6c4009
@c     close->close_not_cancel_no_status @acsfd
Packit 6c4009
@c     malloc dup @ascuheap @acsmem
Packit 6c4009
@c     read->read_not_cancel ok
Packit 6c4009
@c     munmap dup @acsmem
Packit 6c4009
@c     W dup ok
Packit 6c4009
@c     strlen dup ok
Packit 6c4009
@c     get_sysdep_segment_value ok
Packit 6c4009
@c     memcpy dup ok
Packit 6c4009
@c     hash_string dup ok
Packit 6c4009
@c     free dup @ascuheap @acsmem
Packit 6c4009
@c     libc_rwlock_init ok
Packit 6c4009
@c     _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c     libc_rwlock_fini ok
Packit 6c4009
@c     EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem
Packit 6c4009
@c      strstr dup ok
Packit 6c4009
@c      isspace ok
Packit 6c4009
@c      strtoul ok
Packit 6c4009
@c      PLURAL_PARSE @ascuheap @acsmem
Packit 6c4009
@c       malloc dup @ascuheap @acsmem
Packit 6c4009
@c       free dup @ascuheap @acsmem
Packit 6c4009
@c      INIT_GERMANIC_PLURAL ok, nothing
Packit 6c4009
@c        the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext]
Packit 6c4009
@c    _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock
Packit 6c4009
@c    _nl_explode_name dup @ascuheap @acsmem
Packit 6c4009
@c    libc_rwlock_wrlock dup @asulock @aculock
Packit 6c4009
@c    free dup @asulock @aculock @acsfd @acsmem
Packit 6c4009
@c   _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c    _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
Packit 6c4009
@c    strlen ok
Packit 6c4009
@c    hash_string ok
Packit 6c4009
@c    W ok
Packit 6c4009
@c     SWAP ok
Packit 6c4009
@c      bswap_32 ok
Packit 6c4009
@c    strcmp ok
Packit 6c4009
@c    get_output_charset @mtsenv @ascuheap @acsmem
Packit 6c4009
@c     getenv dup @mtsenv
Packit 6c4009
@c     strlen dup ok
Packit 6c4009
@c     malloc dup @ascuheap @acsmem
Packit 6c4009
@c     memcpy dup ok
Packit 6c4009
@c    libc_rwlock_rdlock dup @asulock @aculock
Packit 6c4009
@c    libc_rwlock_unlock dup ok
Packit 6c4009
@c    libc_rwlock_wrlock dup @asulock @aculock
Packit 6c4009
@c    realloc @ascuheap @acsmem
Packit 6c4009
@c    strdup @ascuheap @acsmem
Packit 6c4009
@c    strstr ok
Packit 6c4009
@c    strcspn ok
Packit 6c4009
@c    mempcpy dup ok
Packit 6c4009
@c    norm_add_slashes dup ok
Packit 6c4009
@c    gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd
Packit 6c4009
@c     [protected from @mtslocale by dcigettext locale lock]
Packit 6c4009
@c    free dup @ascuheap @acsmem
Packit 6c4009
@c    libc_lock_lock @asulock @aculock
Packit 6c4009
@c    calloc @ascuheap @acsmem
Packit 6c4009
@c    gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock]
Packit 6c4009
@c    libc_lock_unlock ok
Packit 6c4009
@c   malloc @ascuheap @acsmem
Packit 6c4009
@c   mempcpy ok
Packit 6c4009
@c   memcpy ok
Packit 6c4009
@c   strcpy ok
Packit 6c4009
@c   libc_rwlock_wrlock @asulock @aculock
Packit 6c4009
@c   tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt]
Packit 6c4009
@c    transcmp ok
Packit 6c4009
@c     strmp dup ok
Packit 6c4009
@c   free @ascuheap @acsmem
Packit 6c4009
The @code{dcgettext} adds another argument to those which
Packit 6c4009
@code{dgettext} takes.  This argument @var{category} specifies the last
Packit 6c4009
piece of information needed to localize the message catalog.  I.e., the
Packit 6c4009
domain name and the locale category exactly specify which message
Packit 6c4009
catalog has to be used (relative to a given directory, see below).
Packit 6c4009
Packit 6c4009
The @code{dgettext} function can be expressed in terms of
Packit 6c4009
@code{dcgettext} by using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
dcgettext (domain, string, LC_MESSAGES)
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
instead of
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
dgettext (domain, string)
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
This also shows which values are expected for the third parameter.  One
Packit 6c4009
has to use the available selectors for the categories available in
Packit 6c4009
@file{locale.h}.  Normally the available values are @code{LC_CTYPE},
Packit 6c4009
@code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
Packit 6c4009
@code{LC_NUMERIC}, and @code{LC_TIME}.  Please note that @code{LC_ALL}
Packit 6c4009
must not be used and even though the names might suggest this, there is
Packit 6c4009
no relation to the environment variable of this name.
Packit 6c4009
Packit 6c4009
The @code{dcgettext} function is only implemented for compatibility with
Packit 6c4009
other systems which have @code{gettext} functions.  There is not really
Packit 6c4009
any situation where it is necessary (or useful) to use a different value
Packit 6c4009
than @code{LC_MESSAGES} for the @var{category} parameter.  We are
Packit 6c4009
dealing with messages here and any other choice can only be irritating.
Packit 6c4009
Packit 6c4009
As for @code{gettext} the return value type is @code{char *} which is an
Packit 6c4009
anachronism.  The returned string must never be modified.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
When using the three functions above in a program it is a frequent case
Packit 6c4009
that the @var{msgid} argument is a constant string.  So it is worthwhile to
Packit 6c4009
optimize this case.  Thinking shortly about this one will realize that
Packit 6c4009
as long as no new message catalog is loaded the translation of a message
Packit 6c4009
will not change.  This optimization is actually implemented by the
Packit 6c4009
@code{gettext}, @code{dgettext} and @code{dcgettext} functions.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Locating gettext catalog
Packit 6c4009
@subsubsection How to determine which catalog to be used
Packit 6c4009
Packit 6c4009
The functions to retrieve the translations for a given message have a
Packit 6c4009
remarkable simple interface.  But to provide the user of the program
Packit 6c4009
still the opportunity to select exactly the translation s/he wants and
Packit 6c4009
also to provide the programmer the possibility to influence the way to
Packit 6c4009
locate the search for catalogs files there is a quite complicated
Packit 6c4009
underlying mechanism which controls all this.  The code is complicated
Packit 6c4009
the use is easy.
Packit 6c4009
Packit 6c4009
Basically we have two different tasks to perform which can also be
Packit 6c4009
performed by the @code{catgets} functions:
Packit 6c4009
Packit 6c4009
@enumerate
Packit 6c4009
@item
Packit 6c4009
Locate the set of message catalogs.  There are a number of files for
Packit 6c4009
different languages which all belong to the package.  Usually they
Packit 6c4009
are all stored in the filesystem below a certain directory.
Packit 6c4009
Packit 6c4009
There can be arbitrarily many packages installed and they can follow
Packit 6c4009
different guidelines for the placement of their files.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
Relative to the location specified by the package the actual translation
Packit 6c4009
files must be searched, based on the wishes of the user.  I.e., for each
Packit 6c4009
language the user selects the program should be able to locate the
Packit 6c4009
appropriate file.
Packit 6c4009
@end enumerate
Packit 6c4009
Packit 6c4009
This is the functionality required by the specifications for
Packit 6c4009
@code{gettext} and this is also what the @code{catgets} functions are
Packit 6c4009
able to do.  But there are some problems unresolved:
Packit 6c4009
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
The language to be used can be specified in several different ways.
Packit 6c4009
There is no generally accepted standard for this and the user always
Packit 6c4009
expects the program to understand what s/he means.  E.g., to select the
Packit 6c4009
German translation one could write @code{de}, @code{german}, or
Packit 6c4009
@code{deutsch} and the program should always react the same.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
Sometimes the specification of the user is too detailed.  If s/he, e.g.,
Packit 6c4009
specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
Packit 6c4009
coded using the @w{ISO 8859-1} character set there is the possibility
Packit 6c4009
that a message catalog matching this exactly is not available.  But
Packit 6c4009
there could be a catalog matching @code{de} and if the character set
Packit 6c4009
used on the machine is always @w{ISO 8859-1} there is no reason why this
Packit 6c4009
later message catalog should not be used.  (We call this @dfn{message
Packit 6c4009
inheritance}.)
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
If a catalog for a wanted language is not available it is not always the
Packit 6c4009
second best choice to fall back on the language of the developer and
Packit 6c4009
simply not translate any message.  Instead a user might be better able
Packit 6c4009
to read the messages in another language and so the user of the program
Packit 6c4009
should be able to define a precedence order of languages.
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
We can divide the configuration actions in two parts: the one is
Packit 6c4009
performed by the programmer, the other by the user.  We will start with
Packit 6c4009
the functions the programmer can use since the user configuration will
Packit 6c4009
be based on this.
Packit 6c4009
Packit 6c4009
As the functions described in the last sections already mention separate
Packit 6c4009
sets of messages can be selected by a @dfn{domain name}.  This is a
Packit 6c4009
simple string which should be unique for each program part that uses a
Packit 6c4009
separate domain.  It is possible to use in one program arbitrarily many
Packit 6c4009
domains at the same time.  E.g., @theglibc{} itself uses a domain
Packit 6c4009
named @code{libc} while the program using the C Library could use a
Packit 6c4009
domain named @code{foo}.  The important point is that at any time
Packit 6c4009
exactly one domain is active.  This is controlled with the following
Packit 6c4009
function.
Packit 6c4009
Packit 6c4009
@deftypefun {char *} textdomain (const char *@var{domainname})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}}
Packit 6c4009
@c textdomain @asulock @ascuheap @aculock @acsmem
Packit 6c4009
@c  libc_rwlock_wrlock @asulock @aculock
Packit 6c4009
@c  strcmp ok
Packit 6c4009
@c  strdup @ascuheap @acsmem
Packit 6c4009
@c  free @ascuheap @acsmem
Packit 6c4009
@c  libc_rwlock_unlock ok
Packit 6c4009
The @code{textdomain} function sets the default domain, which is used in
Packit 6c4009
all future @code{gettext} calls, to @var{domainname}.  Please note that
Packit 6c4009
@code{dgettext} and @code{dcgettext} calls are not influenced if the
Packit 6c4009
@var{domainname} parameter of these functions is not the null pointer.
Packit 6c4009
Packit 6c4009
Before the first call to @code{textdomain} the default domain is
Packit 6c4009
@code{messages}.  This is the name specified in the specification of
Packit 6c4009
the @code{gettext} API.  This name is as good as any other name.  No
Packit 6c4009
program should ever really use a domain with this name since this can
Packit 6c4009
only lead to problems.
Packit 6c4009
Packit 6c4009
The function returns the value which is from now on taken as the default
Packit 6c4009
domain.  If the system went out of memory the returned value is
Packit 6c4009
@code{NULL} and the global variable @var{errno} is set to @code{ENOMEM}.
Packit 6c4009
Despite the return value type being @code{char *} the return string must
Packit 6c4009
not be changed.  It is allocated internally by the @code{textdomain}
Packit 6c4009
function.
Packit 6c4009
Packit 6c4009
If the @var{domainname} parameter is the null pointer no new default
Packit 6c4009
domain is set.  Instead the currently selected default domain is
Packit 6c4009
returned.
Packit 6c4009
Packit 6c4009
If the @var{domainname} parameter is the empty string the default domain
Packit 6c4009
is reset to its initial value, the domain with the name @code{messages}.
Packit 6c4009
This possibility is questionable to use since the domain @code{messages}
Packit 6c4009
really never should be used.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
Packit 6c4009
@c bindtextdomain @ascuheap @acsmem
Packit 6c4009
@c  set_binding_values @ascuheap @acsmem
Packit 6c4009
@c   libc_rwlock_wrlock dup @asulock @aculock
Packit 6c4009
@c   strcmp dup ok
Packit 6c4009
@c   strdup dup @ascuheap @acsmem
Packit 6c4009
@c   free dup @ascuheap @acsmem
Packit 6c4009
@c   malloc dup @ascuheap @acsmem
Packit 6c4009
The @code{bindtextdomain} function can be used to specify the directory
Packit 6c4009
which contains the message catalogs for domain @var{domainname} for the
Packit 6c4009
different languages.  To be correct, this is the directory where the
Packit 6c4009
hierarchy of directories is expected.  Details are explained below.
Packit 6c4009
Packit 6c4009
For the programmer it is important to note that the translations which
Packit 6c4009
come with the program have to be placed in a directory hierarchy starting
Packit 6c4009
at, say, @file{/foo/bar}.  Then the program should make a
Packit 6c4009
@code{bindtextdomain} call to bind the domain for the current program to
Packit 6c4009
this directory.  So it is made sure the catalogs are found.  A correctly
Packit 6c4009
running program does not depend on the user setting an environment
Packit 6c4009
variable.
Packit 6c4009
Packit 6c4009
The @code{bindtextdomain} function can be used several times and if the
Packit 6c4009
@var{domainname} argument is different the previously bound domains
Packit 6c4009
will not be overwritten.
Packit 6c4009
Packit 6c4009
If the program which wish to use @code{bindtextdomain} at some point of
Packit 6c4009
time use the @code{chdir} function to change the current working
Packit 6c4009
directory it is important that the @var{dirname} strings ought to be an
Packit 6c4009
absolute pathname.  Otherwise the addressed directory might vary with
Packit 6c4009
the time.
Packit 6c4009
Packit 6c4009
If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
Packit 6c4009
returns the currently selected directory for the domain with the name
Packit 6c4009
@var{domainname}.
Packit 6c4009
Packit 6c4009
The @code{bindtextdomain} function returns a pointer to a string
Packit 6c4009
containing the name of the selected directory name.  The string is
Packit 6c4009
allocated internally in the function and must not be changed by the
Packit 6c4009
user.  If the system went out of core during the execution of
Packit 6c4009
@code{bindtextdomain} the return value is @code{NULL} and the global
Packit 6c4009
variable @var{errno} is set accordingly.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Advanced gettext functions
Packit 6c4009
@subsubsection Additional functions for more complicated situations
Packit 6c4009
Packit 6c4009
The functions of the @code{gettext} family described so far (and all the
Packit 6c4009
@code{catgets} functions as well) have one problem in the real world
Packit 6c4009
which has been neglected completely in all existing approaches.  What
Packit 6c4009
is meant here is the handling of plural forms.
Packit 6c4009
Packit 6c4009
Looking through Unix source code before the time anybody thought about
Packit 6c4009
internationalization (and, sadly, even afterwards) one can often find
Packit 6c4009
code similar to the following:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
After the first complaints from people internationalizing the code people
Packit 6c4009
either completely avoided formulations like this or used strings like
Packit 6c4009
@code{"file(s)"}.  Both look unnatural and should be avoided.  First
Packit 6c4009
tries to solve the problem correctly looked like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
   if (n == 1)
Packit 6c4009
     printf ("%d file deleted", n);
Packit 6c4009
   else
Packit 6c4009
     printf ("%d files deleted", n);
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
But this does not solve the problem.  It helps languages where the
Packit 6c4009
plural form of a noun is not simply constructed by adding an `s' but
Packit 6c4009
that is all.  Once again people fell into the trap of believing the
Packit 6c4009
rules their language uses are universal.  But the handling of plural
Packit 6c4009
forms differs widely between the language families.  There are two
Packit 6c4009
things we can differ between (and even inside language families);
Packit 6c4009
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
The form how plural forms are build differs.  This is a problem with
Packit 6c4009
language which have many irregularities.  German, for instance, is a
Packit 6c4009
drastic case.  Though English and German are part of the same language
Packit 6c4009
family (Germanic), the almost regular forming of plural noun forms
Packit 6c4009
(appending an `s') is hardly found in German.
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
The number of plural forms differ.  This is somewhat surprising for
Packit 6c4009
those who only have experiences with Romanic and Germanic languages
Packit 6c4009
since here the number is the same (there are two).
Packit 6c4009
Packit 6c4009
But other language families have only one form or many forms.  More
Packit 6c4009
information on this in an extra section.
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
The consequence of this is that application writers should not try to
Packit 6c4009
solve the problem in their code.  This would be localization since it is
Packit 6c4009
only usable for certain, hardcoded language environments.  Instead the
Packit 6c4009
extended @code{gettext} interface should be used.
Packit 6c4009
Packit 6c4009
These extra functions are taking instead of the one key string two
Packit 6c4009
strings and a numerical argument.  The idea behind this is that using
Packit 6c4009
the numerical argument and the first string as a key, the implementation
Packit 6c4009
can select using rules specified by the translator the right plural
Packit 6c4009
form.  The two string arguments then will be used to provide a return
Packit 6c4009
value in case no message catalog is found (similar to the normal
Packit 6c4009
@code{gettext} behavior).  In this case the rules for Germanic language
Packit 6c4009
are used and it is assumed that the first string argument is the singular
Packit 6c4009
form, the second the plural form.
Packit 6c4009
Packit 6c4009
This has the consequence that programs without language catalogs can
Packit 6c4009
display the correct strings only if the program itself is written using
Packit 6c4009
a Germanic language.  This is a limitation but since @theglibc{}
Packit 6c4009
(as well as the GNU @code{gettext} package) is written as part of the
Packit 6c4009
GNU package and the coding standards for the GNU project require programs
Packit 6c4009
to be written in English, this solution nevertheless fulfills its
Packit 6c4009
purpose.
Packit 6c4009
Packit 6c4009
@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
Packit 6c4009
@c Wrapper for dcngettext.
Packit 6c4009
The @code{ngettext} function is similar to the @code{gettext} function
Packit 6c4009
as it finds the message catalogs in the same way.  But it takes two
Packit 6c4009
extra arguments.  The @var{msgid1} parameter must contain the singular
Packit 6c4009
form of the string to be converted.  It is also used as the key for the
Packit 6c4009
search in the catalog.  The @var{msgid2} parameter is the plural form.
Packit 6c4009
The parameter @var{n} is used to determine the plural form.  If no
Packit 6c4009
message catalog is found @var{msgid1} is returned if @code{n == 1},
Packit 6c4009
otherwise @code{msgid2}.
Packit 6c4009
Packit 6c4009
An example for the use of this function is:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
  printf (ngettext ("%d file removed", "%d files removed", n), n);
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Please note that the numeric value @var{n} has to be passed to the
Packit 6c4009
@code{printf} function as well.  It is not sufficient to pass it only to
Packit 6c4009
@code{ngettext}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
Packit 6c4009
@c Wrapper for dcngettext.
Packit 6c4009
The @code{dngettext} is similar to the @code{dgettext} function in the
Packit 6c4009
way the message catalog is selected.  The difference is that it takes
Packit 6c4009
two extra parameters to provide the correct plural form.  These two
Packit 6c4009
parameters are handled in the same way @code{ngettext} handles them.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
Packit 6c4009
@c Wrapper for dcigettext.
Packit 6c4009
The @code{dcngettext} is similar to the @code{dcgettext} function in the
Packit 6c4009
way the message catalog is selected.  The difference is that it takes
Packit 6c4009
two extra parameters to provide the correct plural form.  These two
Packit 6c4009
parameters are handled in the same way @code{ngettext} handles them.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@subsubheading The problem of plural forms
Packit 6c4009
Packit 6c4009
A description of the problem can be found at the beginning of the last
Packit 6c4009
section.  Now there is the question how to solve it.  Without the input
Packit 6c4009
of linguists (which was not available) it was not possible to determine
Packit 6c4009
whether there are only a few different forms in which plural forms are
Packit 6c4009
formed or whether the number can increase with every new supported
Packit 6c4009
language.
Packit 6c4009
Packit 6c4009
Therefore the solution implemented is to allow the translator to specify
Packit 6c4009
the rules of how to select the plural form.  Since the formula varies
Packit 6c4009
with every language this is the only viable solution except for
Packit 6c4009
hardcoding the information in the code (which still would require the
Packit 6c4009
possibility of extensions to not prevent the use of new languages).  The
Packit 6c4009
details are explained in the GNU @code{gettext} manual.  Here only a
Packit 6c4009
bit of information is provided.
Packit 6c4009
Packit 6c4009
The information about the plural form selection has to be stored in the
Packit 6c4009
header entry (the one with the empty @code{msgid} string).  It looks
Packit 6c4009
like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
The @code{nplurals} value must be a decimal number which specifies how
Packit 6c4009
many different plural forms exist for this language.  The string
Packit 6c4009
following @code{plural} is an expression using the C language
Packit 6c4009
syntax.  Exceptions are that no negative numbers are allowed, numbers
Packit 6c4009
must be decimal, and the only variable allowed is @code{n}.  This
Packit 6c4009
expression will be evaluated whenever one of the functions
Packit 6c4009
@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
Packit 6c4009
numeric value passed to these functions is then substituted for all uses
Packit 6c4009
of the variable @code{n} in the expression.  The resulting value then
Packit 6c4009
must be greater or equal to zero and smaller than the value given as the
Packit 6c4009
value of @code{nplurals}.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
The following rules are known at this point.  The language with families
Packit 6c4009
are listed.  But this does not necessarily mean the information can be
Packit 6c4009
generalized for the whole family (as can be easily seen in the table
Packit 6c4009
below).@footnote{Additions are welcome.  Send appropriate information to
Packit 6c4009
@email{bug-glibc-manual@@gnu.org}.}
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Only one form:
Packit 6c4009
Some languages only require one single form.  There is no distinction
Packit 6c4009
between the singular and plural form.  An appropriate header entry
Packit 6c4009
would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=1; plural=0;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Finno-Ugric family
Packit 6c4009
Hungarian
Packit 6c4009
@item Asian family
Packit 6c4009
Japanese, Korean
Packit 6c4009
@item Turkic/Altaic family
Packit 6c4009
Turkish
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Two forms, singular used for one only
Packit 6c4009
This is the form used in most existing programs since it is what English
Packit 6c4009
uses.  A header entry would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=2; plural=n != 1;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
(Note: this uses the feature of C expressions that boolean expressions
Packit 6c4009
have to value zero or one.)
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Germanic family
Packit 6c4009
Danish, Dutch, English, German, Norwegian, Swedish
Packit 6c4009
@item Finno-Ugric family
Packit 6c4009
Estonian, Finnish
Packit 6c4009
@item Latin/Greek family
Packit 6c4009
Greek
Packit 6c4009
@item Semitic family
Packit 6c4009
Hebrew
Packit 6c4009
@item Romance family
Packit 6c4009
Italian, Portuguese, Spanish
Packit 6c4009
@item Artificial
Packit 6c4009
Esperanto
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Two forms, singular used for zero and one
Packit 6c4009
Exceptional case in the language family.  The header entry would be:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=2; plural=n>1;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Romanic family
Packit 6c4009
French, Brazilian Portuguese
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Three forms, special case for zero
Packit 6c4009
The header entry would be:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Baltic family
Packit 6c4009
Latvian
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Three forms, special cases for one and two
Packit 6c4009
The header entry would be:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Celtic
Packit 6c4009
Gaeilge (Irish)
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Three forms, special case for numbers ending in 1[2-9]
Packit 6c4009
The header entry would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=3; \
Packit 6c4009
    plural=n%10==1 && n%100!=11 ? 0 : \
Packit 6c4009
           n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Baltic family
Packit 6c4009
Lithuanian
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
Packit 6c4009
The header entry would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=3; \
Packit 6c4009
    plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Slavic family
Packit 6c4009
Croatian, Czech, Russian, Ukrainian
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Three forms, special cases for 1 and 2, 3, 4
Packit 6c4009
The header entry would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=3; \
Packit 6c4009
    plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Slavic family
Packit 6c4009
Slovak
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Three forms, special case for one and some numbers ending in 2, 3, or 4
Packit 6c4009
The header entry would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=3; \
Packit 6c4009
    plural=n==1 ? 0 : \
Packit 6c4009
           n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Slavic family
Packit 6c4009
Polish
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@item Four forms, special case for one and all numbers ending in 02, 03, or 04
Packit 6c4009
The header entry would look like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Plural-Forms: nplurals=4; \
Packit 6c4009
    plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
Languages with this property include:
Packit 6c4009
Packit 6c4009
@table @asis
Packit 6c4009
@item Slavic family
Packit 6c4009
Slovenian
Packit 6c4009
@end table
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Charset conversion in gettext
Packit 6c4009
@subsubsection How to specify the output character set @code{gettext} uses
Packit 6c4009
Packit 6c4009
@code{gettext} not only looks up a translation in a message catalog, it
Packit 6c4009
also converts the translation on the fly to the desired output character
Packit 6c4009
set.  This is useful if the user is working in a different character set
Packit 6c4009
than the translator who created the message catalog, because it avoids
Packit 6c4009
distributing variants of message catalogs which differ only in the
Packit 6c4009
character set.
Packit 6c4009
Packit 6c4009
The output character set is, by default, the value of @code{nl_langinfo
Packit 6c4009
(CODESET)}, which depends on the @code{LC_CTYPE} part of the current
Packit 6c4009
locale.  But programs which store strings in a locale independent way
Packit 6c4009
(e.g. UTF-8) can request that @code{gettext} and related functions
Packit 6c4009
return the translations in that encoding, by use of the
Packit 6c4009
@code{bind_textdomain_codeset} function.
Packit 6c4009
Packit 6c4009
Note that the @var{msgid} argument to @code{gettext} is not subject to
Packit 6c4009
character set conversion.  Also, when @code{gettext} does not find a
Packit 6c4009
translation for @var{msgid}, it returns @var{msgid} unchanged --
Packit 6c4009
independently of the current output character set.  It is therefore
Packit 6c4009
recommended that all @var{msgid}s be US-ASCII strings.
Packit 6c4009
Packit 6c4009
@deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
Packit 6c4009
@standards{GNU, libintl.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
Packit 6c4009
@c bind_textdomain_codeset @ascuheap @acsmem
Packit 6c4009
@c  set_binding_values dup @ascuheap @acsmem
Packit 6c4009
The @code{bind_textdomain_codeset} function can be used to specify the
Packit 6c4009
output character set for message catalogs for domain @var{domainname}.
Packit 6c4009
The @var{codeset} argument must be a valid codeset name which can be used
Packit 6c4009
for the @code{iconv_open} function, or a null pointer.
Packit 6c4009
Packit 6c4009
If the @var{codeset} parameter is the null pointer,
Packit 6c4009
@code{bind_textdomain_codeset} returns the currently selected codeset
Packit 6c4009
for the domain with the name @var{domainname}.  It returns @code{NULL} if
Packit 6c4009
no codeset has yet been selected.
Packit 6c4009
Packit 6c4009
The @code{bind_textdomain_codeset} function can be used several times.
Packit 6c4009
If used multiple times with the same @var{domainname} argument, the
Packit 6c4009
later call overrides the settings made by the earlier one.
Packit 6c4009
Packit 6c4009
The @code{bind_textdomain_codeset} function returns a pointer to a
Packit 6c4009
string containing the name of the selected codeset.  The string is
Packit 6c4009
allocated internally in the function and must not be changed by the
Packit 6c4009
user.  If the system went out of core during the execution of
Packit 6c4009
@code{bind_textdomain_codeset}, the return value is @code{NULL} and the
Packit 6c4009
global variable @var{errno} is set accordingly.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node GUI program problems
Packit 6c4009
@subsubsection How to use @code{gettext} in GUI programs
Packit 6c4009
Packit 6c4009
One place where the @code{gettext} functions, if used normally, have big
Packit 6c4009
problems is within programs with graphical user interfaces (GUIs).  The
Packit 6c4009
problem is that many of the strings which have to be translated are very
Packit 6c4009
short.  They have to appear in pull-down menus which restricts the
Packit 6c4009
length.  But strings which are not containing entire sentences or at
Packit 6c4009
least large fragments of a sentence may appear in more than one
Packit 6c4009
situation in the program but might have different translations.  This is
Packit 6c4009
especially true for the one-word strings which are frequently used in
Packit 6c4009
GUI programs.
Packit 6c4009
Packit 6c4009
As a consequence many people say that the @code{gettext} approach is
Packit 6c4009
wrong and instead @code{catgets} should be used which indeed does not
Packit 6c4009
have this problem.  But there is a very simple and powerful method to
Packit 6c4009
handle these kind of problems with the @code{gettext} functions.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
As an example consider the following fictional situation.  A GUI program
Packit 6c4009
has a menu bar with the following entries:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
+------------+------------+--------------------------------------+
Packit 6c4009
| File       | Printer    |                                      |
Packit 6c4009
+------------+------------+--------------------------------------+
Packit 6c4009
| Open     | | Select   |
Packit 6c4009
| New      | | Open     |
Packit 6c4009
+----------+ | Connect  |
Packit 6c4009
             +----------+
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
To have the strings @code{File}, @code{Printer}, @code{Open},
Packit 6c4009
@code{New}, @code{Select}, and @code{Connect} translated there has to be
Packit 6c4009
at some point in the code a call to a function of the @code{gettext}
Packit 6c4009
family.  But in two places the string passed into the function would be
Packit 6c4009
@code{Open}.  The translations might not be the same and therefore we
Packit 6c4009
are in the dilemma described above.
Packit 6c4009
Packit 6c4009
One solution to this problem is to artificially extend the strings
Packit 6c4009
to make them unambiguous.  But what would the program do if no
Packit 6c4009
translation is available?  The extended string is not what should be
Packit 6c4009
printed.  So we should use a slightly modified version of the functions.
Packit 6c4009
Packit 6c4009
To extend the strings a uniform method should be used.  E.g., in the
Packit 6c4009
example above, the strings could be chosen as
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
Menu|File
Packit 6c4009
Menu|Printer
Packit 6c4009
Menu|File|Open
Packit 6c4009
Menu|File|New
Packit 6c4009
Menu|Printer|Select
Packit 6c4009
Menu|Printer|Open
Packit 6c4009
Menu|Printer|Connect
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Now all the strings are different and if now instead of @code{gettext}
Packit 6c4009
the following little wrapper function is used, everything works just
Packit 6c4009
fine:
Packit 6c4009
Packit 6c4009
@cindex sgettext
Packit 6c4009
@smallexample
Packit 6c4009
  char *
Packit 6c4009
  sgettext (const char *msgid)
Packit 6c4009
  @{
Packit 6c4009
    char *msgval = gettext (msgid);
Packit 6c4009
    if (msgval == msgid)
Packit 6c4009
      msgval = strrchr (msgid, '|') + 1;
Packit 6c4009
    return msgval;
Packit 6c4009
  @}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
What this little function does is to recognize the case when no
Packit 6c4009
translation is available.  This can be done very efficiently by a
Packit 6c4009
pointer comparison since the return value is the input value.  If there
Packit 6c4009
is no translation we know that the input string is in the format we used
Packit 6c4009
for the Menu entries and therefore contains a @code{|} character.  We
Packit 6c4009
simply search for the last occurrence of this character and return a
Packit 6c4009
pointer to the character following it.  That's it!
Packit 6c4009
Packit 6c4009
If one now consistently uses the extended string form and replaces
Packit 6c4009
the @code{gettext} calls with calls to @code{sgettext} (this is normally
Packit 6c4009
limited to very few places in the GUI implementation) then it is
Packit 6c4009
possible to produce a program which can be internationalized.
Packit 6c4009
Packit 6c4009
With advanced compilers (such as GNU C) one can write the
Packit 6c4009
@code{sgettext} functions as an inline function or as a macro like this:
Packit 6c4009
Packit 6c4009
@cindex sgettext
Packit 6c4009
@smallexample
Packit 6c4009
#define sgettext(msgid) \
Packit 6c4009
  (@{ const char *__msgid = (msgid);            \
Packit 6c4009
     char *__msgstr = gettext (__msgid);       \
Packit 6c4009
     if (__msgval == __msgid)                  \
Packit 6c4009
       __msgval = strrchr (__msgid, '|') + 1;  \
Packit 6c4009
     __msgval; @})
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
Packit 6c4009
and the @code{ngettext} equivalents) can and should have corresponding
Packit 6c4009
functions as well which look almost identical, except for the parameters
Packit 6c4009
and the call to the underlying function.
Packit 6c4009
Packit 6c4009
Now there is of course the question why such functions do not exist in
Packit 6c4009
@theglibc{}?  There are two parts of the answer to this question.
Packit 6c4009
Packit 6c4009
@itemize @bullet
Packit 6c4009
@item
Packit 6c4009
They are easy to write and therefore can be provided by the project they
Packit 6c4009
are used in.  This is not an answer by itself and must be seen together
Packit 6c4009
with the second part which is:
Packit 6c4009
Packit 6c4009
@item
Packit 6c4009
There is no way the C library can contain a version which can work
Packit 6c4009
everywhere.  The problem is the selection of the character to separate
Packit 6c4009
the prefix from the actual string in the extended string.  The
Packit 6c4009
examples above used @code{|} which is a quite good choice because it
Packit 6c4009
resembles a notation frequently used in this context and it also is a
Packit 6c4009
character not often used in message strings.
Packit 6c4009
Packit 6c4009
But what if the character is used in message strings.  Or if the chose
Packit 6c4009
character is not available in the character set on the machine one
Packit 6c4009
compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
Packit 6c4009
why the @file{iso646.h} file exists in @w{ISO C} programming environments).
Packit 6c4009
@end itemize
Packit 6c4009
Packit 6c4009
There is only one more comment to make left.  The wrapper function above
Packit 6c4009
requires that the translations strings are not extended themselves.
Packit 6c4009
This is only logical.  There is no need to disambiguate the strings
Packit 6c4009
(since they are never used as keys for a search) and one also saves
Packit 6c4009
quite some memory and disk space by doing this.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Using gettextized software
Packit 6c4009
@subsubsection User influence on @code{gettext}
Packit 6c4009
Packit 6c4009
The last sections described what the programmer can do to
Packit 6c4009
internationalize the messages of the program.  But it is finally up to
Packit 6c4009
the user to select the message s/he wants to see.  S/He must understand
Packit 6c4009
them.
Packit 6c4009
Packit 6c4009
The POSIX locale model uses the environment variables @code{LC_COLLATE},
Packit 6c4009
@code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC},
Packit 6c4009
and @code{LC_TIME} to select the locale which is to be used.  This way
Packit 6c4009
the user can influence lots of functions.  As we mentioned above, the
Packit 6c4009
@code{gettext} functions also take advantage of this.
Packit 6c4009
Packit 6c4009
To understand how this happens it is necessary to take a look at the
Packit 6c4009
various components of the filename which gets computed to locate a
Packit 6c4009
message catalog.  It is composed as follows:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
@var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
The default value for @var{dir_name} is system specific.  It is computed
Packit 6c4009
from the value given as the prefix while configuring the C library.
Packit 6c4009
This value normally is @file{/usr} or @file{/}.  For the former the
Packit 6c4009
complete @var{dir_name} is:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
/usr/share/locale
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
We can use @file{/usr/share} since the @file{.mo} files containing the
Packit 6c4009
message catalogs are system independent, so all systems can use the same
Packit 6c4009
files.  If the program executed the @code{bindtextdomain} function for
Packit 6c4009
the message domain that is currently handled, the @code{dir_name}
Packit 6c4009
component is exactly the value which was given to the function as
Packit 6c4009
the second parameter.  I.e., @code{bindtextdomain} allows overwriting
Packit 6c4009
the only system dependent and fixed value to make it possible to
Packit 6c4009
address files anywhere in the filesystem.
Packit 6c4009
Packit 6c4009
The @var{category} is the name of the locale category which was selected
Packit 6c4009
in the program code.  For @code{gettext} and @code{dgettext} this is
Packit 6c4009
always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
Packit 6c4009
value of the third parameter.  As said above it should be avoided to
Packit 6c4009
ever use a category other than @code{LC_MESSAGES}.
Packit 6c4009
Packit 6c4009
The @var{locale} component is computed based on the category used.  Just
Packit 6c4009
like for the @code{setlocale} function here comes the user selection
Packit 6c4009
into the play.  Some environment variables are examined in a fixed order
Packit 6c4009
and the first environment variable set determines the return value of
Packit 6c4009
the lookup process.  In detail, for the category @code{LC_xxx} the
Packit 6c4009
following variables in this order are examined:
Packit 6c4009
Packit 6c4009
@table @code
Packit 6c4009
@item LANGUAGE
Packit 6c4009
@item LC_ALL
Packit 6c4009
@item LC_xxx
Packit 6c4009
@item LANG
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
This looks very familiar.  With the exception of the @code{LANGUAGE}
Packit 6c4009
environment variable this is exactly the lookup order the
Packit 6c4009
@code{setlocale} function uses.  But why introduce the @code{LANGUAGE}
Packit 6c4009
variable?
Packit 6c4009
Packit 6c4009
The reason is that the syntax of the values these variables can have is
Packit 6c4009
different to what is expected by the @code{setlocale} function.  If we
Packit 6c4009
would set @code{LC_ALL} to a value following the extended syntax that
Packit 6c4009
would mean the @code{setlocale} function will never be able to use the
Packit 6c4009
value of this variable as well.  An additional variable removes this
Packit 6c4009
problem plus we can select the language independently of the locale
Packit 6c4009
setting which sometimes is useful.
Packit 6c4009
Packit 6c4009
While for the @code{LC_xxx} variables the value should consist of
Packit 6c4009
exactly one specification of a locale the @code{LANGUAGE} variable's
Packit 6c4009
value can consist of a colon separated list of locale names.  The
Packit 6c4009
attentive reader will realize that this is the way we manage to
Packit 6c4009
implement one of our additional demands above: we want to be able to
Packit 6c4009
specify an ordered list of languages.
Packit 6c4009
Packit 6c4009
Back to the constructed filename we have only one component missing.
Packit 6c4009
The @var{domain_name} part is the name which was either registered using
Packit 6c4009
the @code{textdomain} function or which was given to @code{dgettext} or
Packit 6c4009
@code{dcgettext} as the first parameter.  Now it becomes obvious that a
Packit 6c4009
good choice for the domain name in the program code is a string which is
Packit 6c4009
closely related to the program/package name.  E.g., for @theglibc{}
Packit 6c4009
the domain name is @code{libc}.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
A limited piece of example code should show how the program is supposed
Packit 6c4009
to work:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
@{
Packit 6c4009
  setlocale (LC_ALL, "");
Packit 6c4009
  textdomain ("test-package");
Packit 6c4009
  bindtextdomain ("test-package", "/usr/local/share/locale");
Packit 6c4009
  puts (gettext ("Hello, world!"));
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
At the program start the default domain is @code{messages}, and the
Packit 6c4009
default locale is "C".  The @code{setlocale} call sets the locale
Packit 6c4009
according to the user's environment variables; remember that correct
Packit 6c4009
functioning of @code{gettext} relies on the correct setting of the
Packit 6c4009
@code{LC_MESSAGES} locale (for looking up the message catalog) and
Packit 6c4009
of the @code{LC_CTYPE} locale (for the character set conversion).
Packit 6c4009
The @code{textdomain} call changes the default domain to
Packit 6c4009
@code{test-package}.  The @code{bindtextdomain} call specifies that
Packit 6c4009
the message catalogs for the domain @code{test-package} can be found
Packit 6c4009
below the directory @file{/usr/local/share/locale}.
Packit 6c4009
Packit 6c4009
If the user sets in her/his environment the variable @code{LANGUAGE}
Packit 6c4009
to @code{de} the @code{gettext} function will try to use the
Packit 6c4009
translations from the file
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
/usr/local/share/locale/de/LC_MESSAGES/test-package.mo
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
From the above descriptions it should be clear which component of this
Packit 6c4009
filename is determined by which source.
Packit 6c4009
Packit 6c4009
In the above example we assumed the @code{LANGUAGE} environment
Packit 6c4009
variable to be @code{de}.  This might be an appropriate selection but what
Packit 6c4009
happens if the user wants to use @code{LC_ALL} because of the wider
Packit 6c4009
usability and here the required value is @code{de_DE.ISO-8859-1}?  We
Packit 6c4009
already mentioned above that a situation like this is not infrequent.
Packit 6c4009
E.g., a person might prefer reading a dialect and if this is not
Packit 6c4009
available fall back on the standard language.
Packit 6c4009
Packit 6c4009
The @code{gettext} functions know about situations like this and can
Packit 6c4009
handle them gracefully.  The functions recognize the format of the value
Packit 6c4009
of the environment variable.  It can split the value is different pieces
Packit 6c4009
and by leaving out the only or the other part it can construct new
Packit 6c4009
values.  This happens of course in a predictable way.  To understand
Packit 6c4009
this one must know the format of the environment variable value.  There
Packit 6c4009
is one more or less standardized form, originally from the X/Open
Packit 6c4009
specification:
Packit 6c4009
Packit 6c4009
@code{language[_territory[.codeset]][@@modifier]}
Packit 6c4009
Packit 6c4009
Less specific locale names will be stripped in the order of the
Packit 6c4009
following list:
Packit 6c4009
Packit 6c4009
@enumerate
Packit 6c4009
@item
Packit 6c4009
@code{codeset}
Packit 6c4009
@item
Packit 6c4009
@code{normalized codeset}
Packit 6c4009
@item
Packit 6c4009
@code{territory}
Packit 6c4009
@item
Packit 6c4009
@code{modifier}
Packit 6c4009
@end enumerate
Packit 6c4009
Packit 6c4009
The @code{language} field will never be dropped for obvious reasons.
Packit 6c4009
Packit 6c4009
The only new thing is the @code{normalized codeset} entry.  This is
Packit 6c4009
another goodie which is introduced to help reduce the chaos which
Packit 6c4009
derives from the inability of people to standardize the names of
Packit 6c4009
character sets.  Instead of @w{ISO-8859-1} one can often see @w{8859-1},
Packit 6c4009
@w{88591}, @w{iso8859-1}, or @w{iso_8859-1}.  The @code{normalized
Packit 6c4009
codeset} value is generated from the user-provided character set name by
Packit 6c4009
applying the following rules:
Packit 6c4009
Packit 6c4009
@enumerate
Packit 6c4009
@item
Packit 6c4009
Remove all characters besides numbers and letters.
Packit 6c4009
@item
Packit 6c4009
Fold letters to lowercase.
Packit 6c4009
@item
Packit 6c4009
If the same only contains digits prepend the string @code{"iso"}.
Packit 6c4009
@end enumerate
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
So all of the above names will be normalized to @code{iso88591}.  This
Packit 6c4009
allows the program user much more freedom in choosing the locale name.
Packit 6c4009
Packit 6c4009
Even this extended functionality still does not help to solve the
Packit 6c4009
problem that completely different names can be used to denote the same
Packit 6c4009
locale (e.g., @code{de} and @code{german}).  To be of help in this
Packit 6c4009
situation the locale implementation and also the @code{gettext}
Packit 6c4009
functions know about aliases.
Packit 6c4009
Packit 6c4009
The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
Packit 6c4009
whatever prefix you used for configuring the C library) contains a
Packit 6c4009
mapping of alternative names to more regular names.  The system manager
Packit 6c4009
is free to add new entries to fill her/his own needs.  The selected
Packit 6c4009
locale from the environment is compared with the entries in the first
Packit 6c4009
column of this file ignoring the case.  If they match, the value of the
Packit 6c4009
second column is used instead for the further handling.
Packit 6c4009
Packit 6c4009
In the description of the format of the environment variables we already
Packit 6c4009
mentioned the character set as a factor in the selection of the message
Packit 6c4009
catalog.  In fact, only catalogs which contain text written using the
Packit 6c4009
character set of the system/program can be used (directly; there will
Packit 6c4009
come a solution for this some day).  This means for the user that s/he
Packit 6c4009
will always have to take care of this.  If in the collection of the
Packit 6c4009
message catalogs there are files for the same language but coded using
Packit 6c4009
different character sets the user has to be careful.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Helper programs for gettext
Packit 6c4009
@subsection Programs to handle message catalogs for @code{gettext}
Packit 6c4009
Packit 6c4009
@Theglibc{} does not contain the source code for the programs to
Packit 6c4009
handle message catalogs for the @code{gettext} functions.  As part of
Packit 6c4009
the GNU project the GNU gettext package contains everything the
Packit 6c4009
developer needs.  The functionality provided by the tools in this
Packit 6c4009
package by far exceeds the abilities of the @code{gencat} program
Packit 6c4009
described above for the @code{catgets} functions.
Packit 6c4009
Packit 6c4009
There is a program @code{msgfmt} which is the equivalent program to the
Packit 6c4009
@code{gencat} program.  It generates from the human-readable and
Packit 6c4009
-editable form of the message catalog a binary file which can be used by
Packit 6c4009
the @code{gettext} functions.  But there are several more programs
Packit 6c4009
available.
Packit 6c4009
Packit 6c4009
The @code{xgettext} program can be used to automatically extract the
Packit 6c4009
translatable messages from a source file.  I.e., the programmer need not
Packit 6c4009
take care of the translations and the list of messages which have to be
Packit 6c4009
translated.  S/He will simply wrap the translatable string in calls to
Packit 6c4009
@code{gettext} et.al and the rest will be done by @code{xgettext}.  This
Packit 6c4009
program has a lot of options which help to customize the output or
Packit 6c4009
help to understand the input better.
Packit 6c4009
Packit 6c4009
Other programs help to manage the development cycle when new messages appear
Packit 6c4009
in the source files or when a new translation of the messages appears.
Packit 6c4009
Here it should only be noted that using all the tools in GNU gettext it
Packit 6c4009
is possible to @emph{completely} automate the handling of message
Packit 6c4009
catalogs.  Besides marking the translatable strings in the source code and
Packit 6c4009
generating the translations the developers do not have anything to do
Packit 6c4009
themselves.