Blame manual/ctype.texi

Packit 6c4009
@node Character Handling, String and Array Utilities, Memory, Top
Packit 6c4009
@c %MENU% Character testing and conversion functions
Packit 6c4009
@chapter Character Handling
Packit 6c4009
Packit 6c4009
Programs that work with characters and strings often need to classify a
Packit 6c4009
character---is it alphabetic, is it a digit, is it whitespace, and so
Packit 6c4009
on---and perform case conversion operations on characters.  The
Packit 6c4009
functions in the header file @file{ctype.h} are provided for this
Packit 6c4009
purpose.
Packit 6c4009
@pindex ctype.h
Packit 6c4009
Packit 6c4009
Since the choice of locale and character set can alter the
Packit 6c4009
classifications of particular character codes, all of these functions
Packit 6c4009
are affected by the current locale.  (More precisely, they are affected
Packit 6c4009
by the locale currently selected for character classification---the
Packit 6c4009
@code{LC_CTYPE} category; see @ref{Locale Categories}.)
Packit 6c4009
Packit 6c4009
The @w{ISO C} standard specifies two different sets of functions.  The
Packit 6c4009
one set works on @code{char} type characters, the other one on
Packit 6c4009
@code{wchar_t} wide characters (@pxref{Extended Char Intro}).
Packit 6c4009
Packit 6c4009
@menu
Packit 6c4009
* Classification of Characters::       Testing whether characters are
Packit 6c4009
			                letters, digits, punctuation, etc.
Packit 6c4009
Packit 6c4009
* Case Conversion::                    Case mapping, and the like.
Packit 6c4009
* Classification of Wide Characters::  Character class determination for
Packit 6c4009
                                        wide characters.
Packit 6c4009
* Using Wide Char Classes::            Notes on using the wide character
Packit 6c4009
                                        classes.
Packit 6c4009
* Wide Character Case Conversion::     Mapping of wide characters.
Packit 6c4009
@end menu
Packit 6c4009
Packit 6c4009
@node Classification of Characters, Case Conversion,  , Character Handling
Packit 6c4009
@section Classification of Characters
Packit 6c4009
@cindex character testing
Packit 6c4009
@cindex classification of characters
Packit 6c4009
@cindex predicates on characters
Packit 6c4009
@cindex character predicates
Packit 6c4009
Packit 6c4009
This section explains the library functions for classifying characters.
Packit 6c4009
For example, @code{isalpha} is the function to test for an alphabetic
Packit 6c4009
character.  It takes one argument, the character to test, and returns a
Packit 6c4009
nonzero integer if the character is alphabetic, and zero otherwise.  You
Packit 6c4009
would use it like this:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
if (isalpha (c))
Packit 6c4009
  printf ("The character `%c' is alphabetic.\n", c);
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Each of the functions in this section tests for membership in a
Packit 6c4009
particular class of characters; each has a name starting with @samp{is}.
Packit 6c4009
Each of them takes one argument, which is a character to test, and
Packit 6c4009
returns an @code{int} which is treated as a boolean value.  The
Packit 6c4009
character argument is passed as an @code{int}, and it may be the
Packit 6c4009
constant value @code{EOF} instead of a real character.
Packit 6c4009
Packit 6c4009
The attributes of any given character can vary between locales.
Packit 6c4009
@xref{Locales}, for more information on locales.@refill
Packit 6c4009
Packit 6c4009
These functions are declared in the header file @file{ctype.h}.
Packit 6c4009
@pindex ctype.h
Packit 6c4009
Packit 6c4009
@cindex lower-case character
Packit 6c4009
@deftypefun int islower (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
@c The is* macros call __ctype_b_loc to get the ctype array from the
Packit 6c4009
@c current locale, and then index it by c.  __ctype_b_loc reads from
Packit 6c4009
@c thread-local memory the (indirect) pointer to the ctype array, which
Packit 6c4009
@c may involve one word access to the global locale object, if that's
Packit 6c4009
@c the active locale for the thread, and the array, being part of the
Packit 6c4009
@c locale data, is undeletable, so there's no thread-safety issue.  We
Packit 6c4009
@c might want to mark these with @mtslocale to flag to callers that
Packit 6c4009
@c changing locales might affect them, even if not these simpler
Packit 6c4009
@c functions.
Packit 6c4009
Returns true if @var{c} is a lower-case letter.  The letter need not be
Packit 6c4009
from the Latin alphabet, any alphabet representable is valid.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex upper-case character
Packit 6c4009
@deftypefun int isupper (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is an upper-case letter.  The letter need not be
Packit 6c4009
from the Latin alphabet, any alphabet representable is valid.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex alphabetic character
Packit 6c4009
@deftypefun int isalpha (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is an alphabetic character (a letter).  If
Packit 6c4009
@code{islower} or @code{isupper} is true of a character, then
Packit 6c4009
@code{isalpha} is also true.
Packit 6c4009
Packit 6c4009
In some locales, there may be additional characters for which
Packit 6c4009
@code{isalpha} is true---letters which are neither upper case nor lower
Packit 6c4009
case.  But in the standard @code{"C"} locale, there are no such
Packit 6c4009
additional characters.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex digit character
Packit 6c4009
@cindex decimal digit character
Packit 6c4009
@deftypefun int isdigit (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a decimal digit (@samp{0} through @samp{9}).
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex alphanumeric character
Packit 6c4009
@deftypefun int isalnum (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is an alphanumeric character (a letter or
Packit 6c4009
number); in other words, if either @code{isalpha} or @code{isdigit} is
Packit 6c4009
true of a character, then @code{isalnum} is also true.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex hexadecimal digit character
Packit 6c4009
@deftypefun int isxdigit (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a hexadecimal digit.
Packit 6c4009
Hexadecimal digits include the normal decimal digits @samp{0} through
Packit 6c4009
@samp{9} and the letters @samp{A} through @samp{F} and
Packit 6c4009
@samp{a} through @samp{f}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex punctuation character
Packit 6c4009
@deftypefun int ispunct (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a punctuation character.
Packit 6c4009
This means any printing character that is not alphanumeric or a space
Packit 6c4009
character.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex whitespace character
Packit 6c4009
@deftypefun int isspace (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a @dfn{whitespace} character.  In the standard
Packit 6c4009
@code{"C"} locale, @code{isspace} returns true for only the standard
Packit 6c4009
whitespace characters:
Packit 6c4009
Packit 6c4009
@table @code
Packit 6c4009
@item ' '
Packit 6c4009
space
Packit 6c4009
Packit 6c4009
@item '\f'
Packit 6c4009
formfeed
Packit 6c4009
Packit 6c4009
@item '\n'
Packit 6c4009
newline
Packit 6c4009
Packit 6c4009
@item '\r'
Packit 6c4009
carriage return
Packit 6c4009
Packit 6c4009
@item '\t'
Packit 6c4009
horizontal tab
Packit 6c4009
Packit 6c4009
@item '\v'
Packit 6c4009
vertical tab
Packit 6c4009
@end table
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex blank character
Packit 6c4009
@deftypefun int isblank (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a blank character; that is, a space or a tab.
Packit 6c4009
This function was originally a GNU extension, but was added in @w{ISO C99}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex graphic character
Packit 6c4009
@deftypefun int isgraph (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a graphic character; that is, a character
Packit 6c4009
that has a glyph associated with it.  The whitespace characters are not
Packit 6c4009
considered graphic.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex printing character
Packit 6c4009
@deftypefun int isprint (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a printing character.  Printing characters
Packit 6c4009
include all the graphic characters, plus the space (@samp{ }) character.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex control character
Packit 6c4009
@deftypefun int iscntrl (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a control character (that is, a character that
Packit 6c4009
is not a printing character).
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex ASCII character
Packit 6c4009
@deftypefun int isascii (int @var{c})
Packit 6c4009
@standards{SVID, ctype.h}
Packit 6c4009
@standards{BSD, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{c} is a 7-bit @code{unsigned char} value that fits
Packit 6c4009
into the US/UK ASCII character set.  This function is a BSD extension
Packit 6c4009
and is also an SVID extension.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@node Case Conversion, Classification of Wide Characters, Classification of Characters, Character Handling
Packit 6c4009
@section Case Conversion
Packit 6c4009
@cindex character case conversion
Packit 6c4009
@cindex case conversion of characters
Packit 6c4009
@cindex converting case of characters
Packit 6c4009
Packit 6c4009
This section explains the library functions for performing conversions
Packit 6c4009
such as case mappings on characters.  For example, @code{toupper}
Packit 6c4009
converts any character to upper case if possible.  If the character
Packit 6c4009
can't be converted, @code{toupper} returns it unchanged.
Packit 6c4009
Packit 6c4009
These functions take one argument of type @code{int}, which is the
Packit 6c4009
character to convert, and return the converted character as an
Packit 6c4009
@code{int}.  If the conversion is not applicable to the argument given,
Packit 6c4009
the argument is returned unchanged.
Packit 6c4009
Packit 6c4009
@strong{Compatibility Note:} In pre-@w{ISO C} dialects, instead of
Packit 6c4009
returning the argument unchanged, these functions may fail when the
Packit 6c4009
argument is not suitable for the conversion.  Thus for portability, you
Packit 6c4009
may need to write @code{islower(c) ? toupper(c) : c} rather than just
Packit 6c4009
@code{toupper(c)}.
Packit 6c4009
Packit 6c4009
These functions are declared in the header file @file{ctype.h}.
Packit 6c4009
@pindex ctype.h
Packit 6c4009
Packit 6c4009
@deftypefun int tolower (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
@c The to* macros/functions call different functions that use different
Packit 6c4009
@c arrays than those of__ctype_b_loc, but the access patterns and
Packit 6c4009
@c thus safety guarantees are the same.
Packit 6c4009
If @var{c} is an upper-case letter, @code{tolower} returns the corresponding
Packit 6c4009
lower-case letter.  If @var{c} is not an upper-case letter,
Packit 6c4009
@var{c} is returned unchanged.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun int toupper (int @var{c})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
If @var{c} is a lower-case letter, @code{toupper} returns the corresponding
Packit 6c4009
upper-case letter.  Otherwise @var{c} is returned unchanged.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun int toascii (int @var{c})
Packit 6c4009
@standards{SVID, ctype.h}
Packit 6c4009
@standards{BSD, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
This function converts @var{c} to a 7-bit @code{unsigned char} value
Packit 6c4009
that fits into the US/UK ASCII character set, by clearing the high-order
Packit 6c4009
bits.  This function is a BSD extension and is also an SVID extension.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun int _tolower (int @var{c})
Packit 6c4009
@standards{SVID, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
This is identical to @code{tolower}, and is provided for compatibility
Packit 6c4009
with the SVID.  @xref{SVID}.@refill
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun int _toupper (int @var{c})
Packit 6c4009
@standards{SVID, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
This is identical to @code{toupper}, and is provided for compatibility
Packit 6c4009
with the SVID.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Classification of Wide Characters, Using Wide Char Classes, Case Conversion, Character Handling
Packit 6c4009
@section Character class determination for wide characters
Packit 6c4009
Packit 6c4009
@w{Amendment 1} to @w{ISO C90} defines functions to classify wide
Packit 6c4009
characters.  Although the original @w{ISO C90} standard already defined
Packit 6c4009
the type @code{wchar_t}, no functions operating on them were defined.
Packit 6c4009
Packit 6c4009
The general design of the classification functions for wide characters
Packit 6c4009
is more general.  It allows extensions to the set of available
Packit 6c4009
classifications, beyond those which are always available.  The POSIX
Packit 6c4009
standard specifies how extensions can be made, and this is already
Packit 6c4009
implemented in the @glibcadj{} implementation of the @code{localedef}
Packit 6c4009
program.
Packit 6c4009
Packit 6c4009
The character class functions are normally implemented with bitsets,
Packit 6c4009
with a bitset per character.  For a given character, the appropriate
Packit 6c4009
bitset is read from a table and a test is performed as to whether a
Packit 6c4009
certain bit is set.  Which bit is tested for is determined by the
Packit 6c4009
class.
Packit 6c4009
Packit 6c4009
For the wide character classification functions this is made visible.
Packit 6c4009
There is a type classification type defined, a function to retrieve this
Packit 6c4009
value for a given class, and a function to test whether a given
Packit 6c4009
character is in this class, using the classification value.  On top of
Packit 6c4009
this the normal character classification functions as used for
Packit 6c4009
@code{char} objects can be defined.
Packit 6c4009
Packit 6c4009
@deftp {Data type} wctype_t
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
The @code{wctype_t} can hold a value which represents a character class.
Packit 6c4009
The only defined way to generate such a value is by using the
Packit 6c4009
@code{wctype} function.
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
This type is defined in @file{wctype.h}.
Packit 6c4009
@end deftp
Packit 6c4009
Packit 6c4009
@deftypefun wctype_t wctype (const char *@var{property})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
@c Although the source code of wctype contains multiple references to
Packit 6c4009
@c the locale, that could each reference different locale_data objects
Packit 6c4009
@c should the global locale object change while active, the compiler can
Packit 6c4009
@c and does combine them all into a single dereference that resolves
Packit 6c4009
@c once to the LCTYPE locale object used throughout the function, so it
Packit 6c4009
@c is safe in (optimized) practice, if not in theory, even when the
Packit 6c4009
@c locale changes.  Ideally we'd explicitly save the resolved
Packit 6c4009
@c locale_data object to make it visibly safe instead of safe only under
Packit 6c4009
@c compiler optimizations, but given the decision that setlocale is
Packit 6c4009
@c MT-Unsafe, all this would afford us would be the ability to not mark
Packit 6c4009
@c this function with @mtslocale.
Packit 6c4009
@code{wctype} returns a value representing a class of wide
Packit 6c4009
characters which is identified by the string @var{property}.  Besides
Packit 6c4009
some standard properties each locale can define its own ones.  In case
Packit 6c4009
no property with the given name is known for the current locale
Packit 6c4009
selected for the @code{LC_CTYPE} category, the function returns zero.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
The properties known in every locale are:
Packit 6c4009
Packit 6c4009
@multitable @columnfractions .25 .25 .25 .25
Packit 6c4009
@item
Packit 6c4009
@code{"alnum"} @tab @code{"alpha"} @tab @code{"cntrl"} @tab @code{"digit"}
Packit 6c4009
@item
Packit 6c4009
@code{"graph"} @tab @code{"lower"} @tab @code{"print"} @tab @code{"punct"}
Packit 6c4009
@item
Packit 6c4009
@code{"space"} @tab @code{"upper"} @tab @code{"xdigit"}
Packit 6c4009
@end multitable
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
This function is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
To test the membership of a character to one of the non-standard classes
Packit 6c4009
the @w{ISO C} standard defines a completely new function.
Packit 6c4009
Packit 6c4009
@deftypefun int iswctype (wint_t @var{wc}, wctype_t @var{desc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
@c The compressed lookup table returned by wctype is read-only.
Packit 6c4009
This function returns a nonzero value if @var{wc} is in the character
Packit 6c4009
class specified by @var{desc}.  @var{desc} must previously be returned
Packit 6c4009
by a successful call to @code{wctype}.
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
This function is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
To make it easier to use the commonly-used classification functions,
Packit 6c4009
they are defined in the C library.  There is no need to use
Packit 6c4009
@code{wctype} if the property string is one of the known character
Packit 6c4009
classes.  In some situations it is desirable to construct the property
Packit 6c4009
strings, and then it is important that @code{wctype} can also handle the
Packit 6c4009
standard classes.
Packit 6c4009
Packit 6c4009
@cindex alphanumeric character
Packit 6c4009
@deftypefun int iswalnum (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
@c The implicit wctype call in the isw* functions is actually an
Packit 6c4009
@c optimized version because the category has a known offset, but the
Packit 6c4009
@c wctype is equally safe when optimized, unsafe with changing locales
Packit 6c4009
@c if not optimized (thus @mtslocale).  Since it's not a macro, we
Packit 6c4009
@c always optimize, and the locale can't change in any MT-Safe way, it's
Packit 6c4009
@c fine.  The test whether wc is ASCII to use the non-wide is*
Packit 6c4009
@c macro/function doesn't bring any other safety issues: the test does
Packit 6c4009
@c not depend on the locale, and each path after the decision resolves
Packit 6c4009
@c the locale object only once.
Packit 6c4009
This function returns a nonzero value if @var{wc} is an alphanumeric
Packit 6c4009
character (a letter or number); in other words, if either @code{iswalpha}
Packit 6c4009
or @code{iswdigit} is true of a character, then @code{iswalnum} is also
Packit 6c4009
true.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("alnum"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex alphabetic character
Packit 6c4009
@deftypefun int iswalpha (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is an alphabetic character (a letter).  If
Packit 6c4009
@code{iswlower} or @code{iswupper} is true of a character, then
Packit 6c4009
@code{iswalpha} is also true.
Packit 6c4009
Packit 6c4009
In some locales, there may be additional characters for which
Packit 6c4009
@code{iswalpha} is true---letters which are neither upper case nor lower
Packit 6c4009
case.  But in the standard @code{"C"} locale, there are no such
Packit 6c4009
additional characters.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("alpha"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex control character
Packit 6c4009
@deftypefun int iswcntrl (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a control character (that is, a character that
Packit 6c4009
is not a printing character).
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("cntrl"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex digit character
Packit 6c4009
@deftypefun int iswdigit (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a digit (e.g., @samp{0} through @samp{9}).
Packit 6c4009
Please note that this function does not only return a nonzero value for
Packit 6c4009
@emph{decimal} digits, but for all kinds of digits.  A consequence is
Packit 6c4009
that code like the following will @strong{not} work unconditionally for
Packit 6c4009
wide characters:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
n = 0;
Packit 6c4009
while (iswdigit (*wc))
Packit 6c4009
  @{
Packit 6c4009
    n *= 10;
Packit 6c4009
    n += *wc++ - L'0';
Packit 6c4009
  @}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("digit"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex graphic character
Packit 6c4009
@deftypefun int iswgraph (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a graphic character; that is, a character
Packit 6c4009
that has a glyph associated with it.  The whitespace characters are not
Packit 6c4009
considered graphic.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("graph"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex lower-case character
Packit 6c4009
@deftypefun int iswlower (wint_t @var{wc})
Packit 6c4009
@standards{ISO, ctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a lower-case letter.  The letter need not be
Packit 6c4009
from the Latin alphabet, any alphabet representable is valid.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("lower"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex printing character
Packit 6c4009
@deftypefun int iswprint (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a printing character.  Printing characters
Packit 6c4009
include all the graphic characters, plus the space (@samp{ }) character.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("print"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex punctuation character
Packit 6c4009
@deftypefun int iswpunct (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a punctuation character.
Packit 6c4009
This means any printing character that is not alphanumeric or a space
Packit 6c4009
character.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("punct"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex whitespace character
Packit 6c4009
@deftypefun int iswspace (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a @dfn{whitespace} character.  In the standard
Packit 6c4009
@code{"C"} locale, @code{iswspace} returns true for only the standard
Packit 6c4009
whitespace characters:
Packit 6c4009
Packit 6c4009
@table @code
Packit 6c4009
@item L' '
Packit 6c4009
space
Packit 6c4009
Packit 6c4009
@item L'\f'
Packit 6c4009
formfeed
Packit 6c4009
Packit 6c4009
@item L'\n'
Packit 6c4009
newline
Packit 6c4009
Packit 6c4009
@item L'\r'
Packit 6c4009
carriage return
Packit 6c4009
Packit 6c4009
@item L'\t'
Packit 6c4009
horizontal tab
Packit 6c4009
Packit 6c4009
@item L'\v'
Packit 6c4009
vertical tab
Packit 6c4009
@end table
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("space"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex upper-case character
Packit 6c4009
@deftypefun int iswupper (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is an upper-case letter.  The letter need not be
Packit 6c4009
from the Latin alphabet, any alphabet representable is valid.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("upper"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@cindex hexadecimal digit character
Packit 6c4009
@deftypefun int iswxdigit (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a hexadecimal digit.
Packit 6c4009
Hexadecimal digits include the normal decimal digits @samp{0} through
Packit 6c4009
@samp{9} and the letters @samp{A} through @samp{F} and
Packit 6c4009
@samp{a} through @samp{f}.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
This function can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
iswctype (wc, wctype ("xdigit"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
It is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@Theglibc{} also provides a function which is not defined in the
Packit 6c4009
@w{ISO C} standard but which is available as a version for single byte
Packit 6c4009
characters as well.
Packit 6c4009
Packit 6c4009
@cindex blank character
Packit 6c4009
@deftypefun int iswblank (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
Returns true if @var{wc} is a blank character; that is, a space or a tab.
Packit 6c4009
This function was originally a GNU extension, but was added in @w{ISO C99}.
Packit 6c4009
It is declared in @file{wchar.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@node Using Wide Char Classes, Wide Character Case Conversion, Classification of Wide Characters, Character Handling
Packit 6c4009
@section Notes on using the wide character classes
Packit 6c4009
Packit 6c4009
The first note is probably not astonishing but still occasionally a
Packit 6c4009
cause of problems.  The @code{isw@var{XXX}} functions can be implemented
Packit 6c4009
using macros and in fact, @theglibc{} does this.  They are still
Packit 6c4009
available as real functions but when the @file{wctype.h} header is
Packit 6c4009
included the macros will be used.  This is the same as the
Packit 6c4009
@code{char} type versions of these functions.
Packit 6c4009
Packit 6c4009
The second note covers something new.  It can be best illustrated by a
Packit 6c4009
(real-world) example.  The first piece of code is an excerpt from the
Packit 6c4009
original code.  It is truncated a bit but the intention should be clear.
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
int
Packit 6c4009
is_in_class (int c, const char *class)
Packit 6c4009
@{
Packit 6c4009
  if (strcmp (class, "alnum") == 0)
Packit 6c4009
    return isalnum (c);
Packit 6c4009
  if (strcmp (class, "alpha") == 0)
Packit 6c4009
    return isalpha (c);
Packit 6c4009
  if (strcmp (class, "cntrl") == 0)
Packit 6c4009
    return iscntrl (c);
Packit 6c4009
  @dots{}
Packit 6c4009
  return 0;
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
Now, with the @code{wctype} and @code{iswctype} you can avoid the
Packit 6c4009
@code{if} cascades, but rewriting the code as follows is wrong:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
int
Packit 6c4009
is_in_class (int c, const char *class)
Packit 6c4009
@{
Packit 6c4009
  wctype_t desc = wctype (class);
Packit 6c4009
  return desc ? iswctype ((wint_t) c, desc) : 0;
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
The problem is that it is not guaranteed that the wide character
Packit 6c4009
representation of a single-byte character can be found using casting.
Packit 6c4009
In fact, usually this fails miserably.  The correct solution to this
Packit 6c4009
problem is to write the code as follows:
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
int
Packit 6c4009
is_in_class (int c, const char *class)
Packit 6c4009
@{
Packit 6c4009
  wctype_t desc = wctype (class);
Packit 6c4009
  return desc ? iswctype (btowc (c), desc) : 0;
Packit 6c4009
@}
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@xref{Converting a Character}, for more information on @code{btowc}.
Packit 6c4009
Note that this change probably does not improve the performance
Packit 6c4009
of the program a lot since the @code{wctype} function still has to make
Packit 6c4009
the string comparisons.  It gets really interesting if the
Packit 6c4009
@code{is_in_class} function is called more than once for the
Packit 6c4009
same class name.  In this case the variable @var{desc} could be computed
Packit 6c4009
once and reused for all the calls.  Therefore the above form of the
Packit 6c4009
function is probably not the final one.
Packit 6c4009
Packit 6c4009
Packit 6c4009
@node Wide Character Case Conversion, , Using Wide Char Classes, Character Handling
Packit 6c4009
@section Mapping of wide characters.
Packit 6c4009
Packit 6c4009
The classification functions are also generalized by the @w{ISO C}
Packit 6c4009
standard.  Instead of just allowing the two standard mappings, a
Packit 6c4009
locale can contain others.  Again, the @code{localedef} program
Packit 6c4009
already supports generating such locale data files.
Packit 6c4009
Packit 6c4009
@deftp {Data Type} wctrans_t
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
This data type is defined as a scalar type which can hold a value
Packit 6c4009
representing the locale-dependent character mapping.  There is no way to
Packit 6c4009
construct such a value apart from using the return value of the
Packit 6c4009
@code{wctrans} function.
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
@noindent
Packit 6c4009
This type is defined in @file{wctype.h}.
Packit 6c4009
@end deftp
Packit 6c4009
Packit 6c4009
@deftypefun wctrans_t wctrans (const char *@var{property})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
@c Similar implementation, same caveats as wctype.
Packit 6c4009
The @code{wctrans} function has to be used to find out whether a named
Packit 6c4009
mapping is defined in the current locale selected for the
Packit 6c4009
@code{LC_CTYPE} category.  If the returned value is non-zero, you can use
Packit 6c4009
it afterwards in calls to @code{towctrans}.  If the return value is
Packit 6c4009
zero no such mapping is known in the current locale.
Packit 6c4009
Packit 6c4009
Beside locale-specific mappings there are two mappings which are
Packit 6c4009
guaranteed to be available in every locale:
Packit 6c4009
Packit 6c4009
@multitable @columnfractions .5 .5
Packit 6c4009
@item
Packit 6c4009
@code{"tolower"} @tab @code{"toupper"}
Packit 6c4009
@end multitable
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
@noindent
Packit 6c4009
These functions are declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun wint_t towctrans (wint_t @var{wc}, wctrans_t @var{desc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
Packit 6c4009
@c Same caveats as iswctype.
Packit 6c4009
@code{towctrans} maps the input character @var{wc}
Packit 6c4009
according to the rules of the mapping for which @var{desc} is a
Packit 6c4009
descriptor, and returns the value it finds.  @var{desc} must be
Packit 6c4009
obtained by a successful call to @code{wctrans}.
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
@noindent
Packit 6c4009
This function is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
For the generally available mappings, the @w{ISO C} standard defines
Packit 6c4009
convenient shortcuts so that it is not necessary to call @code{wctrans}
Packit 6c4009
for them.
Packit 6c4009
Packit 6c4009
@deftypefun wint_t towlower (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
@c Same caveats as iswalnum, just using a wctrans rather than a wctype
Packit 6c4009
@c table.
Packit 6c4009
If @var{wc} is an upper-case letter, @code{towlower} returns the corresponding
Packit 6c4009
lower-case letter.  If @var{wc} is not an upper-case letter,
Packit 6c4009
@var{wc} is returned unchanged.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
@code{towlower} can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
towctrans (wc, wctrans ("tolower"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
@noindent
Packit 6c4009
This function is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
@deftypefun wint_t towupper (wint_t @var{wc})
Packit 6c4009
@standards{ISO, wctype.h}
Packit 6c4009
@safety{@prelim{}@mtsafe{@mtslocale{}}@assafe{}@acsafe{}}
Packit 6c4009
If @var{wc} is a lower-case letter, @code{towupper} returns the corresponding
Packit 6c4009
upper-case letter.  Otherwise @var{wc} is returned unchanged.
Packit 6c4009
Packit 6c4009
@noindent
Packit 6c4009
@code{towupper} can be implemented using
Packit 6c4009
Packit 6c4009
@smallexample
Packit 6c4009
towctrans (wc, wctrans ("toupper"))
Packit 6c4009
@end smallexample
Packit 6c4009
Packit 6c4009
@pindex wctype.h
Packit 6c4009
@noindent
Packit 6c4009
This function is declared in @file{wctype.h}.
Packit 6c4009
@end deftypefun
Packit 6c4009
Packit 6c4009
The same warnings given in the last section for the use of the wide
Packit 6c4009
character classification functions apply here.  It is not possible to
Packit 6c4009
simply cast a @code{char} type value to a @code{wint_t} and use it as an
Packit 6c4009
argument to @code{towctrans} calls.