Blame README_d/README.multibyte

Packit Service f629e6
Fri Jun  3 12:20:17 IDT 2005
Packit Service f629e6
============================
Packit Service f629e6
Packit Service f629e6
As noted in the NEWS file, as of 3.1.5, gawk uses character values instead
Packit Service f629e6
of byte values for `index', `length', `substr' and `match'.  This works
Packit Service f629e6
in multibyte and unicode locales.
Packit Service f629e6
Packit Service f629e6
Wed Jun 18 16:47:31 IDT 2003
Packit Service f629e6
============================
Packit Service f629e6
Packit Service f629e6
Multibyte locales can cause occasional weirdness, in particular with
Packit Service f629e6
ranges inside brackets: /[....]/.  Something that works great for ASCII
Packit Service f629e6
will choke for, e.g., en_US.UTF-8.  One such program is test/gsubtst5.awk.
Packit Service f629e6
Packit Service f629e6
By default, the test suite runs with LC_ALL=C and LANG=C. You
Packit Service f629e6
can change this by doing (from a Bourne-style shell):
Packit Service f629e6
Packit Service f629e6
	$ GAWKLOCALE=some_locale make check
Packit Service f629e6
Packit Service f629e6
Then the test suite will set LC_ALL and LANG to the given locale.
Packit Service f629e6
Packit Service f629e6
As of this writing, this works for en_US.UTF-8, and all tests
Packit Service f629e6
pass except gsubtst5.
Packit Service f629e6
Packit Service f629e6
For the normal case of RS = "\n", the locale is largely irrelevant.
Packit Service f629e6
For other single byte record separators, using LC_ALL=C will give you
Packit Service f629e6
much better performance when reading records.  Otherwise, gawk has to
Packit Service f629e6
make several function calls, *per input character* to find the record
Packit Service f629e6
terminator.  You have been warned.