Blame NKF.mod/NKF.pm

Packit 5b6b86
# Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).
Packit 5b6b86
# Copyright (c) 1996-2015, The nkf Project.
Packit 5b6b86
# All rights reserved.
Packit 5b6b86
#
Packit 5b6b86
# This software is provided 'as-is', without any express or implied
Packit 5b6b86
# warranty. In no event will the authors be held liable for any damages
Packit 5b6b86
# arising from the use of this software.
Packit 5b6b86
#
Packit 5b6b86
# Permission is granted to anyone to use this software for any purpose,
Packit 5b6b86
# including commercial applications, and to alter it and redistribute it
Packit 5b6b86
# freely, subject to the following restrictions:
Packit 5b6b86
#
Packit 5b6b86
# 1. The origin of this software must not be misrepresented; you must not
Packit 5b6b86
# claim that you wrote the original software. If you use this software
Packit 5b6b86
# in a product, an acknowledgment in the product documentation would be
Packit 5b6b86
# appreciated but is not required.
Packit 5b6b86
#
Packit 5b6b86
# 2. Altered source versions must be plainly marked as such, and must not be
Packit 5b6b86
# misrepresented as being the original software.
Packit 5b6b86
#
Packit 5b6b86
# 3. This notice may not be removed or altered from any source distribution.
Packit 5b6b86
Packit 5b6b86
package NKF;
Packit 5b6b86
Packit 5b6b86
use strict;
Packit 5b6b86
use vars qw($VERSION @ISA @EXPORT @EXPORT_OK);
Packit 5b6b86
Packit 5b6b86
require Exporter;
Packit 5b6b86
require DynaLoader;
Packit 5b6b86
Packit 5b6b86
@ISA = qw(Exporter DynaLoader);
Packit 5b6b86
# Items to export into callers namespace by default. Note: do not export
Packit 5b6b86
# names by default without a very good reason. Use EXPORT_OK instead.
Packit 5b6b86
# Do not simply export all your public functions/methods/constants.
Packit 5b6b86
@EXPORT = qw(
Packit 5b6b86
	nkf	nkf_continue	inputcode
Packit 5b6b86
);
Packit 5b6b86
$VERSION = '2.14';
Packit 5b6b86
Packit 5b6b86
bootstrap NKF $VERSION;
Packit 5b6b86
Packit 5b6b86
# Preloaded methods go here.
Packit 5b6b86
Packit 5b6b86
# Autoload methods go after =cut, and are processed by the autosplit program.
Packit 5b6b86
Packit 5b6b86
1;
Packit 5b6b86
__END__
Packit 5b6b86
Packit 5b6b86
#
Packit 5b6b86
# =begin から =begin COMMAND までは Perl/NKF のドキュメント
Packit 5b6b86
# =begin COMMAND から =end までは nkf コマンドのドキュメント
Packit 5b6b86
# 
Packit 5b6b86
Packit 5b6b86
=head1 NAME
Packit 5b6b86
Packit Service f8f26a
=begin
Packit 5b6b86
Packit 5b6b86
NKF - Perl extension for Network Kanji Filter
Packit 5b6b86
Packit Service f8f26a
=begin COMMAND
Packit Service f8f26a
Packit Service f8f26a
nkf - Network Kanji Filter
Packit Service f8f26a
Packit Service f8f26a
=end
Packit 5b6b86
Packit 5b6b86
=head1 SYNOPSIS
Packit 5b6b86
Packit Service f8f26a
=begin
Packit 5b6b86
Packit 5b6b86
  use NKF;
Packit 5b6b86
  $output = nkf("-s",$input);
Packit 5b6b86
Packit Service f8f26a
=begin COMMAND
Packit Service f8f26a
Packit Service f8f26a
nkf B<[-butjnesliohrTVvwWJESZxXFfmMBOcdILg]> B<[>I<file ...>B<]>
Packit Service f8f26a
Packit Service f8f26a
=end
Packit 5b6b86
Packit 5b6b86
=head1 DESCRIPTION
Packit 5b6b86
Packit Service f8f26a
=begin
Packit 5b6b86
Packit 5b6b86
This is a Perl Extension version of nkf (Network Kanji Filter).
Packit 5b6b86
It converts the last argument and return converted result. Conversion
Packit 5b6b86
details are specified by flags before the last argument.
Packit 5b6b86
Packit Service f8f26a
=end
Packit 5b6b86
Packit 5b6b86
B<Nkf> is a yet another kanji code converter among networks, hosts and terminals.
Packit 5b6b86
It converts input kanji code to designated kanji code
Packit 5b6b86
such as ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 or UTF-32.
Packit 5b6b86
Packit 5b6b86
One of the most unique faculty of B<nkf> is the guess of the input kanji encodings.
Packit 5b6b86
It currently recognizes ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8, UTF-16 and UTF-32.
Packit 5b6b86
So users needn't set the input kanji code explicitly.
Packit 5b6b86
Packit 5b6b86
By default, X0201 kana is converted into X0208 kana.
Packit 5b6b86
For X0201 kana, SO/SI, SSO and ESC-(-I methods are supported.
Packit 5b6b86
For automatic code detection, nkf assumes no X0201 kana in Shift_JIS.
Packit 5b6b86
To accept X0201 in Shift_JIS, use B<-X>, B<-x> or B<-S>.
Packit 5b6b86
Packit 5b6b86
multiple options are specifed as seprate strings, such as
Packit 5b6b86
Packit 5b6b86
  print nkf('--ic=UTF8-MAC', '-w', $string), "\n";
Packit 5b6b86
Packit 5b6b86
except the last arguments.
Packit 5b6b86
Packit 5b6b86
=head1 OPTIONS
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-J -S -E -W -W16 -W32 -j -s -e -w -w16 -w32>
Packit 5b6b86
Packit 5b6b86
Specify input and output encodings. Upper case is input.
Packit 5b6b86
cf. --ic and --oc.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-J>
Packit 5b6b86
Packit 5b6b86
ISO-2022-JP (JIS code).
Packit 5b6b86
Packit 5b6b86
=item B<-S>
Packit 5b6b86
Packit 5b6b86
Shift_JIS and JIS X 0201 kana.
Packit 5b6b86
EUC-JP is recognized as X0201 kana. Without B<-x> flag,
Packit 5b6b86
JIS X 0201 Katakana (a.k.a.halfwidth kana) is converted into JIS X 0208.
Packit 5b6b86
If you use Windows, see Windows-31J (CP932).
Packit 5b6b86
Packit 5b6b86
=item B<-E>
Packit 5b6b86
Packit 5b6b86
EUC-JP.
Packit 5b6b86
Packit 5b6b86
=item B<-W>
Packit 5b6b86
Packit 5b6b86
UTF-8N.
Packit 5b6b86
Packit 5b6b86
=item B<-W16[BL][0]>
Packit 5b6b86
Packit 5b6b86
UTF-16.
Packit 5b6b86
B or L gives whether Big Endian or Little Endian.
Packit 5b6b86
0 gives whther put BOM or not.
Packit 5b6b86
Packit 5b6b86
=item B<-W32[BL][0]>
Packit 5b6b86
Packit 5b6b86
UTF-32.
Packit 5b6b86
B or L gives whether Big Endian or Little Endian.
Packit 5b6b86
0 gives whther put BOM or not.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-b -u>
Packit 5b6b86
Packit 5b6b86
Output is buffered (DEFAULT), Output is unbuffered.
Packit 5b6b86
Packit 5b6b86
=item B<-t>
Packit 5b6b86
Packit 5b6b86
No conversion.
Packit 5b6b86
Packit 5b6b86
=item B<-i[@B]>
Packit 5b6b86
Packit 5b6b86
Specify the escape sequence for JIS X 0208.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-i@>
Packit 5b6b86
Packit 5b6b86
Use ESC ( @. (JIS X 0208-1978)
Packit 5b6b86
Packit 5b6b86
=item B<-iB>
Packit 5b6b86
Packit 5b6b86
Use ESC ( B. (JIS X 0208-1983/1990 DEFAULT)
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-o[BJ]>
Packit 5b6b86
Packit 5b6b86
Specify the escape sequence for US-ASCII/JIS X 0201 Roman. (DEFAULT B)
Packit 5b6b86
Packit 5b6b86
=item B<-r>
Packit 5b6b86
Packit 5b6b86
{de/en}crypt ROT13/47
Packit 5b6b86
Packit 5b6b86
=item B<-h[123] --hiragana --katakana --katakana-hiragana>
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-h1 --hiragana>
Packit 5b6b86
Packit 5b6b86
Katakana to Hiragana conversion.
Packit 5b6b86
Packit 5b6b86
=item B<-h2 --katakana>
Packit 5b6b86
Packit 5b6b86
Hiragana to Katakana conversion.
Packit 5b6b86
Packit 5b6b86
=item B<-h3 --katakana-hiragana>
Packit 5b6b86
Packit 5b6b86
Katakana to Hiragana and Hiragana to Katakana conversion.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-T>
Packit 5b6b86
Packit 5b6b86
Text mode output (MS-DOS)
Packit 5b6b86
Packit 5b6b86
=item B<-f[I<m> [- I<n>]]>
Packit 5b6b86
Packit 5b6b86
Folding on I<m> length with I<n> margin in a line.
Packit 5b6b86
Without this option, fold length is 60 and fold margin is 10.
Packit 5b6b86
Packit 5b6b86
=item B<-F>
Packit 5b6b86
Packit 5b6b86
New line preserving line folding.
Packit 5b6b86
Packit 5b6b86
=item B<-Z[0-3]>
Packit 5b6b86
Packit 5b6b86
Convert X0208 alphabet (Fullwidth Alphabets) to ASCII.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-Z -Z0>
Packit 5b6b86
Packit 5b6b86
Convert X0208 alphabet to ASCII.
Packit 5b6b86
Packit 5b6b86
=item B<-Z1>
Packit 5b6b86
Packit 5b6b86
Convert X0208 kankaku to single ASCII space.
Packit 5b6b86
Packit 5b6b86
=item B<-Z2>
Packit 5b6b86
Packit 5b6b86
Convert X0208 kankaku to double ASCII spaces.
Packit 5b6b86
Packit 5b6b86
=item B<-Z3>
Packit 5b6b86
Packit 5b6b86
Replacing fullwidth >, <, ", & into '>', '<', '"', '&' as in HTML.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-X -x>
Packit 5b6b86
Packit 5b6b86
With B<-X> or without this option, X0201 is converted into X0208 Kana.
Packit 5b6b86
With B<-x>, try to preserve X0208 kana and do not convert X0201 kana to X0208.
Packit 5b6b86
In JIS output, ESC-(-I is used. In EUC output, SS2 is used.
Packit 5b6b86
Packit 5b6b86
=item B<-B[0-2]>
Packit 5b6b86
Packit 5b6b86
Assume broken JIS-Kanji input, which lost ESC.
Packit 5b6b86
Useful when your site is using old B-News Nihongo patch.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-B1>
Packit 5b6b86
Packit 5b6b86
allows any chars after ESC-( or ESC-$.
Packit 5b6b86
Packit 5b6b86
=item B<-B2>
Packit 5b6b86
Packit 5b6b86
force ASCII after NL.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-I>
Packit 5b6b86
Packit 5b6b86
Replacing non iso-2022-jp char into a geta character
Packit 5b6b86
(substitute character in Japanese).
Packit 5b6b86
Packit 5b6b86
=item B<-m[BQN0]>
Packit 5b6b86
Packit 5b6b86
MIME ISO-2022-JP/ISO8859-1 decode. (DEFAULT)
Packit 5b6b86
To see ISO8859-1 (Latin-1) -l is necessary.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-mB>
Packit 5b6b86
Packit 5b6b86
Decode MIME base64 encoded stream. Remove header or other part before
Packit 5b6b86
conversion. 
Packit 5b6b86
Packit 5b6b86
=item B<-mQ>
Packit 5b6b86
Packit 5b6b86
Decode MIME quoted stream. '_' in quoted stream is converted to space.
Packit 5b6b86
Packit 5b6b86
=item B<-mN>
Packit 5b6b86
Packit 5b6b86
Non-strict decoding.
Packit 5b6b86
It allows line break in the middle of the base64 encoding.
Packit 5b6b86
Packit 5b6b86
=item B<-m0>
Packit 5b6b86
Packit 5b6b86
No MIME decode.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-M>
Packit 5b6b86
Packit 5b6b86
MIME encode. Header style. All ASCII code and control characters are intact.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-MB>
Packit 5b6b86
Packit 5b6b86
MIME encode Base64 stream.
Packit 5b6b86
Kanji conversion is performed before encoding, so this cannot be used as a picture encoder.
Packit 5b6b86
Packit 5b6b86
=item B<-MQ>
Packit 5b6b86
Packit 5b6b86
Perform quoted encoding.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<-l>
Packit 5b6b86
Packit 5b6b86
Input and output code is ISO8859-1 (Latin-1) and ISO-2022-JP.
Packit 5b6b86
B<-s>, B<-e> and B<-x> are not compatible with this option.
Packit 5b6b86
Packit 5b6b86
=item B<-L[uwm] -d -c>
Packit 5b6b86
Packit 5b6b86
Convert line breaks.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item B<-Lu -d>
Packit 5b6b86
Packit 5b6b86
unix (LF)
Packit 5b6b86
Packit 5b6b86
=item B<-Lw -c>
Packit 5b6b86
Packit 5b6b86
windows (CRLF)
Packit 5b6b86
Packit 5b6b86
=item B<-Lm>
Packit 5b6b86
Packit 5b6b86
mac (CR)
Packit 5b6b86
Packit 5b6b86
Without this option, nkf doesn't convert line breaks.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<--fj --unix --mac --msdos --windows>
Packit 5b6b86
Packit 5b6b86
Convert for these systems.
Packit 5b6b86
Packit 5b6b86
=item B<--jis --euc --sjis --mime --base64>
Packit 5b6b86
Packit 5b6b86
Convert to named code.
Packit 5b6b86
Packit 5b6b86
=item B<--jis-input --euc-input --sjis-input --mime-input --base64-input>
Packit 5b6b86
Packit 5b6b86
Assume input system
Packit 5b6b86
Packit 5b6b86
=item B<--ic=I<input codeset> --oc=I<output codeset>>
Packit 5b6b86
Packit 5b6b86
Set the input or output codeset.
Packit 5b6b86
NKF supports following codesets and those codeset names are case insensitive.
Packit 5b6b86
Packit 5b6b86
=over
Packit 5b6b86
Packit 5b6b86
=item ISO-2022-JP
Packit 5b6b86
Packit 5b6b86
a.k.a. RFC1468, 7bit JIS, JUNET
Packit 5b6b86
Packit 5b6b86
=item EUC-JP (eucJP-nkf)
Packit 5b6b86
Packit 5b6b86
a.k.a. AT&T JIS, Japanese EUC, UJIS
Packit 5b6b86
Packit 5b6b86
=item eucJP-ascii
Packit 5b6b86
Packit 5b6b86
=item eucJP-ms
Packit 5b6b86
Packit 5b6b86
=item CP51932
Packit 5b6b86
Packit 5b6b86
Microsoft Version of EUC-JP.
Packit 5b6b86
Packit 5b6b86
=item Shift_JIS
Packit 5b6b86
Packit 5b6b86
a.k.a. SJIS, MS_Kanji
Packit 5b6b86
Packit 5b6b86
=item Windows-31J
Packit 5b6b86
Packit 5b6b86
a.k.a. CP932
Packit 5b6b86
Packit 5b6b86
=item UTF-8
Packit 5b6b86
Packit 5b6b86
same as UTF-8N
Packit 5b6b86
Packit 5b6b86
=item UTF-8N
Packit 5b6b86
Packit 5b6b86
UTF-8 without BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-8-BOM
Packit 5b6b86
Packit 5b6b86
UTF-8 with BOM
Packit 5b6b86
Packit 5b6b86
=item UTF8-MAC (input only)
Packit 5b6b86
Packit 5b6b86
decomposed UTF-8
Packit 5b6b86
Packit 5b6b86
=item UTF-16
Packit 5b6b86
Packit 5b6b86
same as UTF-16BE
Packit 5b6b86
Packit 5b6b86
=item UTF-16BE
Packit 5b6b86
Packit 5b6b86
UTF-16 Big Endian without BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-16BE-BOM
Packit 5b6b86
Packit 5b6b86
UTF-16 Big Endian with BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-16LE
Packit 5b6b86
Packit 5b6b86
UTF-16 Little Endian without BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-16LE-BOM
Packit 5b6b86
Packit 5b6b86
UTF-16 Little Endian with BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-32
Packit 5b6b86
Packit 5b6b86
same as UTF-32BE
Packit 5b6b86
Packit 5b6b86
=item UTF-32BE
Packit 5b6b86
Packit 5b6b86
UTF-32 Big Endian without BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-32BE-BOM
Packit 5b6b86
Packit 5b6b86
UTF-32 Big Endian with BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-32LE
Packit 5b6b86
Packit 5b6b86
UTF-32 Little Endian without BOM
Packit 5b6b86
Packit 5b6b86
=item UTF-32LE-BOM
Packit 5b6b86
Packit 5b6b86
UTF-32 Little Endian with BOM
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=item B<--fb-{skip, html, xml, perl, java, subchar}>
Packit 5b6b86
Packit 5b6b86
Specify the way that nkf handles unassigned characters.
Packit 5b6b86
Without this option, --fb-skip is assumed.
Packit 5b6b86
Packit 5b6b86
=item B<--prefix=I<escape character>I<target character>..>
Packit 5b6b86
Packit 5b6b86
When nkf converts to Shift_JIS,
Packit 5b6b86
nkf adds a specified escape character to specified 2nd byte of Shift_JIS characters.
Packit 5b6b86
1st byte of argument is the escape character and following bytes are target characters.
Packit 5b6b86
Packit 5b6b86
=item B<--no-cp932ext>
Packit 5b6b86
Packit 5b6b86
Handle the characters extended in CP932 as unassigned characters.
Packit 5b6b86
Packit 5b6b86
=item B<--no-best-fit-chars>
Packit 5b6b86
Packit 5b6b86
When Unicode to Encoded byte conversion,
Packit 5b6b86
don't convert characters which is not round trip safe.
Packit 5b6b86
When Unicode to Unicode conversion,
Packit 5b6b86
with this and -x option, nkf can be used as UTF converter.
Packit 5b6b86
(In other words, without this and -x option, nkf doesn't save some characters)
Packit 5b6b86
Packit 5b6b86
When nkf converts strings that related to path, you should use this opion.
Packit 5b6b86
Packit 5b6b86
=item B<--cap-input>
Packit 5b6b86
Packit 5b6b86
Decode hex encoded characters.
Packit 5b6b86
Packit 5b6b86
=item B<--url-input>
Packit 5b6b86
Packit 5b6b86
Unescape percent escaped characters.
Packit 5b6b86
Packit 5b6b86
=item B<--numchar-input>
Packit 5b6b86
Packit 5b6b86
Decode character reference, such as "&#....;".
Packit 5b6b86
Packit Service f8f26a
=begin COMMAND
Packit Service f8f26a
Packit Service f8f26a
=item B<--in-place[=>I<SUFFIX>B<]>  B<--overwrite[=>I<SUFFIX>B<]>
Packit Service f8f26a
Packit Service f8f26a
Overwrite B<original> listed files by filtered result.
Packit Service f8f26a
Packit Service f8f26a
B<Note> --overwrite preserves timestamps of original files.
Packit Service f8f26a
Packit Service f8f26a
=item B<--guess=[12]>
Packit Service f8f26a
Packit Service f8f26a
Print guessed encoding and newline. (2 is default, 1 is only encoding)
Packit Service f8f26a
Packit Service f8f26a
=item B<--help>
Packit Service f8f26a
Packit Service f8f26a
Print nkf's help.
Packit Service f8f26a
Packit Service f8f26a
=item B<--version>
Packit Service f8f26a
Packit Service f8f26a
Print nkf's version.
Packit Service f8f26a
Packit Service f8f26a
=end
Packit Service f8f26a
Packit 5b6b86
=item B<-->
Packit 5b6b86
Packit 5b6b86
Ignore rest of -option.
Packit 5b6b86
Packit 5b6b86
=back
Packit 5b6b86
Packit 5b6b86
=head1 AUTHOR
Packit 5b6b86
Packit 5b6b86
Copyright (c) 1987, Fujitsu LTD. (Itaru ICHIKAWA).
Packit 5b6b86
Packit 5b6b86
Copyright (c) 1996-2015, The nkf Project.
Packit 5b6b86
Packit Service f8f26a
=begin
Packit 5b6b86
Packit 5b6b86
=head1 SEE ALSO
Packit 5b6b86
Packit 5b6b86
perl(1).   nkf(1)
Packit 5b6b86
Packit Service f8f26a
=end
Packit 5b6b86
Packit 5b6b86
=cut