|
Packit |
e4b6da |
.\" -*- coding: us-ascii -*-
|
|
Packit |
e4b6da |
.if \n(.g .ds T< \\FC
|
|
Packit |
e4b6da |
.if \n(.g .ds T> \\F[\n[.fam]]
|
|
Packit |
e4b6da |
.de URL
|
|
Packit |
e4b6da |
\\$2 \(la\\$1\(ra\\$3
|
|
Packit |
e4b6da |
..
|
|
Packit |
e4b6da |
.if \n(.g .mso www.tmac
|
|
Packit |
e4b6da |
.TH utf8trans 1 "3 March 2007" "docbook2X 0.8.8" docbook2X
|
|
Packit |
e4b6da |
.SH NAME
|
|
Packit |
e4b6da |
utf8trans \- Transliterate UTF-8 characters according to a table
|
|
Packit |
e4b6da |
.SH SYNOPSIS
|
|
Packit |
e4b6da |
'nh
|
|
Packit |
e4b6da |
.fi
|
|
Packit |
e4b6da |
.ad l
|
|
Packit |
e4b6da |
\fButf8trans\fR \kx
|
|
Packit |
e4b6da |
.if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
|
|
Packit |
e4b6da |
'in \n(.iu+\nxu
|
|
Packit |
e4b6da |
\fIcharmap\fR [\fIfile\fR]\&...
|
|
Packit |
e4b6da |
'in \n(.iu-\nxu
|
|
Packit |
e4b6da |
.ad b
|
|
Packit |
e4b6da |
'hy
|
|
Packit |
e4b6da |
.SH DESCRIPTION
|
|
Packit |
e4b6da |
\fButf8trans\fR transliterates characters in the specified files (or
|
|
Packit |
e4b6da |
standard input, if they are not specified) and writes the output to
|
|
Packit |
e4b6da |
standard output. All input and output is in the UTF-8 encoding.
|
|
Packit |
e4b6da |
.PP
|
|
Packit |
e4b6da |
This program is usually used to render characters in Unicode text files
|
|
Packit |
e4b6da |
as some markup escapes or ASCII transliterations.
|
|
Packit |
e4b6da |
(It is not intended for general charset conversions.)
|
|
Packit |
e4b6da |
It provides functionality similar to the character maps
|
|
Packit |
e4b6da |
in XSLT 2.0 (XML Stylesheet Language \(en Transformations, version 2.0).
|
|
Packit |
e4b6da |
.SH OPTIONS
|
|
Packit |
e4b6da |
.TP
|
|
Packit |
e4b6da |
\*(T<\fB\-m\fR\*(T>, \*(T<\fB\-\-modify\fR\*(T>
|
|
Packit |
e4b6da |
Modifies the given files in-place with their transliterated output,
|
|
Packit |
e4b6da |
instead of sending it to standard output.
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
This option is useful for efficient transliteration of many files
|
|
Packit |
e4b6da |
at once.
|
|
Packit |
e4b6da |
.TP
|
|
Packit |
e4b6da |
\*(T<\fB\-\-help\fR\*(T>
|
|
Packit |
e4b6da |
Show brief usage information and exit.
|
|
Packit |
e4b6da |
.TP
|
|
Packit |
e4b6da |
\*(T<\fB\-\-version\fR\*(T>
|
|
Packit |
e4b6da |
Show version and exit.
|
|
Packit |
e4b6da |
.SH USAGE
|
|
Packit |
e4b6da |
The translation is done according to the rules in the \(oqcharacter
|
|
Packit |
e4b6da |
map\(cq, named in the file \fIcharmap\fR. It
|
|
Packit |
e4b6da |
has the following format:
|
|
Packit |
e4b6da |
.TP 0.4i
|
|
Packit |
e4b6da |
1.
|
|
Packit |
e4b6da |
Each line represents a translation entry, except for
|
|
Packit |
e4b6da |
blank lines and comment lines, which are ignored.
|
|
Packit |
e4b6da |
.TP 0.4i
|
|
Packit |
e4b6da |
2.
|
|
Packit |
e4b6da |
Any amount of whitespace (space or tab) may precede
|
|
Packit |
e4b6da |
the start of an entry.
|
|
Packit |
e4b6da |
.TP 0.4i
|
|
Packit |
e4b6da |
3.
|
|
Packit |
e4b6da |
Comment lines begin with \*(T<#\*(T>.
|
|
Packit |
e4b6da |
Everything on the same line is ignored.
|
|
Packit |
e4b6da |
.TP 0.4i
|
|
Packit |
e4b6da |
4.
|
|
Packit |
e4b6da |
Each entry consists of the Unicode codepoint of the
|
|
Packit |
e4b6da |
character to translate, in hexadecimal, followed
|
|
Packit |
e4b6da |
\fIone\fR space or tab, followed by the translation
|
|
Packit |
e4b6da |
string, up to the end of the line.
|
|
Packit |
e4b6da |
.TP 0.4i
|
|
Packit |
e4b6da |
5.
|
|
Packit |
e4b6da |
The translation string is taken literally, including any
|
|
Packit |
e4b6da |
leading and trailing spaces (except the delimeter between the codepoint
|
|
Packit |
e4b6da |
and the translation string), and all types of characters. The newline
|
|
Packit |
e4b6da |
at the end is not included.
|
|
Packit |
e4b6da |
.PP
|
|
Packit |
e4b6da |
The above format is intended to be restrictive, to keep
|
|
Packit |
e4b6da |
\fButf8trans\fR simple. But if a XML-based format is desired,
|
|
Packit |
e4b6da |
there is a \*(T<\fIxmlcharmap2utf8trans\fR\*(T> script that
|
|
Packit |
e4b6da |
comes with the docbook2X distribution, that converts character
|
|
Packit |
e4b6da |
maps in XSLT 2.0 format to the \fButf8trans\fR format.
|
|
Packit |
e4b6da |
.SH LIMITATIONS
|
|
Packit |
e4b6da |
.TP 0.2i
|
|
Packit |
e4b6da |
\(bu
|
|
Packit |
e4b6da |
\fButf8trans\fR does not work with binary files, because malformed
|
|
Packit |
e4b6da |
UTF-8 sequences in the input are substituted with
|
|
Packit |
e4b6da |
U+FFFD characters. However, null characters in the input
|
|
Packit |
e4b6da |
are handled correctly. This limitation may be removed in the future.
|
|
Packit |
e4b6da |
.TP 0.2i
|
|
Packit |
e4b6da |
\(bu
|
|
Packit |
e4b6da |
There is no way to include a newline or null in the substitution string.
|
|
Packit |
e4b6da |
.SH AUTHOR
|
|
Packit |
e4b6da |
Steve Cheng <\*(T<stevecheng@users.sourceforge.net\*(T>>.
|