Blame doc/utf8trans.1

Packit e4b6da
.\" -*- coding: us-ascii -*-
Packit e4b6da
.if \n(.g .ds T< \\FC
Packit e4b6da
.if \n(.g .ds T> \\F[\n[.fam]]
Packit e4b6da
.de URL
Packit e4b6da
\\$2 \(la\\$1\(ra\\$3
Packit e4b6da
..
Packit e4b6da
.if \n(.g .mso www.tmac
Packit e4b6da
.TH utf8trans 1 "3 March 2007" "docbook2X 0.8.8" docbook2X
Packit e4b6da
.SH NAME
Packit e4b6da
utf8trans \- Transliterate UTF-8 characters according to a table
Packit e4b6da
.SH SYNOPSIS
Packit e4b6da
'nh
Packit e4b6da
.fi
Packit e4b6da
.ad l
Packit e4b6da
\fButf8trans\fR \kx
Packit e4b6da
.if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
Packit e4b6da
'in \n(.iu+\nxu
Packit e4b6da
\fIcharmap\fR [\fIfile\fR]\&...
Packit e4b6da
'in \n(.iu-\nxu
Packit e4b6da
.ad b
Packit e4b6da
'hy
Packit e4b6da
.SH DESCRIPTION
Packit e4b6da
\fButf8trans\fR transliterates characters in the specified files (or 
Packit e4b6da
standard input, if they are not specified) and writes the output to
Packit e4b6da
standard output. All input and output is in the UTF-8 encoding. 
Packit e4b6da
.PP
Packit e4b6da
This program is usually used to render characters in Unicode text files
Packit e4b6da
as some markup escapes or ASCII transliterations.
Packit e4b6da
(It is not intended for general charset conversions.)
Packit e4b6da
It provides functionality similar to the character maps
Packit e4b6da
in XSLT 2.0 (XML Stylesheet Language \(en Transformations, version 2.0).
Packit e4b6da
.SH OPTIONS
Packit e4b6da
.TP 
Packit e4b6da
\*(T<\fB\-m\fR\*(T>, \*(T<\fB\-\-modify\fR\*(T>
Packit e4b6da
Modifies the given files in-place with their transliterated output,
Packit e4b6da
instead of sending it to standard output.
Packit e4b6da
Packit e4b6da
This option is useful for efficient transliteration of many files
Packit e4b6da
at once.
Packit e4b6da
.TP 
Packit e4b6da
\*(T<\fB\-\-help\fR\*(T>
Packit e4b6da
Show brief usage information and exit.
Packit e4b6da
.TP 
Packit e4b6da
\*(T<\fB\-\-version\fR\*(T>
Packit e4b6da
Show version and exit.
Packit e4b6da
.SH USAGE
Packit e4b6da
The translation is done according to the rules in the \(oqcharacter
Packit e4b6da
map\(cq, named in the file \fIcharmap\fR. It
Packit e4b6da
has the following format:
Packit e4b6da
.TP 0.4i
Packit e4b6da
1.
Packit e4b6da
Each line represents a translation entry, except for
Packit e4b6da
blank lines and comment lines, which are ignored.
Packit e4b6da
.TP 0.4i
Packit e4b6da
2.
Packit e4b6da
Any amount of whitespace (space or tab) may precede 
Packit e4b6da
the start of an entry.
Packit e4b6da
.TP 0.4i
Packit e4b6da
3.
Packit e4b6da
Comment lines begin with \*(T<#\*(T>.
Packit e4b6da
Everything on the same line is ignored.
Packit e4b6da
.TP 0.4i
Packit e4b6da
4.
Packit e4b6da
Each entry consists of the Unicode codepoint of the
Packit e4b6da
character to translate, in hexadecimal, followed
Packit e4b6da
\fIone\fR space or tab, followed by the translation
Packit e4b6da
string, up to the end of the line.
Packit e4b6da
.TP 0.4i
Packit e4b6da
5.
Packit e4b6da
The translation string is taken literally, including any
Packit e4b6da
leading and trailing spaces (except the delimeter between the codepoint
Packit e4b6da
and the translation string), and all types of characters. The newline
Packit e4b6da
at the end is not included. 
Packit e4b6da
.PP
Packit e4b6da
The above format is intended to be restrictive, to keep
Packit e4b6da
\fButf8trans\fR simple. But if a XML-based format is desired,
Packit e4b6da
there is a \*(T<\fIxmlcharmap2utf8trans\fR\*(T> script that 
Packit e4b6da
comes with the docbook2X distribution, that converts character
Packit e4b6da
maps in XSLT 2.0 format to the \fButf8trans\fR format.
Packit e4b6da
.SH LIMITATIONS
Packit e4b6da
.TP 0.2i
Packit e4b6da
\(bu
Packit e4b6da
\fButf8trans\fR does not work with binary files, because malformed
Packit e4b6da
UTF-8 sequences in the input are substituted with
Packit e4b6da
U+FFFD characters. However, null characters in the input
Packit e4b6da
are handled correctly. This limitation may be removed in the future.
Packit e4b6da
.TP 0.2i
Packit e4b6da
\(bu
Packit e4b6da
There is no way to include a newline or null in the substitution string.
Packit e4b6da
.SH AUTHOR
Packit e4b6da
Steve Cheng <\*(T<stevecheng@users.sourceforge.net\*(T>>.