|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
Packit |
e4b6da |
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
Packit |
e4b6da |
<head>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"text/html; charset=us-ascii" />
|
|
Packit |
e4b6da |
<title>docbook2X: Character set conversion</title>
|
|
Packit |
e4b6da |
<link rel="stylesheet" href="docbook2X.css" type="text/css" />
|
|
Packit |
e4b6da |
<link rev="made" href="mailto:stevecheng@users.sourceforge.net" />
|
|
Packit |
e4b6da |
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1" />
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"Discussion on reproducing non-ASCII characters in the converted output" />
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"docbook2X: Documentation Table of Contents" />
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"docbook2X: Documentation Table of Contents" />
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"docbook2X: sgml2xml-isoent" />
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"docbook2X: utf8trans" />
|
|
Packit |
e4b6da |
</head>
|
|
Packit |
e4b6da |
<body>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
Character set conversion
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"sgml2xml-isoent.html"><< Previous
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"utf8trans.html">Next >>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
Character
|
|
Packit |
e4b6da |
set conversion
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"id2537983" class="indexterm" name="id2537983">
|
|
Packit |
e4b6da |
"id2537990" class="indexterm" name="id2537990">
|
|
Packit |
e4b6da |
"id2537997" class="indexterm" name="id2537997">
|
|
Packit |
e4b6da |
"id2538612" class="indexterm" name="id2538612">
|
|
Packit |
e4b6da |
"id2538619" class="indexterm" name="id2538619">
|
|
Packit |
e4b6da |
"id2538625" class="indexterm" name="id2538625">
|
|
Packit |
e4b6da |
"id2538632" class="indexterm" name="id2538632">
|
|
Packit |
e4b6da |
"id2538639" class="indexterm" name="id2538639">
|
|
Packit |
e4b6da |
"id2538649" class="indexterm" name="id2538649">
|
|
Packit |
e4b6da |
"id2538656" class="indexterm" name="id2538656">
|
|
Packit |
e4b6da |
When translating XML to legacy ASCII-based formats with poor
|
|
Packit |
e4b6da |
support for Unicode, such as man pages and Texinfo, there is always
|
|
Packit |
e4b6da |
the problem that Unicode characters in the source document also
|
|
Packit |
e4b6da |
have to be translated somehow.
|
|
Packit |
e4b6da |
A straightforward character set conversion from Unicode does not
|
|
Packit |
e4b6da |
suffice, because the target character set, usually US-ASCII or ISO
|
|
Packit |
e4b6da |
Latin-1, do not contain common characters such as dashes and
|
|
Packit |
e4b6da |
directional quotation marks that are widely used in XML documents.
|
|
Packit |
e4b6da |
But document formatters (man and Texinfo) allow such characters to
|
|
Packit |
e4b6da |
be entered by a markup escape: for example,
|
|
Packit |
e4b6da |
"markup">\(lq for the left directional quote
|
|
Packit |
e4b6da |
"literal">“. And if a markup-level escape is not
|
|
Packit |
e4b6da |
available, an ASCII transliteration might be used: for example,
|
|
Packit |
e4b6da |
using the ASCII less-than sign < for
|
|
Packit |
e4b6da |
the angle quotation mark 〈.
|
|
Packit |
e4b6da |
So the Unicode character problem can be solved in two steps:
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"command">utf8trans, a program included in
|
|
Packit |
e4b6da |
docbook2X, maps Unicode characters to markup-level escapes or
|
|
Packit |
e4b6da |
transliterations.
|
|
Packit |
e4b6da |
Since there is not necessarily a fixed, official mapping of
|
|
Packit |
e4b6da |
Unicode characters,
|
|
Packit |
e4b6da |
"command">utf8trans can read in user-modifiable
|
|
Packit |
e4b6da |
character mappings expressed in text files and apply them. (Unlike
|
|
Packit |
e4b6da |
most character set converters.)
|
|
Packit |
e4b6da |
In charmaps/man/roff.charmap and
|
|
Packit |
e4b6da |
charmaps/man/texi.charmap are
|
|
Packit |
e4b6da |
character maps that may be used for man-page and Texinfo
|
|
Packit |
e4b6da |
conversion. The programs
|
|
Packit |
e4b6da |
"db2x_manxml.html">
|
|
Packit |
e4b6da |
"command">db2x_manxml and
|
|
Packit |
e4b6da |
"db2x_texixml.html">
|
|
Packit |
e4b6da |
"command">db2x_texixml will apply these
|
|
Packit |
e4b6da |
character maps, or another character map specified by the user,
|
|
Packit |
e4b6da |
automatically.
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
The rest of the Unicode text is converted to some other
|
|
Packit |
e4b6da |
character set (encoding). For example, a French document with
|
|
Packit |
e4b6da |
accented characters (such as é )
|
|
Packit |
e4b6da |
might be converted to ISO Latin 1.
|
|
Packit |
e4b6da |
This step is applied after
|
|
Packit |
e4b6da |
"command">utf8trans character mapping, using the
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"refentrytitle">
|
|
Packit |
e4b6da |
"command">iconv encoding conversion
|
|
Packit |
e4b6da |
tool. Both
|
|
Packit |
e4b6da |
"command">db2x_manxml and
|
|
Packit |
e4b6da |
"db2x_texixml.html">
|
|
Packit |
e4b6da |
"command">db2x_texixml can call
|
|
Packit |
e4b6da |
"citerefentry">
|
|
Packit |
e4b6da |
"command">iconv automatically when
|
|
Packit |
e4b6da |
producing their output.
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"sgml2xml-isoent.html"><< Previous
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"utf8trans.html">Next >>
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"command">sgml2xml-isoent
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"docbook2X.html">Table of Contents
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
utf8trans
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
|
|
Packit |
e4b6da |
"http://docbook2x.sourceforge.net/" title=
|
|
Packit |
e4b6da |
"docbook2X: Home page">docbook2X home page
|
|
Packit |
e4b6da |
</body>
|
|
Packit |
e4b6da |
</html>
|