Blame doc/charsets.html

Packit e4b6da
Packit e4b6da
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Packit e4b6da
<html xmlns="http://www.w3.org/1999/xhtml">
Packit e4b6da
<head>
Packit e4b6da
Packit e4b6da
"HTML Tidy for Linux/x86 (vers 1 September 2005), see www.w3.org" />
Packit e4b6da
Packit e4b6da
"text/html; charset=us-ascii" />
Packit e4b6da
<title>docbook2X: Character set conversion</title>
Packit e4b6da
<link rel="stylesheet" href="docbook2X.css" type="text/css" />
Packit e4b6da
<link rev="made" href="mailto:stevecheng@users.sourceforge.net" />
Packit e4b6da
<meta name="generator" content="DocBook XSL Stylesheets V1.68.1" />
Packit e4b6da
Packit e4b6da
"Discussion on reproducing non-ASCII characters in the   converted output" />
Packit e4b6da
Packit e4b6da
"docbook2X: Documentation Table of Contents" />
Packit e4b6da
Packit e4b6da
"docbook2X: Documentation Table of Contents" />
Packit e4b6da
Packit e4b6da
"docbook2X: sgml2xml-isoent" />
Packit e4b6da
Packit e4b6da
"docbook2X: utf8trans" />
Packit e4b6da
</head>
Packit e4b6da
<body>
Packit e4b6da
Packit e4b6da
Packit e4b6da
Packit e4b6da
Character set conversion
Packit e4b6da
Packit e4b6da
Packit e4b6da
Packit e4b6da
"sgml2xml-isoent.html"><< Previous 
Packit e4b6da
 
Packit e4b6da
 
Packit e4b6da
"utf8trans.html">Next >>
Packit e4b6da
Packit e4b6da
Packit e4b6da

Packit e4b6da
Packit e4b6da
Packit e4b6da
Packit e4b6da
Packit e4b6da

Character

Packit e4b6da
set conversion
Packit e4b6da
Packit e4b6da
Packit e4b6da
Packit e4b6da
Packit e4b6da
"id2537983" class="indexterm" name="id2537983">
Packit e4b6da
"id2537990" class="indexterm" name="id2537990">
Packit e4b6da
"id2537997" class="indexterm" name="id2537997">
Packit e4b6da
"id2538612" class="indexterm" name="id2538612">
Packit e4b6da
"id2538619" class="indexterm" name="id2538619">
Packit e4b6da
"id2538625" class="indexterm" name="id2538625">
Packit e4b6da
"id2538632" class="indexterm" name="id2538632">
Packit e4b6da
"id2538639" class="indexterm" name="id2538639">
Packit e4b6da
"id2538649" class="indexterm" name="id2538649">
Packit e4b6da
"id2538656" class="indexterm" name="id2538656">
Packit e4b6da

When translating XML to legacy ASCII-based formats with poor

Packit e4b6da
support for Unicode, such as man pages and Texinfo, there is always
Packit e4b6da
the problem that Unicode characters in the source document also
Packit e4b6da
have to be translated somehow.

Packit e4b6da

A straightforward character set conversion from Unicode does not

Packit e4b6da
suffice, because the target character set, usually US-ASCII or ISO
Packit e4b6da
Latin-1, do not contain common characters such as dashes and
Packit e4b6da
directional quotation marks that are widely used in XML documents.
Packit e4b6da
But document formatters (man and Texinfo) allow such characters to
Packit e4b6da
be entered by a markup escape: for example, 
Packit e4b6da
"markup">\(lq for the left directional quote 
Packit e4b6da
"literal">“. And if a markup-level escape is not
Packit e4b6da
available, an ASCII transliteration might be used: for example,
Packit e4b6da
using the ASCII less-than sign < for
Packit e4b6da
the angle quotation mark .

Packit e4b6da

So the Unicode character problem can be solved in two steps:

Packit e4b6da
Packit e4b6da
    Packit e4b6da
  1. Packit e4b6da

    Packit e4b6da
    "command">utf8trans, a program included in
    Packit e4b6da
    docbook2X, maps Unicode characters to markup-level escapes or
    Packit e4b6da
    transliterations.

    Packit e4b6da

    Since there is not necessarily a fixed, official mapping of

    Packit e4b6da
    Unicode characters, 
    Packit e4b6da
    "command">utf8trans can read in user-modifiable
    Packit e4b6da
    character mappings expressed in text files and apply them. (Unlike
    Packit e4b6da
    most character set converters.)

    Packit e4b6da

    In charmaps/man/roff.charmap and

    Packit e4b6da
    charmaps/man/texi.charmap are
    Packit e4b6da
    character maps that may be used for man-page and Texinfo
    Packit e4b6da
    conversion. The programs 
    Packit e4b6da
    "db2x_manxml.html">
    Packit e4b6da
    "command">db2x_manxml and 
    Packit e4b6da
    "db2x_texixml.html">
    Packit e4b6da
    "command">db2x_texixml will apply these
    Packit e4b6da
    character maps, or another character map specified by the user,
    Packit e4b6da
    automatically.

    Packit e4b6da
    Packit e4b6da
  2. Packit e4b6da

    The rest of the Unicode text is converted to some other

    Packit e4b6da
    character set (encoding). For example, a French document with
    Packit e4b6da
    accented characters (such as é)
    Packit e4b6da
    might be converted to ISO Latin 1.

    Packit e4b6da

    This step is applied after

    Packit e4b6da
    "command">utf8trans character mapping, using the
    Packit e4b6da
    Packit e4b6da
    "refentrytitle">
    Packit e4b6da
    "command">iconv encoding conversion
    Packit e4b6da
    tool. Both 
    Packit e4b6da
    "command">db2x_manxml and 
    Packit e4b6da
    "db2x_texixml.html">
    Packit e4b6da
    "command">db2x_texixml can call 
    Packit e4b6da
    "citerefentry">
    Packit e4b6da
    "command">iconv automatically when
    Packit e4b6da
    producing their output.

    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da

    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    "sgml2xml-isoent.html"><< Previous 
    Packit e4b6da
     
    Packit e4b6da
     
    Packit e4b6da
    "utf8trans.html">Next >>
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    "command">sgml2xml-isoent 
    Packit e4b6da
    Packit e4b6da
    "docbook2X.html">Table of Contents
    Packit e4b6da
    Packit e4b6da
     utf8trans
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da
    Packit e4b6da

    Packit e4b6da
    "http://docbook2x.sourceforge.net/" title=
    Packit e4b6da
    "docbook2X: Home page">docbook2X home page

    Packit e4b6da
    </body>
    Packit e4b6da
    </html>