Blame doc/entities.html

Packit Service a31ea6
Packit Service a31ea6
Packit Service a31ea6
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit Service a31ea6
TD {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit Service a31ea6
H1 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
H2 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
H3 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
A:link, A:visited, A:active { text-decoration: underline }
Packit Service a31ea6
</style><title>Entities or no entities</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Entities or no entities

<center>Developer Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>API Indexes</center>
<center>Related links</center>

Entities in principle are similar to simple C macros. An entity defines an

Packit Service a31ea6
abbreviation for a given string that you can reuse many times throughout the
Packit Service a31ea6
content of your document. Entities are especially useful when a given string
Packit Service a31ea6
may occur frequently within a document, or to confine the change needed to a
Packit Service a31ea6
document to a restricted area in the internal subset of the document (at the
Packit Service a31ea6
beginning). Example:

1 <?xml version="1.0"?>
Packit Service a31ea6
2 <!DOCTYPE EXAMPLE SYSTEM "example.dtd" [
Packit Service a31ea6
3 <!ENTITY xml "Extensible Markup Language">
Packit Service a31ea6
4 ]>
Packit Service a31ea6
5 <EXAMPLE>
Packit Service a31ea6
6    &xml;
Packit Service a31ea6
7 </EXAMPLE>

Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing

Packit Service a31ea6
its name with '&' and following it by ';' without any spaces added. There
Packit Service a31ea6
are 5 predefined entities in libxml2 allowing you to escape characters with
Packit Service a31ea6
predefined meaning in some parts of the xml document content:
Packit Service a31ea6
&lt; for the character '<', &gt;
Packit Service a31ea6
for the character '>',  &apos; for the character ''',
Packit Service a31ea6
&quot; for the character '"', and
Packit Service a31ea6
&amp; for the character '&'.

One of the problems related to entities is that you may want the parser to

Packit Service a31ea6
substitute an entity's content so that you can see the replacement text in
Packit Service a31ea6
your application. Or you may prefer to keep entity references as such in the
Packit Service a31ea6
content to be able to save the document back without losing this usually
Packit Service a31ea6
precious information (if the user went through the pain of explicitly
Packit Service a31ea6
defining entities, he may have a a rather negative attitude if you blindly
Packit Service a31ea6
substitute them as saving time). The xmlSubstituteEntitiesDefault()
Packit Service a31ea6
function allows you to check and change the behaviour, which is to not
Packit Service a31ea6
substitute entities by default.

Here is the DOM tree built by libxml2 for the previous document in the

Packit Service a31ea6
default case:

/gnome/src/gnome-xml -> ./xmllint --debug test/ent1
Packit Service a31ea6
DOCUMENT
Packit Service a31ea6
version=1.0
Packit Service a31ea6
   ELEMENT EXAMPLE
Packit Service a31ea6
     TEXT
Packit Service a31ea6
     content=
Packit Service a31ea6
     ENTITY_REF
Packit Service a31ea6
       INTERNAL_GENERAL_ENTITY xml
Packit Service a31ea6
       content=Extensible Markup Language
Packit Service a31ea6
     TEXT
Packit Service a31ea6
     content=

And here is the result when substituting entities:

/gnome/src/gnome-xml -> ./tester --debug --noent test/ent1
Packit Service a31ea6
DOCUMENT
Packit Service a31ea6
version=1.0
Packit Service a31ea6
   ELEMENT EXAMPLE
Packit Service a31ea6
     TEXT
Packit Service a31ea6
     content=     Extensible Markup Language

So, entities or no entities? Basically, it depends on your use case. I

Packit Service a31ea6
suggest that you keep the non-substituting default behaviour and avoid using
Packit Service a31ea6
entities in your XML document or data if you are not willing to handle the
Packit Service a31ea6
entity references elements in the DOM tree.

Note that at save time libxml2 enforces the conversion of the predefined

Packit Service a31ea6
entities where necessary to prevent well-formedness problems, and will also
Packit Service a31ea6
transparently replace those with chars (i.e. it will not generate entity
Packit Service a31ea6
reference elements in the DOM tree or call the reference() SAX callback when
Packit Service a31ea6
finding them in the input).

WARNING: handling entities

Packit Service a31ea6
on top of the libxml2 SAX interface is difficult!!! If you plan to use
Packit Service a31ea6
non-predefined entities in your documents, then the learning curve to handle
Packit Service a31ea6
then using the SAX API may be long. If you plan to use complex documents, I
Packit Service a31ea6
strongly suggest you consider using the DOM interface instead and let libxml
Packit Service a31ea6
deal with the complexity rather than trying to do it yourself.

Daniel Veillard

</body></html>