Blame doc/tree.html

Packit 423ecb
Packit 423ecb
Packit 423ecb
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit 423ecb
TD {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit 423ecb
H1 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H2 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H3 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
A:link, A:visited, A:active { text-decoration: underline }
Packit 423ecb
</style><title>The tree output</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

The tree output

<center>Developer Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>API Indexes</center>
<center>Related links</center>

The parser returns a tree built during the document analysis. The value

Packit 423ecb
returned is an xmlDocPtr (i.e., a pointer to an
Packit 423ecb
xmlDoc structure). This structure contains information such
Packit 423ecb
as the file name, the document type, and a children pointer
Packit 423ecb
which is the root of the document (or more exactly the first child under the
Packit 423ecb
root which is the document). The tree is made of xmlNodes,
Packit 423ecb
chained in double-linked lists of siblings and with a children<->parent
Packit 423ecb
relationship. An xmlNode can also carry properties (a chain of xmlAttr
Packit 423ecb
structures). An attribute may have a value which is a list of TEXT or
Packit 423ecb
ENTITY_REF nodes.

Here is an example (erroneous with respect to the XML spec since there

Packit 423ecb
should be only one ELEMENT under the root):

 structure.gif

In the source package there is a small program (not installed by default)

Packit 423ecb
called xmllint which parses XML files given as argument and
Packit 423ecb
prints them back as parsed. This is useful for detecting errors both in XML
Packit 423ecb
code and in the XML parser itself. It has an option --debug
Packit 423ecb
which prints the actual in-memory structure of the document; here is the
Packit 423ecb
result with the example given before:

DOCUMENT
Packit 423ecb
version=1.0
Packit 423ecb
standalone=true
Packit 423ecb
  ELEMENT EXAMPLE
Packit 423ecb
    ATTRIBUTE prop1
Packit 423ecb
      TEXT
Packit 423ecb
      content=gnome is great
Packit 423ecb
    ATTRIBUTE prop2
Packit 423ecb
      ENTITY_REF
Packit 423ecb
      TEXT
Packit 423ecb
      content= linux too 
Packit 423ecb
    ELEMENT head
Packit 423ecb
      ELEMENT title
Packit 423ecb
        TEXT
Packit 423ecb
        content=Welcome to Gnome
Packit 423ecb
    ELEMENT chapter
Packit 423ecb
      ELEMENT title
Packit 423ecb
        TEXT
Packit 423ecb
        content=The Linux adventure
Packit 423ecb
      ELEMENT p
Packit 423ecb
        TEXT
Packit 423ecb
        content=bla bla bla ...
Packit 423ecb
      ELEMENT image
Packit 423ecb
        ATTRIBUTE href
Packit 423ecb
          TEXT
Packit 423ecb
          content=linus.gif
Packit 423ecb
      ELEMENT p
Packit 423ecb
        TEXT
Packit 423ecb
        content=...

This should be useful for learning the internal representation model.

Daniel Veillard

</body></html>