Blame doc/library.html

Packit 423ecb
Packit 423ecb
Packit 423ecb
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit 423ecb
TD {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit 423ecb
H1 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H2 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H3 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
A:link, A:visited, A:active { text-decoration: underline }
Packit 423ecb
</style><title>The parser interfaces</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

The parser interfaces

<center>Developer Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>API Indexes</center>
<center>Related links</center>

This section is directly intended to help programmers getting bootstrapped

Packit 423ecb
using the XML tollkit from the C language. It is not intended to be
Packit 423ecb
extensive. I hope the automatically generated documents will provide the
Packit 423ecb
completeness required, but as a separate set of documents. The interfaces of
Packit 423ecb
the XML parser are by principle low level, Those interested in a higher level
Packit 423ecb
API should look at DOM.

The parser interfaces for XML are

Packit 423ecb
separated from the HTML parser
Packit 423ecb
interfaces.  Let's have a look at how the XML parser can be called:

Invoking the parser : the pull method

Usually, the first thing to do is to read an XML input. The parser accepts

Packit 423ecb
documents either from in-memory strings or from files.  The functions are
Packit 423ecb
defined in "parser.h":

Packit 423ecb
  
xmlDocPtr xmlParseMemory(char *buffer, int size);
Packit 423ecb
    

Parse a null-terminated string containing the document.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
xmlDocPtr xmlParseFile(const char *filename);
Packit 423ecb
    

Parse an XML document contained in a (possibly compressed)

Packit 423ecb
      file.

Packit 423ecb
    
Packit 423ecb

The parser returns a pointer to the document structure (or NULL in case of

Packit 423ecb
failure).

Invoking the parser: the push method

In order for the application to keep the control when the document is

Packit 423ecb
being fetched (which is common for GUI based programs) libxml2 provides a
Packit 423ecb
push interface, too, as of version 1.8.3. Here are the interface
Packit 423ecb
functions:

xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
Packit 423ecb
                                         void *user_data,
Packit 423ecb
                                         const char *chunk,
Packit 423ecb
                                         int size,
Packit 423ecb
                                         const char *filename);
Packit 423ecb
int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
Packit 423ecb
                                         const char *chunk,
Packit 423ecb
                                         int size,
Packit 423ecb
                                         int terminate);

and here is a simple example showing how to use the interface:

            FILE *f;
Packit 423ecb
Packit 423ecb
            f = fopen(filename, "r");
Packit 423ecb
            if (f != NULL) {
Packit 423ecb
                int res, size = 1024;
Packit 423ecb
                char chars[1024];
Packit 423ecb
                xmlParserCtxtPtr ctxt;
Packit 423ecb
Packit 423ecb
                res = fread(chars, 1, 4, f);
Packit 423ecb
                if (res > 0) {
Packit 423ecb
                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
Packit 423ecb
                                chars, res, filename);
Packit 423ecb
                    while ((res = fread(chars, 1, size, f)) > 0) {
Packit 423ecb
                        xmlParseChunk(ctxt, chars, res, 0);
Packit 423ecb
                    }
Packit 423ecb
                    xmlParseChunk(ctxt, chars, 0, 1);
Packit 423ecb
                    doc = ctxt->myDoc;
Packit 423ecb
                    xmlFreeParserCtxt(ctxt);
Packit 423ecb
                }
Packit 423ecb
            }

The HTML parser embedded into libxml2 also has a push interface; the

Packit 423ecb
functions are just prefixed by "html" rather than "xml".

Invoking the parser: the SAX interface

The tree-building interface makes the parser memory-hungry, first loading

Packit 423ecb
the document in memory and then building the tree itself. Reading a document
Packit 423ecb
without building the tree is possible using the SAX interfaces (see SAX.h and
Packit 423ecb
James
Packit 423ecb
Henstridge's documentation). Note also that the push interface can be
Packit 423ecb
limited to SAX: just use the two first arguments of
Packit 423ecb
xmlCreatePushParserCtxt().

Building a tree from scratch

The other way to get an XML tree in memory is by building it. Basically

Packit 423ecb
there is a set of functions dedicated to building new elements. (These are
Packit 423ecb
also described in <libxml/tree.h>.) For example, here is a piece of
Packit 423ecb
code that produces the XML document used in the previous examples:

    #include <libxml/tree.h>
Packit 423ecb
    xmlDocPtr doc;
Packit 423ecb
    xmlNodePtr tree, subtree;
Packit 423ecb
Packit 423ecb
    doc = xmlNewDoc("1.0");
Packit 423ecb
    doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
Packit 423ecb
    xmlSetProp(doc->children, "prop1", "gnome is great");
Packit 423ecb
    xmlSetProp(doc->children, "prop2", "& linux too");
Packit 423ecb
    tree = xmlNewChild(doc->children, NULL, "head", NULL);
Packit 423ecb
    subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
Packit 423ecb
    tree = xmlNewChild(doc->children, NULL, "chapter", NULL);
Packit 423ecb
    subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
Packit 423ecb
    subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
Packit 423ecb
    subtree = xmlNewChild(tree, NULL, "image", NULL);
Packit 423ecb
    xmlSetProp(subtree, "href", "linus.gif");

Not really rocket science ...

Traversing the tree

Basically by including "tree.h" your

Packit 423ecb
code has access to the internal structure of all the elements of the tree.
Packit 423ecb
The names should be somewhat simple like parent,
Packit 423ecb
children, next, prev,
Packit 423ecb
properties, etc... For example, still with the previous
Packit 423ecb
example:

doc->children->children->children

points to the title element,

doc->children->children->next->children->children

points to the text node containing the chapter title "The Linux

Packit 423ecb
adventure".

NOTE: XML allows PIs and comments to be

Packit 423ecb
present before the document root, so doc->children may point
Packit 423ecb
to an element which is not the document Root Element; a function
Packit 423ecb
xmlDocGetRootElement() was added for this purpose.

Modifying the tree

Functions are provided for reading and writing the document content. Here

Packit 423ecb
is an excerpt from the tree API:

Packit 423ecb
  
xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
Packit 423ecb
  xmlChar *value);
Packit 423ecb
    

This sets (or changes) an attribute carried by an ELEMENT node.

Packit 423ecb
      The value can be NULL.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
Packit 423ecb
  *name);
Packit 423ecb
    

This function returns a pointer to new copy of the property

Packit 423ecb
      content. Note that the user must deallocate the result.

Packit 423ecb
    
Packit 423ecb

Two functions are provided for reading and writing the text associated

Packit 423ecb
with elements:

Packit 423ecb
  
xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
Packit 423ecb
  *value);
Packit 423ecb
    

This function takes an "external" string and converts it to one

Packit 423ecb
      text node or possibly to a list of entity and text nodes. All
Packit 423ecb
      non-predefined entity references like &Gnome; will be stored
Packit 423ecb
      internally as entity nodes, hence the result of the function may not be
Packit 423ecb
      a single node.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
Packit 423ecb
  inLine);
Packit 423ecb
    

This function is the inverse of

Packit 423ecb
      xmlStringGetNodeList(). It generates a new string
Packit 423ecb
      containing the content of the text and entity nodes. Note the extra
Packit 423ecb
      argument inLine. If this argument is set to 1, the function will expand
Packit 423ecb
      entity references.  For example, instead of returning the &Gnome;
Packit 423ecb
      XML encoding in the string, it will substitute it with its value (say,
Packit 423ecb
      "GNU Network Object Model Environment").

Packit 423ecb
    
Packit 423ecb

Saving a tree

Basically 3 options are possible:

Packit 423ecb
  
void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
Packit 423ecb
  *size);
Packit 423ecb
    

Returns a buffer into which the document has been saved.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
extern void xmlDocDump(FILE *f, xmlDocPtr doc);
Packit 423ecb
    

Dumps a document to an open file descriptor.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
int xmlSaveFile(const char *filename, xmlDocPtr cur);
Packit 423ecb
    

Saves the document to a file. In this case, the compression

Packit 423ecb
      interface is triggered if it has been turned on.

Packit 423ecb
    
Packit 423ecb

Compression

The library transparently handles compression when doing file-based

Packit 423ecb
accesses. The level of compression on saves can be turned on either globally
Packit 423ecb
or individually for one file:

Packit 423ecb
  
int xmlGetDocCompressMode (xmlDocPtr doc);
Packit 423ecb
    

Gets the document compression ratio (0-9).

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
void xmlSetDocCompressMode (xmlDocPtr doc, int mode);
Packit 423ecb
    

Sets the document compression ratio.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
int xmlGetCompressMode(void);
Packit 423ecb
    

Gets the default compression ratio.

Packit 423ecb
    
Packit 423ecb
Packit 423ecb
  
void xmlSetCompressMode(int mode);
Packit 423ecb
    

Sets the default compression ratio.

Packit 423ecb
    
Packit 423ecb

Daniel Veillard

</body></html>