Blame doc/library.html

Packit Service a31ea6
Packit Service a31ea6
Packit Service a31ea6
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit Service a31ea6
TD {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit Service a31ea6
H1 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
H2 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
H3 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
A:link, A:visited, A:active { text-decoration: underline }
Packit Service a31ea6
</style><title>The parser interfaces</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

The parser interfaces

<center>Developer Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>API Indexes</center>
<center>Related links</center>

This section is directly intended to help programmers getting bootstrapped

Packit Service a31ea6
using the XML tollkit from the C language. It is not intended to be
Packit Service a31ea6
extensive. I hope the automatically generated documents will provide the
Packit Service a31ea6
completeness required, but as a separate set of documents. The interfaces of
Packit Service a31ea6
the XML parser are by principle low level, Those interested in a higher level
Packit Service a31ea6
API should look at DOM.

The parser interfaces for XML are

Packit Service a31ea6
separated from the HTML parser
Packit Service a31ea6
interfaces.  Let's have a look at how the XML parser can be called:

Invoking the parser : the pull method

Usually, the first thing to do is to read an XML input. The parser accepts

Packit Service a31ea6
documents either from in-memory strings or from files.  The functions are
Packit Service a31ea6
defined in "parser.h":

Packit Service a31ea6
  
xmlDocPtr xmlParseMemory(char *buffer, int size);
Packit Service a31ea6
    

Parse a null-terminated string containing the document.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
xmlDocPtr xmlParseFile(const char *filename);
Packit Service a31ea6
    

Parse an XML document contained in a (possibly compressed)

Packit Service a31ea6
      file.

Packit Service a31ea6
    
Packit Service a31ea6

The parser returns a pointer to the document structure (or NULL in case of

Packit Service a31ea6
failure).

Invoking the parser: the push method

In order for the application to keep the control when the document is

Packit Service a31ea6
being fetched (which is common for GUI based programs) libxml2 provides a
Packit Service a31ea6
push interface, too, as of version 1.8.3. Here are the interface
Packit Service a31ea6
functions:

xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
Packit Service a31ea6
                                         void *user_data,
Packit Service a31ea6
                                         const char *chunk,
Packit Service a31ea6
                                         int size,
Packit Service a31ea6
                                         const char *filename);
Packit Service a31ea6
int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
Packit Service a31ea6
                                         const char *chunk,
Packit Service a31ea6
                                         int size,
Packit Service a31ea6
                                         int terminate);

and here is a simple example showing how to use the interface:

            FILE *f;
Packit Service a31ea6
Packit Service a31ea6
            f = fopen(filename, "r");
Packit Service a31ea6
            if (f != NULL) {
Packit Service a31ea6
                int res, size = 1024;
Packit Service a31ea6
                char chars[1024];
Packit Service a31ea6
                xmlParserCtxtPtr ctxt;
Packit Service a31ea6
Packit Service a31ea6
                res = fread(chars, 1, 4, f);
Packit Service a31ea6
                if (res > 0) {
Packit Service a31ea6
                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
Packit Service a31ea6
                                chars, res, filename);
Packit Service a31ea6
                    while ((res = fread(chars, 1, size, f)) > 0) {
Packit Service a31ea6
                        xmlParseChunk(ctxt, chars, res, 0);
Packit Service a31ea6
                    }
Packit Service a31ea6
                    xmlParseChunk(ctxt, chars, 0, 1);
Packit Service a31ea6
                    doc = ctxt->myDoc;
Packit Service a31ea6
                    xmlFreeParserCtxt(ctxt);
Packit Service a31ea6
                }
Packit Service a31ea6
            }

The HTML parser embedded into libxml2 also has a push interface; the

Packit Service a31ea6
functions are just prefixed by "html" rather than "xml".

Invoking the parser: the SAX interface

The tree-building interface makes the parser memory-hungry, first loading

Packit Service a31ea6
the document in memory and then building the tree itself. Reading a document
Packit Service a31ea6
without building the tree is possible using the SAX interfaces (see SAX.h and
Packit Service a31ea6
James
Packit Service a31ea6
Henstridge's documentation). Note also that the push interface can be
Packit Service a31ea6
limited to SAX: just use the two first arguments of
Packit Service a31ea6
xmlCreatePushParserCtxt().

Building a tree from scratch

The other way to get an XML tree in memory is by building it. Basically

Packit Service a31ea6
there is a set of functions dedicated to building new elements. (These are
Packit Service a31ea6
also described in <libxml/tree.h>.) For example, here is a piece of
Packit Service a31ea6
code that produces the XML document used in the previous examples:

    #include <libxml/tree.h>
Packit Service a31ea6
    xmlDocPtr doc;
Packit Service a31ea6
    xmlNodePtr tree, subtree;
Packit Service a31ea6
Packit Service a31ea6
    doc = xmlNewDoc("1.0");
Packit Service a31ea6
    doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
Packit Service a31ea6
    xmlSetProp(doc->children, "prop1", "gnome is great");
Packit Service a31ea6
    xmlSetProp(doc->children, "prop2", "& linux too");
Packit Service a31ea6
    tree = xmlNewChild(doc->children, NULL, "head", NULL);
Packit Service a31ea6
    subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
Packit Service a31ea6
    tree = xmlNewChild(doc->children, NULL, "chapter", NULL);
Packit Service a31ea6
    subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
Packit Service a31ea6
    subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
Packit Service a31ea6
    subtree = xmlNewChild(tree, NULL, "image", NULL);
Packit Service a31ea6
    xmlSetProp(subtree, "href", "linus.gif");

Not really rocket science ...

Traversing the tree

Basically by including "tree.h" your

Packit Service a31ea6
code has access to the internal structure of all the elements of the tree.
Packit Service a31ea6
The names should be somewhat simple like parent,
Packit Service a31ea6
children, next, prev,
Packit Service a31ea6
properties, etc... For example, still with the previous
Packit Service a31ea6
example:

doc->children->children->children

points to the title element,

doc->children->children->next->children->children

points to the text node containing the chapter title "The Linux

Packit Service a31ea6
adventure".

NOTE: XML allows PIs and comments to be

Packit Service a31ea6
present before the document root, so doc->children may point
Packit Service a31ea6
to an element which is not the document Root Element; a function
Packit Service a31ea6
xmlDocGetRootElement() was added for this purpose.

Modifying the tree

Functions are provided for reading and writing the document content. Here

Packit Service a31ea6
is an excerpt from the tree API:

Packit Service a31ea6
  
xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const
Packit Service a31ea6
  xmlChar *value);
Packit Service a31ea6
    

This sets (or changes) an attribute carried by an ELEMENT node.

Packit Service a31ea6
      The value can be NULL.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar
Packit Service a31ea6
  *name);
Packit Service a31ea6
    

This function returns a pointer to new copy of the property

Packit Service a31ea6
      content. Note that the user must deallocate the result.

Packit Service a31ea6
    
Packit Service a31ea6

Two functions are provided for reading and writing the text associated

Packit Service a31ea6
with elements:

Packit Service a31ea6
  
xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar
Packit Service a31ea6
  *value);
Packit Service a31ea6
    

This function takes an "external" string and converts it to one

Packit Service a31ea6
      text node or possibly to a list of entity and text nodes. All
Packit Service a31ea6
      non-predefined entity references like &Gnome; will be stored
Packit Service a31ea6
      internally as entity nodes, hence the result of the function may not be
Packit Service a31ea6
      a single node.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
Packit Service a31ea6
  inLine);
Packit Service a31ea6
    

This function is the inverse of

Packit Service a31ea6
      xmlStringGetNodeList(). It generates a new string
Packit Service a31ea6
      containing the content of the text and entity nodes. Note the extra
Packit Service a31ea6
      argument inLine. If this argument is set to 1, the function will expand
Packit Service a31ea6
      entity references.  For example, instead of returning the &Gnome;
Packit Service a31ea6
      XML encoding in the string, it will substitute it with its value (say,
Packit Service a31ea6
      "GNU Network Object Model Environment").

Packit Service a31ea6
    
Packit Service a31ea6

Saving a tree

Basically 3 options are possible:

Packit Service a31ea6
  
void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int
Packit Service a31ea6
  *size);
Packit Service a31ea6
    

Returns a buffer into which the document has been saved.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
extern void xmlDocDump(FILE *f, xmlDocPtr doc);
Packit Service a31ea6
    

Dumps a document to an open file descriptor.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
int xmlSaveFile(const char *filename, xmlDocPtr cur);
Packit Service a31ea6
    

Saves the document to a file. In this case, the compression

Packit Service a31ea6
      interface is triggered if it has been turned on.

Packit Service a31ea6
    
Packit Service a31ea6

Compression

The library transparently handles compression when doing file-based

Packit Service a31ea6
accesses. The level of compression on saves can be turned on either globally
Packit Service a31ea6
or individually for one file:

Packit Service a31ea6
  
int xmlGetDocCompressMode (xmlDocPtr doc);
Packit Service a31ea6
    

Gets the document compression ratio (0-9).

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
void xmlSetDocCompressMode (xmlDocPtr doc, int mode);
Packit Service a31ea6
    

Sets the document compression ratio.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
int xmlGetCompressMode(void);
Packit Service a31ea6
    

Gets the default compression ratio.

Packit Service a31ea6
    
Packit Service a31ea6
Packit Service a31ea6
  
void xmlSetCompressMode(int mode);
Packit Service a31ea6
    

Sets the default compression ratio.

Packit Service a31ea6
    
Packit Service a31ea6

Daniel Veillard

</body></html>