Blame doc/xmlio.html

Packit 423ecb
Packit 423ecb
Packit 423ecb
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit 423ecb
TD {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit 423ecb
H1 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H2 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H3 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
A:link, A:visited, A:active { text-decoration: underline }
Packit 423ecb
</style><title>I/O Interfaces</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

I/O Interfaces

<center>Developer Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>API Indexes</center>
<center>Related links</center>

Table of Content:

    Packit 423ecb
      
  1. General overview
  2. Packit 423ecb
      
  3. The basic buffer type
  4. Packit 423ecb
      
  5. Input I/O handlers
  6. Packit 423ecb
      
  7. Output I/O handlers
  8. Packit 423ecb
      
  9. The entities loader
  10. Packit 423ecb
      
  11. Example of customized I/O
  12. Packit 423ecb

    General overview

    The module xmlIO.h provides

    Packit 423ecb
    the interfaces to the libxml2 I/O system. This consists of 4 main parts:

      Packit 423ecb
        
    • Entities loader, this is a routine which tries to fetch the entities
    • Packit 423ecb
          (files) based on their PUBLIC and SYSTEM identifiers. The default loader
      Packit 423ecb
          don't look at the public identifier since libxml2 do not maintain a
      Packit 423ecb
          catalog. You can redefine you own entity loader by using
      Packit 423ecb
          xmlGetExternalEntityLoader() and
      Packit 423ecb
          xmlSetExternalEntityLoader(). Check the
      Packit 423ecb
          example.
      Packit 423ecb
        
    • Input I/O buffers which are a commodity structure used by the parser(s)
    • Packit 423ecb
          input layer to handle fetching the information to feed the parser. This
      Packit 423ecb
          provides buffering and is also a placeholder where the encoding
      Packit 423ecb
          converters to UTF8 are piggy-backed.
      Packit 423ecb
        
    • Output I/O buffers are similar to the Input ones and fulfill similar
    • Packit 423ecb
          task but when generating a serialization from a tree.
      Packit 423ecb
        
    • A mechanism to register sets of I/O callbacks and associate them with
    • Packit 423ecb
          specific naming schemes like the protocol part of the URIs.
      Packit 423ecb
          

      This affect the default I/O operations and allows to use specific I/O

      Packit 423ecb
          handlers for certain names.

      Packit 423ecb
        
      Packit 423ecb

      The general mechanism used when loading http://rpmfind.net/xml.html for

      Packit 423ecb
      example in the HTML parser is the following:

        Packit 423ecb
          
      1. The default entity loader calls xmlNewInputFromFile() with
      2. Packit 423ecb
            the parsing context and the URI string.
        Packit 423ecb
          
      3. the URI string is checked against the existing registered handlers
      4. Packit 423ecb
            using their match() callback function, if the HTTP module was compiled
        Packit 423ecb
            in, it is registered and its match() function will succeeds
        Packit 423ecb
          
      5. the open() function of the handler is called and if successful will
      6. Packit 423ecb
            return an I/O Input buffer
        Packit 423ecb
          
      7. the parser will the start reading from this buffer and progressively
      8. Packit 423ecb
            fetch information from the resource, calling the read() function of the
        Packit 423ecb
            handler until the resource is exhausted
        Packit 423ecb
          
      9. if an encoding change is detected it will be installed on the input
      10. Packit 423ecb
            buffer, providing buffering and efficient use of the conversion
        Packit 423ecb
          routines
        Packit 423ecb
          
      11. once the parser has finished, the close() function of the handler is
      12. Packit 423ecb
            called once and the Input buffer and associated resources are
        Packit 423ecb
          deallocated.
        Packit 423ecb

        The user defined callbacks are checked first to allow overriding of the

        Packit 423ecb
        default libxml2 I/O routines.

        The basic buffer type

        All the buffer manipulation handling is done using the

        Packit 423ecb
        xmlBuffer type define in tree.h which is a
        Packit 423ecb
        resizable memory buffer. The buffer allocation strategy can be selected to be
        Packit 423ecb
        either best-fit or use an exponential doubling one (CPU vs. memory use
        Packit 423ecb
        trade-off). The values are XML_BUFFER_ALLOC_EXACT and
        Packit 423ecb
        XML_BUFFER_ALLOC_DOUBLEIT, and can be set individually or on a
        Packit 423ecb
        system wide basis using xmlBufferSetAllocationScheme(). A number
        Packit 423ecb
        of functions allows to manipulate buffers with names starting with the
        Packit 423ecb
        xmlBuffer... prefix.

        Input I/O handlers

        An Input I/O handler is a simple structure

        Packit 423ecb
        xmlParserInputBuffer containing a context associated to the
        Packit 423ecb
        resource (file descriptor, or pointer to a protocol handler), the read() and
        Packit 423ecb
        close() callbacks to use and an xmlBuffer. And extra xmlBuffer and a charset
        Packit 423ecb
        encoding handler are also present to support charset conversion when
        Packit 423ecb
        needed.

        Output I/O handlers

        An Output handler xmlOutputBuffer is completely similar to an

        Packit 423ecb
        Input one except the callbacks are write() and close().

        The entities loader

        The entity loader resolves requests for new entities and create inputs for

        Packit 423ecb
        the parser. Creating an input from a filename or an URI string is done
        Packit 423ecb
        through the xmlNewInputFromFile() routine.  The default entity loader do not
        Packit 423ecb
        handle the PUBLIC identifier associated with an entity (if any). So it just
        Packit 423ecb
        calls xmlNewInputFromFile() with the SYSTEM identifier (which is mandatory in
        Packit 423ecb
        XML).

        If you want to hook up a catalog mechanism then you simply need to

        Packit 423ecb
        override the default entity loader, here is an example:

        #include <libxml/xmlIO.h>
        Packit 423ecb
        Packit 423ecb
        xmlExternalEntityLoader defaultLoader = NULL;
        Packit 423ecb
        Packit 423ecb
        xmlParserInputPtr
        Packit 423ecb
        xmlMyExternalEntityLoader(const char *URL, const char *ID,
        Packit 423ecb
                                       xmlParserCtxtPtr ctxt) {
        Packit 423ecb
            xmlParserInputPtr ret;
        Packit 423ecb
            const char *fileID = NULL;
        Packit 423ecb
            /* lookup for the fileID depending on ID */
        Packit 423ecb
        Packit 423ecb
            ret = xmlNewInputFromFile(ctxt, fileID);
        Packit 423ecb
            if (ret != NULL)
        Packit 423ecb
                return(ret);
        Packit 423ecb
            if (defaultLoader != NULL)
        Packit 423ecb
                ret = defaultLoader(URL, ID, ctxt);
        Packit 423ecb
            return(ret);
        Packit 423ecb
        }
        Packit 423ecb
        Packit 423ecb
        int main(..) {
        Packit 423ecb
            ...
        Packit 423ecb
        Packit 423ecb
            /*
        Packit 423ecb
             * Install our own entity loader
        Packit 423ecb
             */
        Packit 423ecb
            defaultLoader = xmlGetExternalEntityLoader();
        Packit 423ecb
            xmlSetExternalEntityLoader(xmlMyExternalEntityLoader);
        Packit 423ecb
        Packit 423ecb
            ...
        Packit 423ecb
        }

        Example of customized I/O

        This example come from a

        Packit 423ecb
        real use case,  xmlDocDump() closes the FILE * passed by the application
        Packit 423ecb
        and this was a problem. The solution was to redefine a
        Packit 423ecb
        new output handler with the closing call deactivated:

          Packit 423ecb
            
        1. First define a new I/O output allocator where the output don't close
        2. Packit 423ecb
              the file:
          Packit 423ecb
              
          xmlOutputBufferPtr
          Packit 423ecb
          xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {
          Packit 423ecb
              xmlOutputBufferPtr ret;
          Packit 423ecb
              
          Packit 423ecb
              if (xmlOutputCallbackInitialized == 0)
          Packit 423ecb
                  xmlRegisterDefaultOutputCallbacks();
          Packit 423ecb
          Packit 423ecb
              if (file == NULL) return(NULL);
          Packit 423ecb
              ret = xmlAllocOutputBuffer(encoder);
          Packit 423ecb
              if (ret != NULL) {
          Packit 423ecb
                  ret->context = file;
          Packit 423ecb
                  ret->writecallback = xmlFileWrite;
          Packit 423ecb
                  ret->closecallback = NULL;  /* No close callback */
          Packit 423ecb
              }
          Packit 423ecb
              return(ret);
          Packit 423ecb
          } 
          Packit 423ecb
            
          Packit 423ecb
            
        3. And then use it to save the document:
        4. Packit 423ecb
              
          FILE *f;
          Packit 423ecb
          xmlOutputBufferPtr output;
          Packit 423ecb
          xmlDocPtr doc;
          Packit 423ecb
          int res;
          Packit 423ecb
          Packit 423ecb
          f = ...
          Packit 423ecb
          doc = ....
          Packit 423ecb
          Packit 423ecb
          output = xmlOutputBufferCreateOwn(f, NULL);
          Packit 423ecb
          res = xmlSaveFileTo(output, doc, NULL);
          Packit 423ecb
              
          Packit 423ecb
            
          Packit 423ecb

          Daniel Veillard

          </body></html>