Blame doc/catalog.html

Packit 423ecb
Packit 423ecb
Packit 423ecb
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit 423ecb
TD {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit 423ecb
H1 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H2 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H3 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
A:link, A:visited, A:active { text-decoration: underline }
Packit 423ecb
</style><title>Catalog support</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Catalog support

<center>Main Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>Related links</center>

Table of Content:

    Packit 423ecb
      
  1. General overview
  2. Packit 423ecb
      
  3. The definition
  4. Packit 423ecb
      
  5. Using catalogs
  6. Packit 423ecb
      
  7. Some examples
  8. Packit 423ecb
      
  9. How to tune catalog usage
  10. Packit 423ecb
      
  11. How to debug catalog processing
  12. Packit 423ecb
      
  13. How to create and maintain catalogs
  14. Packit 423ecb
      
  15. The implementor corner quick review of the
  16. Packit 423ecb
      API
    Packit 423ecb
      
  17. Other resources
  18. Packit 423ecb

    General overview

    What is a catalog? Basically it's a lookup mechanism used when an entity

    Packit 423ecb
    (a file or a remote resource) references another entity. The catalog lookup
    Packit 423ecb
    is inserted between the moment the reference is recognized by the software
    Packit 423ecb
    (XML parser, stylesheet processing, or even images referenced for inclusion
    Packit 423ecb
    in a rendering) and the time where loading that resource is actually
    Packit 423ecb
    started.

    It is basically used for 3 things:

      Packit 423ecb
        
    • mapping from "logical" names, the public identifiers and a more
    • Packit 423ecb
          concrete name usable for download (and URI). For example it can associate
      Packit 423ecb
          the logical name
      Packit 423ecb
          

      "-//OASIS//DTD DocBook XML V4.1.2//EN"

      Packit 423ecb
          

      of the DocBook 4.1.2 XML DTD with the actual URL where it can be

      Packit 423ecb
          downloaded

      Packit 423ecb
          

      http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd

      Packit 423ecb
        
      Packit 423ecb
        
    • remapping from a given URL to another one, like an HTTP indirection
    • Packit 423ecb
          saying that
      Packit 423ecb
          

      "http://www.oasis-open.org/committes/tr.xsl"

      Packit 423ecb
          

      should really be looked at

      Packit 423ecb
          

      "http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"

      Packit 423ecb
        
      Packit 423ecb
        
    • providing a local cache mechanism allowing to load the entities
    • Packit 423ecb
          associated to public identifiers or remote resources, this is a really
      Packit 423ecb
          important feature for any significant deployment of XML or SGML since it
      Packit 423ecb
          allows to avoid the aleas and delays associated to fetching remote
      Packit 423ecb
          resources.
      Packit 423ecb

      The definitions

      Libxml, as of 2.4.3 implements 2 kind of catalogs:

        Packit 423ecb
          
      • the older SGML catalogs, the official spec is SGML Open Technical
      • Packit 423ecb
            Resolution TR9401:1997, but is better understood by reading the SP Catalog page from
        Packit 423ecb
            James Clark. This is relatively old and not the preferred mode of
        Packit 423ecb
            operation of libxml.
        Packit 423ecb
          
      • XML
      • Packit 423ecb
            Catalogs is far more flexible, more recent, uses an XML syntax and
        Packit 423ecb
            should scale quite better. This is the default option of libxml.
        Packit 423ecb

        Using catalog

        In a normal environment libxml2 will by default check the presence of a

        Packit 423ecb
        catalog in /etc/xml/catalog, and assuming it has been correctly populated,
        Packit 423ecb
        the processing is completely transparent to the document user. To take a
        Packit 423ecb
        concrete example, suppose you are authoring a DocBook document, this one
        Packit 423ecb
        starts with the following DOCTYPE definition:

        <?xml version='1.0'?>
        Packit 423ecb
        <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
        Packit 423ecb
                  "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">

        When validating the document with libxml, the catalog will be

        Packit 423ecb
        automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
        Packit 423ecb
        DocBk XML V3.1.4//EN" and the system identifier
        Packit 423ecb
        "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
        Packit 423ecb
        been installed on your system and the catalogs actually point to them, libxml
        Packit 423ecb
        will fetch them from the local disk.

        Note: Really don't use this

        Packit 423ecb
        DOCTYPE example it's a really old version, but is fine as an example.

        Libxml2 will check the catalog each time that it is requested to load an

        Packit 423ecb
        entity, this includes DTD, external parsed entities, stylesheets, etc ... If
        Packit 423ecb
        your system is correctly configured all the authoring phase and processing
        Packit 423ecb
        should use only local files, even if your document stays portable because it
        Packit 423ecb
        uses the canonical public and system ID, referencing the remote document.

        Some examples:

        Here is a couple of fragments from XML Catalogs used in libxml2 early

        Packit 423ecb
        regression tests in test/catalogs :

        <?xml version="1.0"?>
        Packit 423ecb
        <!DOCTYPE catalog PUBLIC 
        Packit 423ecb
           "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        Packit 423ecb
           "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit 423ecb
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
        Packit 423ecb
          <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit 423ecb
           uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
        Packit 423ecb
        ...

        This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are

        Packit 423ecb
        written in XML,  there is a specific namespace for catalog elements
        Packit 423ecb
        "urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
        Packit 423ecb
        catalog is a public mapping it allows to associate a Public
        Packit 423ecb
        Identifier with an URI.

        ...
        Packit 423ecb
            <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
        Packit 423ecb
                           rewritePrefix="file:///usr/share/xml/docbook/"/>
        Packit 423ecb
        ...

        A rewriteSystem is a very powerful instruction, it says that

        Packit 423ecb
        any URI starting with a given prefix should be looked at another  URI
        Packit 423ecb
        constructed by replacing the prefix with an new one. In effect this acts like
        Packit 423ecb
        a cache system for a full area of the Web. In practice it is extremely useful
        Packit 423ecb
        with a file prefix if you have installed a copy of those resources on your
        Packit 423ecb
        local system.

        ...
        Packit 423ecb
        <delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
        Packit 423ecb
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit 423ecb
        <delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
        Packit 423ecb
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit 423ecb
        <delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
        Packit 423ecb
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit 423ecb
        <delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
        Packit 423ecb
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit 423ecb
        <delegateURI uriStartString="http://www.oasis-open.org/docbook/"
        Packit 423ecb
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit 423ecb
        ...

        Delegation is the core features which allows to build a tree of catalogs,

        Packit 423ecb
        easier to maintain than a single catalog, based on Public Identifier, System
        Packit 423ecb
        Identifier or URI prefixes it instructs the catalog software to look up
        Packit 423ecb
        entries in another resource. This feature allow to build hierarchies of
        Packit 423ecb
        catalogs, the set of entries presented should be sufficient to redirect the
        Packit 423ecb
        resolution of all DocBook references to the specific catalog in
        Packit 423ecb
        /usr/share/xml/docbook.xml this one in turn could delegate all
        Packit 423ecb
        references for DocBook 4.2.1 to a specific catalog installed at the same time
        Packit 423ecb
        as the DocBook resources on the local machine.

        How to tune catalog usage:

        The user can change the default catalog behaviour by redirecting queries

        Packit 423ecb
        to its own set of catalogs, this can be done by setting the
        Packit 423ecb
        XML_CATALOG_FILES environment variable to a list of catalogs, an
        Packit 423ecb
        empty one should deactivate loading the default /etc/xml/catalog
        Packit 423ecb
        default catalog

        How to debug catalog processing:

        Setting up the XML_DEBUG_CATALOG environment variable will

        Packit 423ecb
        make libxml2 output debugging information for each catalog operations, for
        Packit 423ecb
        example:

        orchis:~/XML -> xmllint --memory --noout test/ent2
        Packit 423ecb
        warning: failed to load external entity "title.xml"
        Packit 423ecb
        orchis:~/XML -> export XML_DEBUG_CATALOG=
        Packit 423ecb
        orchis:~/XML -> xmllint --memory --noout test/ent2
        Packit 423ecb
        Failed to parse catalog /etc/xml/catalog
        Packit 423ecb
        Failed to parse catalog /etc/xml/catalog
        Packit 423ecb
        warning: failed to load external entity "title.xml"
        Packit 423ecb
        Catalogs cleanup
        Packit 423ecb
        orchis:~/XML -> 

        The test/ent2 references an entity, running the parser from memory makes

        Packit 423ecb
        the base URI unavailable and the the "title.xml" entity cannot be loaded.
        Packit 423ecb
        Setting up the debug environment variable allows to detect that an attempt is
        Packit 423ecb
        made to load the /etc/xml/catalog but since it's not present the
        Packit 423ecb
        resolution fails.

        But the most advanced way to debug XML catalog processing is to use the

        Packit 423ecb
        xmlcatalog command shipped with libxml2, it allows to load
        Packit 423ecb
        catalogs and make resolution queries to see what is going on. This is also
        Packit 423ecb
        used for the regression tests:

        orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \
        Packit 423ecb
                           "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit 423ecb
        http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
        Packit 423ecb
        orchis:~/XML -> 

        For debugging what is going on, adding one -v flags increase the verbosity

        Packit 423ecb
        level to indicate the processing done (adding a second flag also indicate
        Packit 423ecb
        what elements are recognized at parsing):

        orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \
        Packit 423ecb
                           "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit 423ecb
        Parsing catalog test/catalogs/docbook.xml's content
        Packit 423ecb
        Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
        Packit 423ecb
        http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
        Packit 423ecb
        Catalogs cleanup
        Packit 423ecb
        orchis:~/XML -> 

        A shell interface is also available to debug and process multiple queries

        Packit 423ecb
        (and for regression tests):

        orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \
        Packit 423ecb
                           "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit 423ecb
        > help   
        Packit 423ecb
        Commands available:
        Packit 423ecb
        public PublicID: make a PUBLIC identifier lookup
        Packit 423ecb
        system SystemID: make a SYSTEM identifier lookup
        Packit 423ecb
        resolve PublicID SystemID: do a full resolver lookup
        Packit 423ecb
        add 'type' 'orig' 'replace' : add an entry
        Packit 423ecb
        del 'values' : remove values
        Packit 423ecb
        dump: print the current catalog state
        Packit 423ecb
        debug: increase the verbosity level
        Packit 423ecb
        quiet: decrease the verbosity level
        Packit 423ecb
        exit:  quit the shell
        Packit 423ecb
        > public "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit 423ecb
        http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
        Packit 423ecb
        > quit
        Packit 423ecb
        orchis:~/XML -> 

        This should be sufficient for most debugging purpose, this was actually

        Packit 423ecb
        used heavily to debug the XML Catalog implementation itself.

        How to create and maintain catalogs:

        Basically XML Catalogs are XML files, you can either use XML tools to

        Packit 423ecb
        manage them or use  xmlcatalog for this. The basic step is
        Packit 423ecb
        to create a catalog the -create option provide this facility:

        orchis:~/XML -> ./xmlcatalog --create tst.xml
        Packit 423ecb
        <?xml version="1.0"?>
        Packit 423ecb
        <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        Packit 423ecb
                 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit 423ecb
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
        Packit 423ecb
        orchis:~/XML -> 

        By default xmlcatalog does not overwrite the original catalog and save the

        Packit 423ecb
        result on the standard output, this can be overridden using the -noout
        Packit 423ecb
        option. The -add command allows to add entries in the
        Packit 423ecb
        catalog:

        orchis:~/XML -> ./xmlcatalog --noout --create --add "public" \
        Packit 423ecb
          "-//OASIS//DTD DocBook XML V4.1.2//EN" \
        Packit 423ecb
          http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
        Packit 423ecb
        orchis:~/XML -> cat tst.xml
        Packit 423ecb
        <?xml version="1.0"?>
        Packit 423ecb
        <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \
        Packit 423ecb
          "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit 423ecb
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
        Packit 423ecb
        <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit 423ecb
                uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
        Packit 423ecb
        </catalog>
        Packit 423ecb
        orchis:~/XML -> 

        The -add option will always take 3 parameters even if some of

        Packit 423ecb
        the XML Catalog constructs (like nextCatalog) will have only a single
        Packit 423ecb
        argument, just pass a third empty string, it will be ignored.

        Similarly the -del option remove matching entries from the

        Packit 423ecb
        catalog:

        orchis:~/XML -> ./xmlcatalog --del \
        Packit 423ecb
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
        Packit 423ecb
        <?xml version="1.0"?>
        Packit 423ecb
        <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        Packit 423ecb
            "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit 423ecb
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
        Packit 423ecb
        orchis:~/XML -> 

        The catalog is now empty. Note that the matching of -del is

        Packit 423ecb
        exact and would have worked in a similar fashion with the Public ID
        Packit 423ecb
        string.

        This is rudimentary but should be sufficient to manage a not too complex

        Packit 423ecb
        catalog tree of resources.

        The implementor corner quick review of the

        Packit 423ecb
        API:

        First, and like for every other module of libxml, there is an

        Packit 423ecb
        automatically generated API page for
        Packit 423ecb
        catalog support.

        The header for the catalog interfaces should be included as:

        #include <libxml/catalog.h>

        The API is voluntarily kept very simple. First it is not obvious that

        Packit 423ecb
        applications really need access to it since it is the default behaviour of
        Packit 423ecb
        libxml2 (Note: it is possible to completely override libxml2 default catalog
        Packit 423ecb
        by using xmlSetExternalEntityLoader to
        Packit 423ecb
        plug an application specific resolver).

        Basically libxml2 support 2 catalog lists:

          Packit 423ecb
            
        • the default one, global shared by all the application
        • Packit 423ecb
            
        • a per-document catalog, this one is built if the document uses the
        • Packit 423ecb
              oasis-xml-catalog PIs to specify its own catalog list, it is
          Packit 423ecb
              associated to the parser context and destroyed when the parsing context
          Packit 423ecb
              is destroyed.
          Packit 423ecb

          the document one will be used first if it exists.

          Initialization routines:

          xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be

          Packit 423ecb
          used at startup to initialize the catalog, if the catalog should be
          Packit 423ecb
          initialized with specific values xmlLoadCatalog()  or xmlLoadCatalogs()
          Packit 423ecb
          should be called before xmlInitializeCatalog() which would otherwise do a
          Packit 423ecb
          default initialization first.

          The xmlCatalogAddLocal() call is used by the parser to grow the document

          Packit 423ecb
          own catalog list if needed.

          Preferences setup:

          The XML Catalog spec requires the possibility to select default

          Packit 423ecb
          preferences between  public and system delegation,
          Packit 423ecb
          xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
          Packit 423ecb
          xmlCatalogGetDefaults() allow to control  if XML Catalogs resolution should
          Packit 423ecb
          be forbidden, allowed for global catalog, for document catalog or both, the
          Packit 423ecb
          default is to allow both.

          And of course xmlCatalogSetDebug() allows to generate debug messages

          Packit 423ecb
          (through the xmlGenericError() mechanism).

          Querying routines:

          xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()

          Packit 423ecb
          and xmlCatalogResolveURI() are relatively explicit if you read the XML
          Packit 423ecb
          Catalog specification they correspond to section 7 algorithms, they should
          Packit 423ecb
          also work if you have loaded an SGML catalog with a simplified semantic.

          xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but

          Packit 423ecb
          operate on the document catalog list

          Cleanup and Miscellaneous:

          xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is

          Packit 423ecb
          the per-document equivalent.

          xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the

          Packit 423ecb
          first catalog in the global list, and xmlCatalogDump() allows to dump a
          Packit 423ecb
          catalog state, those routines are primarily designed for xmlcatalog, I'm not
          Packit 423ecb
          sure that exposing more complex interfaces (like navigation ones) would be
          Packit 423ecb
          really useful.

          The xmlParseCatalogFile() is a function used to load XML Catalog files,

          Packit 423ecb
          it's similar as xmlParseFile() except it bypass all catalog lookups, it's
          Packit 423ecb
          provided because this functionality may be useful for client tools.

          threaded environments:

          Since the catalog tree is built progressively, some care has been taken to

          Packit 423ecb
          try to avoid troubles in multithreaded environments. The code is now thread
          Packit 423ecb
          safe assuming that the libxml2 library has been compiled with threads
          Packit 423ecb
          support.

          Other resources

          The XML Catalog specification is relatively recent so there isn't much

          Packit 423ecb
          literature to point at:

            Packit 423ecb
              
          • You can find a good rant from Norm Walsh about the
          • Packit 423ecb
                need for catalogs, it provides a lot of context information even if
            Packit 423ecb
                I don't agree with everything presented. Norm also wrote a more recent
            Packit 423ecb
                article XML
            Packit 423ecb
                entities and URI resolvers describing them.
            Packit 423ecb
              
          • An old XML
          • Packit 423ecb
                catalog proposal from John Cowan
            Packit 423ecb
              
          • The Resource Directory Description
          • Packit 423ecb
                Language (RDDL) another catalog system but more oriented toward
            Packit 423ecb
                providing metadata for XML namespaces.
            Packit 423ecb
              
          • the page from the OASIS Technical Committee on Entity
          • Packit 423ecb
                Resolution who maintains XML Catalog, you will find pointers to the
            Packit 423ecb
                specification update, some background and pointers to others tools
            Packit 423ecb
                providing XML Catalog support
            Packit 423ecb
              
          • There is a shell script to generate
          • Packit 423ecb
                XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/
            Packit 423ecb
                directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on
            Packit 423ecb
                the resources found on the system. Otherwise it will just create
            Packit 423ecb
                ~/xmlcatalog and ~/dbkxmlcatalog and doing:
            Packit 423ecb
                

            export XML_CATALOG_FILES=$HOME/xmlcatalog

            Packit 423ecb
                

            should allow to process DocBook documentations without requiring

            Packit 423ecb
                network accesses for the DTD or stylesheets

            Packit 423ecb
              
            Packit 423ecb
              
          • I have uploaded a
          • Packit 423ecb
                small tarball containing XML Catalogs for DocBook 4.1.2 which seems
            Packit 423ecb
                to work fine for me too
            Packit 423ecb
              
          • The xmlcatalog
          • Packit 423ecb
                manual page
            Packit 423ecb

            If you have suggestions for corrections or additions, simply contact

            Packit 423ecb
            me:

            Daniel Veillard

            </body></html>