Blame doc/catalog.html

Packit Service a31ea6
Packit Service a31ea6
Packit Service a31ea6
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit Service a31ea6
TD {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit Service a31ea6
H1 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
H2 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
H3 {font-family: Verdana,Arial,Helvetica}
Packit Service a31ea6
A:link, A:visited, A:active { text-decoration: underline }
Packit Service a31ea6
</style><title>Catalog support</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Catalog support

<center>Main Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>Related links</center>

Table of Content:

    Packit Service a31ea6
      
  1. General overview
  2. Packit Service a31ea6
      
  3. The definition
  4. Packit Service a31ea6
      
  5. Using catalogs
  6. Packit Service a31ea6
      
  7. Some examples
  8. Packit Service a31ea6
      
  9. How to tune catalog usage
  10. Packit Service a31ea6
      
  11. How to debug catalog processing
  12. Packit Service a31ea6
      
  13. How to create and maintain catalogs
  14. Packit Service a31ea6
      
  15. The implementor corner quick review of the
  16. Packit Service a31ea6
      API
    Packit Service a31ea6
      
  17. Other resources
  18. Packit Service a31ea6

    General overview

    What is a catalog? Basically it's a lookup mechanism used when an entity

    Packit Service a31ea6
    (a file or a remote resource) references another entity. The catalog lookup
    Packit Service a31ea6
    is inserted between the moment the reference is recognized by the software
    Packit Service a31ea6
    (XML parser, stylesheet processing, or even images referenced for inclusion
    Packit Service a31ea6
    in a rendering) and the time where loading that resource is actually
    Packit Service a31ea6
    started.

    It is basically used for 3 things:

      Packit Service a31ea6
        
    • mapping from "logical" names, the public identifiers and a more
    • Packit Service a31ea6
          concrete name usable for download (and URI). For example it can associate
      Packit Service a31ea6
          the logical name
      Packit Service a31ea6
          

      "-//OASIS//DTD DocBook XML V4.1.2//EN"

      Packit Service a31ea6
          

      of the DocBook 4.1.2 XML DTD with the actual URL where it can be

      Packit Service a31ea6
          downloaded

      Packit Service a31ea6
          

      http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd

      Packit Service a31ea6
        
      Packit Service a31ea6
        
    • remapping from a given URL to another one, like an HTTP indirection
    • Packit Service a31ea6
          saying that
      Packit Service a31ea6
          

      "http://www.oasis-open.org/committes/tr.xsl"

      Packit Service a31ea6
          

      should really be looked at

      Packit Service a31ea6
          

      "http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"

      Packit Service a31ea6
        
      Packit Service a31ea6
        
    • providing a local cache mechanism allowing to load the entities
    • Packit Service a31ea6
          associated to public identifiers or remote resources, this is a really
      Packit Service a31ea6
          important feature for any significant deployment of XML or SGML since it
      Packit Service a31ea6
          allows to avoid the aleas and delays associated to fetching remote
      Packit Service a31ea6
          resources.
      Packit Service a31ea6

      The definitions

      Libxml, as of 2.4.3 implements 2 kind of catalogs:

        Packit Service a31ea6
          
      • the older SGML catalogs, the official spec is SGML Open Technical
      • Packit Service a31ea6
            Resolution TR9401:1997, but is better understood by reading the SP Catalog page from
        Packit Service a31ea6
            James Clark. This is relatively old and not the preferred mode of
        Packit Service a31ea6
            operation of libxml.
        Packit Service a31ea6
          
      • XML
      • Packit Service a31ea6
            Catalogs is far more flexible, more recent, uses an XML syntax and
        Packit Service a31ea6
            should scale quite better. This is the default option of libxml.
        Packit Service a31ea6

        Using catalog

        In a normal environment libxml2 will by default check the presence of a

        Packit Service a31ea6
        catalog in /etc/xml/catalog, and assuming it has been correctly populated,
        Packit Service a31ea6
        the processing is completely transparent to the document user. To take a
        Packit Service a31ea6
        concrete example, suppose you are authoring a DocBook document, this one
        Packit Service a31ea6
        starts with the following DOCTYPE definition:

        <?xml version='1.0'?>
        Packit Service a31ea6
        <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
        Packit Service a31ea6
                  "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">

        When validating the document with libxml, the catalog will be

        Packit Service a31ea6
        automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
        Packit Service a31ea6
        DocBk XML V3.1.4//EN" and the system identifier
        Packit Service a31ea6
        "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
        Packit Service a31ea6
        been installed on your system and the catalogs actually point to them, libxml
        Packit Service a31ea6
        will fetch them from the local disk.

        Note: Really don't use this

        Packit Service a31ea6
        DOCTYPE example it's a really old version, but is fine as an example.

        Libxml2 will check the catalog each time that it is requested to load an

        Packit Service a31ea6
        entity, this includes DTD, external parsed entities, stylesheets, etc ... If
        Packit Service a31ea6
        your system is correctly configured all the authoring phase and processing
        Packit Service a31ea6
        should use only local files, even if your document stays portable because it
        Packit Service a31ea6
        uses the canonical public and system ID, referencing the remote document.

        Some examples:

        Here is a couple of fragments from XML Catalogs used in libxml2 early

        Packit Service a31ea6
        regression tests in test/catalogs :

        <?xml version="1.0"?>
        Packit Service a31ea6
        <!DOCTYPE catalog PUBLIC 
        Packit Service a31ea6
           "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        Packit Service a31ea6
           "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit Service a31ea6
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
        Packit Service a31ea6
          <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit Service a31ea6
           uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
        Packit Service a31ea6
        ...

        This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are

        Packit Service a31ea6
        written in XML,  there is a specific namespace for catalog elements
        Packit Service a31ea6
        "urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
        Packit Service a31ea6
        catalog is a public mapping it allows to associate a Public
        Packit Service a31ea6
        Identifier with an URI.

        ...
        Packit Service a31ea6
            <rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
        Packit Service a31ea6
                           rewritePrefix="file:///usr/share/xml/docbook/"/>
        Packit Service a31ea6
        ...

        A rewriteSystem is a very powerful instruction, it says that

        Packit Service a31ea6
        any URI starting with a given prefix should be looked at another  URI
        Packit Service a31ea6
        constructed by replacing the prefix with an new one. In effect this acts like
        Packit Service a31ea6
        a cache system for a full area of the Web. In practice it is extremely useful
        Packit Service a31ea6
        with a file prefix if you have installed a copy of those resources on your
        Packit Service a31ea6
        local system.

        ...
        Packit Service a31ea6
        <delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
        Packit Service a31ea6
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit Service a31ea6
        <delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
        Packit Service a31ea6
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit Service a31ea6
        <delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
        Packit Service a31ea6
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit Service a31ea6
        <delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
        Packit Service a31ea6
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit Service a31ea6
        <delegateURI uriStartString="http://www.oasis-open.org/docbook/"
        Packit Service a31ea6
                        catalog="file:///usr/share/xml/docbook.xml"/>
        Packit Service a31ea6
        ...

        Delegation is the core features which allows to build a tree of catalogs,

        Packit Service a31ea6
        easier to maintain than a single catalog, based on Public Identifier, System
        Packit Service a31ea6
        Identifier or URI prefixes it instructs the catalog software to look up
        Packit Service a31ea6
        entries in another resource. This feature allow to build hierarchies of
        Packit Service a31ea6
        catalogs, the set of entries presented should be sufficient to redirect the
        Packit Service a31ea6
        resolution of all DocBook references to the specific catalog in
        Packit Service a31ea6
        /usr/share/xml/docbook.xml this one in turn could delegate all
        Packit Service a31ea6
        references for DocBook 4.2.1 to a specific catalog installed at the same time
        Packit Service a31ea6
        as the DocBook resources on the local machine.

        How to tune catalog usage:

        The user can change the default catalog behaviour by redirecting queries

        Packit Service a31ea6
        to its own set of catalogs, this can be done by setting the
        Packit Service a31ea6
        XML_CATALOG_FILES environment variable to a list of catalogs, an
        Packit Service a31ea6
        empty one should deactivate loading the default /etc/xml/catalog
        Packit Service a31ea6
        default catalog

        How to debug catalog processing:

        Setting up the XML_DEBUG_CATALOG environment variable will

        Packit Service a31ea6
        make libxml2 output debugging information for each catalog operations, for
        Packit Service a31ea6
        example:

        orchis:~/XML -> xmllint --memory --noout test/ent2
        Packit Service a31ea6
        warning: failed to load external entity "title.xml"
        Packit Service a31ea6
        orchis:~/XML -> export XML_DEBUG_CATALOG=
        Packit Service a31ea6
        orchis:~/XML -> xmllint --memory --noout test/ent2
        Packit Service a31ea6
        Failed to parse catalog /etc/xml/catalog
        Packit Service a31ea6
        Failed to parse catalog /etc/xml/catalog
        Packit Service a31ea6
        warning: failed to load external entity "title.xml"
        Packit Service a31ea6
        Catalogs cleanup
        Packit Service a31ea6
        orchis:~/XML -> 

        The test/ent2 references an entity, running the parser from memory makes

        Packit Service a31ea6
        the base URI unavailable and the the "title.xml" entity cannot be loaded.
        Packit Service a31ea6
        Setting up the debug environment variable allows to detect that an attempt is
        Packit Service a31ea6
        made to load the /etc/xml/catalog but since it's not present the
        Packit Service a31ea6
        resolution fails.

        But the most advanced way to debug XML catalog processing is to use the

        Packit Service a31ea6
        xmlcatalog command shipped with libxml2, it allows to load
        Packit Service a31ea6
        catalogs and make resolution queries to see what is going on. This is also
        Packit Service a31ea6
        used for the regression tests:

        orchis:~/XML -> ./xmlcatalog test/catalogs/docbook.xml \
        Packit Service a31ea6
                           "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit Service a31ea6
        http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
        Packit Service a31ea6
        orchis:~/XML -> 

        For debugging what is going on, adding one -v flags increase the verbosity

        Packit Service a31ea6
        level to indicate the processing done (adding a second flag also indicate
        Packit Service a31ea6
        what elements are recognized at parsing):

        orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \
        Packit Service a31ea6
                           "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit Service a31ea6
        Parsing catalog test/catalogs/docbook.xml's content
        Packit Service a31ea6
        Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
        Packit Service a31ea6
        http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
        Packit Service a31ea6
        Catalogs cleanup
        Packit Service a31ea6
        orchis:~/XML -> 

        A shell interface is also available to debug and process multiple queries

        Packit Service a31ea6
        (and for regression tests):

        orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \
        Packit Service a31ea6
                           "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit Service a31ea6
        > help   
        Packit Service a31ea6
        Commands available:
        Packit Service a31ea6
        public PublicID: make a PUBLIC identifier lookup
        Packit Service a31ea6
        system SystemID: make a SYSTEM identifier lookup
        Packit Service a31ea6
        resolve PublicID SystemID: do a full resolver lookup
        Packit Service a31ea6
        add 'type' 'orig' 'replace' : add an entry
        Packit Service a31ea6
        del 'values' : remove values
        Packit Service a31ea6
        dump: print the current catalog state
        Packit Service a31ea6
        debug: increase the verbosity level
        Packit Service a31ea6
        quiet: decrease the verbosity level
        Packit Service a31ea6
        exit:  quit the shell
        Packit Service a31ea6
        > public "-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit Service a31ea6
        http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
        Packit Service a31ea6
        > quit
        Packit Service a31ea6
        orchis:~/XML -> 

        This should be sufficient for most debugging purpose, this was actually

        Packit Service a31ea6
        used heavily to debug the XML Catalog implementation itself.

        How to create and maintain catalogs:

        Basically XML Catalogs are XML files, you can either use XML tools to

        Packit Service a31ea6
        manage them or use  xmlcatalog for this. The basic step is
        Packit Service a31ea6
        to create a catalog the -create option provide this facility:

        orchis:~/XML -> ./xmlcatalog --create tst.xml
        Packit Service a31ea6
        <?xml version="1.0"?>
        Packit Service a31ea6
        <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        Packit Service a31ea6
                 "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit Service a31ea6
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
        Packit Service a31ea6
        orchis:~/XML -> 

        By default xmlcatalog does not overwrite the original catalog and save the

        Packit Service a31ea6
        result on the standard output, this can be overridden using the -noout
        Packit Service a31ea6
        option. The -add command allows to add entries in the
        Packit Service a31ea6
        catalog:

        orchis:~/XML -> ./xmlcatalog --noout --create --add "public" \
        Packit Service a31ea6
          "-//OASIS//DTD DocBook XML V4.1.2//EN" \
        Packit Service a31ea6
          http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
        Packit Service a31ea6
        orchis:~/XML -> cat tst.xml
        Packit Service a31ea6
        <?xml version="1.0"?>
        Packit Service a31ea6
        <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" \
        Packit Service a31ea6
          "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit Service a31ea6
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
        Packit Service a31ea6
        <public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
        Packit Service a31ea6
                uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
        Packit Service a31ea6
        </catalog>
        Packit Service a31ea6
        orchis:~/XML -> 

        The -add option will always take 3 parameters even if some of

        Packit Service a31ea6
        the XML Catalog constructs (like nextCatalog) will have only a single
        Packit Service a31ea6
        argument, just pass a third empty string, it will be ignored.

        Similarly the -del option remove matching entries from the

        Packit Service a31ea6
        catalog:

        orchis:~/XML -> ./xmlcatalog --del \
        Packit Service a31ea6
          "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
        Packit Service a31ea6
        <?xml version="1.0"?>
        Packit Service a31ea6
        <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
        Packit Service a31ea6
            "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
        Packit Service a31ea6
        <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/>
        Packit Service a31ea6
        orchis:~/XML -> 

        The catalog is now empty. Note that the matching of -del is

        Packit Service a31ea6
        exact and would have worked in a similar fashion with the Public ID
        Packit Service a31ea6
        string.

        This is rudimentary but should be sufficient to manage a not too complex

        Packit Service a31ea6
        catalog tree of resources.

        The implementor corner quick review of the

        Packit Service a31ea6
        API:

        First, and like for every other module of libxml, there is an

        Packit Service a31ea6
        automatically generated API page for
        Packit Service a31ea6
        catalog support.

        The header for the catalog interfaces should be included as:

        #include <libxml/catalog.h>

        The API is voluntarily kept very simple. First it is not obvious that

        Packit Service a31ea6
        applications really need access to it since it is the default behaviour of
        Packit Service a31ea6
        libxml2 (Note: it is possible to completely override libxml2 default catalog
        Packit Service a31ea6
        by using xmlSetExternalEntityLoader to
        Packit Service a31ea6
        plug an application specific resolver).

        Basically libxml2 support 2 catalog lists:

          Packit Service a31ea6
            
        • the default one, global shared by all the application
        • Packit Service a31ea6
            
        • a per-document catalog, this one is built if the document uses the
        • Packit Service a31ea6
              oasis-xml-catalog PIs to specify its own catalog list, it is
          Packit Service a31ea6
              associated to the parser context and destroyed when the parsing context
          Packit Service a31ea6
              is destroyed.
          Packit Service a31ea6

          the document one will be used first if it exists.

          Initialization routines:

          xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be

          Packit Service a31ea6
          used at startup to initialize the catalog, if the catalog should be
          Packit Service a31ea6
          initialized with specific values xmlLoadCatalog()  or xmlLoadCatalogs()
          Packit Service a31ea6
          should be called before xmlInitializeCatalog() which would otherwise do a
          Packit Service a31ea6
          default initialization first.

          The xmlCatalogAddLocal() call is used by the parser to grow the document

          Packit Service a31ea6
          own catalog list if needed.

          Preferences setup:

          The XML Catalog spec requires the possibility to select default

          Packit Service a31ea6
          preferences between  public and system delegation,
          Packit Service a31ea6
          xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and
          Packit Service a31ea6
          xmlCatalogGetDefaults() allow to control  if XML Catalogs resolution should
          Packit Service a31ea6
          be forbidden, allowed for global catalog, for document catalog or both, the
          Packit Service a31ea6
          default is to allow both.

          And of course xmlCatalogSetDebug() allows to generate debug messages

          Packit Service a31ea6
          (through the xmlGenericError() mechanism).

          Querying routines:

          xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic()

          Packit Service a31ea6
          and xmlCatalogResolveURI() are relatively explicit if you read the XML
          Packit Service a31ea6
          Catalog specification they correspond to section 7 algorithms, they should
          Packit Service a31ea6
          also work if you have loaded an SGML catalog with a simplified semantic.

          xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but

          Packit Service a31ea6
          operate on the document catalog list

          Cleanup and Miscellaneous:

          xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is

          Packit Service a31ea6
          the per-document equivalent.

          xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the

          Packit Service a31ea6
          first catalog in the global list, and xmlCatalogDump() allows to dump a
          Packit Service a31ea6
          catalog state, those routines are primarily designed for xmlcatalog, I'm not
          Packit Service a31ea6
          sure that exposing more complex interfaces (like navigation ones) would be
          Packit Service a31ea6
          really useful.

          The xmlParseCatalogFile() is a function used to load XML Catalog files,

          Packit Service a31ea6
          it's similar as xmlParseFile() except it bypass all catalog lookups, it's
          Packit Service a31ea6
          provided because this functionality may be useful for client tools.

          threaded environments:

          Since the catalog tree is built progressively, some care has been taken to

          Packit Service a31ea6
          try to avoid troubles in multithreaded environments. The code is now thread
          Packit Service a31ea6
          safe assuming that the libxml2 library has been compiled with threads
          Packit Service a31ea6
          support.

          Other resources

          The XML Catalog specification is relatively recent so there isn't much

          Packit Service a31ea6
          literature to point at:

            Packit Service a31ea6
              
          • You can find a good rant from Norm Walsh about the
          • Packit Service a31ea6
                need for catalogs, it provides a lot of context information even if
            Packit Service a31ea6
                I don't agree with everything presented. Norm also wrote a more recent
            Packit Service a31ea6
                article XML
            Packit Service a31ea6
                entities and URI resolvers describing them.
            Packit Service a31ea6
              
          • An old XML
          • Packit Service a31ea6
                catalog proposal from John Cowan
            Packit Service a31ea6
              
          • The Resource Directory Description
          • Packit Service a31ea6
                Language (RDDL) another catalog system but more oriented toward
            Packit Service a31ea6
                providing metadata for XML namespaces.
            Packit Service a31ea6
              
          • the page from the OASIS Technical Committee on Entity
          • Packit Service a31ea6
                Resolution who maintains XML Catalog, you will find pointers to the
            Packit Service a31ea6
                specification update, some background and pointers to others tools
            Packit Service a31ea6
                providing XML Catalog support
            Packit Service a31ea6
              
          • There is a shell script to generate
          • Packit Service a31ea6
                XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/
            Packit Service a31ea6
                directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on
            Packit Service a31ea6
                the resources found on the system. Otherwise it will just create
            Packit Service a31ea6
                ~/xmlcatalog and ~/dbkxmlcatalog and doing:
            Packit Service a31ea6
                

            export XML_CATALOG_FILES=$HOME/xmlcatalog

            Packit Service a31ea6
                

            should allow to process DocBook documentations without requiring

            Packit Service a31ea6
                network accesses for the DTD or stylesheets

            Packit Service a31ea6
              
            Packit Service a31ea6
              
          • I have uploaded a
          • Packit Service a31ea6
                small tarball containing XML Catalogs for DocBook 4.1.2 which seems
            Packit Service a31ea6
                to work fine for me too
            Packit Service a31ea6
              
          • The xmlcatalog
          • Packit Service a31ea6
                manual page
            Packit Service a31ea6

            If you have suggestions for corrections or additions, simply contact

            Packit Service a31ea6
            me:

            Daniel Veillard

            </body></html>