Blame doc/guidelines.html

Packit 423ecb
Packit 423ecb
    "http://www.w3.org/TR/html4/loose.dtd">
Packit 423ecb
<html>
Packit 423ecb
<head>
Packit 423ecb
  <meta http-equiv="Content-Type" content="text/html">
Packit 423ecb
  <style type="text/css"></style>
Packit 423ecb
Packit 423ecb
TD {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit 423ecb
H1 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H2 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H3 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
A:link, A:visited, A:active { text-decoration: underline }
Packit 423ecb
  </style>
Packit 423ecb
-->
Packit 423ecb
  <title>XML resources publication guidelines</title>
Packit 423ecb
</head>
Packit 423ecb
Packit 423ecb
<body bgcolor="#fffacd" text="#000000">
Packit 423ecb

XML resources publication guidelines

Packit 423ecb
Packit 423ecb

Packit 423ecb
Packit 423ecb

The goal of this document is to provide a set of guidelines and tips

Packit 423ecb
helping the publication and deployment of 
Packit 423ecb
href="http://www.w3.org/XML/">XML resources for the 
Packit 423ecb
href="http://www.gnome.org/">GNOME project. However it is not tied to
Packit 423ecb
GNOME and might be helpful more generally. I welcome 
Packit 423ecb
href="mailto:veillard@redhat.com">feedback on this document.

Packit 423ecb
Packit 423ecb

The intended audience is the software developers who started using XML

Packit 423ecb
for some of the resources of their project, as a storage format, for data
Packit 423ecb
exchange, checking or transformations. There have been an increasing number
Packit 423ecb
of new XML formats defined, but not all steps have been taken, possibly because of
Packit 423ecb
lack of documentation, to truly gain all the benefits of the use of XML.
Packit 423ecb
These guidelines hope to improve the matter and provide a better overview of
Packit 423ecb
the overall XML processing and associated steps needed to deploy it
Packit 423ecb
successfully:

Packit 423ecb
Packit 423ecb

Table of contents:

Packit 423ecb
    Packit 423ecb
      
  1. Design guidelines
  2. Packit 423ecb
      
  3. Canonical URL
  4. Packit 423ecb
      
  5. Catalog setup
  6. Packit 423ecb
      
  7. Package integration
  8. Packit 423ecb
    Packit 423ecb
    Packit 423ecb

    Design guidelines

    Packit 423ecb
    Packit 423ecb

    This part intends to focus on the format itself of XML. It may arrive

    Packit 423ecb
    a bit too late since the structure of the document may already be cast in
    Packit 423ecb
    existing and deployed code. Still, here are a few rules which might be helpful
    Packit 423ecb
    when designing a new XML vocabulary or making the revision of an existing
    Packit 423ecb
    format:

    Packit 423ecb
    Packit 423ecb

    Reuse existing formats:

    Packit 423ecb
    Packit 423ecb

    This may sounds a bit simplistic, but before designing your own format,

    Packit 423ecb
    try to lookup existing XML vocabularies on similar data. Ideally this allows
    Packit 423ecb
    you to reuse them, in which case a lot of the existing tools like DTD, schemas
    Packit 423ecb
    and stylesheets may already be available. If you are looking at a
    Packit 423ecb
    documentation format, DocBook should
    Packit 423ecb
    handle your needs. If reuse is not possible because some semantic or use case
    Packit 423ecb
    aspects are too different this will be helpful avoiding design errors like
    Packit 423ecb
    targeting the vocabulary to the wrong abstraction level. In this format
    Packit 423ecb
    design phase try to be synthetic and be sure to express the real content of
    Packit 423ecb
    your data and use the XML structure to express the semantic and context of
    Packit 423ecb
    those data.

    Packit 423ecb
    Packit 423ecb

    DTD rules:

    Packit 423ecb
    Packit 423ecb

    Building a DTD (Document Type Definition) or a Schema describing the

    Packit 423ecb
    structure allowed by instances is the core of the design process of the
    Packit 423ecb
    vocabulary. Here are a few tips:

    Packit 423ecb
      Packit 423ecb
        
    • use significant words for the element and attributes names.
    • Packit 423ecb
        
    • do not use attributes for general textual content, attributes
    • Packit 423ecb
          will be modified by the parser before reaching the application,
      Packit 423ecb
          spaces and line informations will be modified.
      Packit 423ecb
        
    • use single elements for every string that might be subject to
    • Packit 423ecb
          localization. The canonical way to localize XML content is to use
      Packit 423ecb
          siblings element carrying different xml:lang attributes like in the
      Packit 423ecb
          following:
      Packit 423ecb
          
      <welcome>
      Packit 423ecb
        <msg xml:lang="en">hello</msg>
      Packit 423ecb
        <msg xml:lang="fr">bonjour</msg>
      Packit 423ecb
      </welcome>
      Packit 423ecb
        
      Packit 423ecb
        
    • use attributes to refine the content of an element but avoid them for
    • Packit 423ecb
          more complex tasks, attribute parsing is not cheaper than an element and
      Packit 423ecb
          it is far easier to make an element content more complex while attribute
      Packit 423ecb
          will have to remain very simple.
      Packit 423ecb
      Packit 423ecb
      Packit 423ecb

      Versioning:

      Packit 423ecb
      Packit 423ecb

      As part of the design, make sure the structure you define will be usable

      Packit 423ecb
      for future extension that you may not consider for the current version. There
      Packit 423ecb
      are two parts to this:

      Packit 423ecb
        Packit 423ecb
          
      • Make sure the instance contains a version number which will allow to
      • Packit 423ecb
            make backward compatibility easy. Something as simple as having a
        Packit 423ecb
            version="1.0" on the root document of the instance is
        Packit 423ecb
            sufficient.
        Packit 423ecb
          
      • While designing the code doing the analysis of the data provided by the
      • Packit 423ecb
            XML parser, make sure you can work with unknown versions, generate a UI
        Packit 423ecb
            warning and process only the tags recognized by your version but keep in
        Packit 423ecb
            mind that you should not break on unknown elements if the version
        Packit 423ecb
            attribute was not in the recognized set.
        Packit 423ecb
        Packit 423ecb
        Packit 423ecb

        Other design parts:

        Packit 423ecb
        Packit 423ecb

        While defining you vocabulary, try to think in term of other usage of your

        Packit 423ecb
        data, for example how using XSLT stylesheets could be used to make an HTML
        Packit 423ecb
        view of your data, or to convert it into a different format. Checking XML
        Packit 423ecb
        Schemas and looking at defining an XML Schema with a more complete
        Packit 423ecb
        validation and datatyping of your data structures is important, this helps
        Packit 423ecb
        avoiding some mistakes in the design phase.

        Packit 423ecb
        Packit 423ecb

        Namespace:

        Packit 423ecb
        Packit 423ecb

        If you expect your XML vocabulary to be used or recognized outside of your

        Packit 423ecb
        application (for example binding a specific processing from a graphic shell
        Packit 423ecb
        like Nautilus to an instance of your data) then you should really define an 
        Packit 423ecb
        href="http://www.w3.org/TR/REC-xml-names/">XML namespace for your
        Packit 423ecb
        vocabulary. A namespace name is an URL (absolute URI more precisely). It is
        Packit 423ecb
        generally recommended to anchor it as an HTTP resource to a server associated
        Packit 423ecb
        with the software project. See the next section about this. In practice this
        Packit 423ecb
        will mean that XML parsers will not handle your element names as-is but as a
        Packit 423ecb
        couple based on the namespace name and the element name. This allows it to
        Packit 423ecb
        recognize and disambiguate processing. Unicity of the namespace name can be
        Packit 423ecb
        for the most part guaranteed by the use of the DNS registry. Namespace can
        Packit 423ecb
        also be used to carry versioning information like:

        Packit 423ecb
        Packit 423ecb

        "http://www.gnome.org/project/projectname/1.0/"

        Packit 423ecb
        Packit 423ecb

        An easy way to use them is to make them the default namespace on the

        Packit 423ecb
        root element of the XML instance like:

        Packit 423ecb
        <structure xmlns="http://www.gnome.org/project/projectname/1.0/">
        Packit 423ecb
          <data>
        Packit 423ecb
          ...
        Packit 423ecb
          </data>
        Packit 423ecb
        </structure>
        Packit 423ecb
        Packit 423ecb

        In that document, structure and all descendant elements like data are in

        Packit 423ecb
        the given namespace.

        Packit 423ecb
        Packit 423ecb

        Canonical URL

        Packit 423ecb
        Packit 423ecb

        As seen in the previous namespace section, while XML processing is not

        Packit 423ecb
        tied to the Web there is a natural synergy between both. XML was designed to
        Packit 423ecb
        be available on the Web, and keeping the infrastructure that way helps
        Packit 423ecb
        deploying the XML resources. The core of this issue is the notion of
        Packit 423ecb
        "Canonical URL" of an XML resource. The resource can be an XML document, a
        Packit 423ecb
        DTD, a stylesheet, a schema, or even non-XML data associated with an XML
        Packit 423ecb
        resource, the canonical URL is the URL where the "master" copy of that
        Packit 423ecb
        resource is expected to be present on the Web. Usually when processing XML a
        Packit 423ecb
        copy of the resource will be present on the local disk, maybe in
        Packit 423ecb
        /usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\
        Packit 423ecb
        (horror !). The key point is that the way to name that resource should be
        Packit 423ecb
        independent of the actual place where it resides on disk if it is available,
        Packit 423ecb
        and the fact that the processing will still work if there is no local copy
        Packit 423ecb
        (and that the machine where the processing is connected to the Internet).

        Packit 423ecb
        Packit 423ecb

        What this really means is that one should never use the local name of a

        Packit 423ecb
        resource to reference it but always use the canonical URL. For example in a
        Packit 423ecb
        DocBook instance the following should not be used:

        Packit 423ecb
        <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
        Packit 423ecb
        Packit 423ecb
        Packit 423ecb
                                 "/usr/share/xml/docbook/4.2/docbookx.dtd">
        Packit 423ecb
        Packit 423ecb

        But always reference the canonical URL for the DTD:

        Packit 423ecb
        <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
        Packit 423ecb
        Packit 423ecb
        Packit 423ecb
                                 "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">   
        Packit 423ecb
        Packit 423ecb

        Similarly, the document instance may reference the

        Packit 423ecb
        href="http://www.w3.org/TR/xslt">XSLT stylesheets needed to process it to
        Packit 423ecb
        generate HTML, and the canonical URL should be used:

        Packit 423ecb
        <?xml-stylesheet
        Packit 423ecb
          href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
        Packit 423ecb
          type="text/xsl"?>
        Packit 423ecb
        Packit 423ecb

        Defining the canonical URL for the resources needed should obey a few

        Packit 423ecb
        simple rules similar to those used to design namespace names:

        Packit 423ecb
          Packit 423ecb
            
        • use a DNS name you know is associated to the project and will be
        • Packit 423ecb
              available on the long term
          Packit 423ecb
            
        • within that server space, reserve the right to the subtree where you
        • Packit 423ecb
              intend to keep those data
          Packit 423ecb
            
        • version the URL so that multiple concurrent versions of the resources
        • Packit 423ecb
              can be hosted simultaneously
          Packit 423ecb
          Packit 423ecb
          Packit 423ecb

          Catalog setup

          Packit 423ecb
          Packit 423ecb

          How catalogs work:

          Packit 423ecb
          Packit 423ecb

          The catalogs are the technical mechanism which allow the XML processing

          Packit 423ecb
          tools to use a local copy of the resources if it is available even if the
          Packit 423ecb
          instance document references the canonical URL. 
          Packit 423ecb
          href="http://www.oasis-open.org/committees/entity/">XML Catalogs are
          Packit 423ecb
          anchored in the root catalog (usually /etc/xml/catalog or
          Packit 423ecb
          defined by the user). They are a tree of XML documents defining the mappings
          Packit 423ecb
          between the canonical naming space and the local installed ones, this can be
          Packit 423ecb
          seen as a static cache structure.

          Packit 423ecb
          Packit 423ecb

          When the XML processor is asked to process a resource it will

          Packit 423ecb
          automatically test for a locally available version in the catalog, starting
          Packit 423ecb
          from the root catalog, and possibly fetching sub-catalog resources until it
          Packit 423ecb
          finds that the catalog has that resource or not. If not the default
          Packit 423ecb
          processing of fetching the resource from the Web is done, allowing in most
          Packit 423ecb
          case to recover from a catalog miss. The key point is that the document
          Packit 423ecb
          instances are totally independent of the availability of a catalog or from
          Packit 423ecb
          the actual place where the local resource they reference may be installed.
          Packit 423ecb
          This greatly improves the management of the documents in the long run, making
          Packit 423ecb
          them independent of the platform or toolchain used to process them. The
          Packit 423ecb
          figure below tries to express that  mechanism:
          Packit 423ecb
          alt="Picture describing the catalog ">

          Packit 423ecb
          Packit 423ecb

          Usual catalog setup:

          Packit 423ecb
          Packit 423ecb

          Usually catalogs for a project are setup as a 2 level hierarchical cache,

          Packit 423ecb
          the root catalog containing only "delegates" indicating a separate subcatalog
          Packit 423ecb
          dedicated to the project. The goal is to keep the root catalog clean and
          Packit 423ecb
          simplify the maintenance of the catalog by using separate catalogs per
          Packit 423ecb
          project. For example when creating a catalog for the 
          Packit 423ecb
          href="http://www.w3.org/TR/xhtml1">XHTML1 DTDs, only 3 items are added to
          Packit 423ecb
          the root catalog:

          Packit 423ecb
            <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
          Packit 423ecb
                            catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
          Packit 423ecb
            <delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
          Packit 423ecb
                            catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
          Packit 423ecb
            <delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
          Packit 423ecb
                            catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
          Packit 423ecb
          Packit 423ecb

          They are all "delegates" meaning that if the catalog system is asked to

          Packit 423ecb
          resolve a reference corresponding to them, it has to lookup a sub catalog.
          Packit 423ecb
          Here the subcatalog was installed as
          Packit 423ecb
          /usr/share/sgml/xhtml1/xmlcatalog in the local tree. That
          Packit 423ecb
          decision is left to the sysadmin or the packager for that system and may
          Packit 423ecb
          obey different rules, but the actual place on the filesystem (or on a
          Packit 423ecb
          resource cache on the local network) will not influence the processing as
          Packit 423ecb
          long as it is available. The first rule indicate that if the reference uses a
          Packit 423ecb
          PUBLIC identifier beginning with the

          Packit 423ecb
          Packit 423ecb

          "-//W3C//DTD XHTML 1.0"

          Packit 423ecb
          Packit 423ecb

          substring, then the catalog lookup should be limited to the specific given

          Packit 423ecb
          lookup catalog. Similarly the second and third entries indicate those
          Packit 423ecb
          delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL
          Packit 423ecb
          starts with the "http://www.w3.org/TR/xhtml1/DTD" substring
          Packit 423ecb
          which indicates the location on the W3C server where the XHTML1 resources are
          Packit 423ecb
          stored. Those are the beginning of all Canonical URLs for XHTML1 resources.
          Packit 423ecb
          Those three rules are sufficient in practice to capture all references to XHTML1
          Packit 423ecb
          resources and direct the processing tools to the right subcatalog.

          Packit 423ecb
          Packit 423ecb

          A subcatalog example:

          Packit 423ecb
          Packit 423ecb

          Here is the complete subcatalog used for XHTML1:

          Packit 423ecb
          <?xml version="1.0"?>
          Packit 423ecb
          <!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
          Packit 423ecb
                    "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
          Packit 423ecb
          <catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
          Packit 423ecb
            <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
          Packit 423ecb
                    uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/>
          Packit 423ecb
            <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
          Packit 423ecb
                    uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/>
          Packit 423ecb
            <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
          Packit 423ecb
                    uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/>
          Packit 423ecb
            <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
          Packit 423ecb
                    rewritePrefix="xhtml1-20020801/DTD"/>
          Packit 423ecb
            <rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
          Packit 423ecb
                    rewritePrefix="xhtml1-20020801/DTD"/>
          Packit 423ecb
          </catalog>
          Packit 423ecb
          Packit 423ecb

          There are a few things to notice:

          Packit 423ecb
            Packit 423ecb
              
          • this is an XML resource, it points to the DTD using Canonical URLs, the
          • Packit 423ecb
                root element defines a namespace (but based on an URN not an HTTP
            Packit 423ecb
              URL).
            Packit 423ecb
              
          • it contains 5 rules, the 3 first ones are direct mapping for the 3
          • Packit 423ecb
                PUBLIC identifiers defined by the XHTML1 specification and associating
            Packit 423ecb
                them with the local resource containing the DTD, the 2 last ones are
            Packit 423ecb
                rewrite rules allowing to build the local filename for any URL based on
            Packit 423ecb
                "http://www.w3.org/TR/xhtml1/DTD", the local cache simplifies the rules by
            Packit 423ecb
                keeping the same structure as the on-line server at the Canonical URL
            Packit 423ecb
              
          • the local resources are designated using URI references (the uri or
          • Packit 423ecb
                rewritePrefix attributes), the base being the containing sub-catalog URL,
            Packit 423ecb
                which means that in practice the copy of the XHTML1 strict DTD is stored
            Packit 423ecb
                locally in
            Packit 423ecb
                /usr/share/sgml/xhtml1/xmlcatalog/xhtml1-20020801/DTD/xhtml1-strict.dtd
            Packit 423ecb
            Packit 423ecb
            Packit 423ecb

            Those 5 rules are sufficient to cover all references to the resources held

            Packit 423ecb
            at the Canonical URL for the XHTML1 DTDs.

            Packit 423ecb
            Packit 423ecb

            Package integration

            Packit 423ecb
            Packit 423ecb

            Creating and removing catalogs should be handled as part of the process of

            Packit 423ecb
            (un)installing the local copy of the resources. The catalog files being XML
            Packit 423ecb
            resources should be processed with XML based tools to avoid problems with the
            Packit 423ecb
            generated files, the xmlcatalog command coming with libxml2 allows you to create
            Packit 423ecb
            catalogs, and add or remove rules at that time. Here is a complete example
            Packit 423ecb
            coming from the RPM for the XHTML1 DTDs post install script. While this example
            Packit 423ecb
            is platform and packaging specific, this can be useful as a an example in
            Packit 423ecb
            other contexts:

            Packit 423ecb
            %post
            Packit 423ecb
            CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
            Packit 423ecb
            #
            Packit 423ecb
            # Register it in the super catalog with the appropriate delegates
            Packit 423ecb
            #
            Packit 423ecb
            ROOTCATALOG=/etc/xml/catalog
            Packit 423ecb
            Packit 423ecb
            if [ ! -r $ROOTCATALOG ]
            Packit 423ecb
            then
            Packit 423ecb
                /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
            Packit 423ecb
            fi
            Packit 423ecb
            Packit 423ecb
            if [ -w $ROOTCATALOG ]
            Packit 423ecb
            then
            Packit 423ecb
                    /usr/bin/xmlcatalog --noout --add "delegatePublic" \
            Packit 423ecb
                            "-//W3C//DTD XHTML 1.0" \
            Packit 423ecb
                            "file://$CATALOG" $ROOTCATALOG
            Packit 423ecb
                    /usr/bin/xmlcatalog --noout --add "delegateSystem" \
            Packit 423ecb
                            "http://www.w3.org/TR/xhtml1/DTD" \
            Packit 423ecb
                            "file://$CATALOG" $ROOTCATALOG
            Packit 423ecb
                    /usr/bin/xmlcatalog --noout --add "delegateURI" \
            Packit 423ecb
                            "http://www.w3.org/TR/xhtml1/DTD" \
            Packit 423ecb
                            "file://$CATALOG" $ROOTCATALOG
            Packit 423ecb
            fi
            Packit 423ecb
            Packit 423ecb

            The XHTML1 subcatalog is not created on-the-fly in that case, it is

            Packit 423ecb
            installed as part of the files of the packages. So the only work needed is to
            Packit 423ecb
            make sure the root catalog exists and register the delegate rules.

            Packit 423ecb
            Packit 423ecb

            Similarly, the script for the post-uninstall just remove the rules from the

            Packit 423ecb
            catalog:

            Packit 423ecb
            %postun
            Packit 423ecb
            #
            Packit 423ecb
            # On removal, unregister the xmlcatalog from the supercatalog
            Packit 423ecb
            #
            Packit 423ecb
            if [ "$1" = 0 ]; then
            Packit 423ecb
                CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
            Packit 423ecb
                ROOTCATALOG=/etc/xml/catalog
            Packit 423ecb
            Packit 423ecb
                if [ -w $ROOTCATALOG ]
            Packit 423ecb
                then
            Packit 423ecb
                        /usr/bin/xmlcatalog --noout --del \
            Packit 423ecb
                                "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
            Packit 423ecb
                        /usr/bin/xmlcatalog --noout --del \
            Packit 423ecb
                                "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
            Packit 423ecb
                        /usr/bin/xmlcatalog --noout --del \
            Packit 423ecb
                                "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
            Packit 423ecb
                fi
            Packit 423ecb
            fi
            Packit 423ecb
            Packit 423ecb

            Note the test against $1, this is needed to not remove the delegate rules

            Packit 423ecb
            in case of upgrade of the package.

            Packit 423ecb
            Packit 423ecb

            Following the set of guidelines and tips provided in this document should

            Packit 423ecb
            help deploy the XML resources in the GNOME framework without much pain and
            Packit 423ecb
            ensure a smooth evolution of the resource and instances.

            Packit 423ecb
            Packit 423ecb

            Daniel Veillard

            Packit 423ecb
            Packit 423ecb

            $Id$

            Packit 423ecb
            Packit 423ecb

            Packit 423ecb
            </body>
            Packit 423ecb
            </html>