Blame doc/xmldtd.html

Packit 423ecb
Packit 423ecb
Packit 423ecb
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><link rel="SHORTCUT ICON" href="/favicon.ico" /><style type="text/css">
Packit 423ecb
TD {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
BODY {font-family: Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right: 0em}
Packit 423ecb
H1 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H2 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
H3 {font-family: Verdana,Arial,Helvetica}
Packit 423ecb
A:link, A:visited, A:active { text-decoration: underline }
Packit 423ecb
</style><title>Validation & DTDs</title></head><body bgcolor="#8b7765" text="#000000" link="#a06060" vlink="#000000">
Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Validation & DTDs

<center>Main Menu</center>
<form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><input name="query" type="text" size="20" value="" /><input name="submit" type="submit" value="Search ..." /></form>
<center>Related links</center>

Table of Content:

    Packit 423ecb
      
  1. General overview
  2. Packit 423ecb
      
  3. The definition
  4. Packit 423ecb
      
  5. Simple rules
  6. Packit 423ecb
        
      Packit 423ecb
            
    1. How to reference a DTD from a document
    2. Packit 423ecb
            
    3. Declaring elements
    4. Packit 423ecb
            
    5. Declaring attributes
    6. Packit 423ecb
          
      Packit 423ecb
        
      Packit 423ecb
        
    7. Some examples
    8. Packit 423ecb
        
    9. How to validate
    10. Packit 423ecb
        
    11. Other resources
    12. Packit 423ecb

      General overview

      Well what is validation and what is a DTD ?

      DTD is the acronym for Document Type Definition. This is a description of

      Packit 423ecb
      the content for a family of XML files. This is part of the XML 1.0
      Packit 423ecb
      specification, and allows one to describe and verify that a given document
      Packit 423ecb
      instance conforms to the set of rules detailing its structure and content.

      Validation is the process of checking a document against a DTD (more

      Packit 423ecb
      generally against a set of construction rules).

      The validation process and building DTDs are the two most difficult parts

      Packit 423ecb
      of the XML life cycle. Briefly a DTD defines all the possible elements to be
      Packit 423ecb
      found within your document, what is the formal shape of your document tree
      Packit 423ecb
      (by defining the allowed content of an element; either text, a regular
      Packit 423ecb
      expression for the allowed list of children, or mixed content i.e. both text
      Packit 423ecb
      and children). The DTD also defines the valid attributes for all elements and
      Packit 423ecb
      the types of those attributes.

      The definition

      The W3C XML Recommendation (Tim Bray's annotated version of

      Packit 423ecb
      Rev1):

        Packit 423ecb
          
      • Declaring
      • Packit 423ecb
          elements
        Packit 423ecb
          
      • Declaring
      • Packit 423ecb
          attributes
        Packit 423ecb

        (unfortunately) all this is inherited from the SGML world, the syntax is

        Packit 423ecb
        ancient...

        Simple rules

        Writing DTDs can be done in many ways. The rules to build them if you need

        Packit 423ecb
        something permanent or something which can evolve over time can be radically
        Packit 423ecb
        different. Really complex DTDs like DocBook ones are flexible but quite
        Packit 423ecb
        harder to design. I will just focus on DTDs for a formats with a fixed simple
        Packit 423ecb
        structure. It is just a set of basic rules, and definitely not exhaustive nor
        Packit 423ecb
        usable for complex DTD design.

        How to reference a DTD from a document:

        Assuming the top element of the document is spec and the dtd

        Packit 423ecb
        is placed in the file mydtd in the subdirectory
        Packit 423ecb
        dtds of the directory from where the document were loaded:

        <!DOCTYPE spec SYSTEM "dtds/mydtd">

        Notes:

          Packit 423ecb
            
        • The system string is actually an URI-Reference (as defined in RFC 2396) so you can use a
        • Packit 423ecb
              full URL string indicating the location of your DTD on the Web. This is a
          Packit 423ecb
              really good thing to do if you want others to validate your document.
          Packit 423ecb
            
        • It is also possible to associate a PUBLIC identifier (a
        • Packit 423ecb
              magic string) so that the DTD is looked up in catalogs on the client side
          Packit 423ecb
              without having to locate it on the web.
          Packit 423ecb
            
        • A DTD contains a set of element and attribute declarations, but they
        • Packit 423ecb
              don't define what the root of the document should be. This is explicitly
          Packit 423ecb
              told to the parser/validator as the first element of the
          Packit 423ecb
              DOCTYPE declaration.
          Packit 423ecb

          Declaring elements:

          The following declares an element spec:

          <!ELEMENT spec (front, body, back?)>

          It also expresses that the spec element contains one front,

          Packit 423ecb
          one body and one optional back children elements in
          Packit 423ecb
          this order. The declaration of one element of the structure and its content
          Packit 423ecb
          are done in a single declaration. Similarly the following declares
          Packit 423ecb
          div1 elements:

          <!ELEMENT div1 (head, (p | list | note)*, div2?)>

          which means div1 contains one head then a series of optional

          Packit 423ecb
          p, lists and notes and then an
          Packit 423ecb
          optional div2. And last but not least an element can contain
          Packit 423ecb
          text:

          <!ELEMENT b (#PCDATA)>

          b contains text or being of mixed content (text and elements

          Packit 423ecb
          in no particular order):

          <!ELEMENT p (#PCDATA|a|ul|b|i|em)*>

          p can contain text or a, ul,

          Packit 423ecb
          b, i or em elements in no particular
          Packit 423ecb
          order.

          Declaring attributes:

          Again the attributes declaration includes their content definition:

          <!ATTLIST termdef name CDATA #IMPLIED>

          means that the element termdef can have a name

          Packit 423ecb
          attribute containing text (CDATA) and which is optional
          Packit 423ecb
          (#IMPLIED). The attribute value can also be defined within a
          Packit 423ecb
          set:

          <!ATTLIST list type (bullets|ordered|glossary)

          Packit 423ecb
          "ordered">

          means list element have a type attribute with 3

          Packit 423ecb
          allowed values "bullets", "ordered" or "glossary" and which default to
          Packit 423ecb
          "ordered" if the attribute is not explicitly specified.

          The content type of an attribute can be text (CDATA),

          Packit 423ecb
          anchor/reference/references
          Packit 423ecb
          (ID/IDREF/IDREFS), entity(ies)
          Packit 423ecb
          (ENTITY/ENTITIES) or name(s)
          Packit 423ecb
          (NMTOKEN/NMTOKENS). The following defines that a
          Packit 423ecb
          chapter element can have an optional id attribute
          Packit 423ecb
          of type ID, usable for reference from attribute of type
          Packit 423ecb
          IDREF:

          <!ATTLIST chapter id ID #IMPLIED>

          The last value of an attribute definition can be #REQUIRED

          Packit 423ecb
          meaning that the attribute has to be given, #IMPLIED
          Packit 423ecb
          meaning that it is optional, or the default value (possibly prefixed by
          Packit 423ecb
          #FIXED if it is the only allowed).

          Notes:

            Packit 423ecb
              
          • Usually the attributes pertaining to a given element are declared in a
          • Packit 423ecb
                single expression, but it is just a convention adopted by a lot of DTD
            Packit 423ecb
                writers:
            Packit 423ecb
                
            <!ATTLIST termdef
            Packit 423ecb
                      id      ID      #REQUIRED
            Packit 423ecb
                      name    CDATA   #IMPLIED>
            Packit 423ecb
                

            The previous construct defines both id and

            Packit 423ecb
                name attributes for the element termdef.

            Packit 423ecb
              
            Packit 423ecb

            Some examples

            The directory test/valid/dtds/ in the libxml2 distribution

            Packit 423ecb
            contains some complex DTD examples. The example in the file
            Packit 423ecb
            test/valid/dia.xml shows an XML file where the simple DTD is
            Packit 423ecb
            directly included within the document.

            How to validate

            The simplest way is to use the xmllint program included with libxml. The

            Packit 423ecb
            --valid option turns-on validation of the files given as input.
            Packit 423ecb
            For example the following validates a copy of the first revision of the XML
            Packit 423ecb
            1.0 specification:

            xmllint --valid --noout test/valid/REC-xml-19980210.xml

            the -- noout is used to disable output of the resulting tree.

            The --dtdvalid dtd allows validation of the document(s)

            Packit 423ecb
            against a given DTD.

            Libxml2 exports an API to handle DTDs and validation, check the associated

            Packit 423ecb
            description.

            Other resources

            DTDs are as old as SGML. So there may be a number of examples on-line, I

            Packit 423ecb
            will just list one for now, others pointers welcome:

              Packit 423ecb
                
            • XML-101 DTD
            • Packit 423ecb

              I suggest looking at the examples found under test/valid/dtd and any of

              Packit 423ecb
              the large number of books available on XML. The dia example in test/valid
              Packit 423ecb
              should be both simple and complete enough to allow you to build your own.

              Daniel Veillard

              </body></html>