Blame doc/sax-2.0-adv.html

Packit d27c7e
Packit d27c7e
<html>
Packit d27c7e
  <head>
Packit d27c7e
    <title>Advanced Features of the Perl SAX 2.0 Binding</title>
Packit d27c7e
    <meta name="keywords" content="XML SGML SAX Perl libxml libxml-perl" />
Packit d27c7e
  </head>
Packit d27c7e
  <body>
Packit d27c7e
Packit d27c7e

Advanced SAX

Packit d27c7e
Packit d27c7e

The classes, methods, and features described below are

Packit d27c7e
not commonly used in most applications and can be ignored by most
Packit d27c7e
users. If however you find that you are not getting the granularity
Packit d27c7e
you expect from Basic SAX, this would be the place to look for more.
Packit d27c7e
Advanced SAX isn't advanced in the sense that it is harder, or requires
Packit d27c7e
better programming skills. It is simply more complete, and has been
Packit d27c7e
separated to keep Basic SAX simple in terms of the number of events
Packit d27c7e
one would have to deal with.
Packit d27c7e

Packit d27c7e
Packit d27c7e

SAX Parsers

Packit d27c7e
Packit d27c7e

SAX supports several classes of event handlers: content handlers,

Packit d27c7e
declaration handlers, DTD handlers, error handlers, entity resolvers,
Packit d27c7e
and other extensions.  For each class of events, a seperate handler
Packit d27c7e
can be used to handle those events.  If a handler is not defined for a
Packit d27c7e
class of events, then the default handler, <tt>Handler</tt>, is used.
Packit d27c7e
Each of these handlers is described in the sections below.
Packit d27c7e
Applications may change an event handler in the middle of the parse
Packit d27c7e
and the SAX parser will begin using the new handler immediately.

Packit d27c7e
Packit d27c7e

SAX's basic interface defines methods for parsing system

Packit d27c7e
identifiers (URIs), open files, and strings.  Behind the scenes,
Packit d27c7e
though, SAX uses a <tt>Source</tt> hash that contains that
Packit d27c7e
information, plus encoding, system and public identifiers if
Packit d27c7e
available.  These are described below under the <tt>Source</tt>
Packit d27c7e
option.

Packit d27c7e
Packit d27c7e

SAX parsers accept all features as options to the <tt>parse()</tt>

Packit d27c7e
methods and on the parser's constructor.  Features are described in
Packit d27c7e
the next section.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>parse</tt>(options)
Packit d27c7e
Packit d27c7e
Parses the XML instance identified by the <tt>Source</tt> option.
Packit d27c7e
options can be a list of option, value pairs or a hash.
Packit d27c7e
<tt>parse()</tt> returns the result of calling the
Packit d27c7e
<tt>end_document()</tt> handler.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt>ContentHandler</tt>
Packit d27c7e
Packit d27c7e
Object to receive document content events.  The
Packit d27c7e
<tt>ContentHandler</tt>, with additional events defined below, is the
Packit d27c7e
class of events described in Basic
Packit d27c7e
SAX Handler.If the application does not register a content handler
Packit d27c7e
or content event handlers on the default handler, content events
Packit d27c7e
reported by the SAX parser will be silently ignored.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt>DTDHandler</tt>
Packit d27c7e
Packit d27c7e
Object to receive basic DTD events.  If the application does not
Packit d27c7e
register a DTD handler or DTD event handlers on the default handler,
Packit d27c7e
DTD events reported by the SAX parser will be silently
Packit d27c7e
ignored.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt>EntityResolver</tt>
Packit d27c7e
Packit d27c7e
Object to resolve external entities.  If the application does not
Packit d27c7e
register an entity resolver or entity events on the default handler,
Packit d27c7e
the SAX parser will perform its own default resolution.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt>ErrorHandler</tt>
Packit d27c7e
Packit d27c7e
Object to receive error-message events.  If the application does not
Packit d27c7e
register an error handler or error event handlers on the default
Packit d27c7e
handler, all error events reported by the SAX parser will be silently
Packit d27c7e
ignored; however, normal processing may not continue. It is highly
Packit d27c7e
recommended that all SAX applications implement an error handler to
Packit d27c7e
avoid unexpected bugs.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt>Source</tt>
Packit d27c7e
Packit d27c7e
A hash containing information about the XML instance to be parsed.
Packit d27c7e
See Input Sources below. Note that
Packit d27c7e
<tt>Source</tt> cannot be changed during the parse

Packit d27c7e
Packit d27c7e

Packit d27c7e
	
Packit d27c7e
		
<tt>Features</tt>
Packit d27c7e
		
Packit d27c7e
			A hash containing Feature information, as described below.
Packit d27c7e
			Features can be set at runtime but not directly on the Features
Packit d27c7e
			hash (at least, not reliably. You can do it, but the results
Packit d27c7e
			might not be what you expect as it doesn't give the parser a
Packit d27c7e
			chance to look at what you've set so that it can't react properly
Packit d27c7e
			to errors, or Features that it doesn't support). You should use
Packit d27c7e
			the set_feature() method instead.
Packit d27c7e
		
Packit d27c7e
	
Packit d27c7e

Packit d27c7e
Packit d27c7e
Packit d27c7e

Features

Packit d27c7e
Packit d27c7e

Features are as defined in

Packit d27c7e
href="http://sax.sourceforge.net/apidoc/org/xml/sax/package-summary.html#package_description">SAX2: Features
Packit d27c7e
and Properties, but not of course limited to those. You may add
Packit d27c7e
your own Features. Also, Java has an artificial distinction between
Packit d27c7e
Features and Properties which is unnecessary. In Perl, both have been
Packit d27c7e
merged under the same name.
Packit d27c7e

Packit d27c7e
Packit d27c7e

Features can be passed as options when creating a parser or calling

Packit d27c7e
a <tt>parse()</tt> method. They may also be set using the
Packit d27c7e
set_feature().
Packit d27c7e

Packit d27c7e
Packit d27c7e
Packit d27c7e
    $parser = AnySAXParser->new(
Packit d27c7e
                                Features => {
Packit d27c7e
                                             'http://xml.org/sax/features/namespaces' => 0,
Packit d27c7e
                                             },
Packit d27c7e
                                );
Packit d27c7e
    $parser->parse(
Packit d27c7e
                   Features => {
Packit d27c7e
                               'http://xml.org/sax/features/namespaces' => 0,
Packit d27c7e
                               },
Packit d27c7e
                   );
Packit d27c7e
    $parser->set_feature('http://xml.org/sax/properties/xml-string', 1);
Packit d27c7e
    $string = $parser->get_feature('http://xml.org/sax/properties/xml-string');
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
	When performing namespace processing, Perl SAX parsers always provide
Packit d27c7e
	both the raw tag name in <tt>Name</tt> and the namespace names in
Packit d27c7e
	<tt>NamespaceURI</tt>, <tt>LocalName</tt>, and <tt>Prefix</tt>.
Packit d27c7e
	Therefore, the
Packit d27c7e
	"<tt>http://xml.org/sax/features/namespace-prefixes</tt>" Feature is
Packit d27c7e
	ignored.
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
	Also, Features are things that are supposed to be turned
Packit d27c7e
	on, and thus should normally be off by default, especially if
Packit d27c7e
	the parser doesn't support turning them off. Due to backwards
Packit d27c7e
	compatibility problems, the one exception to this rule is the
Packit d27c7e
	"<tt>http://xml.org/sax/features/namespaces</tt>" Feature which is on by
Packit d27c7e
	default and which a number of parsers may not be able to turn off. Thus,
Packit d27c7e
	a parser claiming to support this Feature (and all SAX2 parsers must
Packit d27c7e
	support	it) may in fact only support turning it on. This is only a minor
Packit d27c7e
	problem as turning it off basically amounts to returning to SAX1, which
Packit d27c7e
	can be accomplished by a filter (eg XML::Filter::SAX2toSAX1).
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
  In addition to the Features described in the SAX spec
Packit d27c7e
  itself, a number of new ones may be defined for Perl. An example of
Packit d27c7e
  this would be http://xmlns.perl.org/sax/node-factory which
Packit d27c7e
  when supported by the parser would be settable to a NodeFactory object
Packit d27c7e
  that would be in charge of creating SAX nodes different from those that
Packit d27c7e
  are normally received by event handlers. See
Packit d27c7e
  http://xmlns.perl.org/ (currently
Packit d27c7e
  in alpha state) for details on how to register Features.
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
	The following methods are used to get and set features:
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>get_feature</tt>(name)
Packit d27c7e
Packit d27c7e
Look up the value of a feature.
Packit d27c7e
Packit d27c7e

The feature name is any fully-qualified URI. It is possible for an

Packit d27c7e
SAX parser to recognize a feature name but to be unable to return its
Packit d27c7e
value; this is especially true in the case of an adapter for a SAX1
Packit d27c7e
Parser, which has no way of knowing whether the underlying parser is
Packit d27c7e
validating, for example.

Packit d27c7e
Packit d27c7e

Some feature values may be available only in specific contexts,

Packit d27c7e
such as before, during, or after a parse.

Packit d27c7e
Packit d27c7e
<tt>get_feature()</tt> returns the value of the feature, which is usually
Packit d27c7e
either a boolean or an object, and will throw
Packit d27c7e
<tt>XML::SAX::Exception::NotRecognized</tt> when the SAX parser does not
Packit d27c7e
recognize the feature name and <tt>XML::SAX::Exception::NotSupported</tt>
Packit d27c7e
when the SAX parser recognizes the feature name but cannot determine its
Packit d27c7e
value at this time.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>set_feature</tt>(name,
Packit d27c7e
value)
Packit d27c7e
Packit d27c7e
Set the state of a feature.
Packit d27c7e
Packit d27c7e

The feature name is any fully-qualified URI. It is possible for an

Packit d27c7e
SAX parser to recognize a feature name but to be unable to set its
Packit d27c7e
value; this is especially true in the case of an adapter for a SAX1
Packit d27c7e
Parser, which has no way of affecting whether the underlying parser is
Packit d27c7e
validating, for example.

Packit d27c7e
Packit d27c7e

Some feature values may be immutable or mutable only in specific

Packit d27c7e
contexts, such as before, during, or after a parse.

Packit d27c7e
Packit d27c7e
<tt>set_feature()</tt> will throw <tt>XML::SAX::Exception::NotRecognized</tt>
Packit d27c7e
when the SAX parser does not recognize the feature name and
Packit d27c7e
<tt>XML::SAX::Exception::NotSupported</tt> when the SAX parser recognizes the
Packit d27c7e
feature name but cannot set the requested value.
Packit d27c7e
Packit d27c7e

Packit d27c7e
	This method is also the standard mechanism for setting extended	handlers,
Packit d27c7e
	such as "http://xml.org/sax/handlers/DeclHandler".
Packit d27c7e

Packit d27c7e

Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
	
Packit d27c7e
		
get_features()
Packit d27c7e
		
Packit d27c7e
			Look up all Features that this parser claims to support.
Packit d27c7e
			

Packit d27c7e
				This method returns a hash of Features which the parser
Packit d27c7e
				claims to support. The value of the hash is currently
Packit d27c7e
				unspecified though it may be used later. This method is meant
Packit d27c7e
				to be inherited so that Features supported by the base parser
Packit d27c7e
				class (XML::SAX::Base) are declared to be supported by
Packit d27c7e
				subclasses.
Packit d27c7e
			

Packit d27c7e
			

Packit d27c7e
				Calling this method is probably only moderately useful to end
Packit d27c7e
				users. It is mostly meant for use by XML::SAX, so that it can
Packit d27c7e
				query parsers for Feature support and return an appropriate
Packit d27c7e
				parser depending on the Features that are required.
Packit d27c7e
			

Packit d27c7e
		
Packit d27c7e
	
Packit d27c7e

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e

Input Sources

Packit d27c7e
Packit d27c7e

Input sources may be provided to parser objects or are returned by

Packit d27c7e
entity resolvers.  An input source is a hash with these
Packit d27c7e
properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The public identifier of this input source.
Packit d27c7e
Packit d27c7e

The public identifier is always optional: if the application writer

Packit d27c7e
includes one, it will be provided as part of the location
Packit d27c7e
information.

Packit d27c7e
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The system identifier (URI) of this input source.
Packit d27c7e
Packit d27c7e

The system identifier is optional if there is a byte stream or a

Packit d27c7e
character stream, but it is still useful to provide one, since the
Packit d27c7e
application can use it to resolve relative URIs and can include it in
Packit d27c7e
error messages and warnings (the parser will attempt to open a
Packit d27c7e
connection to the URI only if there is no byte stream or character
Packit d27c7e
stream specified).

Packit d27c7e
Packit d27c7e
If the application knows the character encoding of the object
Packit d27c7e
pointed to by the system identifier, it can register the encoding
Packit d27c7e
using the <tt>Encoding</tt> property.
Packit d27c7e
Packit d27c7e
<tt>ByteStream</tt>
Packit d27c7e
The byte stream for this input source.
Packit d27c7e
Packit d27c7e

The SAX parser will ignore this if there is also a character stream

Packit d27c7e
specified, but it will use a byte stream in preference to opening a
Packit d27c7e
URI connection itself.

Packit d27c7e
Packit d27c7e
If the application knows the character encoding of the byte stream, it
Packit d27c7e
should set the <tt>Encoding</tt> property.
Packit d27c7e
Packit d27c7e
<tt>CharacterStream</tt>
Packit d27c7e
The character stream for this input source.
Packit d27c7e
Packit d27c7e

If there is a character stream specified, the SAX parser will

Packit d27c7e
ignore any byte stream and will not attempt to open a URI connection
Packit d27c7e
to the system identifier.

Packit d27c7e
Packit d27c7e

Note: A CharacterStream is a filehandle that does not need any encoding

Packit d27c7e
translation done on it. This is implemented as a regular filehandle
Packit d27c7e
and only works under Perl 5.7.2 or higher using PerlIO. To get a single
Packit d27c7e
character, or number of characters from it, use the perl core read()
Packit d27c7e
function. To get a single byte from it (or number of bytes), you can
Packit d27c7e
use sysread(). The encoding of the stream should be in the Encoding
Packit d27c7e
entry for the Source.

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Encoding</tt>
Packit d27c7e
The character encoding, if known.
Packit d27c7e
Packit d27c7e

The encoding must be a string acceptable for an XML encoding

Packit d27c7e
declaration (see section 4.3.3 of the XML 1.0 recommendation).

Packit d27c7e
Packit d27c7e
This property has no effect when the application provides a character
Packit d27c7e
stream.
Packit d27c7e
Packit d27c7e
Packit d27c7e

SAX Handlers

Packit d27c7e
Packit d27c7e

SAX supports several classes of event handlers: content handlers,

Packit d27c7e
declaration handlers, DTD handlers, error handlers, entity resolvers,
Packit d27c7e
and other extensions.  This section defines each of these classes of
Packit d27c7e
events.

Packit d27c7e
Packit d27c7e

Content Events

Packit d27c7e
Packit d27c7e

This is the main interface that most SAX applications implement: if

Packit d27c7e
the application needs to be informed of basic parsing events, it
Packit d27c7e
implements this interface and registers an instance with the SAX
Packit d27c7e
parser using the <tt>ContentHandler</tt> property. The parser uses
Packit d27c7e
the instance to report basic document-related events like the start
Packit d27c7e
and end of elements and character data.

Packit d27c7e
Packit d27c7e

The order of events in this interface is very important, and

Packit d27c7e
mirrors the order of information in the document itself. For example,
Packit d27c7e
all of an element's content (character data, processing instructions,
Packit d27c7e
and/or subelements) will appear, in order, between the
Packit d27c7e
<tt>start_element</tt> event and the corresponding
Packit d27c7e
<tt>end_element</tt> event.

Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>set_document_locator</tt>(locator)
Packit d27c7e
Packit d27c7e
Receive an object for locating the origin of SAX document events.
Packit d27c7e
Packit d27c7e

SAX parsers are strongly encouraged (though not absolutely

Packit d27c7e
required) to supply a locator: if it does so, it must supply the
Packit d27c7e
locator to the application by invoking this method before invoking any
Packit d27c7e
of the other methods in the ContentHandler interface.

Packit d27c7e
Packit d27c7e

The locator allows the application to determine the end position of

Packit d27c7e
any document-related event, even if the parser is not reporting an
Packit d27c7e
error.  Typically, the application will use this information for
Packit d27c7e
reporting its own errors (such as character content that does not
Packit d27c7e
match an application's business rules).  The information provided by
Packit d27c7e
the locator is probably not sufficient for use with a search
Packit d27c7e
engine.

Packit d27c7e
Packit d27c7e

Note that the locator will provide correct information only during

Packit d27c7e
the invocation of the events in this interface. The application should
Packit d27c7e
not attempt to use it at any other time.

Packit d27c7e
Packit d27c7e

The locator is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>ColumnNumber</tt>
Packit d27c7e
The column number of the end of the text where the exception
Packit d27c7e
occurred.
Packit d27c7e
<tt>LineNumber</tt>
Packit d27c7e
The line number of the end of the text where the exception
Packit d27c7e
occurred.
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The public identifier of the entity where the exception
Packit d27c7e
occurred.
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The system identifier of the entity where the exception
Packit d27c7e
occurred.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>start_prefix_mapping</tt>(mapping)
Packit d27c7e
Packit d27c7e
Begin the scope of a prefix-URI Namespace mapping.
Packit d27c7e
Packit d27c7e

The information from this event is not necessary for normal

Packit d27c7e
Namespace processing: the SAX XML reader will automatically replace
Packit d27c7e
prefixes for element and attribute names when the
Packit d27c7e
"<tt>http://xml.org/sax/features/namespaces</tt>" feature is true (the
Packit d27c7e
default).

Packit d27c7e
Packit d27c7e

There are cases, however, when applications need to use prefixes in

Packit d27c7e
character data or in attribute values, where they cannot safely be
Packit d27c7e
expanded automatically; the start/end_prefix_mapping event supplies the
Packit d27c7e
information to the application to expand prefixes in those contexts
Packit d27c7e
itself, if necessary.

Packit d27c7e
Packit d27c7e

Note that <tt>start</tt>/<tt>end_prefix_mapping()</tt> events are

Packit d27c7e
not guaranteed to be properly nested relative to each-other: all
Packit d27c7e
<tt>start_prefix_apping()</tt> events will occur before the
Packit d27c7e
corresponding <tt>start_element()</tt> event, and all
Packit d27c7e
<tt>end_prefix_mapping</tt> events will occur after the corresponding
Packit d27c7e
<tt>end_element()</tt> event, but their order is not
Packit d27c7e
guaranteed.
Packit d27c7e

Packit d27c7e
Packit d27c7e

mapping is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Prefix</tt>
Packit d27c7e
The Namespace prefix being declared.
Packit d27c7e
<tt>NamespaceURI</tt>
Packit d27c7e
The Namespace URI the prefix is mapped to.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>end_prefix_mapping</tt>(mapping)
Packit d27c7e
Packit d27c7e
End the scope of a prefix-URI mapping.
Packit d27c7e
Packit d27c7e

See <tt>start_prefix_mapping()</tt> for details. This event will

Packit d27c7e
always occur after the corresponding <tt>end_element</tt> event, but
Packit d27c7e
the order of <tt>end_prefix_mapping</tt> events is not otherwise
Packit d27c7e
guaranteed.

Packit d27c7e
Packit d27c7e

mapping is a hash with this property:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Prefix</tt>
Packit d27c7e
The Namespace prefix that was being mapped.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>processing_instruction</tt>(pi)
Packit d27c7e
Packit d27c7e
Receive notification of a processing instruction.
Packit d27c7e
Packit d27c7e

The Parser will invoke this method once for each processing

Packit d27c7e
instruction found: note that processing instructions may occur before
Packit d27c7e
or after the main document element.

Packit d27c7e
Packit d27c7e

A SAX parser should never report an XML declaration (XML 1.0,

Packit d27c7e
section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this
Packit d27c7e
method.

Packit d27c7e
Packit d27c7e

pi is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Target</tt>
Packit d27c7e
The processing instruction target.
Packit d27c7e
<tt>Data</tt>
Packit d27c7e
The processing instruction data, or null if none was
Packit d27c7e
supplied.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>skipped_entity</tt>(entity)
Packit d27c7e
Packit d27c7e
Receive notification of a skipped entity.
Packit d27c7e
Packit d27c7e

The Parser will invoke this method once for each entity skipped.

Packit d27c7e
Non-validating processors may skip entities if they have not seen the
Packit d27c7e
declarations (because, for example, the entity was declared in an
Packit d27c7e
external DTD subset). All processors may skip external entities,
Packit d27c7e
depending on the values of the
Packit d27c7e
"<tt>http://xml.org/sax/features/external-general-entities</tt>" and the
Packit d27c7e
"<tt>http://xml.org/sax/features/external-parameter-entities</tt>"
Packit d27c7e
Features.

Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The name of the skipped entity. If it is a parameter
Packit d27c7e
entity, the name will begin with '<tt>%</tt>'.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Declaration Events

Packit d27c7e
Packit d27c7e

This is an optional extension handler for SAX2 to provide

Packit d27c7e
information about DTD declarations in an XML document. XML readers are
Packit d27c7e
not required to support this handler.

Packit d27c7e
Packit d27c7e

Note that data-related DTD declarations (unparsed entities and

Packit d27c7e
notations) are already reported through the DTDHandler interface.

Packit d27c7e
Packit d27c7e

If you are using the declaration handler together with a lexical

Packit d27c7e
handler, all of the events will occur between the <tt>start_dtd</tt>
Packit d27c7e
and the <tt>end_dtd</tt> events.

Packit d27c7e
Packit d27c7e

To set a seperate DeclHandler for an XML reader, set the

Packit d27c7e
"<tt>http://xml.org/sax/handlers/DeclHandler</tt>" Feature with the
Packit d27c7e
object to received declaration events.  If the reader does not support
Packit d27c7e
declaration events, it will throw a <tt>XML::SAX::Exception::NotRecognized</tt>
Packit d27c7e
or a <tt>XML::SAX::Exception::NotSupported</tt> when you attempt to register
Packit d27c7e
the handler.  Declaration event handlers on the default handler are
Packit d27c7e
automatically recognized and used.

Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>element_decl</tt>(element)
Packit d27c7e
Packit d27c7e
Report an element type declaration.
Packit d27c7e
Packit d27c7e

The content model will consist of the string "EMPTY", the string

Packit d27c7e
"ANY", or a parenthesised group, optionally followed by an occurrence
Packit d27c7e
indicator. The model will be normalized so that all whitespace is
Packit d27c7e
removed, and will include the enclosing parentheses.

Packit d27c7e
Packit d27c7e

element is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The element type name.
Packit d27c7e
<tt>Model</tt>
Packit d27c7e
The content model as a normalized string.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>attribute_decl</tt>(attribute)
Packit d27c7e
Packit d27c7e
Report an attribute type declaration.
Packit d27c7e
Packit d27c7e

Only the effective (first) declaration for an attribute will be

Packit d27c7e
reported.  The type will be one of the strings "<tt>CDATA</tt>",
Packit d27c7e
"<tt>ID</tt>", "<tt>IDREF</tt>", "<tt>IDREFS</tt>",
Packit d27c7e
"<tt>NMTOKEN</tt>", "<tt>NMTOKENS</tt>", "<tt>ENTITY</tt>",
Packit d27c7e
"<tt>ENTITIES</tt>", or "<tt>NOTATION</tt>", or a parenthesized token
Packit d27c7e
group with the separator "<tt>|</tt>" and all whitespace removed.

Packit d27c7e
Packit d27c7e

attribute is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>eName</tt>
Packit d27c7e
The name of the associated element.
Packit d27c7e
<tt>aName</tt>
Packit d27c7e
The name of the attribute.
Packit d27c7e
<tt>Type</tt>
Packit d27c7e
A string representing the attribute type.
Packit d27c7e
<tt>ValueDefault</tt>
Packit d27c7e
A string representing the attribute default ("<tt>#IMPLIED</tt>",
Packit d27c7e
"<tt>#REQUIRED</tt>", or "<tt>#FIXED</tt>") or undef if none of these
Packit d27c7e
applies.
Packit d27c7e
<tt>Value</tt>
Packit d27c7e
A string representing the attribute's default value, or null if
Packit d27c7e
there is none.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>internal_entity_decl</tt>(entity)
Packit d27c7e
Packit d27c7e
Report an internal entity declaration.
Packit d27c7e
Packit d27c7e

Only the effective (first) declaration for each entity will be

Packit d27c7e
reported.

Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The name of the entity. If it is a parameter entity, the name will
Packit d27c7e
begin with '%'.
Packit d27c7e
<tt>Value</tt>
Packit d27c7e
The replacement text of the entity.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>external_entity_decl</tt>(entity)
Packit d27c7e
Packit d27c7e
Report a parsed external entity declaration.
Packit d27c7e
Packit d27c7e

Only the effective (first) declaration for each entity will be

Packit d27c7e
reported.

Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The name of the entity. If it is a parameter entity, the name will
Packit d27c7e
begin with '%'.
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The public identifier of the entity, or <tt>undef</tt> if none was
Packit d27c7e
declared.
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The system identifier of the entity.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

DTD Events

Packit d27c7e
Packit d27c7e

If a SAX application needs information about notations and unparsed

Packit d27c7e
entities, then the application implements this interface.  The parser
Packit d27c7e
uses the instance to report notation and unparsed entity declarations
Packit d27c7e
to the application.

Packit d27c7e
Packit d27c7e

The SAX parser may report these events in any order, regardless of

Packit d27c7e
the order in which the notations and unparsed entities were declared;
Packit d27c7e
however, all DTD events must be reported after the document handler's
Packit d27c7e
<tt>start_document()</tt> event, and before the first
Packit d27c7e
<tt>start_element()</tt> event.

Packit d27c7e
Packit d27c7e

It is up to the application to store the information for future use

Packit d27c7e
(perhaps in a hash table or object tree). If the application
Packit d27c7e
encounters attributes of type "<tt>NOTATION</tt>", "<tt>ENTITY</tt>",
Packit d27c7e
or "<tt>ENTITIES</tt>", it can use the information that it obtained
Packit d27c7e
through this interface to find the entity and/or notation
Packit d27c7e
corresponding with the attribute value.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>notation_decl</tt>(notation)
Packit d27c7e
Packit d27c7e
Receive notification of a notation declaration event.
Packit d27c7e
Packit d27c7e

It is up to the application to record the notation for later

Packit d27c7e
reference, if necessary.

Packit d27c7e
Packit d27c7e

If a system identifier is present, and it is a URL, the SAX parser

Packit d27c7e
must resolve it fully before passing it to the application.

Packit d27c7e
Packit d27c7e

notation is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The notation name.
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The public identifier of the entity, or <tt>undef</tt> if none was
Packit d27c7e
declared.
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The system identifier of the entity, or <tt>undef</tt> if none was
Packit d27c7e
declared.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>unparsed_entity_decl</tt>(entity)
Packit d27c7e
Packit d27c7e
Receive notification of an unparsed entity declaration event.
Packit d27c7e
Packit d27c7e

Note that the notation name corresponds to a notation reported by

Packit d27c7e
the <tt>notation_decl()</tt> event. It is up to the application to
Packit d27c7e
record the entity for later reference, if necessary.

Packit d27c7e
Packit d27c7e

If the system identifier is a URL, the parser must resolve it fully

Packit d27c7e
before passing it to the application.

Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The unparsed entity's name.
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The public identifier of the entity, or <tt>undef</tt> if none was
Packit d27c7e
declared.
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The system identifier of the entity.
Packit d27c7e
<tt>Notation</tt>
Packit d27c7e
The name of the associated notation.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Entity Resolver

Packit d27c7e
Packit d27c7e

If a SAX application needs to implement customized handling for

Packit d27c7e
external entities, it must implement this interface.

Packit d27c7e
Packit d27c7e

The parser will then allow the application to intercept any

Packit d27c7e
external entities (including the external DTD subset and external
Packit d27c7e
parameter entities, if any) before including them.

Packit d27c7e
Packit d27c7e

Packit d27c7e
  Many SAX applications will not need to implement this interface,
Packit d27c7e
  but it will be especially useful for applications that build XML
Packit d27c7e
  documents from databases or other specialised input sources, or for
Packit d27c7e
  applications that use URI types that are either not URLs, or that
Packit d27c7e
  have schemes unknown to the parser.
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>resolve_entity</tt>(entity)
Packit d27c7e
Packit d27c7e
Allow the application to resolve external entities.
Packit d27c7e
Packit d27c7e

The Parser will call this method before opening any external entity

Packit d27c7e
except the top-level document entity (including the external DTD
Packit d27c7e
subset, external entities referenced within the DTD, and external
Packit d27c7e
entities referenced within the document element): the application may
Packit d27c7e
request that the parser resolve the entity itself, that it use an
Packit d27c7e
alternative URI, or that it use an entirely different input
Packit d27c7e
source.

Packit d27c7e
Packit d27c7e

Application writers can use this method to redirect external system

Packit d27c7e
identifiers to secure and/or local URIs, to look up public identifiers
Packit d27c7e
in a catalogue, or to read an entity from a database or other input
Packit d27c7e
source (including, for example, a dialog box).

Packit d27c7e
Packit d27c7e

If the system identifier is a URL, the SAX parser must resolve it

Packit d27c7e
fully before reporting it to the application.

Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The public identifier of the entity being referenced, or
Packit d27c7e
<tt>undef</tt> if none was declared.
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The system identifier of the entity being referenced.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Error Events

Packit d27c7e
Packit d27c7e

If a SAX application needs to implement customized error handling,

Packit d27c7e
it must implement this interface.  The parser will then report all
Packit d27c7e
errors and warnings through this interface.

Packit d27c7e
Packit d27c7e

The parser shall use this interface to report errors instead or in

Packit d27c7e
addition to throwing an exception: for errors and warnings the recommended
Packit d27c7e
approach is to leave the application throw its own exceptions and to not
Packit d27c7e
throw them in the parser. For fatal errors however, it is not uncommon that
Packit d27c7e
the parser will throw an exception after having reported the error as it
Packit d27c7e
renders any continuation of parsing impossible.
Packit d27c7e

Packit d27c7e
Packit d27c7e

All error handlers receive a hash, exception, with the

Packit d27c7e
properties defined in 
Packit d27c7e
href="sax-2.0.html#Exceptions">Exceptions.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>warning</tt>(exception)
Packit d27c7e
Packit d27c7e
Receive notification of a warning.
Packit d27c7e
Packit d27c7e

SAX parsers will use this method to report conditions that are not

Packit d27c7e
errors or fatal errors as defined by the XML 1.0 recommendation. The
Packit d27c7e
default behaviour is to take no action.

Packit d27c7e
Packit d27c7e
The SAX parser must continue to provide normal parsing events after
Packit d27c7e
invoking this method: it should still be possible for the application
Packit d27c7e
to process the document through to the end.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>error</tt>(exception)
Packit d27c7e
Packit d27c7e
Receive notification of a recoverable error.
Packit d27c7e
Packit d27c7e

This corresponds to the definition of "error" in section 1.2 of the

Packit d27c7e
W3C XML 1.0 Recommendation.  For example, a validating parser would use
Packit d27c7e
this callback to report the violation of a validity constraint.  The
Packit d27c7e
default behaviour is to take no action.

Packit d27c7e
Packit d27c7e
The SAX parser must continue to provide normal parsing events after
Packit d27c7e
invoking this method: it should still be possible for the application
Packit d27c7e
to process the document through to the end.  If the application cannot
Packit d27c7e
do so, then the parser should report a fatal error even if the XML 1.0
Packit d27c7e
recommendation does not require it to do so.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>fatal_error</tt>(exception)
Packit d27c7e
Packit d27c7e
Receive notification of a non-recoverable error.
Packit d27c7e
Packit d27c7e

This corresponds to the definition of "fatal error" in section 1.2

Packit d27c7e
of the W3C XML 1.0 Recommendation.  For example, a parser would use
Packit d27c7e
this callback to report the violation of a well-formedness
Packit d27c7e
constraint.

Packit d27c7e
Packit d27c7e
The application must assume that the document is unusable after the
Packit d27c7e
parser has invoked this method, and should continue (if at all) only
Packit d27c7e
for the sake of collecting addition error messages: in fact, SAX
Packit d27c7e
parsers are free to stop reporting any other events once this method
Packit d27c7e
has been invoked.

Packit d27c7e
Packit d27c7e

Lexical Events

Packit d27c7e
Packit d27c7e

This is an optional extension handler for SAX2 to provide lexical

Packit d27c7e
information about an XML document, such as comments and CDATA section
Packit d27c7e
boundaries; XML readers are not required to support this handler.

Packit d27c7e
Packit d27c7e

The events in the lexical handler apply to the entire document, not

Packit d27c7e
just to the document element, and all lexical handler events must
Packit d27c7e
appear between the content handler's <tt>start_document()</tt> and
Packit d27c7e
<tt>end_document()</tt> events.

Packit d27c7e
Packit d27c7e

To set the LexicalHandler for an XML reader, set the Feature

Packit d27c7e
"<tt>http://xml.org/sax/handlers/LexicalHandler</tt>" on the parser to
Packit d27c7e
the object to receive lexical events.  If the reader does not support
Packit d27c7e
lexical events, it will throw a <tt>XML::SAX::Exception::NotRecognized</tt> or
Packit d27c7e
a <tt>XML::SAX::Exception::NotSupported</tt> when you attempt to register the
Packit d27c7e
handler.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>start_dtd</tt>(dtd)
Packit d27c7e
Packit d27c7e
Report the start of DTD declarations, if any.
Packit d27c7e
Packit d27c7e

Any declarations are assumed to be in the internal subset unless

Packit d27c7e
otherwise indicated by a start_entity event.

Packit d27c7e
Packit d27c7e

Note that the <tt>start</tt>/<tt>end_dtd()</tt> events will appear

Packit d27c7e
within the <tt>start</tt>/<tt>end_document()</tt> events from Content
Packit d27c7e
Handler and before the first <tt>start_element()</tt> event.

Packit d27c7e
Packit d27c7e

dtd is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The document type name.
Packit d27c7e
<tt>PublicId</tt>
Packit d27c7e
The declared public identifier for the external DTD subset, or
Packit d27c7e
<tt>undef</tt> if none was declared.
Packit d27c7e
<tt>SystemId</tt>
Packit d27c7e
The declared system identifier for the external DTD subset, or
Packit d27c7e
<tt>undef</tt> if none was declared.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>end_dtd</tt>(dtd)
Packit d27c7e
Packit d27c7e
Report the end of DTD declarations.
Packit d27c7e
Packit d27c7e

No properties are defined for this event (dtd is

Packit d27c7e
empty).

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>start_entity</tt>(entity)
Packit d27c7e
Packit d27c7e
Report the beginning of an entity in content.
Packit d27c7e
Packit d27c7e

NOTE: entity references in attribute values -- and the start

Packit d27c7e
and end of the document entity -- are never reported.

Packit d27c7e
Packit d27c7e

The start and end of the external DTD subset are reported using the

Packit d27c7e
pseudo-name "[dtd]". All other events must be properly nested within
Packit d27c7e
start/end entity events.

Packit d27c7e
Packit d27c7e

Note that skipped entities will be reported through the

Packit d27c7e
<tt>skipped_entity()</tt> event, which is part of the ContentHandler
Packit d27c7e
interface.

Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The name of the entity. If it is a parameter entity, the
Packit d27c7e
name will begin with '%'.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>end_entity</tt>(entity)
Packit d27c7e
Packit d27c7e
Report the end of an entity.
Packit d27c7e
Packit d27c7e

entity is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Name</tt>
Packit d27c7e
The name of the entity that is ending.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>start_cdata</tt>(cdata)
Packit d27c7e
Packit d27c7e
Report the start of a CDATA section.
Packit d27c7e
Packit d27c7e

The contents of the CDATA section will be reported through the

Packit d27c7e
regular characters event.

Packit d27c7e
Packit d27c7e

No properties are defined for this event (cdata is

Packit d27c7e
empty).

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>end_cdata</tt>(cdata)
Packit d27c7e
Packit d27c7e
Report the end of a CDATA section.
Packit d27c7e
Packit d27c7e

No properties are defined for this event (cdata is

Packit d27c7e
empty).

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt class='function'>comment</tt>(comment)
Packit d27c7e
Packit d27c7e
Report an XML comment anywhere in the document.
Packit d27c7e
Packit d27c7e

This callback will be used for comments inside or outside the

Packit d27c7e
document element, including comments in the external DTD subset (if
Packit d27c7e
read).

Packit d27c7e
Packit d27c7e

comment is a hash with these properties:

Packit d27c7e
Packit d27c7e
Packit d27c7e
Packit d27c7e
<tt>Data</tt>
Packit d27c7e
The comment characters.
Packit d27c7e
Packit d27c7e
Packit d27c7e

Packit d27c7e
Packit d27c7e

SAX Filters

Packit d27c7e
Packit d27c7e

An XML filter is like an XML event generator, except that it

Packit d27c7e
obtains its events from another XML event generator rather than a
Packit d27c7e
primary source like an XML document or database.  Filters can modify a
Packit d27c7e
stream of events as they pass on to the final application.

Packit d27c7e
Packit d27c7e

Packit d27c7e
<tt>Parent</tt>
Packit d27c7e
Packit d27c7e
The parent reader.
Packit d27c7e
Packit d27c7e

This Feature allows the application to link the filter to a parent

Packit d27c7e
event generator (which may be another filter).

Packit d27c7e
Packit d27c7e

Packit d27c7e
  See the XML::SAX::Base module for more on filters. It is meant to be
Packit d27c7e
  used as a base class for filters and drivers, and makes them much
Packit d27c7e
  easier to implement.
Packit d27c7e

Packit d27c7e
Packit d27c7e

Java Compatibility

Packit d27c7e
Packit d27c7e
The Perl SAX 2.0 binding differs from the Java binding in these ways:
Packit d27c7e
Packit d27c7e
    Packit d27c7e
    Packit d27c7e
  • Takes parameters to <tt>new()</tt>, to <tt>parse()</tt>, and to be
  • Packit d27c7e
    set directly in the object, instead of requiring set/get calls (see
    Packit d27c7e
    below).
    Packit d27c7e
    Packit d27c7e
  • Allows a default <tt>Handler</tt> parameter to be used for all
  • Packit d27c7e
    handlers.
    Packit d27c7e
    Packit d27c7e
  • Packit d27c7e
      No base classes are enforced. Instead, parsers dynamically
    Packit d27c7e
      check the handlers for what methods they support. Note however that
    Packit d27c7e
      using XML::SAX::Base as your base class for Drivers and Filters will
    Packit d27c7e
      make your code a lot simpler, less error prone, and probably much more
    Packit d27c7e
      correct with regard to this spec. Only reimplement that functionality
    Packit d27c7e
      if you really need to.
    Packit d27c7e
    Packit d27c7e
    Packit d27c7e
  • The Attribute, InputSource, and SAXException (XML::SAX::Exception)
  • Packit d27c7e
    classes are only described as hashes (see below).
    Packit d27c7e
    Packit d27c7e
  • Handlers are passed a hash (Node) containing properties as an
  • Packit d27c7e
    argument instead of positional arguments.
    Packit d27c7e
    Packit d27c7e
  • <tt>parse()</tt> methods return the value returned by calling the
  • Packit d27c7e
    <tt>end_document()</tt> handler.
    Packit d27c7e
    Packit d27c7e
  • Packit d27c7e
      Method names have been converted to lower-case with underscores.
    Packit d27c7e
      Parameters are all mixed case with initial upper-case.
    Packit d27c7e
    Packit d27c7e
    Packit d27c7e
    Packit d27c7e

    Packit d27c7e
      If compatibility is a problem for you consider writing a Filter that
    Packit d27c7e
      converts from this style to the one you want. It is likely that such
    Packit d27c7e
      a Filter will be available from CPAN in the not distant future.
    Packit d27c7e

    Packit d27c7e
    Packit d27c7e
    Packit d27c7e

    [FIXME: need to list package/class name equivalents for all

    Packit d27c7e
    hashes.]

    Packit d27c7e
    -->
    Packit d27c7e
    Packit d27c7e
    </body>
    Packit d27c7e
    </html>