Blame doc/PerlSAX.pod

Packit d27c7e
=head1 SAX for Perl
Packit d27c7e
Packit d27c7e
=head2 What is SAX?
Packit d27c7e
Packit d27c7e
SAX (Simple API for XML) is a common parser interface for XML
Packit d27c7e
parsers.  It allows application writers to write applications that use
Packit d27c7e
XML parsers, but are independent of which parser is actually used.
Packit d27c7e
Packit d27c7e
This document describes a version of SAX used by Perl modules.  The
Packit d27c7e
original version of SAX, for Java, is described at
Packit d27c7e
<http://www.megginson.com/SAX/>.
Packit d27c7e
Packit d27c7e
There are two basic interfaces in the Perl version of SAX, the parser
Packit d27c7e
interface and the handler interface.  The parser interface creates new
Packit d27c7e
parser instances, initiates parsing, and provides additional
Packit d27c7e
information to handlers on request.  The handler interface is used to
Packit d27c7e
receive parse events from the parser.
Packit d27c7e
Packit d27c7e
=head2 Deviations from the Java version
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
Takes parameters to `C<new()>' instead of using `set*' calls.
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
Allows a default Handler parameter to be used for all handlers.
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
No base classes are implemented.  Instead, parsers dynamically check
Packit d27c7e
the handlers for what methods they support.
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
The AttributeList, InputSource, and SAXException classes have been
Packit d27c7e
replaced by anonymous hashes.
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
Handlers are passed a hash containing properties as an argument in
Packit d27c7e
place of positional arguments.
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
`C<parse()>' returns the value returned by calling the
Packit d27c7e
`C<end_document()>' handler.
Packit d27c7e
Packit d27c7e
=item *
Packit d27c7e
Packit d27c7e
Method names have been converted to lower-case with underscores.
Packit d27c7e
Parameters are all mixed case with initial upper-case.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head1 Parser Interface
Packit d27c7e
Packit d27c7e
SAX parsers are reusable but not re-entrant: the application may reuse
Packit d27c7e
a parser object (possibly with a different input source) once the
Packit d27c7e
first parse has completed successfully, but it may not invoke the
Packit d27c7e
`C<parse()>' methods recursively within a parse.
Packit d27c7e
Packit d27c7e
Parser objects contain the following options.  A new or different
Packit d27c7e
handler option may provided in the middle of a parse, and the SAX
Packit d27c7e
parser must begin using the new handler immediately.  The `C<Locale>'
Packit d27c7e
option must not be changed in the middle of a parse.  If an
Packit d27c7e
application does not provide a handler for a particular set of events,
Packit d27c7e
those events will be silently ignored unless otherwise stated.  If an
Packit d27c7e
`C<EntityResolver>' is not provided, the parser will resolve system
Packit d27c7e
identifiers and open connections to entities itself.
Packit d27c7e
Packit d27c7e
    Handler          default handler to receive events
Packit d27c7e
    DocumentHandler  handler to receive document events
Packit d27c7e
    DTDHandler       handler to receive DTD events
Packit d27c7e
    ErrorHandler     handler to receive error events
Packit d27c7e
    EntityResolver   handler to resolve entities
Packit d27c7e
    Locale           locale to provide localisation for errors
Packit d27c7e
Packit d27c7e
If no handlers are provided then all events will be silently ignored,
Packit d27c7e
except for `C<fatal_error()>' which will cause a `C<die()>' to be
Packit d27c7e
called after calling `C<end_document()>'.
Packit d27c7e
Packit d27c7e
All handler methods are called with a single hash argument containing
Packit d27c7e
the parameters for that method.  `C<new()>' methods can be called with
Packit d27c7e
a hash or a list of key-value pairs containing the parameters.
Packit d27c7e
Packit d27c7e
All SAX parsers must implement this basic interface: it allows
Packit d27c7e
applications to provide handlers for different types of events and to
Packit d27c7e
initiate a parse from a URI, a byte stream, or a character stream.
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item new( I<OPTIONS> )
Packit d27c7e
Packit d27c7e
Creates a Parser that will be used to parse XML sources.  Any
Packit d27c7e
parameters passed to `C<new()>' will be used for subsequent parses.
Packit d27c7e
I<OPTIONS> may be a list of key, value pairs or a hash.
Packit d27c7e
Packit d27c7e
=item parse( I<OPTIONS> )
Packit d27c7e
Packit d27c7e
Parse an XML document. 
Packit d27c7e
Packit d27c7e
The application can use this method to instruct the SAX parser to
Packit d27c7e
begin parsing an XML document from any valid input source (a character
Packit d27c7e
stream, a byte stream, or a URI).  I<OPTIONS> may be a list of key,
Packit d27c7e
value pairs or a hash.  I<OPTIONS> passed to `C<parse()>' override
Packit d27c7e
options given when the parser instance was created with `C<new()>'.
Packit d27c7e
Packit d27c7e
Applications may not invoke this method while a parse is in progress
Packit d27c7e
(they should create a new Parser instead for each additional XML
Packit d27c7e
document). Once a parse is complete, an application may reuse the same
Packit d27c7e
Parser object, possibly with a different input source.
Packit d27c7e
Packit d27c7e
`C<parse()>' returns the result of calling the handler method
Packit d27c7e
`C<end_document()>'.
Packit d27c7e
Packit d27c7e
A `C<Source>' parameter must have been provided to either the
Packit d27c7e
`C<parse()>' or `C<new()>' methods.  The `C<Source>' parameter is a
Packit d27c7e
hash containing the following parameters:
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item PublicId
Packit d27c7e
Packit d27c7e
The public identifier for this input source.
Packit d27c7e
Packit d27c7e
The public identifier is always optional: if the application writer
Packit d27c7e
includes one, it will be provided as part of the location information.
Packit d27c7e
Packit d27c7e
=item SystemId
Packit d27c7e
Packit d27c7e
The system identifier for this input source. 
Packit d27c7e
Packit d27c7e
The system identifier is optional if there is a byte stream, a
Packit d27c7e
character stream, or a string, but it is still useful to provide one,
Packit d27c7e
since the application can use it to resolve relative URIs and can
Packit d27c7e
include it in error messages and warnings (the parser will attempt to
Packit d27c7e
open a connection to the URI only if there is no byte stream or
Packit d27c7e
character stream specified).
Packit d27c7e
Packit d27c7e
If the application knows the character encoding of the object pointed
Packit d27c7e
to by the system identifier, it can provide the encoding using the
Packit d27c7e
`C<Encoding>' parameter.
Packit d27c7e
Packit d27c7e
If the system ID is a URL, it must be fully resolved.
Packit d27c7e
Packit d27c7e
=item String
Packit d27c7e
Packit d27c7e
A scalar value containing XML text to be parsed.
Packit d27c7e
Packit d27c7e
The SAX parser will ignore this if there is also a byte or character
Packit d27c7e
stream, but it will use a string in preference to opening a URI
Packit d27c7e
connection.
Packit d27c7e
Packit d27c7e
=item ByteStream
Packit d27c7e
Packit d27c7e
The byte stream (file handle) for this input source. 
Packit d27c7e
Packit d27c7e
The SAX parser will ignore this if there is also a character stream
Packit d27c7e
specified, but it will use a byte stream in preference to opening a
Packit d27c7e
URI connection itself or using `C<String>'.
Packit d27c7e
Packit d27c7e
If the application knows the character encoding of the byte stream, it
Packit d27c7e
should set it with the `C<Encoding>' parameter.
Packit d27c7e
Packit d27c7e
=item CharacterStream
Packit d27c7e
Packit d27c7e
FOR FUTURE USE ONLY -- Perl does not currently support any character
Packit d27c7e
streams, only use the `C<ByteStream>', `C<SystemId>', or `C<String>'
Packit d27c7e
parameters.
Packit d27c7e
Packit d27c7e
The character stream (file handle) for this input source. 
Packit d27c7e
Packit d27c7e
If there is a character stream specified, the SAX parser will ignore
Packit d27c7e
any byte stream and will not attempt to open a URI connection to the
Packit d27c7e
system identifier.
Packit d27c7e
Packit d27c7e
=item Encoding
Packit d27c7e
Packit d27c7e
The character encoding, if known.
Packit d27c7e
Packit d27c7e
The encoding must be a string acceptable for an XML encoding
Packit d27c7e
declaration (see section 4.3.3 of the XML 1.0 recommendation).
Packit d27c7e
Packit d27c7e
This parameter has no effect when the application provides a character
Packit d27c7e
stream.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head2 Locator
Packit d27c7e
Packit d27c7e
Interface for associating a SAX event with a document location. 
Packit d27c7e
Packit d27c7e
If a SAX parser provides location information to the SAX application,
Packit d27c7e
it does so by implementing the following methods and then calling the
Packit d27c7e
`C<set_document_locator()>' handler method.  The handler can use the
Packit d27c7e
object to obtain the location of any other document handler event in
Packit d27c7e
the XML source document.
Packit d27c7e
Packit d27c7e
Note that the results returned by the object will be valid only during
Packit d27c7e
the scope of each document handler method: the application will
Packit d27c7e
receive unpredictable results if it attempts to use the locator at any
Packit d27c7e
other time.
Packit d27c7e
Packit d27c7e
SAX parsers are not required to supply a locator, but they are very
Packit d27c7e
strongly encouraged to do so.
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item location()
Packit d27c7e
Packit d27c7e
Return the location information for the current event.
Packit d27c7e
Packit d27c7e
Returns a hash containing the following parameters:
Packit d27c7e
Packit d27c7e
  ColumnNumber The column number, or undef if none is available.
Packit d27c7e
  LineNumber   The line number, or undef if none is available.
Packit d27c7e
  PublicId     A string containing the public identifier, or undef if
Packit d27c7e
               none is available.
Packit d27c7e
  SystemId     A string containing the system identifier, or undef if
Packit d27c7e
               none is available.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head1 Handler Interfaces
Packit d27c7e
Packit d27c7e
SAX handler methods are grouped into four interfaces: the document
Packit d27c7e
handler for receiving normal document events, the DTD handler for
Packit d27c7e
receiving notation and unparsed entity events, the error handler for
Packit d27c7e
receiving errors and warnings, and the entity resolver for redirecting
Packit d27c7e
external system identifiers.
Packit d27c7e
Packit d27c7e
The application may choose to implement each interface in one package
Packit d27c7e
or in seperate packages, as long as the objects provided as parameters
Packit d27c7e
to the parser provide the matching interface.
Packit d27c7e
Packit d27c7e
Parsers may implement additional methods in each of these categories,
Packit d27c7e
refer to the parser documentation for further information.
Packit d27c7e
Packit d27c7e
All handlers are called with a single hash argument containing the
Packit d27c7e
parameters for that handler.
Packit d27c7e
Packit d27c7e
Application writers who do not want to implement the entire interface
Packit d27c7e
can leave those methods undefined.  Events whose handler methods are
Packit d27c7e
undefined will be ignored unless otherwise stated.
Packit d27c7e
Packit d27c7e
=head2 DocumentHandler
Packit d27c7e
Packit d27c7e
This is the main interface that most SAX applications implement: if
Packit d27c7e
the application needs to be informed of basic parsing events, it
Packit d27c7e
implements this interface and provides an instance with the SAX parser
Packit d27c7e
using the `C<DocumentHandler>' parameter. The parser uses the instance
Packit d27c7e
to report basic document-related events like the start and end of
Packit d27c7e
elements and character data.
Packit d27c7e
Packit d27c7e
The order of events in this interface is very important, and mirrors
Packit d27c7e
the order of information in the document itself.  For example, all of
Packit d27c7e
an element's content (character data, processing instructions, and/or
Packit d27c7e
subelements) will appear, in order, between the `C<start_element()>'
Packit d27c7e
event and the corresponding `C<end_element()>' event.
Packit d27c7e
Packit d27c7e
The application can find the location of any event using the Locator
Packit d27c7e
interface supplied by the Parser through the
Packit d27c7e
`C<set_document_locator()>' method.
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item set_document_locator( { Locator => $locator } )
Packit d27c7e
Packit d27c7e
Receive an object for locating the origin of SAX document events.
Packit d27c7e
Packit d27c7e
SAX parsers are strongly encouraged (though not absolutely required)
Packit d27c7e
to supply a locator: if it does so, it must supply the locator to the
Packit d27c7e
application by invoking this method before invoking any of the other
Packit d27c7e
methods in the DocumentHandler interface.
Packit d27c7e
Packit d27c7e
The locator allows the application to determine the end position of
Packit d27c7e
any document-related event, even if the parser is not reporting an
Packit d27c7e
error. Typically, the application will use this information for
Packit d27c7e
reporting its own errors (such as character content that does not
Packit d27c7e
match an application's business rules). The information returned by
Packit d27c7e
the locator is probably not sufficient for use with a search engine.
Packit d27c7e
Packit d27c7e
Note that the locator will return correct information only during the
Packit d27c7e
invocation of the events in this interface. The application should not
Packit d27c7e
attempt to use it at any other time.
Packit d27c7e
Packit d27c7e
Parameters:
Packit d27c7e
Packit d27c7e
  Locator     An object that can return the location of any SAX document
Packit d27c7e
              event.
Packit d27c7e
Packit d27c7e
=item start_document( { } )
Packit d27c7e
Packit d27c7e
Receive notification of the beginning of a document.
Packit d27c7e
Packit d27c7e
The SAX parser will invoke this method only once, before any other
Packit d27c7e
methods in this interface or in DTDHandler.
Packit d27c7e
Packit d27c7e
=item end_document( { } )
Packit d27c7e
Packit d27c7e
Receive notification of the end of a document, no parameters are
Packit d27c7e
passed for the end of a document.
Packit d27c7e
Packit d27c7e
The SAX parser will invoke this method only once, and it will be the
Packit d27c7e
last method invoked during the parse.  The parser shall not invoke
Packit d27c7e
this method until it has either abandoned parsing (because of an
Packit d27c7e
unrecoverable error) or reached the end of input.
Packit d27c7e
Packit d27c7e
The value returned by calling `C<end_document()>' will be the value
Packit d27c7e
returned by `C<parse()>'.
Packit d27c7e
Packit d27c7e
=item start_element( { Name => $name, Attributes => $attributes } )
Packit d27c7e
Packit d27c7e
Receive notification of the beginning of an element.
Packit d27c7e
Packit d27c7e
The Parser will invoke this method at the beginning of every element
Packit d27c7e
in the XML document; there will be a corresponding `C<end_element()>'
Packit d27c7e
event for every `C<start_element()>' event (even when the element is
Packit d27c7e
empty). All of the element's content will be reported, in order,
Packit d27c7e
before the corresponding `C<end_element()>' event.
Packit d27c7e
Packit d27c7e
If the element name has a namespace prefix, the prefix will still be
Packit d27c7e
attached.  Note that the attribute list provided will contain only
Packit d27c7e
attributes with explicit values (specified or defaulted): #IMPLIED
Packit d27c7e
attributes will be omitted.
Packit d27c7e
Packit d27c7e
Parameters:
Packit d27c7e
Packit d27c7e
  Name        The element type name.
Packit d27c7e
  Attributes  The attributes attached to the element, if any.
Packit d27c7e
Packit d27c7e
=item end_element( { Name => $name } )
Packit d27c7e
Packit d27c7e
Receive notification of the end of an element.
Packit d27c7e
Packit d27c7e
The SAX parser will invoke this method at the end of every element in
Packit d27c7e
the XML document; there will be a corresponding `C<start_element()>'
Packit d27c7e
event for every `C<end_element()>' event (even when the element is
Packit d27c7e
empty).
Packit d27c7e
Packit d27c7e
If the element name has a namespace prefix, the prefix will still be
Packit d27c7e
attached to the name.
Packit d27c7e
Packit d27c7e
Parameters:
Packit d27c7e
Packit d27c7e
  Name        The element type name.
Packit d27c7e
Packit d27c7e
=item characters( { Data => $characters } )
Packit d27c7e
Packit d27c7e
Receive notification of character data.
Packit d27c7e
Packit d27c7e
The Parser will call this method to report each chunk of character
Packit d27c7e
data.  SAX parsers may return all contiguous character data in a
Packit d27c7e
single chunk, or they may split it into several chunks; however, all
Packit d27c7e
of the characters in any single event must come from the same external
Packit d27c7e
entity, so that the Locator provides useful information.
Packit d27c7e
Packit d27c7e
Note that some parsers will report whitespace using the
Packit d27c7e
`C<ignorable_whitespace()>' method rather than this one (validating
Packit d27c7e
parsers must do so).
Packit d27c7e
Packit d27c7e
Parameters:
Packit d27c7e
Packit d27c7e
  Data        The characters from the XML document.
Packit d27c7e
Packit d27c7e
=item ignorable_whitespace( { Data => $whitespace } )
Packit d27c7e
Packit d27c7e
Receive notification of ignorable whitespace in element content. 
Packit d27c7e
Packit d27c7e
Validating Parsers must use this method to report each chunk of
Packit d27c7e
ignorable whitespace (see the W3C XML 1.0 recommendation, section
Packit d27c7e
2.10): non-validating parsers may also use this method if they are
Packit d27c7e
capable of parsing and using content models.
Packit d27c7e
Packit d27c7e
SAX parsers may return all contiguous whitespace in a single chunk, or
Packit d27c7e
they may split it into several chunks; however, all of the characters
Packit d27c7e
in any single event must come from the same external entity, so that
Packit d27c7e
the Locator provides useful information.
Packit d27c7e
Packit d27c7e
The application must not attempt to read from the array outside of the
Packit d27c7e
specified range.
Packit d27c7e
Packit d27c7e
  Data        The characters from the XML document.
Packit d27c7e
Packit d27c7e
=item processing_instruction ( { Target => $target, Data => $data } )
Packit d27c7e
Packit d27c7e
Receive notification of a processing instruction. 
Packit d27c7e
Packit d27c7e
The Parser will invoke this method once for each processing
Packit d27c7e
instruction found: note that processing instructions may occur before
Packit d27c7e
or after the main document element.
Packit d27c7e
Packit d27c7e
A SAX parser should never report an XML declaration (XML 1.0, section
Packit d27c7e
2.8) or a text declaration (XML 1.0, section 4.3.1) using this method.
Packit d27c7e
Packit d27c7e
Parameters:
Packit d27c7e
Packit d27c7e
  Target      The processing instruction target. 
Packit d27c7e
  Data        The processing instruction data, if any.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head2 ErrorHandler
Packit d27c7e
Packit d27c7e
Basic interface for SAX error handlers. 
Packit d27c7e
Packit d27c7e
If a SAX application needs to implement customized error handling, it
Packit d27c7e
must implement this interface and then provide an instance to the SAX
Packit d27c7e
parser using the parser's `C<ErrorHandler>' parameter.  The parser
Packit d27c7e
will then report all errors and warnings through this interface.
Packit d27c7e
Packit d27c7e
The parser shall use this interface instead of throwing an exception:
Packit d27c7e
it is up to the application whether to throw an exception for
Packit d27c7e
different types of errors and warnings. Note, however, that there is
Packit d27c7e
no requirement that the parser continue to provide useful information
Packit d27c7e
after a call to `C<fatal_error()>' (in other words, a SAX driver class
Packit d27c7e
could catch an exception and report a fatalError).
Packit d27c7e
Packit d27c7e
All error handlers receive the following I<PARAMS>.  The
Packit d27c7e
`C<PublicId>', `C<SystemId>', `C<LineNumber>', and `C<ColumnNumber>'
Packit d27c7e
are provided only if the parser has that information available.
Packit d27c7e
Packit d27c7e
  Messsage     The error or warning message, or undef to use the message
Packit d27c7e
               from the `C<EvalError>' parameter
Packit d27c7e
  PublicId     The public identifer of the entity that generated the
Packit d27c7e
               error or warning.
Packit d27c7e
  SystemId     The system identifer of the entity that generated the
Packit d27c7e
               error or warning.
Packit d27c7e
  LineNumber   The line number of the end of the text that caused the
Packit d27c7e
               error or warning.
Packit d27c7e
  ColumnNumber The column number of the end of the text that cause the
Packit d27c7e
               error or warning.
Packit d27c7e
  EvalError    The error value returned from a lower level interface.
Packit d27c7e
Packit d27c7e
Application writers who do not want to implement the entire interface
Packit d27c7e
can leave those methods undefined.  If not defined, calls to the
Packit d27c7e
`C<warning()>' and `C<error()>' handlers will be ignored and a
Packit d27c7e
processing will be terminated (going straight to `C<end_document()>')
Packit d27c7e
after the call to `C<fatal_error()>'.
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item warning( { I<PARAMS> } )
Packit d27c7e
Packit d27c7e
Receive notification of a warning. 
Packit d27c7e
Packit d27c7e
SAX parsers will use this method to report conditions that are not
Packit d27c7e
errors or fatal errors as defined by the XML 1.0 recommendation. The
Packit d27c7e
default behaviour is to take no action.
Packit d27c7e
Packit d27c7e
The SAX parser must continue to provide normal parsing events after
Packit d27c7e
invoking this method: it should still be possible for the application
Packit d27c7e
to process the document through to the end.
Packit d27c7e
Packit d27c7e
=item error( { I<PARAMS> } )
Packit d27c7e
Packit d27c7e
Receive notification of a recoverable error. 
Packit d27c7e
Packit d27c7e
This corresponds to the definition of "error" in section 1.2 of the
Packit d27c7e
W3C XML 1.0 Recommendation. For example, a validating parser would use
Packit d27c7e
this callback to report the violation of a validity constraint. The
Packit d27c7e
default behaviour is to take no action.
Packit d27c7e
Packit d27c7e
The SAX parser must continue to provide normal parsing events after
Packit d27c7e
invoking this method: it should still be possible for the application
Packit d27c7e
to process the document through to the end. If the application cannot
Packit d27c7e
do so, then the parser should report a fatal error even if the XML 1.0
Packit d27c7e
recommendation does not require it to do so.
Packit d27c7e
Packit d27c7e
=item fatal_error( { I<PARAMS> } )
Packit d27c7e
Packit d27c7e
Receive notification of a non-recoverable error. 
Packit d27c7e
Packit d27c7e
This corresponds to the definition of "fatal error" in section 1.2 of
Packit d27c7e
the W3C XML 1.0 Recommendation. For example, a parser would use this
Packit d27c7e
callback to report the violation of a well-formedness constraint.
Packit d27c7e
Packit d27c7e
The application must assume that the document is unusable after the
Packit d27c7e
parser has invoked this method, and should continue (if at all) only
Packit d27c7e
for the sake of collecting addition error messages: in fact, SAX
Packit d27c7e
parsers are free to stop reporting any other events once this method
Packit d27c7e
has been invoked.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head2 DTDHandler
Packit d27c7e
Packit d27c7e
Receive notification of basic DTD-related events. 
Packit d27c7e
Packit d27c7e
If a SAX application needs information about notations and unparsed
Packit d27c7e
entities, then the application implements this interface and provide
Packit d27c7e
an instance to the SAX parser using the parser's `C<DTDHandler>'
Packit d27c7e
parameter.  The parser uses the instance to report notation and
Packit d27c7e
unparsed entity declarations to the application.
Packit d27c7e
Packit d27c7e
The SAX parser may report these events in any order, regardless of the
Packit d27c7e
order in which the notations and unparsed entities were declared;
Packit d27c7e
however, all DTD events must be reported after the document handler's
Packit d27c7e
`C<start_document()>' event, and before the first `C<start_element()>'
Packit d27c7e
event.
Packit d27c7e
Packit d27c7e
It is up to the application to store the information for future use
Packit d27c7e
(perhaps in a hash table or object tree). If the application
Packit d27c7e
encounters attributes of type "NOTATION", "ENTITY", or "ENTITIES", it
Packit d27c7e
can use the information that it obtained through this interface to
Packit d27c7e
find the entity and/or notation corresponding with the attribute
Packit d27c7e
value.
Packit d27c7e
Packit d27c7e
Application writers who do not want to implement the entire interface
Packit d27c7e
can leave those methods undefined.  Events whose handler methods are
Packit d27c7e
undefined will be ignored.
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item notation_decl( { I<PARAMS> } )
Packit d27c7e
Packit d27c7e
Receive notification of a notation declaration event.
Packit d27c7e
Packit d27c7e
It is up to the application to record the notation for later
Packit d27c7e
reference, if necessary.
Packit d27c7e
Packit d27c7e
If a system identifier is present, and it is a URL, the SAX parser
Packit d27c7e
must resolve it fully before passing it to the application.
Packit d27c7e
Packit d27c7e
I<PARAMS>:
Packit d27c7e
Packit d27c7e
  Name        The notation name.
Packit d27c7e
  PublicId    The notation's public identifier, or undef if none was given.
Packit d27c7e
  SystemId    The notation's system identifier, or undef if none was given.
Packit d27c7e
Packit d27c7e
=item unparsed_entity_decl( { I<PARAMS> } )
Packit d27c7e
Packit d27c7e
Receive notification of an unparsed entity declaration event.
Packit d27c7e
Packit d27c7e
Note that the notation name corresponds to a notation reported by the
Packit d27c7e
`C<notation_decl()>' event. It is up to the application to record the
Packit d27c7e
entity for later reference, if necessary.
Packit d27c7e
Packit d27c7e
If the system identifier is a URL, the parser must resolve it fully
Packit d27c7e
before passing it to the application.
Packit d27c7e
Packit d27c7e
I<PARAMS>:
Packit d27c7e
Packit d27c7e
  Name        The unparsed entity's name.
Packit d27c7e
  PublicId    The entity's public identifier, or undef if none was given.
Packit d27c7e
  SystemId    The entity's system identifier (it must always have one).
Packit d27c7e
  NotationName The name of the associated notation.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head2 EntityResolver
Packit d27c7e
Packit d27c7e
Basic interface for resolving entities. 
Packit d27c7e
Packit d27c7e
If a SAX application needs to implement customized handling for
Packit d27c7e
external entities, it must implement this interface and provide an
Packit d27c7e
instance with the SAX parser using the parser's `C<EntityResolver>'
Packit d27c7e
parameter.
Packit d27c7e
Packit d27c7e
The parser will then allow the application to intercept any external
Packit d27c7e
entities (including the external DTD subset and external parameter
Packit d27c7e
entities, if any) before including them.
Packit d27c7e
Packit d27c7e
Many SAX applications will not need to implement this interface, but
Packit d27c7e
it will be especially useful for applications that build XML documents
Packit d27c7e
from databases or other specialised input sources, or for applications
Packit d27c7e
that use URI types other than URLs.
Packit d27c7e
Packit d27c7e
The application can also use this interface to redirect system
Packit d27c7e
identifiers to local URIs or to look up replacements in a catalog
Packit d27c7e
(possibly by using the public identifier).
Packit d27c7e
Packit d27c7e
=over 4
Packit d27c7e
Packit d27c7e
=item resolve_entity( { PublicId => $public_id, SystemId => $system_id } )
Packit d27c7e
Packit d27c7e
Allow the application to resolve external entities.
Packit d27c7e
Packit d27c7e
The Parser will call this method before opening any external entity
Packit d27c7e
except the top-level document entity (including the external DTD
Packit d27c7e
subset, external entities referenced within the DTD, and external
Packit d27c7e
entities referenced within the document element): the application may
Packit d27c7e
request that the parser resolve the entity itself, that it use an
Packit d27c7e
alternative URI, or that it use an entirely different input source.
Packit d27c7e
Packit d27c7e
Application writers can use this method to redirect external system
Packit d27c7e
identifiers to secure and/or local URIs, to look up public identifiers
Packit d27c7e
in a catalogue, or to read an entity from a database or other input
Packit d27c7e
source (including, for example, a dialog box).
Packit d27c7e
Packit d27c7e
If the system identifier is a URL, the SAX parser must resolve it
Packit d27c7e
fully before reporting it to the application.
Packit d27c7e
Packit d27c7e
Parameters: 
Packit d27c7e
Packit d27c7e
  PublicId    The public identifier of the external entity being
Packit d27c7e
              referenced, or undef if none was supplied. 
Packit d27c7e
  SystemId    The system identifier of the external entity being
Packit d27c7e
              referenced.
Packit d27c7e
Packit d27c7e
`C<resolve_entity()>' returns undef to request that the parser open a
Packit d27c7e
regular URI connection to the system identifier or returns a hash
Packit d27c7e
containing the same parameters as the `C<Source>' parameter to
Packit d27c7e
Parser's `C<parse()>' method, summarized here:
Packit d27c7e
Packit d27c7e
  PublicId    The public identifier of the external entity being
Packit d27c7e
              referenced, or undef if none was supplied. 
Packit d27c7e
  SystemId    The system identifier of the external entity being
Packit d27c7e
              referenced.
Packit d27c7e
  String      String containing XML text
Packit d27c7e
  ByteStream  An open file handle.
Packit d27c7e
  CharacterStream
Packit d27c7e
              An open file handle.
Packit d27c7e
  Encoding    The character encoding, if known.
Packit d27c7e
Packit d27c7e
See Parser's `C<parse()>' method for complete details on how these
Packit d27c7e
parameters interact.
Packit d27c7e
Packit d27c7e
=back
Packit d27c7e
Packit d27c7e
=head1 Contributors
Packit d27c7e
Packit d27c7e
SAX <http://www.megginson.com/SAX/> was developed collaboratively by
Packit d27c7e
the members of the XML-DEV mailing list.  Please see the ``SAX History
Packit d27c7e
and Contributors'' page for the people who did the real work behind
Packit d27c7e
SAX.  Much of the content of this document was copied from the SAX 1.0
Packit d27c7e
Java Implementation documentation.
Packit d27c7e
Packit d27c7e
The SAX for Python specification was helpful in creating this
Packit d27c7e
specification.
Packit d27c7e
<http://www.stud.ifi.uio.no/~larsga/download/python/xml/saxlib.html>
Packit d27c7e
Packit d27c7e
Thanks to the following people who contributed to Perl SAX.
Packit d27c7e
Packit d27c7e
 Eduard (Enno) Derksen
Packit d27c7e
 Ken MacLeod
Packit d27c7e
 Eric Prud'hommeaux
Packit d27c7e
 Larry Wall