Blob Blame History Raw
'\" t
.\" Uncomment the next line to get a man page accurate for MS-DOS
.\"nr Os 1
.\" Uncomment the next line if tracing is enabled.
.\"nr Tr 1
.if \n(.g .if !r Os .nr Os 0
.tr \(ts"
.ds S \s-1SGML\s0
.de TS
.br
.sp .5
..
.de TE
.br
.sp .5
..
.de TQ
.br
.ns
.TP \\$1
..
.TH SGMLS 1
.SH NAME
sgmls \- a validating SGML parser
.sp
An \*S System Conforming to
.if n .br
International Standard ISO 8879 \(em
.br
Standard Generalized Markup Language
.SH SYNOPSIS
.B sgmls
[
.B \-deglprsuv
]
[
.BI \-c file
]
.if \n(Os=1 \{\
[
.BI \-f file
]
.\}
[
.BI \-i name
]
.if \n(Tr \{\
[
.BI \-x flags
]
[
.BI \-y flags
]
.\}
[
.I filename\|.\|.\|.
]
.SH DESCRIPTION
.I Sgmls
parses and validates
the \*S document entity in
.I filename\|.\|.\|.
and prints on the standard output a simple \s-1ASCII\s0 representation of its
Element Structure Information Set.
(This is the information set which a structure-controlled
conforming \*S application should act upon.)
Note that the document entity may be spread amongst several files;
for example, the SGML declaration, document type declaration and document
instance set could each be in a separate file.
If no filenames are specified, then
.I sgmls
will read the document entity from the standard input.
A filename of
.B \-
can also be used to refer to the standard input.
.LP
The following options are available:
.TP
.BI \-c file
Write a report of capacity usage to
.IR file .
The report is in the format of a RACT result.
RACT is the Reference Application for Capacity Testing defined in the
Proposed American National Standard
.I
Conformance Testing for Standard Generalized Markup Language (SGL) Systems
(X3.190-199X),
Draft July 1991.
.TP
.B \-d
Warn about duplicate entity declarations.
.TP
.B \-e
Describe open entities in error messages.
Error messages always include the position of the most recently
opened external entity.
.if \n(Os=1 \{\
.TP
.BI \-f file
Redirect errors to
.IR file .
.\}
.TP
.B \-g
Show the \s-1GI\s0s of open elements in error messages.
.TP
.BI \-i name
Pretend that
.RS
.IP
.BI <!ENTITY\ %\  name\  \(tsINCLUDE\(ts>
.LP
occurs at the start of the document type declaration subset
in the \*S document entity.
Since repeated definitions of an entity are ignored,
this definition will take precedence over any other definitions
of this entity in the document type declaration.
Multiple
.B \-i
options are allowed.
If the \*S declaration replaces the reserved name
.B INCLUDE
then the new reserved name will be the replacement text of the entity.
Typically the document type declaration will contain
.IP
.BI <!ENTITY\ %\  name\  \(tsIGNORE\(ts>
.LP
and will use
.BI % name ;
in the status keyword specification of a marked section declaration.
In this case the effect of the option will be to cause the marked
section not to be ignored.
.RE
.TP
.B \-l
Output
.B L
commands giving the current line number and filename.
.TP
.B \-p
Parse only the prolog.
.I Sgmls
will exit after parsing the document type declaration.
Implies
.BR \-s .
.TP
.B \-r
Warn about defaulted references.
.TP
.B \-s
Suppress output.
Error messages will still be printed.
.TP
.B \-u
Warn about undefined elements: elements used in the DTD but not defined.
Also warn about undefined short reference maps.
.TP
.B \-v
Print the version number.
.if \n(Tr \{\
.TP
.BI \-x flags
.br
.ns
.TP
.BI \-y flags
Enable debugging output;
.B \-x
applies to the document body,
.B \-y
to the prolog.
Each character in the
.I flags
argument enables tracing of a particular activity.
.RS
.TP
.B t
Trace state transitions.
.TP
.B a
Trace attribute activity.
.TP
.B c
Trace context checking.
.TP
.B d
Trace declaration parsing.
.TP
.B e
Trace entities.
.TP
.B g
Trace groups.
.TP
.B i
Trace \s-1ID\s0s.
.TP
.B m
Trace marked sections.
.TP
.B n
Trace notations.
.RE
.\}
.SS "Entity Manager"
An external entity resides in one or more files.
The entity manager component of
.I sgmls
maps a sequence of files into an entity in three sequential stages:
.IP 1.
each carriage return character is turned into a non-SGML character;
.IP 2.
each newline character is turned into a record end character,
and at the same time
a record start character is inserted at the beginning of each line;
.IP 3.
the files are concatenated.
.LP
A system identifier is
interpreted as a list of filenames separated by
.if \n(Os=0 colons.
.if \n(Os=1 semi-colons.
A filename of
.B \-
can be used to refer to the standard input.
If no system identifier is supplied, then the entity manager will
attempt to generate a filename using the public identifier
(if there is one) and other information available to it.
Notation identifiers are not subject to this treatment.
This process is controlled by the environment variable
.BR \s-1SGML_PATH\s0 ;
this contains a
.if \n(Os=0 colon-separated
.if \n(Os=1 semicolon-separated
list of filename templates.
A filename template is a filename that may contain
substitution fields; a substitution field is a
.B %
character followed by a single letter that indicates the value
of the substitution.
If
.B \s-1SGML_PATH\s0
uses the
.B %S
field (the value of which is the system identifier),
then the entity manager will also use
.B \s-1SGML_PATH\s0
to generate a filename
when a system identifier that does not contain any
.if \n(Os=0 colons
.if \n(Os=1 semi-colons
is supplied.
The value of a substitution can either be a string
or it can be
.IR null .
The entity manager transforms the list of
filename templates into a list of filenames by substituting for each
substitution field and discarding any template
that contained a substitution field whose value was null.
It then uses the first resulting filename that exists and is readable.
Substitution values are transformed before being used for substitution:
firstly, any names that were subject to upper case substitution
are folded to lower case;
secondly,
.if \n(Os=0 \{\
.\" Unix
space characters are mapped to underscores
and slashes are mapped to percents.
.\}
.if \n(Os=1 \{\
.\" MS-DOS
the characters
.B +,./:=?
and space characters are deleted.
.\}
The value of the
.B %S
field is not transformed.
The values of substitution fields are as follows:
.TP
.B %%
A single
.BR % .
.TP
.B %D
The entity's data content notation.
This substitution will succeed only for external data entities.
.TP
.B %N
The entity, notation or document type name.
.TP
.B %P
The public identifier if there was a public identifier,
otherwise null.
.TP
.B %S
The system identifier if there was a system identifier
otherwise null.
.TP
.B %X
(This is provided mainly for compatibility with \s-1ARCSGML\s0.)
A three-letter string chosen as follows:
.LP
.RS
.ne 11
.TS
tab(&);
c|c|c s
c|c|c s
c|c|c|c
c|c|c|c
l|lB|lB|lB.
&&With public identifier
&&_
&No public&Device&Device
&identifier&independent&dependent
_
Data or subdocument entity&nsd&pns&vns
General SGML text entity&gml&pge&vge
Parameter entity&spe&ppe&vpe
Document type definition&dtd&pdt&vdt
Link process definition&lpd&plp&vlp
.TE
.LP
The device dependent version is selected if the public text class
allows a public text display version but no public text display
version was specified.
.RE
.TP
.B %Y
The type of thing for which the filename is being generated:
.TS
tab(&);
l lB.
SGML subdocument entity&sgml
Data entity&data
General text entity&text
Parameter entity&parm
Document type definition&dtd
Link process definition&lpd
.TE
.LP
The value of the following substitution fields will be null
unless a valid formal public identifier was supplied.
.TP
.B %A
Null if the text identifier in the
formal public identifier contains an unavailable text indicator,
otherwise the empty string.
.TP
.B %C
The public text class, mapped to lower case.
.TP
.B %E
The public text designating sequence (escape sequence)
if the public text class is
.BR \s-1CHARSET\s0 ,
otherwise null.
.TP
.B %I
The empty string if the owner identifier in the formal public identifier
is an \s-1ISO\s0 owner identifier,
otherwise null.
.TP
.B %L
The public text language, mapped to lower case,
unless the public text class is
.BR \s-1CHARSET\s0 ,
in which case null.
.TP
.B %O
The owner identifier (with the
.B +//
or
.B \-//
prefix stripped.)
.TP
.B %R
The empty string if the owner identifier in the formal public identifier
is a registered owner identifier,
otherwise null.
.TP
.B %T
The public text description.
.TP
.B %U
The empty string if the owner identifier in the formal public identifier
is an unregistered owner identifier,
otherwise null.
.TP
.B %V
The public text display version.
This substitution will be null if the public text class
does not allow a display version or if no version was specified.
If an empty version was specified, a value of
.B default
will be used.
.br
.ne 18
.SS "System declaration"
The system declaration for
.I sgmls
is as follows:
.LP
.TS
tab(&);
c1 s1 s1 s1 s1 s1 s1 s1 s
c s s s s s s s s
l l s s s s s s s
l l s s s s s s s
l l s s s s s s s
l l l s s s s s s
c s s s s s s s s
l l l l l l l l l
l l l l l l l l l
l l l l l l l l l
l l s s s s s s s
l l l s s s s s s
l l l s s s s s s
c s s s s s s s s
l l l l l l l l l.
SYSTEM "ISO 8879:1986"
CHARSET
BASESET&"ISO 646-1983//CHARSET
&\h'\w'"'u'International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET&0\0128\00
CAPACITY&PUBLIC&"ISO 8879:1986//CAPACITY Reference//EN"
FEATURES
MINIMIZE&DATATAG&NO&OMITTAG&YES&RANK&NO&SHORTTAG&YES
LINK&SIMPLE&NO&IMPLICIT&NO&EXPLICIT&NO
OTHER&CONCUR&NO&SUBDOC&YES 1&FORMAL&YES
SCOPE&DOCUMENT
SYNTAX&PUBLIC&"ISO 8879:1986//SYNTAX Reference//EN"
SYNTAX&PUBLIC&"ISO 8879:1986//SYNTAX Core//EN"
VALIDATE
&GENERAL&YES&MODEL&YES&EXCLUDE&YES&CAPACITY&YES
&NONSGML&YES&SGML&YES&FORMAL&YES
.T&
c s s s s s s s s
l l l l l l l l l.
SDIF
&PACK&NO&UNPACK&NO
.TE
.LP
The memory usage of
.I sgmls
is not a function of the capacity points used by a document;
however,
.I sgmls
can handle capacities significantly greater than the reference capacity set.
.LP
In some environments,
higher values may be supported for the \s-1SUBDOC\s0 parameter.
.LP
Documents that do not use optional features are also supported.
For example, if
.B FORMAL\ NO
is specified in the \*S declaration,
public identifiers will not be required to be valid formal public identifiers.
.LP
Certain parts of the concrete syntax may be changed:
.RS
.LP
The shunned character numbers can be changed.
.LP
Eight bit characters can be assigned to
\s-1LCNMSTRT\s0, \s-1UCNMSTRT\s0, \s-1LCNMCHAR\s0 and \s-1UCNMCHAR\s0.
Declaring this requires that the syntax reference character set be declared
like this:
.RS
.ne 3
.TS
tab(&);
l l.
BASESET&"ISO Registration Number 100//CHARSET
&\h'\w'"'u'ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET&0\0256\00
.TE
.RE
.LP
Uppercase substitution can be performed or not performed
both for entity names and for other names.
.LP
Either short reference delimiters assigned by the reference delimiter set
or no short reference delimiters are supported.
.LP
The reserved names can be changed.
.LP
The quantity set can be increased within certain limits
subject to there being sufficient memory available.
The upper limit on \s-1\%NAMELEN\s0 is 239.
The upper limits on
\s-1\%ATTCNT\s0, \s-1\%ATTSPLEN\s0, \s-1\%BSEQLEN\s0, \s-1\%ENTLVL\s0,
\s-1\%LITLEN\s0, \s-1\%PILEN\s0, \s-1\%TAGLEN\s0, and \s-1\%TAGLVL\s0
are more than thirty times greater than the reference limits.
The upper limit on
\s-1\%GRPCNT\s0, \s-1\%GRPGTCNT\s0, and \s-1\%GRPLVL\s0 is 253.
\s-1\%NORMSEP\s0
cannot be changed.
\s-1\%DTAGLEN\s0
are
\s-1\%DTEMPLEN\s0
irrelevant since
.I sgmls
does not support the
\s-1\%DATATAG\s0
feature.
.RE
.SS "\*S declaration"
The \*S declaration may be omitted,
the following declaration will be implied:
.TS
tab(&);
c1 s1 s1 s1 s1 s1 s1 s1 s
c s s s s s s s s
l l s s s s s s s.
<!SGML "ISO 8879:1986"
CHARSET
BASESET&"ISO 646-1983//CHARSET
&\h'\w'"'u'International Reference Version (IRV)//ESC 2/5 4/0"
DESCSET&\0\00\0\09\0UNUSED
&\0\09\0\02\0\09
&\011\0\02\0UNUSED
&\013\0\01\013
&\014\018\0UNUSED
&\032\095\032
&127\0\01\0UNUSED
.T&
l l l s s s s s s
l l s s s s s s s
l l l s s s s s s
c s s s s s s s s
l l l l l l l l l.
CAPACITY&PUBLIC&"ISO 8879:1986//CAPACITY Reference//EN"
SCOPE&DOCUMENT
SYNTAX&PUBLIC&"ISO 8879:1986//SYNTAX Reference//EN"
FEATURES
MINIMIZE&DATATAG&NO&OMITTAG&YES&RANK&NO&SHORTTAG&YES
LINK&SIMPLE&NO&IMPLICIT&NO&EXPLICIT&NO
OTHER&CONCUR&NO&SUBDOC&YES 99999999&FORMAL&YES
.T&
c s s s s s s s s.
APPINFO NONE>
.TE
with the exception that characters 128 through 254 will be assigned to
\s-1DATACHAR\s0.
When exporting documents that use characters in this range,
an accurate description of the upper half of the document character set
should be added to this declaration.
For ISO Latin-1, an appropriate description would be:
.br
.ne 5
.TS
tab(&);
l l.
BASESET&"ISO Registration Number 100//CHARSET
&\h'\w'"'u'ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
DESCSET&128\032\0UNUSED
&160\095\032
&255\0\01\0UNUSED
.TE
.SS "Output format"
The output is a series of lines.
Lines can be arbitrarily long.
Each line consists of an initial command character
and one or more arguments.
Arguments are separated by a single space,
but when a command takes a fixed number of arguments
the last argument can contain spaces.
There is no space between the command character and the first argument.
Arguments can contain the following escape sequences.
.TP
.B \e\e
A
.BR \e.
.TP
.B \en
A record end character.
.TP
.B \e|
Internal \s-1SDATA\s0 entities are bracketed by these.
.TP
.BI \e nnn
The character whose code is
.I nnn
octal.
.LP
A record start character will be represented by
.BR \e012 .
Most applications will need to ignore
.B \e012
and translate
.B \en
into newline.
.LP
The possible command characters and arguments are as follows:
.TP
.BI ( gi
The start of an element whose generic identifier is
.IR gi .
Any attributes for this element
will have been specified with
.B A
commands.
.TP
.BI ) gi
The end an element whose generic identifier is
.IR gi .
.TP
.BI \- data
Data.
.TP
.BI & name
A reference to an external data entity
.IR name ;
.I name
will have been defined using an
.B E
command.
.TP
.BI ? pi
A processing instruction with data
.IR pi .
.TP
.BI A name\ val
The next element to start has an attribute
.I name
with value
.I val
which takes one of the following forms:
.RS
.TP
.B IMPLIED
The value of the attribute is implied.
.TP
.BI CDATA\  data
The attribute is character data.
This is used for attributes whose declared value is
.BR \s-1CDATA\s0 .
.TP
.BI NOTATION\  nname
The attribute is a notation name;
.I nname
will have been defined using a
.B N
command.
This is used for attributes whose declared value is
.BR \s-1NOTATION\s0 .
.TP
.BI ENTITY\  name\|.\|.\|.
The attribute is a list of general entity names.
Each entity name will have been defined using an
.BR I ,
.B E
or
.B S
command.
This is used for attributes whose declared value is
.B \s-1ENTITY\s0
or
.BR \s-1ENTITIES\s0 .
.TP
.BI TOKEN\  token\|.\|.\|.
The attribute is a list of tokens.
This is used for attributes whose declared value is anything else.
.RE
.TP
.BI D ename\ name\ val
This is the same as the
.B A
command, except that it specifies a data attribute for an
external entity named
.IR ename .
Any
.B D
commands will come after the
.B E
command that defines the entity to which they apply, but
before any
.B &
or
.B A
commands that reference the entity.
.TP
.BI N nname
.IR nname.
Define a notation
This command will be preceded by a
.B p
command if the notation was declared with a public identifier,
and by a
.B s
command if the notation was declared with a system identifier.
A notation will only be defined if it is to be referenced
in an
.B E
command or in an
.B A
command for an attribute with a declared value of
.BR \s-1NOTATION\s0 .
.TP
.BI E ename\ typ\ nname
Define an external data entity named
.I ename
with type
.I typ
.RB ( \s-1CDATA\s0 ,
.B \s-1NDATA\s0
or
.BR \s-1SDATA\s0 )
and notation
.IR not.
This command will be preceded by one or more
.B f
commands giving the filenames generated by the entity manager from the system
and public identifiers,
by a
.B p
command if a public identifier was declared for the entity,
and by a
.B s
command if a system identifier was declared for the entity.
.I not
will have been defined using a
.B N
command.
Data attributes may be specified for the entity using
.B D
commands.
An external data entity will only be defined if it is to be referenced in a
.B &
command or in an
.B A
command for an attribute whose declared value is
.B \s-1ENTITY\s0
or
.BR \s-1ENTITIES\s0 .
.TP
.BI I ename\ typ\ text
Define an internal data entity named
.I ename
with type
.I typ
.RB ( \s-1CDATA\s0
or
.BR \s-1SDATA\s0 )
and entity text
.IR text .
An internal data entity will only be defined if it is referenced in an
.B A
command for an attribute whose declared value is
.B \s-1ENTITY\s0
or
.BR \s-1ENTITIES\s0 .
.TP
.BI S ename
Define a subdocument entity named
.IR ename .
This command will be preceded by one or more
.B f
commands giving the filenames generated by the entity manager from the system
and public identifiers,
by a
.B p
command if a public identifier was declared for the entity,
and by a
.B s
command if a system identifier was declared for the entity.
A subdocument entity will only be defined if it is referenced
in a
.B {
command
or in an
.B A
command for an attribute whose declared value is
.B \s-1ENTITY\s0
or
.BR \s-1ENTITIES\s0 .
.TP
.BI s sysid
This command applies to the next
.BR E ,
.B S
or
.B N
command and specifies the associated system identifier.
.TP
.BI p pubid
This command applies to the next
.BR E ,
.B S
or
.B N
command and specifies the associated public identifier.
.TP
.BI f filename
This command applies to the next
.B E
or
.B S
command and specifies an associated filename.
There will be more than one
.B f
command for a single
.B E
or
.B S
command if the system identifier used a
.if \n(Os=0 colon.
.if \n(Os=1 semi-colon.
.TP
.BI { ename
The start of the \*S subdocument entity
.IR ename ;
.I ename
will have been defined using a
.B S
command.
.TP
.BI } ename
The end of the \*S subdocument entity
.IR ename .
.TP
.BI L lineno\ file
.TQ
.BI L lineno
Set the current line number and filename.
The
.I filename
argument will be omitted if only the line number has changed.
This will be output only if the
.B \-l
option has been given.
.TP
.BI # text
An \s-1APPINFO\s0 parameter of
.I text
was specified in the \*S declaration.
This is not strictly part of the ESIS, but a structure-controlled
application is permitted to act on it.
No
.B #
command will be output if
.B \s-1APPINFO\s0\ \s-1NONE\s0
was specified.
A
.B #
command will occur at most once,
and may be preceded only by a single
.B L
command.
.TP
.B C
This command indicates that the document was a conforming \*S document.
If this command is output, it will be the last command.
An \*S document is not conforming if it references a subdocument entity
that is not conforming.
.SH BUGS
Some non-SGML characters in literals are counted as two characters for the
purposes of quantity and capacity calculations.
.SH "SEE ALSO"
The \*S Handbook, Charles F. Goldfarb
.br
\s-1ISO\s0 8879 (Standard Generalized Markup Language),
International Organization for Standardization
.SH ORIGIN
\s-1ARCSGML\s0 was written by Charles F. Goldfarb.
.LP
.I Sgmls
was derived from \s-1ARCSGML\s0 by James Clark (jjc@jclark.com),
to whom bugs should be reported.