Blob Blame History Raw



SGMLS(1)                                                 SGMLS(1)


NAME
       sgmls - a validating SGML parser

       An SGML System Conforming to
       International Standard ISO 8879 --
       Standard Generalized Markup Language

SYNOPSIS
       sgmls  [  -deglprsuv ] [ -cfile ] [ -iname ] [ filename...
       ]

DESCRIPTION
       Sgmls parses and validates the  SGML  document  entity  in
       filename...   and  prints  on the standard output a simple
       ASCII representation of its Element Structure  Information
       Set.   (This  is  the  information  set which a structure-
       controlled conforming SGML application should  act  upon.)
       Note  that  the document entity may be spread amongst sev-
       eral files; for example, the  SGML  declaration,  document
       type  declaration  and document instance set could each be
       in a separate file.  If no filenames are  specified,  then
       sgmls  will  read  the  document  entity from the standard
       input.  A filename of - can also be used to refer  to  the
       standard input.

       The following options are available:

       -cfile Write  a  report  of  capacity  usage to file.  The
              report is in the format of a RACT result.  RACT  is
              the  Reference  Application  for  Capacity  Testing
              defined in the Proposed American National  Standard
              Conformance Testing for Standard Generalized Markup
              Language (SGL) Systems  (X3.190-199X),  Draft  July
              1991.

       -d     Warn about duplicate entity declarations.

       -e     Describe  open  entities  in error messages.  Error
              messages always include the position  of  the  most
              recently opened external entity.

       -g     Show the GIs of open elements in error messages.

       -iname Pretend that

                     <!ENTITY % name "INCLUDE">

              occurs  at  the start of the document type declara-
              tion subset in the  SGML  document  entity.   Since
              repeated definitions of an entity are ignored, this
              definition will take precedence over any other def-
              initions of this entity in the document type decla-
              ration.  Multiple -i options are allowed.   If  the
              SGML declaration replaces the reserved name INCLUDE



                                                                1





SGMLS(1)                                                 SGMLS(1)


              then the new reserved name will be the  replacement
              text  of  the  entity.  Typically the document type
              declaration will contain

                     <!ENTITY % name "IGNORE">

              and will use %name; in the status keyword  specifi-
              cation  of  a  marked section declaration.  In this
              case the effect of the option will be to cause  the
              marked section not to be ignored.

       -l     Output  L  commands  giving the current line number
              and filename.

       -p     Parse only the prolog.  Sgmls will exit after pars-
              ing the document type declaration.  Implies -s.

       -r     Warn about defaulted references.

       -s     Suppress  output.   Error  messages  will  still be
              printed.

       -u     Warn about undefined elements: elements used in the
              DTD  but  not  defined.   Also warn about undefined
              short reference maps.

       -v     Print the version number.

   Entity Manager
       An external entity resides in  one  or  more  files.   The
       entity manager component of sgmls maps a sequence of files
       into an entity in three sequential stages:

       1.     each carriage return character  is  turned  into  a
              non-SGML character;

       2.     each  newline character is turned into a record end
              character, and at the  same  time  a  record  start
              character  is  inserted  at  the  beginning of each
              line;

       3.     the files are concatenated.

       A system identifier is interpreted as a list of  filenames
       separated by colons.  A filename of - can be used to refer
       to the standard input.  If no system  identifier  is  sup-
       plied,  then the entity manager will attempt to generate a
       filename using the public identifier (if there is one) and
       other  information  available to it.  Notation identifiers
       are not subject to this treatment.  This process  is  con-
       trolled  by  the environment variable SGML_PATH; this con-
       tains a colon-separated list  of  filename  templates.   A
       filename template is a filename that may contain substitu-
       tion  fields;  a  substitution  field  is  a  %  character



                                                                2





SGMLS(1)                                                 SGMLS(1)


       followed  by  a  single letter that indicates the value of
       the substitution.  If SGML_PATH uses  the  %S  field  (the
       value  of which is the system identifier), then the entity
       manager will also use SGML_PATH  to  generate  a  filename
       when  a system identifier that does not contain any colons
       is supplied.  The value of a substitution can either be  a
       string  or  it can be null.  The entity manager transforms
       the list of filename templates into a list of filenames by
       substituting  for  each  substitution field and discarding
       any template that contained  a  substitution  field  whose
       value was null.  It then uses the first resulting filename
       that exists and  is  readable.   Substitution  values  are
       transformed  before  being used for substitution: firstly,
       any names that were subject to upper case substitution are
       folded  to  lower  case;  secondly,  space  characters are
       mapped to underscores and slashes are mapped to  percents.
       The  value of the %S field is not transformed.  The values
       of substitution fields are as follows:

       %%     A single %.

       %D     The entity's data content notation.  This substitu-
              tion  will succeed only for external data entities.

       %N     The entity, notation or document type name.

       %P     The public identifier if there was a public identi-
              fier, otherwise null.

       %S     The system identifier if there was a system identi-
              fier otherwise null.

       %X     (This is provided  mainly  for  compatibility  with
              ARCSGML.)  A three-letter string chosen as follows:
                                         |            |
                                         |            | With public identifier
                                         |            +-------------+-----------
                                         | No public  |   Device    |  Device
                                         | identifier | independent | dependent
              ---------------------------+------------+-------------+-----------
              Data or subdocument entity | nsd        | pns         | vns
              General SGML text entity   | gml        | pge         | vge
              Parameter entity           | spe        | ppe         | vpe
              Document type definition   | dtd        | pdt         | vdt
              Link process definition    | lpd        | plp         | vlp

              The device dependent version  is  selected  if  the
              public text class allows a public text display ver-
              sion but no public text display version was  speci-
              fied.

       %Y     The  type  of thing for which the filename is being
              generated:




                                                                3





SGMLS(1)                                                 SGMLS(1)


              SGML subdocument entity    sgml
              Data entity                data
              General text entity        text
              Parameter entity           parm
              Document type definition   dtd
              Link process definition    lpd

       The value of the following  substitution  fields  will  be
       null unless a valid formal public identifier was supplied.

       %A     Null if the text identifier in  the  formal  public
              identifier  contains an unavailable text indicator,
              otherwise the empty string.

       %C     The public text class, mapped to lower case.

       %E     The  public  text  designating   sequence   (escape
              sequence) if the public text class is CHARSET, oth-
              erwise null.

       %I     The empty string if the  owner  identifier  in  the
              formal  public  identifier  is an ISO owner identi-
              fier, otherwise null.

       %L     The public text language,  mapped  to  lower  case,
              unless  the  public text class is CHARSET, in which
              case null.

       %O     The owner identifier (with the +//  or  -//  prefix
              stripped.)

       %R     The  empty  string  if  the owner identifier in the
              formal public  identifier  is  a  registered  owner
              identifier, otherwise null.

       %T     The public text description.

       %U     The  empty  string  if  the owner identifier in the
              formal public identifier is an  unregistered  owner
              identifier, otherwise null.

       %V     The public text display version.  This substitution
              will be null if the  public  text  class  does  not
              allow a display version or if no version was speci-
              fied.  If an empty version was specified,  a  value
              of default will be used.











                                                                4





SGMLS(1)                                                 SGMLS(1)


   System declaration
       The system declaration for sgmls is as follows:

                          SYSTEM "ISO 8879:1986"
                                  CHARSET
       BASESET  "ISO 646-1983//CHARSET
                 International Reference Version (IRV)//ESC 2/5 4/0"
       DESCSET  0 128 0
       CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
                                 FEATURES
       MINIMIZE DATATAG NO  OMITTAG  YES   RANK     NO  SHORTTAG YES
       LINK     SIMPLE  NO  IMPLICIT NO    EXPLICIT NO
       OTHER    CONCUR  NO  SUBDOC   YES 1 FORMAL   YES
       SCOPE    DOCUMENT
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Core//EN"
                                 VALIDATE
                GENERAL YES MODEL    YES   EXCLUDE  YES CAPACITY YES
                NONSGML YES SGML     YES   FORMAL   YES
                                   SDIF
                PACK    NO  UNPACK   NO

       The  memory usage of sgmls is not a function of the capac-
       ity points used by a document; however, sgmls  can  handle
       capacities significantly greater than the reference capac-
       ity set.

       In some environments, higher values may be  supported  for
       the SUBDOC parameter.

       Documents  that do not use optional features are also sup-
       ported.  For example, if FORMAL NO  is  specified  in  the
       SGML  declaration, public identifiers will not be required
       to be valid formal public identifiers.

       Certain parts of the concrete syntax may be changed:

              The shunned character numbers can be changed.

              Eight bit characters can be assigned  to  LCNMSTRT,
              UCNMSTRT,  LCNMCHAR  and  UCNMCHAR.  Declaring this
              requires that the syntax reference character set be
              declared like this:
                     BASESET   "ISO Registration Number 100//CHARSET
                                ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
                     DESCSET   0 256 0

              Uppercase substitution can be performed or not per-
              formed both for entity names and for other names.

              Either short reference delimiters assigned  by  the
              reference  delimiter  set  or  no  short  reference
              delimiters are supported.




                                                                5





SGMLS(1)                                                 SGMLS(1)


              The reserved names can be changed.

              The quantity set can be  increased  within  certain
              limits  subject  to  there  being sufficient memory
              available.  The upper limit on NAMELEN is 239.  The
              upper  limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
              LITLEN, PILEN, TAGLEN, and  TAGLVL  are  more  than
              thirty  times  greater  than  the reference limits.
              The upper limit on GRPCNT, GRPGTCNT, and GRPLVL  is
              253.   NORMSEP  cannot  be  changed.   DTAGLEN  are
              DTEMPLEN irrelevant since sgmls  does  not  support
              the DATATAG feature.

   SGML declaration
       The  SGML declaration may be omitted, the following decla-
       ration will be implied:
                             <!SGML "ISO 8879:1986"
                                     CHARSET
       BASESET  "ISO 646-1983//CHARSET
                 International Reference Version (IRV)//ESC 2/5 4/0"
       DESCSET    0  9 UNUSED
                  9  2  9
                 11  2 UNUSED
                 13  1 13
                 14 18 UNUSED
                 32 95 32
                127  1 UNUSED
       CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
       SCOPE    DOCUMENT
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
                                    FEATURES
       MINIMIZE DATATAG NO OMITTAG  YES          RANK     NO  SHORTTAG YES
       LINK     SIMPLE  NO IMPLICIT NO           EXPLICIT NO
       OTHER    CONCUR  NO SUBDOC   YES 99999999 FORMAL   YES
                                  APPINFO NONE>
       with the exception that characters 128 through 254 will be
       assigned  to  DATACHAR.  When exporting documents that use
       characters in this range, an accurate description  of  the
       upper  half  of the document character set should be added
       to this declaration.   For  ISO  Latin-1,  an  appropriate
       description would be:
       BASESET   "ISO Registration Number 100//CHARSET
                  ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
       DESCSET   128 32 UNUSED
                 160 95 32
                 255  1 UNUSED

   Output format
       The output is a series of lines.  Lines can be arbitrarily
       long.  Each line consists of an initial command  character
       and  one  or more arguments.  Arguments are separated by a
       single space, but when a command takes a fixed  number  of
       arguments  the last argument can contain spaces.  There is
       no space between  the  command  character  and  the  first



                                                                6





SGMLS(1)                                                 SGMLS(1)


       argument.   Arguments  can  contain  the  following escape
       sequences.

       \\     A \.

       \n     A record end character.

       \|     Internal SDATA entities are bracketed by these.

       \nnn   The character whose code is nnn octal.

       A record start character  will  be  represented  by  \012.
       Most  applications  will need to ignore \012 and translate
       \n into newline.

       The possible command characters and arguments are as  fol-
       lows:

       (gi    The start of an element whose generic identifier is
              gi.  Any attributes for this element will have been
              specified with A commands.

       )gi    The  end an element whose generic identifier is gi.

       -data  Data.

       &name  A reference to an external data entity  name;  name
              will have been defined using an E command.

       ?pi    A processing instruction with data pi.

       Aname val
              The  next  element  to  start has an attribute name
              with value val which takes  one  of  the  following
              forms:

              IMPLIED
                     The value of the attribute is implied.

              CDATA data
                     The  attribute  is  character data.  This is
                     used for attributes whose declared value  is
                     CDATA.

              NOTATION nname
                     The attribute is a notation name; nname will
                     have been defined using a N  command.   This
                     is  used for attributes whose declared value
                     is NOTATION.

              ENTITY name...
                     The attribute is a list  of  general  entity
                     names.   Each  entity  name  will  have been
                     defined using an I, E or S command.  This is



                                                                7





SGMLS(1)                                                 SGMLS(1)


                     used  for attributes whose declared value is
                     ENTITY or ENTITIES.

              TOKEN token...
                     The attribute is a list of tokens.  This  is
                     used  for attributes whose declared value is
                     anything else.

       Dename name val
              This is the same as the A command, except  that  it
              specifies  a  data attribute for an external entity
              named ename.  Any D commands will come after the  E
              command  that  defines  the  entity  to  which they
              apply, but before any & or A commands  that  refer-
              ence the entity.

       Nnname nname.  Define a notation This command will be pre-
              ceded by a p command if the notation  was  declared
              with a public identifier, and by a s command if the
              notation was declared with a system identifier.   A
              notation will only be defined if it is to be refer-
              enced in an E command or in an  A  command  for  an
              attribute with a declared value of NOTATION.

       Eename typ nname
              Define  an  external  data  entity named ename with
              type typ (CDATA, NDATA or SDATA) and notation  not.
              This command will be preceded by one or more f com-
              mands giving the filenames generated by the  entity
              manager  from the system and public identifiers, by
              a p command if a public identifier was declared for
              the  entity, and by a s command if a system identi-
              fier was declared for the entity.   not  will  have
              been  defined  using  a N command.  Data attributes
              may be specified for the entity using  D  commands.
              An  external data entity will only be defined if it
              is to be referenced in a & command or in an A  com-
              mand  for  an  attribute  whose  declared  value is
              ENTITY or ENTITIES.

       Iename typ text
              Define an internal data  entity  named  ename  with
              type typ (CDATA or SDATA) and entity text text.  An
              internal data entity will only be defined if it  is
              referenced  in  an A command for an attribute whose
              declared value is ENTITY or ENTITIES.

       Sename Define a subdocument entity named ename.  This com-
              mand  will  be  preceded  by one or more f commands
              giving the filenames generated by the  entity  man-
              ager from the system and public identifiers, by a p
              command if a public identifier was declared for the
              entity,  and  by a s command if a system identifier
              was declared for the entity.  A subdocument  entity



                                                                8





SGMLS(1)                                                 SGMLS(1)


              will  only  be  defined  if it is referenced in a {
              command or in an A command for an  attribute  whose
              declared value is ENTITY or ENTITIES.

       ssysid This  command applies to the next E, S or N command
              and specifies the associated system identifier.

       ppubid This command applies to the next E, S or N  command
              and specifies the associated public identifier.

       ffilename
              This command applies to the next E or S command and
              specifies an associated filename.   There  will  be
              more than one f command for a single E or S command
              if the system identifier used a colon.

       {ename The start of the  SGML  subdocument  entity  ename;
              ename will have been defined using a S command.

       }ename The end of the SGML subdocument entity ename.

       Llineno file
       Llineno
              Set  the  current  line  number  and filename.  The
              filename argument will be omitted if only the  line
              number  has  changed.   This will be output only if
              the -l option has been given.

       #text  An APPINFO parameter of text was specified  in  the
              SGML declaration.  This is not strictly part of the
              ESIS, but  a  structure-controlled  application  is
              permitted  to act on it.  No # command will be out-
              put if APPINFO NONE was  specified.   A  #  command
              will  occur  at most once, and may be preceded only
              by a single L command.

       C      This command indicates that the document was a con-
              forming  SGML document.  If this command is output,
              it will be the last command.  An SGML  document  is
              not  conforming  if  it  references  a  subdocument
              entity that is not conforming.

BUGS
       Some non-SGML characters in literals are  counted  as  two
       characters  for the purposes of quantity and capacity cal-
       culations.

SEE ALSO
       The SGML Handbook, Charles F. Goldfarb
       ISO 8879 (Standard Generalized Markup Language),  Interna-
       tional Organization for Standardization

ORIGIN
       ARCSGML was written by Charles F. Goldfarb.



                                                                9





SGMLS(1)                                                 SGMLS(1)


       Sgmls   was   derived   from   ARCSGML   by   James  Clark
       (jjc@jclark.com), to whom bugs should be reported.























































                                                               10