If the SGML declaration is omitted and there is no applicable SGMLDECL entry in a catalog, the following declaration will be implied:
<!SGML "ISO 8879:1986" CHARSET BASESET "ISO 646-1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED 13 1 13 14 18 UNUSED 32 95 32 127 1 UNUSED CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN" SCOPE DOCUMENT SYNTAX SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255 BASESET "ISO 646-1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0" DESCSET 0 128 0 FUNCTION RE 13 RS 10 SPACE 32 TAB SEPCHAR 9 NAMING LCNMSTRT "" UCNMSTRT "" LCNMCHAR "-." UCNMCHAR "-." NAMECASE GENERAL YES ENTITY NO DELIM GENERAL SGMLREF SHORTREF SGMLREF NAMES SGMLREF QUANTITY SGMLREF ATTCNT 99999999 ATTSPLEN 99999999 DTEMPLEN 24000 ENTLVL 99999999 GRPCNT 99999999 GRPGTCNT 99999999 GRPLVL 99999999 LITLEN 24000 NAMELEN 99999999 PILEN 24000 TAGLEN 99999999 TAGLVL 99999999 FEATURES MINIMIZE DATATAG NO OMITTAG YES RANK YES SHORTTAG YES LINK SIMPLE YES 1000 IMPLICIT YES EXPLICIT YES 1 OTHER CONCUR NO SUBDOC YES 99999999 FORMAL YES APPINFO NONE>
with the exception that all characters that are neither significant nor shunned will be assigned to DATACHAR.
A character in a base character set is described either by giving its number in a universal character set, or by specifying a minimum literal. The constraints on the choice of universal character set are that characters that are significant in the SGML reference concrete syntax must be in the universal character set and must have the same number in the universal character set as in ISO 646 and that each character in the character set must be represented by exactly one number; that character numbers in the range 0 to 31 and 127 to 159 are control characters (for the purpose of enforcing SHUNCHAR CONTROLS). It is recommended that ISO 10646 (Unicode) be used as the universal character set, except in environments where the normal document character sets are large character set which cannot be compactly described in terms of ISO 10646. The public identifier of a base character set can be associated with an entity that describes it by using a PUBLIC entry in the catalog entry file. The entity must be a fragment of an SGML declaration consisting of the portion of a character set description, following the DESCSET keyword, that is, it must be a sequence of character descriptions, where each character description specifies a described character number, the number of characters and either a character number in the universal character set, a minimum literal or the keyword UNUSED. Character numbers in the universal character set can be as big as 99999999.
In addition SP has built in knowledge of a few character sets. These are identified using the designating sequence in the public identifier. The following designating sequences are recognized:
All the above character sets will be treated as mapping character numbers 0 to 127 inclusive as in ISO 646.
It is not necessary for every character set used in the SGML declaration to be known to SP provided that characters in the document character set that are significant both in the reference concrete syntax and in the described concrete syntax are described using known base character sets and that characters that are significant in the described concrete syntax are described using the same base character sets or the same minimum literals in both the document character set description and the syntax reference character set description.
The public identifier for a public concrete syntax can be associated with an entity that describes using a PUBLIC entry in the catalog entry file. The entity must be a fragment of an SGML declaration consisting of a concrete syntax description starting with the SHUNCHAR keyword as in an SGML declaration. The entity can also make use of the following extensions:
The public identifier for a public capacity set can be associated with an entity that describes using a PUBLIC entry in the catalog entry file. The entity must be a fragment of an SGML declaration consisting of a sequence of capacity names and numbers.
James Clark