Class EntityClassifier
Classifies DTD entities wrt
by attempts
to parse their content against different start
symbols using a TunedDTDParser
We proposed an extended Dtd.AttDef structure where the
usage of classified entities could be encoded as follows:
ent() means full contents of entity ent(0) means contents at leaf nr. 0 of parse tree of entity ent,0 means 2-tuple of reference to entity and this index number +name | +nameabbrev | | +type | | | +typeabbrev | | | | +value | | | | | +valueabbrev | | | | | | %ent; (a|b) #IMPLIED ent() ent (a|b) null #IMP null ab %ent; #IMPLIED ab null ent() ent #IMP null ab %ent; ab null ent(0) ent,0 ent(1) ent,1 one attdef in ent: %ent; ent(0) ent,0 ent(1) ent,1 ent(2) ent,2 more attdefs in ent: %ent; null ent,-1 null null null null V V
FIXME insert "?" in DTD.umod : AttDef FORMAT "$tabular{0>name,20>type?,40>value?}" insert case "only nameabbrev" in attdef ?? print "nameabbrev{asAtts}
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclass
Classify entity replacement text w.r.t usability in content models.Nested classes/interfaces inherited from class
Nested classes/interfaces inherited from class
TunableParser.CharSet, TunableParser.ExtensionalCharSet
Field Summary
Fields inherited from class
currentElementName, entityUsage, errorOnExpand, generalEntities, ignoreErrors, LAZY_ENTITY_ERROR, nicePE, parameterEntities, parsingFailed, UNPARSED_CONTENTS
Fields inherited from class
asciiLetterSet, decDigitSet, encNameSet, hexDigitSet, initialSet, nameSet, PREFIX_GE, PREFIX_PE, pubidCharSet, sNoPESet, sSet, STRINGCONSTANT_IGNORE, STRINGCONSTANT_INCLUDE, versionNumSet
Fields inherited from class
base, in, messageReceiver, topleveldocumentid
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription"classify()" only called for entities in an AttList context.Methods inherited from class
attDef, attlistDecl, attType, attValue, AUX_convert, children, comment, conditionalSection, constructReplacementText, content, cp, declareGeneralEntity, declareParameterEntity, defaultDecl, dtd, elementDecl, encName, encodingDecl, entityDecl, entityValue, enumerated, eq, eRef, externalId, extSubset, fatalError, ignore, markupDecl, mixed, modifierOpt, name, niceEntityValue, nmtoken, notationDecl, parse, parse, parseId, parselocal, peRef, pi, pubidLiteral, resolve, retrieveGeneralEntity, retrieveParameterEntity, retrieveReplacementText, s, sNoPE, sOpt, sOptNoPE, storeEntityUsage, sWsOpt, systemLiteral, versionInfo, versionNum, xmlDecl
Methods inherited from class
lookaheadPe, prefixedEntityName
Methods inherited from class
consume, eof, error, failure, getMessageReceiver, lookahead, lookahead, lookahead, lookahead, lookahead, lookaheadEOF, match, match, match, match, matchahead, matchahead, matchahead, matchUpto, matchUpto, readExternal, setBase, setMessageReceiver, skipUpto, warning, word
Constructor Details
public EntityClassifier()
Method Details
"classify()" only called for entities in an AttList context. Entities in content-def-context are treated in TunedDTDParser by "nicePE()", as in the original bt code !
The roles are coded in EntityRole:
N_V_ContentModel, = ident (attribute N-ame or V-alue or tag in contentm) N_V_ContentModel_IncIgn, = as above, or "IGNORE" "INCLUDE" NT, = attribute name AND type NTV, = attribute name AND type AND value NTVs, = idem, more than one of them T, = type of an attribute T_or_ContentModel, = part of a disjuntion or attribute enumeration type TV, = attribute type and value V, = attribute value (with "#FIXED" or "#IMPLIED" or with quotes) ContentModel, = must be content model IncIgn, = USED at an inc/ign place CrUdE = none of the above