Package eu.bandm.tools.dtd
Class EntityClassifier
java.lang.Object
eu.bandm.tools.rdparser.TunableParser<XMLDocumentIdentifier>
eu.bandm.tools.rdparser.TunableParserForXml<XMLDocumentIdentifier>
eu.bandm.tools.dtd.TunedDTDParser
eu.bandm.tools.dtd.EntityClassifier
Classifies DTD entities wrt
EntityRole by attempts
to parse their content against different start
symbols using a TunedDTDParser.
We proposed an extended Dtd.AttDef structure where the
usage of classified entities could be encoded as follows:
ent() means full contents of entity
ent(0) means contents at leaf nr. 0 of parse tree of entity
ent,0 means 2-tuple of reference to entity and this index number
+name
| +nameabbrev
| | +type
| | | +typeabbrev
| | | | +value
| | | | | +valueabbrev
| | | | | |
%ent; (a|b) #IMPLIED ent() ent (a|b) null #IMP null
ab %ent; #IMPLIED ab null ent() ent #IMP null
ab %ent; ab null ent(0) ent,0 ent(1) ent,1
one attdef in ent:
%ent; ent(0) ent,0 ent(1) ent,1 ent(2) ent,2
more attdefs in ent:
%ent; null ent,-1 null null null null
V
V
FIXME insert "?" in DTD.umod : AttDef
FORMAT "$tabular{0>name,20>type?,40>value?}"
insert case "only nameabbrev" in attdef ??
print "nameabbrev{asAtts}
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclassClassify entity replacement text w.r.t usability in content models.Nested classes/interfaces inherited from class eu.bandm.tools.dtd.TunedDTDParser
TunedDTDParser.ParsingFailedNested classes/interfaces inherited from class eu.bandm.tools.rdparser.TunableParser
TunableParser.CharSet, TunableParser.ExtensionalCharSet -
Field Summary
Fields inherited from class eu.bandm.tools.dtd.TunedDTDParser
currentElementName, entityUsage, errorOnExpand, generalEntities, ignoreErrors, LAZY_ENTITY_ERROR, nicePE, parameterEntities, parsingFailed, UNPARSED_CONTENTSFields inherited from class eu.bandm.tools.rdparser.TunableParserForXml
asciiLetterSet, decDigitSet, encNameSet, hexDigitSet, initialSet, nameSet, PREFIX_GE, PREFIX_PE, pubidCharSet, sNoPESet, sSet, STRINGCONSTANT_IGNORE, STRINGCONSTANT_INCLUDE, versionNumSetFields inherited from class eu.bandm.tools.rdparser.TunableParser
base, in, messageReceiver, startColumnNumber, startLineNumber, topleveldocumentid -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription"classify()" only called for entities in an AttList context.Methods inherited from class eu.bandm.tools.dtd.TunedDTDParser
attDef, attlistDecl, attType, attValue, AUX_convert, children, comment, conditionalSection, constructReplacementText, content, cp, declareGeneralEntity, declareParameterEntity, defaultDecl, dtd, elementDecl, encName, encodingDecl, entityDecl, entityValue, enumerated, eq, eRef, externalId, extSubset, fatalError, ignore, markupDecl, mixed, modifierOpt, name, niceEntityValue, nmtoken, notationDecl, parse, parse, parseId, parselocal, peRef, pi, pubidLiteral, resolve, retrieveGeneralEntity, retrieveParameterEntity, retrieveReplacementText, s, sNoPE, sOpt, sOptNoPE, storeEntityUsage, sWsOpt, systemLiteral, versionInfo, versionNum, xmlDeclMethods inherited from class eu.bandm.tools.rdparser.TunableParserForXml
lookaheadPe, prefixedEntityNameMethods inherited from class eu.bandm.tools.rdparser.TunableParser
consume, eof, error, failure, getMessageReceiver, lookahead, lookahead, lookahead, lookahead, lookahead, lookaheadEOF, match, match, match, match, matchahead, matchahead, matchahead, matchUpto, matchUpto, readExternal, setBase, setMessageReceiver, skipUpto, warning, word
-
Constructor Details
-
EntityClassifier
public EntityClassifier()
-
-
Method Details
-
classify
"classify()" only called for entities in an AttList context. Entities in content-def-context are treated in TunedDTDParser by "nicePE()", as in the original bt code !
The roles are coded in EntityRole:
N_V_ContentModel, = ident (attribute N-ame or V-alue or tag in contentm) N_V_ContentModel_IncIgn, = as above, or "IGNORE" "INCLUDE" NT, = attribute name AND type NTV, = attribute name AND type AND value NTVs, = idem, more than one of them T, = type of an attribute T_or_ContentModel, = part of a disjuntion or attribute enumeration type TV, = attribute type and value V, = attribute value (with "#FIXED" or "#IMPLIED" or with quotes) ContentModel, = must be content model IncIgn, = USED at an inc/ign place CrUdE = none of the above
-