Class EntityClassifier


public class EntityClassifier extends TunedDTDParser
Classifies DTD entities wrt EntityRole by attempts to parse their content against different start symbols using a TunedDTDParser. We proposed an extended Dtd.AttDef structure where the usage of classified entities could be encoded as follows:
  ent() means full contents of entity
  ent(0) means contents at leaf nr. 0 of parse tree of entity
  ent,0 means 2-tuple of reference to entity and this index number 

                               +name
                               |       +nameabbrev
                               |       |       +type
                               |       |       |       +typeabbrev
                               |       |       |       |       +value
                               |       |       |       |       |       +valueabbrev
                               |       |       |       |       |       |

  %ent;  (a|b)  #IMPLIED       ent()   ent     (a|b)   null    #IMP    null  
  ab     %ent;  #IMPLIED       ab      null    ent()   ent     #IMP    null
  ab     %ent;                 ab      null    ent(0)  ent,0   ent(1)  ent,1
one attdef in ent:
  %ent;                        ent(0)  ent,0   ent(1)  ent,1   ent(2)  ent,2
more attdefs in ent:
  %ent;                        null    ent,-1  null    null    null    null
                                        V
                                        V
 
FIXME insert "?" in DTD.umod : AttDef
       FORMAT "$tabular{0>name,20>type?,40>value?}"
     insert case "only nameabbrev" in attdef ??
     print "nameabbrev{asAtts}
 
  • Constructor Details

    • EntityClassifier

      public EntityClassifier()
  • Method Details

    • classify

      public EntityRole classify(String text)
      "classify()" only called for entities in an AttList context. Entities in content-def-context are treated in TunedDTDParser by "nicePE()", as in the original bt code !
      The roles are coded in EntityRole:
      N_V_ContentModel, = ident (attribute N-ame or V-alue or tag in contentm) N_V_ContentModel_IncIgn, = as above, or "IGNORE" "INCLUDE" NT, = attribute name AND type NTV, = attribute name AND type AND value NTVs, = idem, more than one of them T, = type of an attribute T_or_ContentModel, = part of a disjuntion or attribute enumeration type TV, = attribute type and value V, = attribute value (with "#FIXED" or "#IMPLIED" or with quotes) ContentModel, = must be content model IncIgn, = USED at an inc/ign place CrUdE = none of the above