Package eu.bandm.tools.d2d2.base
Class Text2Udom
java.lang.Object
eu.bandm.tools.d2d2.base.Text2Udom
Parse a d2d text input into xml nodes.
- 
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected classThrown to signal to an upper processing loop that parsing cannot be continued.(package private) classDedicated exception to signal this fact to the upper execution loop.static classConfiguration object for the different ways of reacting to input errors.(package private) static enumKind of the text to parse.static enumRealizes the parsing state on lower=character level.protected classClass to signal an attempt to read beyond the limit of the input data to the upper processing loop. - 
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final Map<Enumeration, String[]> protected final Navigate.CharSetCalcInstance to evaluate character expessions.protected Text2Udom.parsingStateCurrent micro-state, values fromText2Udom.parsingState.static final Text2Udom.ErrorStrategyEvident.protected final Text2Udom.ErrorStrategyApplied error strategy, different for interactive and programatic use, etc.static final Text2Udom.ErrorStrategyEvident, Allows partial documents and prints the stack context of a parsing error to the console.protected Text2Udom.parsingStateNeeded to treatsetcommand,setcommentand numeric input (i.e.protected booleanWhether we come from an xslt element.protected List<ResultingChars> Accumulator for ws, not yet clear whether to deliver or to ignore.static final Stringprotected final MessageCounterTarget of most messages.protected Definitionprotected Definitionprotected Definitionprotected Definitionprotected Definitionprotected Definitionprotected Definitionprotected Definitionprotected Definition(package private) Text2Udom.modesKind of the text to parse.protected final MessageReceiver<SimpleMessage<String>> Target of most messages.static final Text2Udom.ErrorStrategyUsed for situations when basic diagnosis data is required by the "programmer" of some using code, not by an interactive user.protected MemScanner<String> Single source of text input.protected ResultingStructureFor reporting skipped inputprotected StateThe currently growing stackframe, seeState.(package private) final TextFileHeaderThe result of parsing the input header.protected ResultingStructureToplevel result of parsing the input text.protected final intDetermines the verbosity:==0stands for complete silence==1stands for minimal output: few loggings and not all warnings==2, 3stands for more output:loggings and warnings==10shows some synthesized source texts==20stands for full debugging.(package private) final Map<Integer, ResultingChars> protected intHow many command characters in verbatim mode will not generated a warning.protected final MessageReceiver<SimpleMessage<XMLDocumentIdentifier>> Target of messages for some called classes which generate XML-file-locations.protected @Opt ExpressionContains all xslt tags which can appear anywhere in the target elements.protected ExpressionContains all xslt tags which can appear anywhere in the target elements.protected ModuleThe loaded xslt module. - 
Constructor Summary
ConstructorsConstructorDescriptionText2Udom(MessageReceiver<SimpleMessage<String>> msg, Text2Udom.ErrorStrategy errorStrategy, int traceLevel) Only constructor. - 
Method Summary
Modifier and TypeMethodDescriptionprotected booleanaccept_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Lets the scanner accept and discard an end tag, either normal or forced or by a parenthesis character.protected voidaddRestPerm(State_perm permstate, CheckedList<Expression> misslist) Add obligate members of a "perm" expression to the term which describes the missing input.protected voidprotected booleanprotected static @Opt ResultingStructureconsume_enumeration(MemScanner<String> scan, Enumeration etype) Used by tag parsers AND character parsers (with multiple parallel scanners !)protected voiddeliver(ResultingChars chars) protected voidAppend the argument to the top-most resulting structure.protected voidprotected voidDeliver the last scanned character data, and set the "is whitespace" flag according to the token type returned by scanner.protected voiddeliver_numeric(Location<String> loc, int val) Append the unicode character to the top-most resulting structure.protected voidAppend the argument to the top-most resulting structure which is not encoded as xml attribute.protected voidprotected voiddigest()Main loop, consumes input data according to current "micro"-statecurrentState.protected voidWhen an element may receive character data, and some non-ws character data has been recognized at the reading position.protected booleanAfter a command char has been read.protected voidInitial (micro-)state, or after an explicit close.protected voidMicro-state for error recovery: skip until command char and then goto tag-reading mode again.protected voidParses "verbatim input" mode, which (a) requires an explicit#/tagend tag, and (b) accepts only tags of sub-elements immediatly contained in its definition.protected voidEmits an error message and further context information depending on the values inerrorStrategy.protected voidCallserror(eu.bandm.tools.location.Location<java.lang.String>,java.lang.String,java.lang.Object...)with no location.protected voidThrows a corresponding message exception.protected State_singletonCalled fromreturn_to_upper_input_mode().protected State_singletonfind_top_singleton(boolean mustBeNonAtt) fromFile(File f, ModuleRegistry moduleRegistry) Parse the contents of the given file.fromFile(String s, ModuleRegistry moduleRegistry) Parse the contents of the file found at the given location.fromFile(String locationText, File f, ModuleRegistry moduleRegistry) Parse the contents of the given file, using the given location text for all error messages.fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry) Parse the contents of the given MemString object, which includes a text type declaration header.fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry, @Opt XRegExp toplevelXRegExp, Text2Udom.modes mode) Parse the contents of the given MemString object, containing just the text body.fromReader(String locationText, Reader r, ModuleRegistry moduleRegistry) Parse the contents of the given file, using the given location text for all error messages.Get all xslt input name spaces, as defined in the header of the input file.protected voidEmits hint ifftraceLevel> 0.protected voidhint_xml(Location<XMLDocumentIdentifier> loc, String text, Object... args) Emits message ifftraceLevel> 0.protected voidignore_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Called after an empty tag, a parser or an enum have been consumed.static voidinsertPlainChars(ResultingStructure host, Definition tag, String chars) Aux function to insert synthesized character data in the output, which is NOT contained in the source, e.g.protected booleanisXsltDef(Definition def) Whether a definition is an xslt element.(package private) booleanWhether to parse an XSLT source.protected DefinitionloadMetaDefinition(Map<String, Definition> defs, String name) Find the definition from the loaded metamodule.protected voidloadMetaModule(ModuleRegistry moduleRegistry) Load the general d2d meta module und find the required definitions.protected voidEmits message ifftraceLevel> 0.protected voidEmits message ifftraceLevel> 0.protected voidmakeGlobalSkipContainer(ResultingStructure errormsg) static void/protected booleanprotected voidprocess_char_parser_error(String tag, MemString<String> charsStart) Is called after a non-acceptions by char parser or by an enumeration has been detected.protected voidCalled from digest_nothing_open/look_for_tag/consume_chars, in case that closing parenthesis character has been recognized.protected voidprocess_close_tag(String tag, boolean force) Called by digest_look_for_tag, process_open_tag (in case of empty element declarations), process_close_char.protected voidprocess_close_tag_inner(@Opt String tag, boolean force, Location<String> closeTagDefLoc) Search the state space for "nexttag" as possible close tag.protected booleanprocess_open_tag(String tag, boolean isCharData) Only method for processing an "open" tag.protected voidreport_assumed_xslt_output(CheckedList<Expression> misslist) Is called when an target-language open/close tag has been found after an xslt template call, to report the necessary expansion.protected voidreport_missing_elements(boolean isOpen, String tag, boolean frameFound, CheckedList<Expression> misslist, List<SimpleMessage<String>> messlist, Location<String> closeTagDefLoc) this proc can be called fromprocess_open_tag(String, boolean)orprocess_close_tag(String, boolean).
Basically, there are two classes of error recovery:
framefound = true ==> the tag is KNOWN, but tags before are missing, --- then continue with a shrunk stack state.
framefound = false ==> the tag is NOT KNOWN, then discard all input up to the next tag and try again.protected voidCalled to re-enter the input mode (verbatim or nothing_open) after an explicit close (tags had been openend and closed) or an implicit close (enum or character parser have consumed enough) has happened.
Assume the stack frame of the parser/enum/closed tag is already popped from the stack.
Assume that "inputmode verbatim" is only allowed with "content = (#PCDATA|..)*".protected static String[]protected voidEmits warning ifftraceLevel> 0.protected voidEmits warning ifftraceLevel> 0. 
- 
Field Details
- 
default_ErrorStrategy
Evident. See the default field values of this class. - 
interactive_ErrorStrategy
Evident, Allows partial documents and prints the stack context of a parsing error to the console. - 
non_interactive_ErrorStrategy
Used for situations when basic diagnosis data is required by the "programmer" of some using code, not by an interactive user. - 
scanner
Single source of text input. - 
msg
Target of most messages. - 
messageCounter
Target of most messages. - 
xml_msg
Target of messages for some called classes which generate XML-file-locations. - 
errorStrategy
Applied error strategy, different for interactive and programatic use, etc. - 
charSetCalc
Instance to evaluate character expessions. Maintains a cache. - 
traceLevel
protected final int traceLevelDetermines the verbosity:==0stands for complete silence==1stands for minimal output: few loggings and not all warnings==2, 3stands for more output:loggings and warnings==10shows some synthesized source texts==20stands for full debugging. - 
meta_warning
 - 
meta_location
 - 
meta_messageText
 - 
meta_kind
 - 
meta_tag
 - 
meta_parsingError
 - 
meta_expected
 - 
meta_skipped
 - 
meta_assumedXsltOutput
 - 
textFileHeader
The result of parsing the input header. Globally needed onlygetXsltInputNamespaces()and "getXsltMode()" - 
mode
Text2Udom.modes modeKind of the text to parse. - 
top_result
Toplevel result of parsing the input text. Set once when opening the State for the first tag. Globally needed for error handling. - 
xslt_alt_ubiquituous
Contains all xslt tags which can appear anywhere in the target elements. This field is != null iff xslt mode. - 
xslt_alt_ubiquituous_repeated
Contains all xslt tags which can appear anywhere in the target elements. This field is != null iff xslt mode. - 
xslt_module
The loaded xslt module. Only needed forisXsltDef(eu.bandm.tools.d2d2.model.Definition). This field is != null iff xslt mode. - 
state
The currently growing stackframe, seeState. - 
leading_ws
Accumulator for ws, not yet clear whether to deliver or to ignore. - 
currentState
Current micro-state, values fromText2Udom.parsingState. - 
verbatimSuppress
protected int verbatimSuppressHow many command characters in verbatim mode will not generated a warning. - 
interruptedState
Needed to treatsetcommand,setcommentand numeric input (i.e. pseudo-tags, which look like a tag, but do not behave so) like whitespace = resuming the interrupted state. - 
skipContainer
For reporting skipped input - 
lastCloseWasXslt
protected boolean lastCloseWasXsltWhether we come from an xslt element. Any explicitly closed xslt element is no longer in the stack, but a subsequent target-language element must nevertheless be treated as coming from "inxslt". - 
allSortedKeys
 - 
MEMSTRING_ID_SYNTHETIC
- See Also:
 
 - 
synthLocation
 - 
bicLoc
 - 
unicodeResults
 
 - 
 - 
Constructor Details
- 
Text2Udom
public Text2Udom(MessageReceiver<SimpleMessage<String>> msg, Text2Udom.ErrorStrategy errorStrategy, int traceLevel) Only constructor. 
 - 
 - 
Method Details
- 
P
/ - 
getXsltInputNamespaces
Get all xslt input name spaces, as defined in the header of the input file. Valid result can be obtained only after successful parsing. - 
isXsltMode
boolean isXsltMode()Whether to parse an XSLT source. - 
isXsltDef
Whether a definition is an xslt element. - 
fromFile
Parse the contents of the file found at the given location. The calling graph of the main scanner functions is :fromFile(String filename,..) -> fromFile(File,..) -> fromMemString(locationText,MemString,..) -> scanForHeader -> Starter.install -> digest() // Starting in "nothing open" mode, no verbatim or charparser // is supported as top-level element - 
fromFile
Parse the contents of the given file. - 
fromFile
@Opt public @Opt ResultingStructure fromFile(String locationText, File f, ModuleRegistry moduleRegistry) Parse the contents of the given file, using the given location text for all error messages. - 
fromReader
@Opt public @Opt ResultingStructure fromReader(String locationText, Reader r, ModuleRegistry moduleRegistry) Parse the contents of the given file, using the given location text for all error messages. - 
fromMemString
@Opt public @Opt ResultingStructure fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry) Parse the contents of the given MemString object, which includes a text type declaration header.- Parameters:
 locationText- use this in error messages.
 - 
fromMemString
@Opt public @Opt ResultingStructure fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry, @Opt @Opt XRegExp toplevelXRegExp, Text2Udom.modes mode) Parse the contents of the given MemString object, containing just the text body. The text format etc. are already known and fixed and given as argument values .- Parameters:
 locationText- use this in error messages.text- to parsemoduleRegistry- for to load the xslt code and the meta module.toplevelXRegExp- the top regexp of the target, either directly or as target indication for the xslt source to parse. (== null iff mode == xsltText)mode- whether to parse XML or XSLT or XSLT for mere text
 - 
warning
Emits warning ifftraceLevel> 0. - 
warning
Emits warning ifftraceLevel> 0. - 
hint
Emits hint ifftraceLevel> 0. - 
logStart
Emits message ifftraceLevel> 0. - 
logEnd
Emits message ifftraceLevel> 0. - 
hint_xml
Emits message ifftraceLevel> 0. - 
error
Emits an error message and further context information depending on the values inerrorStrategy. - 
error
Callserror(eu.bandm.tools.location.Location<java.lang.String>,java.lang.String,java.lang.Object...)with no location. - 
failure
Throws a corresponding message exception. - 
loadMetaDefinition
Find the definition from the loaded metamodule. - 
loadMetaModule
Load the general d2d meta module und find the required definitions. Store them into the global constant fields "meta_warning", "meta_location", "meta_warning", etc., for later use. Is required for xslt mode and when "partially correct documents" are allowed. - 
digest
protected void digest()Main loop, consumes input data according to current "micro"-statecurrentState. Only called once, fromfromMemString(String,MemString,ModuleRegistry). Calls thedigest_<>delegate functions according to currentState. These are only called from here and could be in-lined. They mostly do not change micro-state themselves, but callprocess_open_tag(String,boolean),process_close_tag(String,boolean)orprocess_close_char(). These can change the micro-state, which has consequences when re-entering this "digest" loop.Each loop pass (and each delegate method) ends with calling
scanner.accept(). Therefore, when each loop pass (and each delegate method) starts, the next-to-consume token is already lexically recognized and ready for consumption. - 
digest_nothing_open
protected void digest_nothing_open()Initial (micro-)state, or after an explicit close. SwitchescurrentStateon command-char and on non-whitespace character data, which implies the invisible "char-data-tag". If the current top expression does NOT accept characters, then non-Whitespace is implicitly tagged as such, and that tag is searched up-ward, as usual. Whitespace however is accumulated and delayed for possible deliverance, until this happens. If the current top expression DOES accept characters, then whitespace is immediately treated as valid character data and delivered. - 
digest_look_for_tag
protected boolean digest_look_for_tag()After a command char has been read. An open tag or (two kinds of) close tag may follow. Whitespace, comments and further command chars are discarded, other chars lead to an error.- Returns:
 - whether end-of-input token (=pseudo tag) has been accepted.
 
 - 
process_open_tag
Only method for processing an "open" tag. Is called fromdigest_nothing_open()for character data, fromdigest_look_for_tag()for an explicit tag, and fromdigest_verbatim().
ATTENTION: must be called in an "un-accepted" state of MemScanner, because next may come an "open tag modifier", which has to be decoded explicty, not byMemScanner.accept(). This may be followed by characters which are consumed by a "chars parser" or an "enum" definition, again not by the MemScanner! In these cases the outer state machine in not altered, but returned to.
In case of a non-empty "tags parser", the state stack is extended accordingly by one level (or two levels in case of "#implicit".)
In case of xslt-mode, the state machine isstartInXslt=startXslt // whether current frame is an xslt | | ascend into non-xslt, and NO xslt components are missing | // if xslt itself is incomplete (CANNOT HAPPEN currently!), | // then treat ALL missings as ERROR. V found_weakMode=true // use "weak_firsts" modified director map | // interpret all missing as "assumed" | | cross open element boundary V all false, normal parsing // report "assumed" and collect all FURTHER // missings as errors - 
process_char_parser_error
Is called after a non-acceptions by char parser or by an enumeration has been detected. If no "partial doc output" is enabled: throw exception and abort parsing. Otherweise add Add diagnostic outputs with tags from "d2d-meta" into the result udom, set up skip container and switch parser state toText2Udom.parsingState.skip_for_command. - 
sortKeys
 - 
consume_enumeration
@Opt protected static @Opt ResultingStructure consume_enumeration(MemScanner<String> scan, Enumeration etype) Used by tag parsers AND character parsers (with multiple parallel scanners !) - 
process_close_char
protected void process_close_char()Called from digest_nothing_open/look_for_tag/consume_chars, in case that closing parenthesis character has been recognized. resets scanner and simulates the reaction to a "complete" close tag. - 
process_close_tag
Called by digest_look_for_tag, process_open_tag (in case of empty element declarations), process_close_char. Tag may be =null for an unspecified "close lowest element" - 
process_close_tag_inner
protected void process_close_tag_inner(@Opt @Opt String tag, boolean force, Location<String> closeTagDefLoc) Search the state space for "nexttag" as possible close tag. Then adjust the stack accordingly. Whenever obligate entries are missing in between, then store them to "found_missing" and execute the error diagnosis.- Parameters:
 tag- the tag of the element to close, or null to close the last opened element.
 - 
return_to_upper_input_mode
protected void return_to_upper_input_mode()Called to re-enter the input mode (verbatim or nothing_open) after an explicit close (tags had been openend and closed) or an implicit close (enum or character parser have consumed enough) has happened.
Assume the stack frame of the parser/enum/closed tag is already popped from the stack.
Assume that "inputmode verbatim" is only allowed with "content = (#PCDATA|..)*". TYPCHECKER MISSING FIXME. - 
insertPlainChars
Aux function to insert synthesized character data in the output, which is NOT contained in the source, e.g. error messages.DISLOC FIXME (besser eine allgemeine lösung wie "IncludingCharBuffer" etc.
- Parameters:
 host- the toplevel elementtag- the definition for the contentchars- the content
 - 
makeGlobalSkipContainer
 - 
report_assumed_xslt_output
Is called when an target-language open/close tag has been found after an xslt template call, to report the necessary expansion. Or as soon as the possible range of the template's output (one content model) is left, so that all further missing elements will be treated as normal "missing errors". - 
report_missing_elements
protected void report_missing_elements(boolean isOpen, String tag, boolean frameFound, CheckedList<Expression> misslist, List<SimpleMessage<String>> messlist, Location<String> closeTagDefLoc) this proc can be called fromprocess_open_tag(String, boolean)orprocess_close_tag(String, boolean).
Basically, there are two classes of error recovery:
-  framefound = true ==> the tag is KNOWN, but tags before are missing, 
      --- then continue with a shrunk stack state.
 - framefound = false ==> the tag is NOT KNOWN, then discard all input up to the next tag and try again.
 
General issue: these "Errors" behave like "Warnings" in case of "partialDocs==true".
FIXME DocumentError should have TYPED fields like expression ,etc.
FIXME parameter for controlling the kind of reaction MISSING FIXME -  framefound = true ==> the tag is KNOWN, but tags before are missing, 
      --- then continue with a shrunk stack state.
 - 
digest_consume_characters
protected void digest_consume_characters()When an element may receive character data, and some non-ws character data has been recognized at the reading position. Discards comments, delivers text, whitespace and other chars, and switches state on a command char. - 
ignore_superfluous_end_tag
protected void ignore_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Called after an empty tag, a parser or an enum have been consumed. Called with scanner in the "un-accepted" state, but leaves after calling "accept()" (so the next token to digest == scanner.current)
After a character parser there MAY appear an explicit closing tag:
#tag 12[345]!!! #/tagor
#tag 12[345]!!! #///tagor
#tag 12[345]!!! #/ <-ws!or
#tag$ 12[345]!!! $
All these are to be IGNORED, since the application of the character parser must be finished (= it must have reached a final state OR an error stste!) anyhow.
For empty content the situation is similar:
#tag continue textdo NOT consume anything.
#tag#/tag continue textDO consume end-tag, but nothing more (esp. not the following blank!)
#tag# /tag continue textthis is ok, and treated the same way
.#tag #/tag continue textthis is NOT OK, and should be rejected
#tag-continue worddo NOT consume anything.
#tag!! continuedo consume the parentheses (has been done by lexer).
#tag#///tagshould possibly NOT be supported. If, then treat like simple end tag!
Operation starts with anaccept(), meaning "decode at current point of reading".
On return, also anaccept()has been performed, meaning "next lexer token to digest by the top level "digest" loop (=what follows after the ended element) is now reflected by the scanner output fields". - 
accept_superfluous_end_tag
protected boolean accept_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Lets the scanner accept and discard an end tag, either normal or forced or by a parenthesis character. Only caller isignore_superfluous_end_tag(String,boolean,Definition,Lcation) - 
digest_verbatim
protected void digest_verbatim()Parses "verbatim input" mode, which (a) requires an explicit#/tagend tag, and (b) accepts only tags of sub-elements immediatly contained in its definition. - 
digest_skip_for_command
protected void digest_skip_for_command()Micro-state for error recovery: skip until command char and then goto tag-reading mode again. Store the skipped input chars into skipContainer for meta-error-elements. - 
bicERROR
 - 
builtInMetaCommands
 - 
parseVerbatimSuppres
protected boolean parseVerbatimSuppres() - 
deliver_to_singletonstate
 - 
deliver_numeric
Append the unicode character to the top-most resulting structure. A java string must contain two(2) "code units", what are 16bit "chars" for representing a code point > 0x0ffff, (int=32 bit char=16bit) - 
deliver
Append the argument to the top-most resulting structure. - 
deliver_spontanuous
Append the argument to the top-most resulting structure which is not encoded as xml attribute. - 
deliver
 - 
deliver
 - 
deliver_last
protected void deliver_last()Deliver the last scanned character data, and set the "is whitespace" flag according to the token type returned by scanner. - 
addRestPerm
Add obligate members of a "perm" expression to the term which describes the missing input. - 
find_top_singleton
Called fromreturn_to_upper_input_mode(). and .... DELIVER()??? AND OPEN XSLTMODE ?? - 
find_top_singleton
 
 -