Package eu.bandm.tools.d2d2.base
Class Text2Udom
java.lang.Object
eu.bandm.tools.d2d2.base.Text2Udom
Parse a d2d text input into xml nodes.
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected class
Thrown to signal to an upper processing loop that parsing cannot be continued.(package private) class
Dedicated exception to signal this fact to the upper execution loop.static class
Configuration object for the different ways of reacting to input errors.(package private) static enum
Kind of the text to parse.static enum
Realizes the parsing state on lower=character level.protected class
Class to signal an attempt to read beyond the limit of the input data to the upper processing loop. -
Field Summary
Modifier and TypeFieldDescriptionprotected static final Map<Enumeration,
String[]> protected final Navigate.CharSetCalc
Instance to evaluate character expessions.protected Text2Udom.parsingState
Current micro-state, values fromText2Udom.parsingState
.static final Text2Udom.ErrorStrategy
Evident.protected final Text2Udom.ErrorStrategy
Applied error strategy, different for interactive and programatic use, etc.static final Text2Udom.ErrorStrategy
Evident, Allows partial documents and prints the stack context of a parsing error to the console.protected Text2Udom.parsingState
Needed to treatsetcommand
,setcomment
and numeric input (i.e.protected boolean
Whether we come from an xslt element.protected List<ResultingChars>
Accumulator for ws, not yet clear whether to deliver or to ignore.static final String
protected final MessageCounter
Target of most messages.protected Definition
protected Definition
protected Definition
protected Definition
protected Definition
protected Definition
protected Definition
protected Definition
protected Definition
(package private) Text2Udom.modes
Kind of the text to parse.protected final MessageReceiver<SimpleMessage<String>>
Target of most messages.static final Text2Udom.ErrorStrategy
Used for situations when basic diagnosis data is required by the "programmer" of some using code, not by an interactive user.protected MemScanner<String>
Single source of text input.protected ResultingStructure
For reporting skipped inputprotected State
The currently growing stackframe, seeState
.(package private) final TextFileHeader
The result of parsing the input header.protected ResultingStructure
Toplevel result of parsing the input text.protected final int
Determines the verbosity:==0
stands for complete silence==1
stands for minimal output: few loggings and not all warnings==2, 3
stands for more output:loggings and warnings==10
shows some synthesized source texts==20
stands for full debugging.(package private) final Map<Integer,
ResultingChars> protected int
How many command characters in verbatim mode will not generated a warning.protected final MessageReceiver<SimpleMessage<XMLDocumentIdentifier>>
Target of messages for some called classes which generate XML-file-locations.protected @Opt Expression
Contains all xslt tags which can appear anywhere in the target elements.protected Expression
Contains all xslt tags which can appear anywhere in the target elements.protected Module
The loaded xslt module. -
Constructor Summary
ConstructorDescriptionText2Udom
(MessageReceiver<SimpleMessage<String>> msg, Text2Udom.ErrorStrategy errorStrategy, int traceLevel) Only constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected boolean
accept_superfluous_end_tag
(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Lets the scanner accept and discard an end tag, either normal or forced or by a parenthesis character.protected void
addRestPerm
(State_perm permstate, CheckedList<Expression> misslist) Add obligate members of a "perm" expression to the term which describes the missing input.protected void
protected boolean
protected static @Opt ResultingStructure
consume_enumeration
(MemScanner<String> scan, Enumeration etype) Used by tag parsers AND character parsers (with multiple parallel scanners !)protected void
deliver
(ResultingChars chars) protected void
Append the argument to the top-most resulting structure.protected void
protected void
Deliver the last scanned character data, and set the "is whitespace" flag according to the token type returned by scanner.protected void
deliver_numeric
(int val) Append the unicode character to the top-most resulting structure.protected void
Append the argument to the top-most resulting structure which is not encoded as xml attribute.protected void
protected void
digest()
Main loop, consumes input data according to current "micro"-statecurrentState
.protected void
When an element may receive character data, and some non-ws character data has been recognized at the reading position.protected boolean
After a command char has been read.protected void
Initial (micro-)state, or after an explicit close.protected void
Micro-state for error recovery: skip until command char and then goto tag-reading mode again.protected void
Parses "verbatim input" mode, which (a) requires an explicit#/tag
end tag, and (b) accepts only tags of sub-elements immediatly contained in its definition.protected void
Emits an error message and further context information depending on the values inerrorStrategy
.protected void
Callserror(eu.bandm.tools.location.Location<java.lang.String>,java.lang.String,java.lang.Object...)
with no location.protected void
Throws a corresponding message exception.protected State_singleton
Called fromreturn_to_upper_input_mode()
.protected State_singleton
find_top_singleton
(boolean mustBeNonAtt) fromFile
(File f, ModuleRegistry moduleRegistry) Parse the contents of the given file.fromFile
(String s, ModuleRegistry moduleRegistry) Parse the contents of the file found at the given location.fromFile
(String locationText, File f, ModuleRegistry moduleRegistry) Parse the contents of the given file, using the given location text for all error messages.fromMemString
(String locationText, MemString<String> text, ModuleRegistry moduleRegistry) Parse the contents of the given MemString object, which includes a text type declaration header.fromMemString
(String locationText, MemString<String> text, ModuleRegistry moduleRegistry, @Opt XRegExp toplevelXRegExp, Text2Udom.modes mode) Parse the contents of the given MemString object, containing just the text body.Get all xslt input name spaces, as defined in the header of the input file.protected void
Emits hint ifftraceLevel
> 0.protected void
hint_xml
(Location<XMLDocumentIdentifier> loc, String text, Object... args) Emits message ifftraceLevel
> 0.protected void
ignore_superfluous_end_tag
(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Called after an empty tag, a parser or an enum have been consumed.static void
insertPlainChars
(ResultingStructure host, Definition tag, String chars) Aux function to insert synthesized character data in the output, which is NOT contained in the source, e.g.protected boolean
isXsltDef
(Definition def) Whether a definition is an xslt element.(package private) boolean
Whether to parse an XSLT source.protected Definition
loadMetaDefinition
(Map<String, Definition> defs, String name) Find the definition from the loaded metamodule.protected void
loadMetaModule
(ModuleRegistry moduleRegistry) Load the general d2d meta module und find the required definitions.protected void
Emits message ifftraceLevel
> 0.protected void
Emits message ifftraceLevel
> 0.protected void
makeGlobalSkipContainer
(ResultingStructure errormsg) static void
protected boolean
protected void
process_char_parser_error
(String tag, MemString<String> charsStart) Is called after a non-acceptions by char parser or by an enumeration has been detected.protected void
Called from digest_nothing_open/look_for_tag/consume_chars, in case that closing parenthesis character has been recognized.protected void
process_close_tag
(String tag, boolean force) Called by digest_look_for_tag, process_open_tag (in case of empty element declarations), process_close_char.protected void
process_close_tag_inner
(@Opt String tag, boolean force, Location<String> closeTagDefLoc) Search the state space for "nexttag" as possible close tag.protected boolean
process_open_tag
(String tag, boolean isCharData) Only method for processing an "open" tag.protected void
report_assumed_xslt_output
(CheckedList<Expression> misslist) Is called when an target-language open/close tag has been found after an xslt template call, to report the necessary expansion.protected void
report_missing_elements
(boolean isOpen, String tag, boolean frameFound, CheckedList<Expression> misslist, List<SimpleMessage<String>> messlist, Location<String> closeTagDefLoc) this proc can be called fromprocess_open_tag(String, boolean)
orprocess_close_tag(String, boolean)
.
Basically, there are two classes of error recovery:framefound = true ==> the tag is KNOWN, but tags before are missing, --- then continue with a shrunk stack state.
framefound = false ==> the tag is NOT KNOWN, then discard all input up to the next tag and try again.protected void
Called to re-enter the input mode (verbatim or nothing_open) after an explicit close (tags had been openend and closed) or an implicit close (enum or character parser have consumed enough) has happened.
Assume the stack frame of the parser/enum/closed tag is already popped from the stack.
Assume that "inputmode verbatim" is only allowed with "content = (#PCDATA|..)*".protected static String[]
protected void
Emits warning ifftraceLevel
> 0.protected void
Emits warning ifftraceLevel
> 0.
-
Field Details
-
default_ErrorStrategy
Evident. See the default field values of this class. -
interactive_ErrorStrategy
Evident, Allows partial documents and prints the stack context of a parsing error to the console. -
non_interactive_ErrorStrategy
Used for situations when basic diagnosis data is required by the "programmer" of some using code, not by an interactive user. -
scanner
Single source of text input. -
msg
Target of most messages. -
messageCounter
Target of most messages. -
xml_msg
Target of messages for some called classes which generate XML-file-locations. -
errorStrategy
Applied error strategy, different for interactive and programatic use, etc. -
charSetCalc
Instance to evaluate character expessions. Maintains a cache. -
traceLevel
protected final int traceLevelDetermines the verbosity:==0
stands for complete silence==1
stands for minimal output: few loggings and not all warnings==2, 3
stands for more output:loggings and warnings==10
shows some synthesized source texts==20
stands for full debugging. -
meta_warning
-
meta_location
-
meta_messageText
-
meta_kind
-
meta_tag
-
meta_parsingError
-
meta_expected
-
meta_skipped
-
meta_assumedXsltOutput
-
textFileHeader
The result of parsing the input header. Globally needed onlygetXsltInputNamespaces()
and "getXsltMode()" -
mode
Text2Udom.modes modeKind of the text to parse. -
top_result
Toplevel result of parsing the input text. Set once when opening the State for the first tag. Globally needed for error handling. -
xslt_alt_ubiquituous
Contains all xslt tags which can appear anywhere in the target elements. This field is != null iff xslt mode. -
xslt_alt_ubiquituous_repeated
Contains all xslt tags which can appear anywhere in the target elements. This field is != null iff xslt mode. -
xslt_module
The loaded xslt module. Only needed forisXsltDef(eu.bandm.tools.d2d2.model.Definition)
. This field is != null iff xslt mode. -
state
The currently growing stackframe, seeState
. -
leading_ws
Accumulator for ws, not yet clear whether to deliver or to ignore. -
currentState
Current micro-state, values fromText2Udom.parsingState
. -
verbatimSuppress
protected int verbatimSuppressHow many command characters in verbatim mode will not generated a warning. -
interruptedState
Needed to treatsetcommand
,setcomment
and numeric input (i.e. pseudo-tags, which look like a tag, but do not behave so) like whitespace = resuming the interrupted state. -
skipContainer
For reporting skipped input -
lastCloseWasXslt
protected boolean lastCloseWasXsltWhether we come from an xslt element. Any explicitly closed xslt element is no longer in the stack, but a subsequent target-language element must nevertheless be treated as coming from "inxslt". -
allSortedKeys
-
MEMSTRING_ID_SYNTHETIC
- See Also:
-
synthLocation
-
bicLoc
-
unicodeResults
-
-
Constructor Details
-
Text2Udom
public Text2Udom(MessageReceiver<SimpleMessage<String>> msg, Text2Udom.ErrorStrategy errorStrategy, int traceLevel) Only constructor.
-
-
Method Details
-
P
-
getXsltInputNamespaces
Get all xslt input name spaces, as defined in the header of the input file. Valid result can be obtained only after successful parsing. -
isXsltMode
boolean isXsltMode()Whether to parse an XSLT source. -
isXsltDef
Whether a definition is an xslt element. -
fromFile
Parse the contents of the file found at the given location. The calling graph of the main scanner functions is :fromFile(String filename,..) -> fromFile(File,..) -> fromMemString(locationText,MemString,..) -> scanForHeader -> Starter.install -> digest() // Starting in "nothing open" mode, no verbatim or charparser // is supported as top-level element
-
fromFile
Parse the contents of the given file. -
fromFile
@Opt public @Opt ResultingStructure fromFile(String locationText, File f, ModuleRegistry moduleRegistry) Parse the contents of the given file, using the given location text for all error messages. -
fromMemString
@Opt public @Opt ResultingStructure fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry) Parse the contents of the given MemString object, which includes a text type declaration header.- Parameters:
locationText
- use this in error messages.
-
fromMemString
@Opt public @Opt ResultingStructure fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry, @Opt @Opt XRegExp toplevelXRegExp, Text2Udom.modes mode) Parse the contents of the given MemString object, containing just the text body. The text format etc. are already known and fixed and given as argument values .- Parameters:
locationText
- use this in error messages.text
- to parsemoduleRegistry
- for to load the xslt code and the meta module.toplevelXRegExp
- the top regexp of the target, either directly or as target indication for the xslt source to parse. (== null iff mode == xsltText)mode
- whether to parse XML or XSLT or XSLT for mere text
-
warning
Emits warning ifftraceLevel
> 0. -
warning
Emits warning ifftraceLevel
> 0. -
hint
Emits hint ifftraceLevel
> 0. -
logStart
Emits message ifftraceLevel
> 0. -
logEnd
Emits message ifftraceLevel
> 0. -
hint_xml
Emits message ifftraceLevel
> 0. -
error
Emits an error message and further context information depending on the values inerrorStrategy
. -
error
Callserror(eu.bandm.tools.location.Location<java.lang.String>,java.lang.String,java.lang.Object...)
with no location. -
failure
Throws a corresponding message exception. -
loadMetaDefinition
Find the definition from the loaded metamodule. -
loadMetaModule
Load the general d2d meta module und find the required definitions. Store them into the global constant fields "meta_warning
", "meta_location
", "meta_warning
", etc., for later use. Is required for xslt mode and when "partially correct documents" are allowed. -
digest
protected void digest()Main loop, consumes input data according to current "micro"-statecurrentState
. Only called once, fromfromMemString(String,MemString,ModuleRegistry)
. Calls thedigest_<>
delegate functions according to currentState. These are only called from here and could be in-lined. They mostly do not change micro-state themselves, but callprocess_open_tag(String,boolean)
,process_close_tag(String,boolean)
orprocess_close_char()
. These can change the micro-state, which has consequences when re-entering this "digest" loop.Each loop pass (and each delegate method) ends with calling
scanner
.accept(). Therefore, when each loop pass (and each delegate method) starts, the next-to-consume token is already lexically recognized and ready for consumption. -
digest_nothing_open
protected void digest_nothing_open()Initial (micro-)state, or after an explicit close. SwitchescurrentState
on command-char and on non-whitespace character data, which implies the invisible "char-data-tag". If the current top expression does NOT accept characters, then non-Whitespace is implicitly tagged as such, and that tag is searched up-ward, as usual. Whitespace however is accumulated and delayed for possible deliverance, until this happens. If the current top expression DOES accept characters, then whitespace is immediately treated as valid character data and delivered. -
digest_look_for_tag
protected boolean digest_look_for_tag()After a command char has been read. An open tag or (two kinds of) close tag may follow. Whitespace, comments and further command chars are discarded, other chars lead to an error.- Returns:
- whether end-of-input token (=pseudo tag) has been accepted.
-
process_open_tag
Only method for processing an "open" tag. Is called fromdigest_nothing_open()
for character data, fromdigest_look_for_tag()
for an explicit tag, and fromdigest_verbatim()
.
ATTENTION: must be called in an "un-accepted" state of MemScanner, because next may come an "open tag modifier", which has to be decoded explicty, not byMemScanner.accept()
. This may be followed by characters which are consumed by a "chars parser" or an "enum" definition, again not by the MemScanner! In these cases the outer state machine in not altered, but returned to.
In case of a non-empty "tags parser", the state stack is extended accordingly by one level (or two levels in case of "#implicit".)
In case of xslt-mode, the state machine isstartInXslt=startXslt // whether current frame is an xslt | | ascend into non-xslt, and NO xslt components are missing | // if xslt itself is incomplete (CANNOT HAPPEN currently!), | // then treat ALL missings as ERROR. V found_weakMode=true // use "weak_firsts" modified director map | // interpret all missing as "assumed" | | cross open element boundary V all false, normal parsing // report "assumed" and collect all FURTHER // missings as errors
-
process_char_parser_error
Is called after a non-acceptions by char parser or by an enumeration has been detected. If no "partial doc output" is enabled: throw exception and abort parsing. Otherweise add Add diagnostic outputs with tags from "d2d-meta" into the result udom, set up skip container and switch parser state toText2Udom.parsingState.skip_for_command
. -
sortKeys
-
consume_enumeration
@Opt protected static @Opt ResultingStructure consume_enumeration(MemScanner<String> scan, Enumeration etype) Used by tag parsers AND character parsers (with multiple parallel scanners !) -
process_close_char
protected void process_close_char()Called from digest_nothing_open/look_for_tag/consume_chars, in case that closing parenthesis character has been recognized. resets scanner and simulates the reaction to a "complete" close tag. -
process_close_tag
Called by digest_look_for_tag, process_open_tag (in case of empty element declarations), process_close_char. Tag may be =null for an unspecified "close lowest element" -
process_close_tag_inner
protected void process_close_tag_inner(@Opt @Opt String tag, boolean force, Location<String> closeTagDefLoc) Search the state space for "nexttag" as possible close tag. Then adjust the stack accordingly. Whenever obligate entries are missing in between, then store them to "found_missing" and execute the error diagnosis.- Parameters:
tag
- the tag of the element to close, or null to close the last opened element.
-
return_to_upper_input_mode
protected void return_to_upper_input_mode()Called to re-enter the input mode (verbatim or nothing_open) after an explicit close (tags had been openend and closed) or an implicit close (enum or character parser have consumed enough) has happened.
Assume the stack frame of the parser/enum/closed tag is already popped from the stack.
Assume that "inputmode verbatim" is only allowed with "content = (#PCDATA|..)*". TYPCHECKER MISSING FIXME. -
insertPlainChars
Aux function to insert synthesized character data in the output, which is NOT contained in the source, e.g. error messages.DISLOC FIXME (besser eine allgemeine lösung wie "IncludingCharBuffer" etc.
- Parameters:
host
- the toplevel elementtag
- the definition for the contentchars
- the content
-
makeGlobalSkipContainer
-
report_assumed_xslt_output
Is called when an target-language open/close tag has been found after an xslt template call, to report the necessary expansion. Or as soon as the possible range of the template's output (one content model) is left, so that all further missing elements will be treated as normal "missing errors". -
report_missing_elements
protected void report_missing_elements(boolean isOpen, String tag, boolean frameFound, CheckedList<Expression> misslist, List<SimpleMessage<String>> messlist, Location<String> closeTagDefLoc) this proc can be called fromprocess_open_tag(String, boolean)
orprocess_close_tag(String, boolean)
.
Basically, there are two classes of error recovery:- framefound = true ==> the tag is KNOWN, but tags before are missing,
--- then continue with a shrunk stack state.
- framefound = false ==> the tag is NOT KNOWN, then discard all input up to the next tag and try again.
General issue: these "Errors" behave like "Warnings" in case of "partialDocs==true".
FIXME DocumentError should have TYPED fields like expression ,etc.
FIXME parameter for controlling the kind of reaction MISSING FIXME - framefound = true ==> the tag is KNOWN, but tags before are missing,
--- then continue with a shrunk stack state.
-
digest_consume_characters
protected void digest_consume_characters()When an element may receive character data, and some non-ws character data has been recognized at the reading position. Discards comments, delivers text, whitespace and other chars, and switches state on a command char. -
ignore_superfluous_end_tag
protected void ignore_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Called after an empty tag, a parser or an enum have been consumed. Called with scanner in the "un-accepted" state, but leaves after calling "accept()" (so the next token to digest == scanner.current)
After a character parser there MAY appear an explicit closing tag:
#tag 12[345]!!! #/tag
or
#tag 12[345]!!! #///tag
or
#tag 12[345]!!! #/ <-ws!
or
#tag$ 12[345]!!! $
All these are to be IGNORED, since the application of the character parser must be finished (= it must have reached a final state OR an error stste!) anyhow.
For empty content the situation is similar:
#tag continue text
do NOT consume anything.
#tag#/tag continue text
DO consume end-tag, but nothing more (esp. not the following blank!)
#tag# /tag continue text
this is ok, and treated the same way
.#tag #/tag continue text
this is NOT OK, and should be rejected
#tag-continue word
do NOT consume anything.
#tag!! continue
do consume the parentheses (has been done by lexer).
#tag#///tag
should possibly NOT be supported. If, then treat like simple end tag!
Operation starts with anaccept()
, meaning "decode at current point of reading".
On return, also anaccept()
has been performed, meaning "next lexer token to digest by the top level "digest" loop (=what follows after the ended element) is now reflected by the scanner output fields". -
accept_superfluous_end_tag
protected boolean accept_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc) Lets the scanner accept and discard an end tag, either normal or forced or by a parenthesis character. Only caller isignore_superfluous_end_tag(String,boolean,Definition,Lcation)
-
digest_verbatim
protected void digest_verbatim()Parses "verbatim input" mode, which (a) requires an explicit#/tag
end tag, and (b) accepts only tags of sub-elements immediatly contained in its definition. -
digest_skip_for_command
protected void digest_skip_for_command()Micro-state for error recovery: skip until command char and then goto tag-reading mode again. Store the skipped input chars into skipContainer for meta-error-elements. -
bicERROR
-
builtInMetaCommands
-
parseVerbatimSuppres
protected boolean parseVerbatimSuppres() -
deliver_to_singletonstate
-
deliver_numeric
protected void deliver_numeric(int val) Append the unicode character to the top-most resulting structure. A java string must contain two(2) "code units", what are 16bit "chars" for representing a code point > 0x0ffff, (int=32 bit char=16bit) -
deliver
Append the argument to the top-most resulting structure. -
deliver_spontanuous
Append the argument to the top-most resulting structure which is not encoded as xml attribute. -
deliver
-
deliver
-
deliver_last
protected void deliver_last()Deliver the last scanned character data, and set the "is whitespace" flag according to the token type returned by scanner. -
addRestPerm
Add obligate members of a "perm" expression to the term which describes the missing input. -
find_top_singleton
Called fromreturn_to_upper_input_mode()
. and .... DELIVER()??? AND OPEN XSLTMODE ?? -
find_top_singleton
-