Package eu.bandm.tools.d2d2.base
Class CharacterParser
java.lang.Object
eu.bandm.tools.d2d2.model.MATCH_ONLY_00
eu.bandm.tools.d2d2.model.SinglePhase
eu.bandm.tools.d2d2.base.CharacterParser
Realizes the fine granular d2d character level parsing.
Usage: call constructor, then
Character based parsing is conceptually AFTER the elimination of comments, and AFTER the translation of closing parenthesis characters into close tags.
It is NOT AFTER the translation of numeric input character entities.
The standard command character "#" in a parser definition is matched to the actual current command character in the input text.
FIXME Kommentare werden übersprungen ABER NICHT aus den Resultaten gestrichen.
For these translation purposes there are dedicated consumer methods with the word "_filtered_",
EXCPTION ONE: String constants (by a parser definition like
... ~ "ABC" ~ ...
are only acceptes when outside of comments, i.e. the characters of a String constant parser may not be interrupted by comment insertions.
EXCPTION TWO: Charater sets with a star- or plus operator are interpreted in a GREEDY way, not non-deterministically: ('a'..'z')~* ~ ('a'..'z')
will never match anything!
parse(eu.bandm.tools.d2d2.base.MemScanner<java.lang.String>,eu.bandm.tools.d2d2.model.CharsRegExp,eu.bandm.tools.d2d2.rt.ResultingStructure)
.
The parsing is nondeterministically (implemented as parallel search)
and delivers one of the resulting interpretations which consume most input
characters.Character based parsing is conceptually AFTER the elimination of comments, and AFTER the translation of closing parenthesis characters into close tags.
It is NOT AFTER the translation of numeric input character entities.
The standard command character "#" in a parser definition is matched to the actual current command character in the input text.
FIXME Kommentare werden übersprungen ABER NICHT aus den Resultaten gestrichen.
For these translation purposes there are dedicated consumer methods with the word "_filtered_",
MemScanner.accept_greedy_filtered_chars(CharSet)
,
MemScanner.accept_one_filtered_char(CharSet)
,
MemScanner.accept_blanks_filtered()
,
MemScanner.accept_string_w_o_lineswitch(String)
.
EXCPTION ONE: String constants (by a parser definition like
... ~ "ABC" ~ ...
are only acceptes when outside of comments, i.e. the characters of a String constant parser may not be interrupted by comment insertions.
EXCPTION TWO: Charater sets with a star- or plus operator are interpreted in a GREEDY way, not non-deterministically: ('a'..'z')~* ~ ('a'..'z')
will never match anything!
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
A simple wrapper for one (of the many parallel) parsing situations; wrapsMemScanner
as the next input situation andResultingStructure
s as collected so far. -
Field Summary
Modifier and TypeFieldDescriptionprotected final Navigate.CharSetCalc
protected final boolean
protected Set<CharacterParser.ParseResult>
Set of alive parsing situations: is input before visiting a grammar sub-expression and output after parsing.protected final MessageReceiver<SimpleMessage<XMLDocumentIdentifier>>
Fields inherited from class eu.bandm.tools.d2d2.model.MATCH_ONLY_00
_visitor_debug_stream, partial
-
Constructor Summary
ConstructorDescriptionCharacterParser
(Navigate.CharSetCalc charSetCalc, MessageReceiver<SimpleMessage<XMLDocumentIdentifier>> msg, boolean doTrace) Only constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
acceptCharRep
(CharSet cset, boolean isstar, boolean istight) Provides special treatment (implemented directly in the scanner), including different semantics (greedy, not non-determinstically !) for character set expressions.protected void
void
void
action
(CharBinary expr) void
void
action
(CharsRegExp def) void
action
(Enumeration enumeration) treat enumeration as parser and create a special resulting structure.void
Only the longest matches for each incoming hypotheses are recognized.void
action
(ImportItem def) void
Special case of insertion, namely the reference to an enumeration, can survive the rewriting process: (@ ref).void
void
action
(ParseParticle pp) void
void
void
void
void
void
action
(StringConst expr) void
action
(TagsRegExp def) protected void
copyContentsFromTo
(CharacterParser.ParseResult from, MemScanner<String> startpos, ResultingStructure to) If "from" has structured contents, then add this into "to", as a sequence and per assoc.protected void
flattened_consumption_of_enum
(Enumeration enumeration) when called from an insertion: treat an enumeration as mere collection of string constants, i.e.parse
(MemScanner<String> scanner, CharsRegExp parser, ResultingStructure result) Central executive method.protected void
subElement
(Definition def, Expression rule) Called when reaching a ParseParticle, a CharsRegExp or an Enumeration, which all lead to wrapping the parsing result into a Result object with this definition as its tag.protected void
trace
(@Opt Location<XMLDocumentIdentifier> loc, String s) protected void
protected void
Methods inherited from class eu.bandm.tools.d2d2.model.SinglePhase
action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, action, follow_definitions, follow_defInstances, follow_docu, follow_firsts, follow_globalSubsts, follow_imports, follow_itemDocu, follow_localdefs, follow_localSubsts, follow_modules, follow_namespaces, follow_obligates, follow_on, follow_rawModules, follow_text, follow_weakfirsts, follow_xattributes
Methods inherited from class eu.bandm.tools.d2d2.model.MATCH_ONLY_00
_visitor_trace, action, action, action, action, compile, followAll_definitions, followAll_defInstances, followAll_docu, followAll_firsts, followAll_globalSubsts, followAll_imports, followAll_itemDocu, followAll_localdefs, followAll_localSubsts, followAll_modules, followAll_namespaces, followAll_obligates, followAll_on, followAll_rawModules, followAll_text, followAll_weakfirsts, followAll_xattributes, foreignObject, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, match, nomatch
-
Field Details
-
msg
-
charSetCalc
-
doTrace
protected final boolean doTrace -
hypotheses
Set of alive parsing situations: is input before visiting a grammar sub-expression and output after parsing.
-
-
Constructor Details
-
Method Details
-
parse
@Opt public @Opt ResultingStructure parse(MemScanner<String> scanner, CharsRegExp parser, ResultingStructure result) Central executive method. It gets a particularCharsRegExp
definition and tries to match the character input (given by theMemScanner
) by a non-deterministic, parallel breadth-first execution.
Maintains a set ofCharacterParser.ParseResult
objects combining parser state and theResultingStructure
, as constructed so far.
In case of success, it returns one (randomly chonsen) of the matches which consume most input characters and adjusts "scanner" accordingly. (So "scanner is an input and an output argument.
In case of failure, it returns null and does not advance the input pointer in "scanner". -
copyContentsFromTo
protected void copyContentsFromTo(CharacterParser.ParseResult from, MemScanner<String> startpos, ResultingStructure to) If "from" has structured contents, then add this into "to", as a sequence and per assoc. Otherwise copy the characters from "startpos" upto the current(=accepting) input pointer pos in "from". *** -
typingError
-
trace
-
trace
-
action
Special case of insertion, namely the reference to an enumeration, can survive the rewriting process: (@ ref). It means a flattened acceptance of one of the enumeration's string values.
Also INFINITE CYCLES of insertions of CharsRegExp may survive. NOT YET SUPPORTED FIXME.- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
flattened_consumption_of_enum
when called from an insertion: treat an enumeration as mere collection of string constants, i.e. consume one of them and do NOT create any corresponding element. -
action
treat enumeration as parser and create a special resulting structure.- Overrides:
action
in classSinglePhase
-
subElement
Called when reaching a ParseParticle, a CharsRegExp or an Enumeration, which all lead to wrapping the parsing result into a Result object with this definition as its tag. All cases can lead to character data only; the first two cases can also lead to sub-result-objects (structured contents) instead. All results must be wrapped explicitly. -
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
acceptCharRep
Provides special treatment (implemented directly in the scanner), including different semantics (greedy, not non-determinstically !) for character set expressions. -
acceptRep
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
Only the longest matches for each incoming hypotheses are recognized.- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-
action
- Overrides:
action
in classSinglePhase
-