Class MemScanner<D>
- Type Parameters:
D
- the document type used in locations. Sensible are String, File, XMLDocumentIdentifier, URI, etc.
- All Implemented Interfaces:
Cloneable
MemString
objects.
During work, the "result" field lastTokenType
,
lastContent
, etc. describe the last recognized input token.
This extends up to MemString.start
, which is the "next-to-decode"
reading position.
Each call to accept()
means that the current result info has been
consumed by the calling layer. So it is forgotten, the next token
is decoded at the next-to-read adress, that address is advanced, and all
result fields are updated.
Secondly, there are character-wise "back-door entries" needed for character level parsing. These partly duplicate parts of the "accept()" code.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
Categories of the character data immediately follwoing an open tag.static enum
Non-overlapping Categories of the recognized tokens.Nested classes/interfaces inherited from class eu.bandm.tools.d2d2.infra.MemString
MemString.CharMem<D2>, MemString.ReadBehindBufferEndException, MemString.Reader
-
Field Summary
Modifier and TypeFieldDescriptionprotected String
Cache of the multi-line comment lead-in.static final CharSet
Convenience cache of all non-blank characters.Fields inherited from class eu.bandm.tools.d2d2.infra.MemString
CHAR_EOTEXT, CHAR_NEWLINE, colnum, data, len, linenum, start
-
Constructor Summary
ConstructorDescriptionConstructor with a given starting point.MemScanner
(MemString<D> m) Constructor with a given starting point. -
Method Summary
Modifier and TypeMethodDescriptionvoid
accept()
Central scanning procedure: (1) throws away old results, (2) consumes more characters and (3) setsget_lastTokenType()
,get_lastContent()
,get_lastLocation()
etc.void
Accepts maximal sequence of whitespace characters, ignoring interspered comments (one line or multi-line).void
Accept all blank characters.int
Accepts maximal sequence of characters from the given set, ignoring interspered comments (one line or multi-line).(package private) boolean
accept_one
(char c) Accept one occurence of the given character.boolean
accept_one
(CharSet cs) Accept one character from the given set and advance the input.boolean
Accepts one character from the given set, ignoring interspered comments (one line or multi-line).(package private) boolean
Advances the input and returns true iff the current input starts with exactly the given string value.copy()
Generate a copy.endTagCharacter
(String tag) Returns the currently expected next "close parenthesis" character related to the given tag..boolean
Compares only the fields of superclass = compares as a MemString.Return the full-text tag in case of lastTokenType == closeCharReturn the character contents of the most recently decoded token.Return the location area of the most recently decoded token.Return the token type of the moct recent decoding step.int
Return the numeric value of the most recently decoded token.char
Return the currently valid command character.char
Return the currently valid first comment character.Return the location of the most recent opening parenthesis related to the given tag..int
hashCode()
Compares only fields of superclass = compares as a MemString.markParenthesis
(String tag) Consume an open parenthesis and memorize its counterpart.Remove the next expected paranethesis from all internal stacks.Read the maximal sequence starting with theChars.charset_ident_initial
character set, followed by the maximal sequence of characters fromChars.charset_ident_subsequent
- Thus newline is not acceptred.Read a maximum sequence of non-blank characters.char
Reset state of parser as it was before the last "accept", but skip the first character of the last recognized result.void
setCommandChar
(char c) Interface procedure: change command character.void
setCommentChar_1
(char c) Interface procedure: change first comment character.(package private) void
skip_after_char
(char c) Accept all characters up to and including the given character.void
Skips all input up to but not incldding the next line feed character.Methods inherited from class eu.bandm.tools.d2d2.infra.MemString
adjustLineNumberX, advanceByX_sameLine, advanceX, appendAll, asString, backwardsX, backwardsX, calccolnum, clone, equals, equals, get_colnum, get_data, get_len, get_linenum, get_start, getChar, getChar, getFollowLocation, getLocation, getPositionIndication, initFrom, startsWith, toString, upTo, upToX
-
Field Details
-
commentstring_ml
Cache of the multi-line comment lead-in. -
nonBlank
Convenience cache of all non-blank characters.
-
-
Constructor Details
-
MemScanner
Constructor with a given starting point. ATTENTION: only the point-wise start coordinate of the given MemString is relevant; the effective end point for parsint is the end of the underlyingMemString.CharMem
. -
MemScanner
Constructor with a given starting point. Start and end point for parsing are the limits of the data of the given CharMem.
-
-
Method Details
-
copy
Generate a copy. -
equals
Compares only the fields of superclass = compares as a MemString. -
hashCode
public int hashCode()Compares only fields of superclass = compares as a MemString. -
markParenthesis
Consume an open parenthesis and memorize its counterpart. Must always be called whenever an open tag is recognized (by the code of the higher level), because it consumes the trailing blank char. (The fact whether an identifier token recognized by this parser is ineed a tag is known only to the caller.) It must be called before the corresponding call toaccept()
..This method consumes one(1) char if it is an open-parenthesis, and memorizes the closing one. It consumes one(1) char if it is a whitespace. These are the allowed combinations:
# tag CONTINUE TEXT # tag/CONTINUE TEXT # tag///CONTINUE TEXT # tag//comment recognizes as such (includes the lead-in sequence) # tag CONTINUE TEXT # tag(continue text with parenthesis open # tag=continue text including leading special, non-paren char # tag()xx same as #tag/xx
-
popParenthesis
Remove the next expected paranethesis from all internal stacks. May only be called when a closing parenthesis has been recognized as the curremt token.- -
endTagCharacter
Returns the currently expected next "close parenthesis" character related to the given tag.. == null if no such parenthesis is currently open. -
getParenthesisDefinitionLocation
Return the location of the most recent opening parenthesis related to the given tag.. == null if no such parenthesis is currently open. -
setCommandChar
public void setCommandChar(char c) Interface procedure: change command character. Attention no checking for allowed character value = caller is fully responsible. -
setCommentChar_1
public void setCommentChar_1(char c) Interface procedure: change first comment character. Attention no checking for allowed character value = caller is fully responsible. -
getCommandChar
public char getCommandChar()Return the currently valid command character. -
getCommentChar_1
public char getCommentChar_1()Return the currently valid first comment character. The second character for mulit-line comments is fixed to "*
" inChars.CHAR_COMMENT_2
. -
get_lastTokenType
Return the token type of the moct recent decoding step. -
get_lastContent
Return the character contents of the most recently decoded token. -
get_numericValue
public int get_numericValue()Return the numeric value of the most recently decoded token. Is sensible only ifget_lastTokenType==numeric
. -
get_closedTag
Return the full-text tag in case of lastTokenType == closeChar -
get_lastLocation
Return the location area of the most recently decoded token. Attention: while internally columns and lines are 0-based, the exported values in the locations are both 1-based, i.e. larger by one. -
accept
public void accept()Central scanning procedure: (1) throws away old results, (2) consumes more characters and (3) setsget_lastTokenType()
,get_lastContent()
,get_lastLocation()
etc. accordingly. Never fails.Crucial are these priorities:
"///" --> forced end quote otherwise "//" --> one-line comment start (if comment char has not been altered.) otherwise "/" --> end quote
command char can never be set to "/" (nor to current comment)
comment lead-in char can never be set to "*" (nor to current command)
(This must be preserved by the caller.) -
scan_again_skipping_first
public char scan_again_skipping_first()Reset state of parser as it was before the last "accept", but skip the first character of the last recognized result. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly, by invokingaccept()
.- Returns:
- the first "skipped" character of the input data.
-
skip_until_eoline
public void skip_until_eoline()Skips all input up to but not incldding the next line feed character. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly. -
skip_after_char
void skip_after_char(char c) Accept all characters up to and including the given character. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly. Assumes that that character is indeed present. Otherwise it throws aMemString.ReadBehindBufferEndException
. -
accept_greedy_filtered_chars
Accepts maximal sequence of characters from the given set, ignoring interspered comments (one line or multi-line). The currently active command character is accepted iff the standard command character = # =Chars.CHAR_COMMAND_DEFAULT
is contained in the given set. Then this character is additionally accepted.Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- Returns:
- number of accepted characters
-
accept_one_filtered_char
Accepts one character from the given set, ignoring interspered comments (one line or multi-line). The currently active command character is accepted iff the standard command character = # =Chars.CHAR_COMMAND_DEFAULT
is contained in the given set. Then this character is additionally accepted.Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- Returns:
- whether one character has been accepted
-
accept_blanks_filtered
public void accept_blanks_filtered()Accepts maximal sequence of whitespace characters, ignoring interspered comments (one line or multi-line).Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
-
accept_blanks_noCommentsAllowed
public void accept_blanks_noCommentsAllowed()Accept all blank characters. commandchar, commentchar, eof and parenthesis chars are not accepted because never contained in any definition of "whitespace characters".Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
-
read_ident_or_null
Read the maximal sequence starting with theChars.charset_ident_initial
character set, followed by the maximal sequence of characters fromChars.charset_ident_subsequent
- Thus newline is not acceptred.Only the end pointer is advanced; a new tokenization must be invoked after this explicitly.
-
read_nonblank
Read a maximum sequence of non-blank characters. (Thus Newline is not accepted).Only the end pointer is advanced; a new tokenization must be invoked after this explicitly.
-
accept_one
boolean accept_one(char c) Accept one occurence of the given character. Newlines are not permitted.Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
-
accept_one
Accept one character from the given set and advance the input. Newlines are permitted.Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
-
accept_string_w_o_lineswitch
Advances the input and returns true iff the current input starts with exactly the given string value.Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
Assume that the argument does not contain any line switching characters and not the end-of-text character ((char)0).
The string is accepted only if it is continuously contained outside of comment, and does not contain the current top-level close parenthesis character.
The currently valid command char is not rejected, i.e. may be contained in the argument and will not be translated.
-