eu.bandm.tools.d2d2.base.MemScanner<D>

Type Parameters:: D - the document type used in locations. Sensible are String, File, XMLDocumentIdentifier, URI, etc.

All Implemented Interfaces:: Cloneable

public class MemScanner<D> extends MemString<D>

Basic scanner for d2d text recognition, working on and returning MemString objects.

During work, the "result" field lastTokenType, lastContent, etc. describe the last recognized input token. This extends up to MemString.start, which is the "next-to-decode" reading position.

Each call to accept() means that the current result info has been consumed by the calling layer. So it is forgotten, the next token is decoded at the next-to-read adress, that address is advanced, and all result fields are updated.

Secondly, there are character-wise "back-door entries" needed for character level parsing. These partly duplicate parts of the "accept()" code.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

MemScanner.OpenTagModifier

Categories of the character data immediately follwoing an open tag.

static enum

MemScanner.TokenType

Non-overlapping Categories of the recognized tokens.

Nested classes/interfaces inherited from class eu.bandm.tools.d2d2.infra.MemString
MemString.CharMem<D2>, MemString.ReadBehindBufferEndException, MemString.Reader
Field Summary

Fields

Modifier and Type

Field

Description

protected String

commentstring_ml

Cache of the multi-line comment lead-in.

static final CharSet

nonBlank

Convenience cache of all non-blank characters.

Fields inherited from class eu.bandm.tools.d2d2.infra.MemString
CHAR_EOTEXT, CHAR_NEWLINE, colnum, data, len, linenum, start
Constructor Summary

Constructors

Constructor

Description

MemScanner(MemString.CharMem<D> m)

Constructor with a given starting point.

MemScanner(MemString<D> m)

Constructor with a given starting point.
Method Summary

Modifier and Type

Method

Description

void

accept()

Central scanning procedure: (1) throws away old results, (2) consumes more characters and (3) sets get_lastTokenType(), get_lastContent(), get_lastLocation() etc.

void

accept_blanks_filtered()

Accepts maximal sequence of whitespace characters, ignoring interspered comments (one line or multi-line).

void

accept_blanks_noCommentsAllowed()

Accept all blank characters.

int

accept_greedy_filtered_chars(CharSet set)

Accepts maximal sequence of characters from the given set, ignoring interspered comments (one line or multi-line).

(package private) boolean

accept_one(char c)

Accept one occurence of the given character.

boolean

accept_one(CharSet cs)

Accept one character from the given set and advance the input.

boolean

accept_one_filtered_char(CharSet set)

Accepts one character from the given set, ignoring interspered comments (one line or multi-line).

(package private) boolean

accept_string_w_o_lineswitch(String s)

Advances the input and returns true iff the current input starts with exactly the given string value.

MemScanner<D>

copy()

Generate a copy.

@Opt Character

endTagCharacter(String tag)

Returns the currently expected next "close parenthesis" character related to the given tag..

boolean

equals(Object o)

Compares only the fields of superclass = compares as a MemString.

String

get_closedTag()

Return the full-text tag in case of lastTokenType == closeChar

MemString<D>

get_lastContent()

Return the character contents of the most recently decoded token.

Location<D>

get_lastLocation()

Return the location area of the most recently decoded token.

MemScanner.TokenType

get_lastTokenType()

Return the token type of the moct recent decoding step.

int

get_numericValue()

Return the numeric value of the most recently decoded token.

char

getCommandChar()

Return the currently valid command character.

char

getCommentChar_1()

Return the currently valid first comment character.

@Opt Location<D>

getParenthesisDefinitionLocation(String tag)

Return the location of the most recent opening parenthesis related to the given tag..

int

hashCode()

Compares only fields of superclass = compares as a MemString.

MemScanner.OpenTagModifier

markParenthesis(String tag)

Consume an open parenthesis and memorize its counterpart.

Location<D>

popParenthesis()

Remove the next expected paranethesis from all internal stacks.

@Opt String

read_ident_or_null()

Read the maximal sequence starting with the Chars.charset_ident_initial character set, followed by the maximal sequence of characters from Chars.charset_ident_subsequent- Thus newline is not acceptred.

@Opt String

read_nonblank()

Read a maximum sequence of non-blank characters.

char

scan_again_skipping_first()

Reset state of parser as it was before the last "accept", but skip the first character of the last recognized result.

void

setCommandChar(char c)

Interface procedure: change command character.

void

setCommentChar_1(char c)

Interface procedure: change first comment character.

(package private) void

skip_after_char(char c)

Accept all characters up to and including the given character.

void

skip_until_eoline()

Skips all input up to but not incldding the next line feed character.

Methods inherited from class eu.bandm.tools.d2d2.infra.MemString
adjustLineNumberX, advanceByX_sameLine, advanceX, appendAll, asString, backwardsX, backwardsX, calccolnum, clone, equals, equals, get_colnum, get_data, get_len, get_linenum, get_start, getChar, getChar, getFollowLocation, getLocation, getPositionIndication, initFrom, startsWith, toString, upTo, upToX

Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait

Field Details
- commentstring_ml
  
  protected String commentstring_ml
  
  Cache of the multi-line comment lead-in.
- nonBlank
  
  public static final CharSet nonBlank
  
  Convenience cache of all non-blank characters.
Constructor Details
- MemScanner
  
  public MemScanner(MemString<D> m)
  
  Constructor with a given starting point. ATTENTION: only the point-wise start coordinate of the given MemString is relevant; the effective end point for parsint is the end of the underlying MemString.CharMem.
- MemScanner
  
  public MemScanner(MemString.CharMem<D> m)
  
  Constructor with a given starting point. Start and end point for parsing are the limits of the data of the given CharMem.
Method Details
- copy
  
  public MemScanner<D> copy()
  
  Generate a copy.
- equals
  
  public boolean equals(Object o)
  
  Compares only the fields of superclass = compares as a MemString.
  
  Overrides:
  
  equals in class MemString<D>
- hashCode
  
  public int hashCode()
  
  Compares only fields of superclass = compares as a MemString.
  
  Overrides:
  
  hashCode in class MemString<D>
- markParenthesis
  
  public MemScanner.OpenTagModifier markParenthesis(String tag)
  Consume an open parenthesis and memorize its counterpart. Must always be called whenever an open tag is recognized (by the code of the higher level), because it consumes the trailing blank char. (The fact whether an identifier token recognized by this parser is ineed a tag is known only to the caller.) It must be called before the corresponding call to accept()..
  This method consumes one(1) char if it is an open-parenthesis, and memorizes the closing one. It consumes one(1) char if it is a whitespace. These are the allowed combinations:
  # tag CONTINUE TEXT # tag/CONTINUE TEXT # tag///CONTINUE TEXT # tag//comment recognizes as such (includes the lead-in sequence) # tag CONTINUE TEXT # tag(continue text with parenthesis open # tag=continue text including leading special, non-paren char # tag()xx same as #tag/xx
- popParenthesis
  
  public Location<D> popParenthesis()
  
  Remove the next expected paranethesis from all internal stacks. May only be called when a closing parenthesis has been recognized as the curremt token.-
- endTagCharacter
  
  @Opt public @Opt Character endTagCharacter(String tag)
  
  Returns the currently expected next "close parenthesis" character related to the given tag.. == null if no such parenthesis is currently open.
- getParenthesisDefinitionLocation
  
  @Opt public @Opt Location<D> getParenthesisDefinitionLocation(String tag)
  
  Return the location of the most recent opening parenthesis related to the given tag.. == null if no such parenthesis is currently open.
- setCommandChar
  
  public void setCommandChar(char c)
  
  Interface procedure: change command character. Attention no checking for allowed character value = caller is fully responsible.
- setCommentChar_1
  
  public void setCommentChar_1(char c)
  
  Interface procedure: change first comment character. Attention no checking for allowed character value = caller is fully responsible.
- getCommandChar
  
  public char getCommandChar()
  
  Return the currently valid command character.
- getCommentChar_1
  
  public char getCommentChar_1()
  
  Return the currently valid first comment character. The second character for mulit-line comments is fixed to "*" in Chars.CHAR_COMMENT_2.
- get_lastTokenType
  
  public MemScanner.TokenType get_lastTokenType()
  
  Return the token type of the moct recent decoding step.
- get_lastContent
  
  public MemString<D> get_lastContent()
  
  Return the character contents of the most recently decoded token.
- get_numericValue
  
  public int get_numericValue()
  
  Return the numeric value of the most recently decoded token. Is sensible only if get_lastTokenType==numeric.
- get_closedTag
  
  public String get_closedTag()
  
  Return the full-text tag in case of lastTokenType == closeChar
- get_lastLocation
  
  public Location<D> get_lastLocation()
  
  Return the location area of the most recently decoded token. Attention: while internally columns and lines are 0-based, the exported values in the locations are both 1-based, i.e. larger by one.
- accept
  
  public void accept()
  Central scanning procedure: (1) throws away old results, (2) consumes more characters and (3) sets get_lastTokenType(), get_lastContent(), get_lastLocation() etc. accordingly. Never fails.
  Crucial are these priorities:
  "///" --> forced end quote otherwise "//" --> one-line comment start (if comment char has not been altered.) otherwise "/" --> end quote
  command char can never be set to "/" (nor to current comment)
  comment lead-in char can never be set to "*" (nor to current command)
  (This must be preserved by the caller.)
- scan_again_skipping_first
  
  public char scan_again_skipping_first()
  
  Reset state of parser as it was before the last "accept", but skip the first character of the last recognized result. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly, by invoking accept().
  
  Returns:
  
  the first "skipped" character of the input data.
- skip_until_eoline
  
  public void skip_until_eoline()
  
  Skips all input up to but not incldding the next line feed character. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- skip_after_char
  
  void skip_after_char(char c)
  
  Accept all characters up to and including the given character. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly. Assumes that that character is indeed present. Otherwise it throws a MemString.ReadBehindBufferEndException.
- accept_greedy_filtered_chars
  
  public int accept_greedy_filtered_chars(CharSet set)
  
  Accepts maximal sequence of characters from the given set, ignoring interspered comments (one line or multi-line). The currently active command character is accepted iff the standard command character = # = Chars.CHAR_COMMAND_DEFAULT is contained in the given set. Then this character is additionally accepted.
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
  
  Returns:
  
  number of accepted characters
- accept_one_filtered_char
  
  public boolean accept_one_filtered_char(CharSet set)
  
  Accepts one character from the given set, ignoring interspered comments (one line or multi-line). The currently active command character is accepted iff the standard command character = # = Chars.CHAR_COMMAND_DEFAULT is contained in the given set. Then this character is additionally accepted.
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
  
  Returns:
  
  whether one character has been accepted
- accept_blanks_filtered
  
  public void accept_blanks_filtered()
  
  Accepts maximal sequence of whitespace characters, ignoring interspered comments (one line or multi-line).
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- accept_blanks_noCommentsAllowed
  
  public void accept_blanks_noCommentsAllowed()
  
  Accept all blank characters. commandchar, commentchar, eof and parenthesis chars are not accepted because never contained in any definition of "whitespace characters".
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- read_ident_or_null
  
  @Opt public @Opt String read_ident_or_null()
  
  Read the maximal sequence starting with the Chars.charset_ident_initial character set, followed by the maximal sequence of characters from Chars.charset_ident_subsequent- Thus newline is not acceptred.
  Only the end pointer is advanced; a new tokenization must be invoked after this explicitly.
- read_nonblank
  
  @Opt public @Opt String read_nonblank()
  
  Read a maximum sequence of non-blank characters. (Thus Newline is not accepted).
  Only the end pointer is advanced; a new tokenization must be invoked after this explicitly.
- accept_one
  
  boolean accept_one(char c)
  
  Accept one occurence of the given character. Newlines are not permitted.
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- accept_one
  
  public boolean accept_one(CharSet cs)
  
  Accept one character from the given set and advance the input. Newlines are permitted.
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
- accept_string_w_o_lineswitch
  
  boolean accept_string_w_o_lineswitch(String s)
  
  Advances the input and returns true iff the current input starts with exactly the given string value.
  Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
  Assume that the argument does not contain any line switching characters and not the end-of-text character ((char)0).
  The string is accepted only if it is continuously contained outside of comment, and does not contain the current top-level close parenthesis character.
  The currently valid command char is not rejected, i.e. may be contained in the argument and will not be translated.

Class MemScanner<D>

Nested Class Summary

Nested classes/interfaces inherited from class eu.bandm.tools.d2d2.infra.MemString

Field Summary

Fields inherited from class eu.bandm.tools.d2d2.infra.MemString

Constructor Summary

Method Summary

Methods inherited from class eu.bandm.tools.d2d2.infra.MemString

Methods inherited from class java.lang.Object

Field Details

commentstring_ml

nonBlank

Constructor Details

MemScanner

MemScanner

Method Details

copy

equals

hashCode

markParenthesis

popParenthesis

endTagCharacter

getParenthesisDefinitionLocation

setCommandChar

setCommentChar_1

getCommandChar

getCommentChar_1

get_lastTokenType

get_lastContent

get_numericValue

get_closedTag

get_lastLocation

accept

scan_again_skipping_first

skip_until_eoline

skip_after_char

accept_greedy_filtered_chars

accept_one_filtered_char

accept_blanks_filtered

accept_blanks_noCommentsAllowed

read_ident_or_null

read_nonblank

accept_one

accept_one

accept_string_w_o_lineswitch