Class MemScanner<D>

java.lang.Object
eu.bandm.tools.d2d2.infra.MemString<D>
eu.bandm.tools.d2d2.base.MemScanner<D>
Type Parameters:
D - the document type used in locations. Sensible are String, File, XMLDocumentIdentifier, URI, etc.
All Implemented Interfaces:
Cloneable

public class MemScanner<D> extends MemString<D>
Basic scanner for d2d text recognition, working on and returning MemString objects.

During work, the "result" field lastTokenType, lastContent, etc. describe the last recognized input token. This extends up to MemString.start, which is the "next-to-decode" reading position.

Each call to accept() means that the current result info has been consumed by the calling layer. So it is forgotten, the next token is decoded at the next-to-read adress, that address is advanced, and all result fields are updated.

Secondly, there are character-wise "back-door entries" needed for character level parsing. These partly duplicate parts of the "accept()" code.

  • Field Details

    • commentstring_ml

      protected String commentstring_ml
      Cache of the multi-line comment lead-in.
    • nonBlank

      public static final CharSet nonBlank
      Convenience cache of all non-blank characters.
  • Constructor Details

    • MemScanner

      public MemScanner(MemString<D> m)
      Constructor with a given starting point. ATTENTION: only the point-wise start coordinate of the given MemString is relevant; the effective end point for parsint is the end of the underlying MemString.CharMem.
    • MemScanner

      public MemScanner(MemString.CharMem<D> m)
      Constructor with a given starting point. Start and end point for parsing are the limits of the data of the given CharMem.
  • Method Details

    • copy

      public MemScanner<D> copy()
      Generate a copy.
    • equals

      public boolean equals(Object o)
      Compares only the fields of superclass = compares as a MemString.
      Overrides:
      equals in class MemString<D>
    • hashCode

      public int hashCode()
      Compares only fields of superclass = compares as a MemString.
      Overrides:
      hashCode in class MemString<D>
    • markParenthesis

      public MemScanner.OpenTagModifier markParenthesis(String tag)
      Consume an open parenthesis and memorize its counterpart. Must always be called whenever an open tag is recognized (by the code of the higher level), because it consumes the trailing blank char. (The fact whether an identifier token recognized by this parser is ineed a tag is known only to the caller.) It must be called before the corresponding call to accept()..

      This method consumes one(1) char if it is an open-parenthesis, and memorizes the closing one. It consumes one(1) char if it is a whitespace. These are the allowed combinations:

          #  tag CONTINUE TEXT
          #  tag/CONTINUE TEXT
          #  tag///CONTINUE TEXT
          #  tag//comment recognizes as such (includes the lead-in sequence)
          #  tag
          CONTINUE TEXT
          #  tag(continue text with parenthesis open 
          #  tag=continue text including leading special, non-paren char 
          #  tag()xx  same as #tag/xx
              
    • popParenthesis

      public Location<D> popParenthesis()
      Remove the next expected paranethesis from all internal stacks. May only be called when a closing parenthesis has been recognized as the curremt token.-
    • endTagCharacter

      @Opt public @Opt Character endTagCharacter(String tag)
      Returns the currently expected next "close parenthesis" character related to the given tag.. == null if no such parenthesis is currently open.
    • getParenthesisDefinitionLocation

      @Opt public @Opt Location<D> getParenthesisDefinitionLocation(String tag)
      Return the location of the most recent opening parenthesis related to the given tag.. == null if no such parenthesis is currently open.
    • setCommandChar

      public void setCommandChar(char c)
      Interface procedure: change command character. Attention no checking for allowed character value = caller is fully responsible.
    • setCommentChar_1

      public void setCommentChar_1(char c)
      Interface procedure: change first comment character. Attention no checking for allowed character value = caller is fully responsible.
    • getCommandChar

      public char getCommandChar()
      Return the currently valid command character.
    • getCommentChar_1

      public char getCommentChar_1()
      Return the currently valid first comment character. The second character for mulit-line comments is fixed to "*" in Chars.CHAR_COMMENT_2.
    • get_lastTokenType

      public MemScanner.TokenType get_lastTokenType()
      Return the token type of the moct recent decoding step.
    • get_lastContent

      public MemString<D> get_lastContent()
      Return the character contents of the most recently decoded token.
    • get_numericValue

      public int get_numericValue()
      Return the numeric value of the most recently decoded token. Is sensible only if get_lastTokenType==numeric.
    • get_closedTag

      public String get_closedTag()
      Return the full-text tag in case of lastTokenType == closeChar
    • get_lastLocation

      public Location<D> get_lastLocation()
      Return the location area of the most recently decoded token. Attention: while internally columns and lines are 0-based, the exported values in the locations are both 1-based, i.e. larger by one.
    • accept

      public void accept()
      Central scanning procedure: (1) throws away old results, (2) consumes more characters and (3) sets get_lastTokenType(), get_lastContent(), get_lastLocation() etc. accordingly. Never fails.

      Crucial are these priorities:

                       "///" --> forced end quote      
         otherwise     "//" --> one-line comment start (if comment char has not been altered.)
         otherwise     "/" --> end quote
         
      command char can never be set to "/" (nor to current comment)
      comment lead-in char can never be set to "*" (nor to current command)
      (This must be preserved by the caller.)
    • scan_again_skipping_first

      public char scan_again_skipping_first()
      Reset state of parser as it was before the last "accept", but skip the first character of the last recognized result. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly, by invoking accept().
      Returns:
      the first "skipped" character of the input data.
    • skip_until_eoline

      public void skip_until_eoline()
      Skips all input up to but not incldding the next line feed character. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.
    • skip_after_char

      void skip_after_char(char c)
      Accept all characters up to and including the given character. Only the input pointer is advanced; a new tokenization must be invoked after this explicitly. Assumes that that character is indeed present. Otherwise it throws a MemString.ReadBehindBufferEndException.
    • accept_greedy_filtered_chars

      public int accept_greedy_filtered_chars(CharSet set)
      Accepts maximal sequence of characters from the given set, ignoring interspered comments (one line or multi-line). The currently active command character is accepted iff the standard command character = # = Chars.CHAR_COMMAND_DEFAULT is contained in the given set. Then this character is additionally accepted.

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

      Returns:
      number of accepted characters
    • accept_one_filtered_char

      public boolean accept_one_filtered_char(CharSet set)
      Accepts one character from the given set, ignoring interspered comments (one line or multi-line). The currently active command character is accepted iff the standard command character = # = Chars.CHAR_COMMAND_DEFAULT is contained in the given set. Then this character is additionally accepted.

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

      Returns:
      whether one character has been accepted
    • accept_blanks_filtered

      public void accept_blanks_filtered()
      Accepts maximal sequence of whitespace characters, ignoring interspered comments (one line or multi-line).

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

    • accept_blanks_noCommentsAllowed

      public void accept_blanks_noCommentsAllowed()
      Accept all blank characters. commandchar, commentchar, eof and parenthesis chars are not accepted because never contained in any definition of "whitespace characters".

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

    • read_ident_or_null

      @Opt public @Opt String read_ident_or_null()
      Read the maximal sequence starting with the Chars.charset_ident_initial character set, followed by the maximal sequence of characters from Chars.charset_ident_subsequent- Thus newline is not acceptred.

      Only the end pointer is advanced; a new tokenization must be invoked after this explicitly.

    • read_nonblank

      @Opt public @Opt String read_nonblank()
      Read a maximum sequence of non-blank characters. (Thus Newline is not accepted).

      Only the end pointer is advanced; a new tokenization must be invoked after this explicitly.

    • accept_one

      boolean accept_one(char c)
      Accept one occurence of the given character. Newlines are not permitted.

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

    • accept_one

      public boolean accept_one(CharSet cs)
      Accept one character from the given set and advance the input. Newlines are permitted.

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

    • accept_string_w_o_lineswitch

      boolean accept_string_w_o_lineswitch(String s)
      Advances the input and returns true iff the current input starts with exactly the given string value.

      Only the input pointer is advanced; a new tokenization must be invoked after this explicitly.

      Assume that the argument does not contain any line switching characters and not the end-of-text character ((char)0).

      The string is accepted only if it is continuously contained outside of comment, and does not contain the current top-level close parenthesis character.

      The currently valid command char is not rejected, i.e. may be contained in the argument and will not be translated.