Class Text2Udom

java.lang.Object
eu.bandm.tools.d2d2.base.Text2Udom

public class Text2Udom extends Object
Parse a d2d text input into xml nodes.
  • Field Details

    • default_ErrorStrategy

      public static final Text2Udom.ErrorStrategy default_ErrorStrategy
      Evident. See the default field values of this class.
    • interactive_ErrorStrategy

      public static final Text2Udom.ErrorStrategy interactive_ErrorStrategy
      Evident, Allows partial documents and prints the stack context of a parsing error to the console.
    • non_interactive_ErrorStrategy

      public static final Text2Udom.ErrorStrategy non_interactive_ErrorStrategy
      Used for situations when basic diagnosis data is required by the "programmer" of some using code, not by an interactive user.
    • scanner

      protected MemScanner<String> scanner
      Single source of text input.
    • msg

      protected final MessageReceiver<SimpleMessage<String>> msg
      Target of most messages.
    • messageCounter

      protected final MessageCounter messageCounter
      Target of most messages.
    • xml_msg

      Target of messages for some called classes which generate XML-file-locations.
    • errorStrategy

      protected final Text2Udom.ErrorStrategy errorStrategy
      Applied error strategy, different for interactive and programatic use, etc.
    • charSetCalc

      protected final Navigate.CharSetCalc charSetCalc
      Instance to evaluate character expessions. Maintains a cache.
    • traceLevel

      protected final int traceLevel
      Determines the verbosity: ==0 stands for complete silence ==1 stands for minimal output: few loggings and not all warnings ==2, 3 stands for more output:loggings and warnings ==10 shows some synthesized source texts ==20 stands for full debugging.
    • meta_warning

      protected Definition meta_warning
    • meta_location

      protected Definition meta_location
    • meta_messageText

      protected Definition meta_messageText
    • meta_kind

      protected Definition meta_kind
    • meta_tag

      protected Definition meta_tag
    • meta_parsingError

      protected Definition meta_parsingError
    • meta_expected

      protected Definition meta_expected
    • meta_skipped

      protected Definition meta_skipped
    • meta_assumedXsltOutput

      protected Definition meta_assumedXsltOutput
    • textFileHeader

      final TextFileHeader textFileHeader
      The result of parsing the input header. Globally needed only getXsltInputNamespaces() and "getXsltMode()"
    • mode

      Kind of the text to parse.
    • top_result

      protected ResultingStructure top_result
      Toplevel result of parsing the input text. Set once when opening the State for the first tag. Globally needed for error handling.
    • xslt_alt_ubiquituous

      @Opt protected @Opt Expression xslt_alt_ubiquituous
      Contains all xslt tags which can appear anywhere in the target elements. This field is != null iff xslt mode.
    • xslt_alt_ubiquituous_repeated

      protected Expression xslt_alt_ubiquituous_repeated
      Contains all xslt tags which can appear anywhere in the target elements. This field is != null iff xslt mode.
    • xslt_module

      protected Module xslt_module
      The loaded xslt module. Only needed for isXsltDef(eu.bandm.tools.d2d2.model.Definition). This field is != null iff xslt mode.
    • state

      protected State state
      The currently growing stackframe, see State.
    • leading_ws

      protected List<ResultingChars> leading_ws
      Accumulator for ws, not yet clear whether to deliver or to ignore.
    • currentState

      protected Text2Udom.parsingState currentState
      Current micro-state, values from Text2Udom.parsingState.
    • verbatimSuppress

      protected int verbatimSuppress
      How many command characters in verbatim mode will not generated a warning.
    • interruptedState

      protected Text2Udom.parsingState interruptedState
      Needed to treat setcommand, setcomment and numeric input (i.e. pseudo-tags, which look like a tag, but do not behave so) like whitespace = resuming the interrupted state.
    • skipContainer

      protected ResultingStructure skipContainer
      For reporting skipped input
    • lastCloseWasXslt

      protected boolean lastCloseWasXslt
      Whether we come from an xslt element. Any explicitly closed xslt element is no longer in the stack, but a subsequent target-language element must nevertheless be treated as coming from "inxslt".
    • allSortedKeys

      protected static final Map<Enumeration,String[]> allSortedKeys
    • MEMSTRING_ID_SYNTHETIC

      public static final String MEMSTRING_ID_SYNTHETIC
      See Also:
    • synthLocation

      public static final Location<String> synthLocation
    • bicLoc

      protected Location<String> bicLoc
    • unicodeResults

      final Map<Integer,ResultingChars> unicodeResults
  • Constructor Details

  • Method Details

    • P

      public static void P(String s)
    • getXsltInputNamespaces

      public Map<String,String> getXsltInputNamespaces()
      Get all xslt input name spaces, as defined in the header of the input file. Valid result can be obtained only after successful parsing.
    • isXsltMode

      boolean isXsltMode()
      Whether to parse an XSLT source.
    • isXsltDef

      protected boolean isXsltDef(Definition def)
      Whether a definition is an xslt element.
    • fromFile

      @Opt public @Opt ResultingStructure fromFile(String s, ModuleRegistry moduleRegistry)
      Parse the contents of the file found at the given location. The calling graph of the main scanner functions is :
        fromFile(String filename,..) 
          -> fromFile(File,..)
             -> fromMemString(locationText,MemString,..)
                -> scanForHeader
                -> Starter.install
                -> digest() 
                    // Starting in "nothing open" mode, no verbatim or charparser
                    // is supported as top-level element
        
    • fromFile

      @Opt public @Opt ResultingStructure fromFile(File f, ModuleRegistry moduleRegistry)
      Parse the contents of the given file.
    • fromFile

      @Opt public @Opt ResultingStructure fromFile(String locationText, File f, ModuleRegistry moduleRegistry)
      Parse the contents of the given file, using the given location text for all error messages.
    • fromReader

      @Opt public @Opt ResultingStructure fromReader(String locationText, Reader r, ModuleRegistry moduleRegistry)
      Parse the contents of the given file, using the given location text for all error messages.
    • fromMemString

      @Opt public @Opt ResultingStructure fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry)
      Parse the contents of the given MemString object, which includes a text type declaration header.
      Parameters:
      locationText - use this in error messages.
    • fromMemString

      @Opt public @Opt ResultingStructure fromMemString(String locationText, MemString<String> text, ModuleRegistry moduleRegistry, @Opt @Opt XRegExp toplevelXRegExp, Text2Udom.modes mode)
      Parse the contents of the given MemString object, containing just the text body. The text format etc. are already known and fixed and given as argument values .
      Parameters:
      locationText - use this in error messages.
      text - to parse
      moduleRegistry - for to load the xslt code and the meta module.
      toplevelXRegExp - the top regexp of the target, either directly or as target indication for the xslt source to parse. (== null iff mode == xsltText)
      mode - whether to parse XML or XSLT or XSLT for mere text
    • warning

      protected void warning(Location<String> loc, String text, Object... args)
      Emits warning iff traceLevel > 0.
    • warning

      protected void warning(String text, Object... args)
      Emits warning iff traceLevel > 0.
    • hint

      protected void hint(Location<String> loc, String text, Object... args)
      Emits hint iff traceLevel > 0.
    • logStart

      protected void logStart(String text, Object... args)
      Emits message iff traceLevel > 0.
    • logEnd

      protected void logEnd(String text, Object... args)
      Emits message iff traceLevel > 0.
    • hint_xml

      protected void hint_xml(Location<XMLDocumentIdentifier> loc, String text, Object... args)
      Emits message iff traceLevel > 0.
    • error

      protected void error(Location<String> loc, String text, Object... args)
      Emits an error message and further context information depending on the values in errorStrategy.
    • error

      protected void error(String text, Object... args)
    • failure

      protected void failure(Location<?> loc, String text, Object... args)
      Throws a corresponding message exception.
    • loadMetaDefinition

      protected Definition loadMetaDefinition(Map<String,Definition> defs, String name)
      Find the definition from the loaded metamodule.
    • loadMetaModule

      protected void loadMetaModule(ModuleRegistry moduleRegistry)
      Load the general d2d meta module und find the required definitions. Store them into the global constant fields "meta_warning", "meta_location", "meta_warning", etc., for later use. Is required for xslt mode and when "partially correct documents" are allowed.
    • digest

      protected void digest()
      Main loop, consumes input data according to current "micro"-state currentState. Only called once, from fromMemString(String,MemString,ModuleRegistry). Calls the digest_<> delegate functions according to currentState. These are only called from here and could be in-lined. They mostly do not change micro-state themselves, but call process_open_tag(String,boolean), process_close_tag(String,boolean) or process_close_char(). These can change the micro-state, which has consequences when re-entering this "digest" loop.

      Each loop pass (and each delegate method) ends with calling scanner .accept(). Therefore, when each loop pass (and each delegate method) starts, the next-to-consume token is already lexically recognized and ready for consumption.

    • digest_nothing_open

      protected void digest_nothing_open()
      Initial (micro-)state, or after an explicit close. Switches currentState on command-char and on non-whitespace character data, which implies the invisible "char-data-tag". If the current top expression does NOT accept characters, then non-Whitespace is implicitly tagged as such, and that tag is searched up-ward, as usual. Whitespace however is accumulated and delayed for possible deliverance, until this happens. If the current top expression DOES accept characters, then whitespace is immediately treated as valid character data and delivered.
    • digest_look_for_tag

      protected boolean digest_look_for_tag()
      After a command char has been read. An open tag or (two kinds of) close tag may follow. Whitespace, comments and further command chars are discarded, other chars lead to an error.
      Returns:
      whether end-of-input token (=pseudo tag) has been accepted.
    • process_open_tag

      protected boolean process_open_tag(String tag, boolean isCharData)
      Only method for processing an "open" tag. Is called from digest_nothing_open() for character data, from digest_look_for_tag() for an explicit tag, and from digest_verbatim().
      ATTENTION: must be called in an "un-accepted" state of MemScanner, because next may come an "open tag modifier", which has to be decoded explicty, not by MemScanner.accept(). This may be followed by characters which are consumed by a "chars parser" or an "enum" definition, again not by the MemScanner! In these cases the outer state machine in not altered, but returned to.
      In case of a non-empty "tags parser", the state stack is extended accordingly by one level (or two levels in case of "#implicit".)
      In case of xslt-mode, the state machine is
            startInXslt=startXslt // whether current frame is an xslt
               | 
               | ascend into non-xslt, and NO xslt components are missing
               |    // if xslt itself is incomplete (CANNOT HAPPEN currently!), 
               |    //   then treat ALL missings as ERROR.
               V
            found_weakMode=true   // use "weak_firsts" modified director map
               |                  //   interpret all missing as "assumed"
               | 
               | cross open element boundary 
               V 
            all false, normal parsing // report "assumed" and collect all FURTHER
                                      //   missings as errors
         
    • process_char_parser_error

      protected void process_char_parser_error(String tag, MemString<String> charsStart)
      Is called after a non-acceptions by char parser or by an enumeration has been detected. If no "partial doc output" is enabled: throw exception and abort parsing. Otherweise add Add diagnostic outputs with tags from "d2d-meta" into the result udom, set up skip container and switch parser state to Text2Udom.parsingState.skip_for_command.
    • sortKeys

      protected static String[] sortKeys(Enumeration e)
    • consume_enumeration

      @Opt protected static @Opt ResultingStructure consume_enumeration(MemScanner<String> scan, Enumeration etype)
      Used by tag parsers AND character parsers (with multiple parallel scanners !)
    • process_close_char

      protected void process_close_char()
      Called from digest_nothing_open/look_for_tag/consume_chars, in case that closing parenthesis character has been recognized. resets scanner and simulates the reaction to a "complete" close tag.
    • process_close_tag

      protected void process_close_tag(String tag, boolean force)
      Called by digest_look_for_tag, process_open_tag (in case of empty element declarations), process_close_char. Tag may be =null for an unspecified "close lowest element"
    • process_close_tag_inner

      protected void process_close_tag_inner(@Opt @Opt String tag, boolean force, Location<String> closeTagDefLoc)
      Search the state space for "nexttag" as possible close tag. Then adjust the stack accordingly. Whenever obligate entries are missing in between, then store them to "found_missing" and execute the error diagnosis.
      Parameters:
      tag - the tag of the element to close, or null to close the last opened element.
    • return_to_upper_input_mode

      protected void return_to_upper_input_mode()
      Called to re-enter the input mode (verbatim or nothing_open) after an explicit close (tags had been openend and closed) or an implicit close (enum or character parser have consumed enough) has happened.
      Assume the stack frame of the parser/enum/closed tag is already popped from the stack.
      Assume that "inputmode verbatim" is only allowed with "content = (#PCDATA|..)*". TYPCHECKER MISSING FIXME.
    • insertPlainChars

      public static void insertPlainChars(ResultingStructure host, Definition tag, String chars)
      Aux function to insert synthesized character data in the output, which is NOT contained in the source, e.g. error messages.

      DISLOC FIXME (besser eine allgemeine lösung wie "IncludingCharBuffer" etc.

      Parameters:
      host - the toplevel element
      tag - the definition for the content
      chars - the content
    • makeGlobalSkipContainer

      protected void makeGlobalSkipContainer(ResultingStructure errormsg)
    • report_assumed_xslt_output

      protected void report_assumed_xslt_output(CheckedList<Expression> misslist)
      Is called when an target-language open/close tag has been found after an xslt template call, to report the necessary expansion. Or as soon as the possible range of the template's output (one content model) is left, so that all further missing elements will be treated as normal "missing errors".
    • report_missing_elements

      protected void report_missing_elements(boolean isOpen, String tag, boolean frameFound, CheckedList<Expression> misslist, List<SimpleMessage<String>> messlist, Location<String> closeTagDefLoc)
      this proc can be called from process_open_tag(String, boolean) or process_close_tag(String, boolean).
      Basically, there are two classes of error recovery:
      1. framefound = true ==> the tag is KNOWN, but tags before are missing, --- then continue with a shrunk stack state.
      2. framefound = false ==> the tag is NOT KNOWN, then discard all input up to the next tag and try again.

      General issue: these "Errors" behave like "Warnings" in case of "partialDocs==true".

      FIXME DocumentError should have TYPED fields like expression ,etc.
      FIXME parameter for controlling the kind of reaction MISSING FIXME
    • digest_consume_characters

      protected void digest_consume_characters()
      When an element may receive character data, and some non-ws character data has been recognized at the reading position. Discards comments, delivers text, whitespace and other chars, and switches state on a command char.
    • ignore_superfluous_end_tag

      protected void ignore_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc)
      Called after an empty tag, a parser or an enum have been consumed. Called with scanner in the "un-accepted" state, but leaves after calling "accept()" (so the next token to digest == scanner.current)
      After a character parser there MAY appear an explicit closing tag:
      #tag 12[345]!!! #/tag or
      #tag 12[345]!!! #///tag or
      #tag 12[345]!!! #/ <-ws! or
      #tag$ 12[345]!!! $
      All these are to be IGNORED, since the application of the character parser must be finished (= it must have reached a final state OR an error stste!) anyhow.
      For empty content the situation is similar:
      #tag continue text do NOT consume anything.
      #tag#/tag continue text DO consume end-tag, but nothing more (esp. not the following blank!)
      #tag# /tag continue text this is ok, and treated the same way
      . #tag #/tag continue text this is NOT OK, and should be rejected
      #tag-continue word do NOT consume anything.
      #tag!! continue do consume the parentheses (has been done by lexer).
      #tag#///tag should possibly NOT be supported. If, then treat like simple end tag!

      Operation starts with an accept(), meaning "decode at current point of reading".
      On return, also an accept() has been performed, meaning "next lexer token to digest by the top level "digest" loop (=what follows after the ended element) is now reflected by the scanner output fields".
    • accept_superfluous_end_tag

      protected boolean accept_superfluous_end_tag(String tag, boolean defIsEmpty, Definition def, Location<String> startLoc)
      Lets the scanner accept and discard an end tag, either normal or forced or by a parenthesis character. Only caller is ignore_superfluous_end_tag(String,boolean,Definition,Lcation)
    • digest_verbatim

      protected void digest_verbatim()
      Parses "verbatim input" mode, which (a) requires an explicit #/tag end tag, and (b) accepts only tags of sub-elements immediatly contained in its definition.
    • digest_skip_for_command

      protected void digest_skip_for_command()
      Micro-state for error recovery: skip until command char and then goto tag-reading mode again. Store the skipped input chars into skipContainer for meta-error-elements.
    • bicERROR

      protected void bicERROR(String text, boolean isCommand)
    • builtInMetaCommands

      protected boolean builtInMetaCommands(String tag)
    • parseVerbatimSuppres

      protected boolean parseVerbatimSuppres()
    • deliver_to_singletonstate

      protected void deliver_to_singletonstate(ResultingStructure res, State_singleton tss)
    • deliver_numeric

      protected void deliver_numeric(int val)
      Append the unicode character to the top-most resulting structure. A java string must contain two(2) "code units", what are 16bit "chars" for representing a code point > 0x0ffff, (int=32 bit char=16bit)
    • deliver

      protected void deliver(ResultingStructure res)
      Append the argument to the top-most resulting structure.
    • deliver_spontanuous

      protected void deliver_spontanuous(ResultingStructure res)
      Append the argument to the top-most resulting structure which is not encoded as xml attribute.
    • deliver

      protected void deliver(ResultingChars chars)
    • deliver

      protected void deliver(Udom res)
    • deliver_last

      protected void deliver_last()
      Deliver the last scanned character data, and set the "is whitespace" flag according to the token type returned by scanner.
    • addRestPerm

      protected void addRestPerm(State_perm permstate, CheckedList<Expression> misslist)
      Add obligate members of a "perm" expression to the term which describes the missing input.
    • find_top_singleton

      protected State_singleton find_top_singleton()
      Called from return_to_upper_input_mode(). and .... DELIVER()??? AND OPEN XSLTMODE ??
    • find_top_singleton

      protected State_singleton find_top_singleton(boolean mustBeNonAtt)