eu.bandm.tools.ulex
Class ULex<T>

java.lang.Object
  extended by eu.bandm.tools.ulex.ULex<T>
Direct Known Subclasses:
ULex.Alt, ULex.CharSet, ULex.Const, ULex.ConstMap, ULex.End, ULex.Fail, ULex.Length, ULex.Negate, ULex.Opt, ULex.Plus, ULex.SemanticPattern, ULex.Seq, ULex.SkipWhitespace, ULex.Star, ULex.ToLower

public abstract class ULex<T>
extends Object

"u-lex" stands for "micro lexer" and supports small-scale text analysis by a fully typed combinator library.
Each instance of a subclass of ULex is either a primitive lexical scanner, or a combinator.
Each subclass of ULex<T> is parametrized with "T", which the type of the result returned by successful parsing.
Parsing operates on java.lang.String values, and is performed by calling the method parse(State).
Such a ULex.State object encapsulates the data to be parsed and the current position.
In most cases the user creates instances of subclasses of ULex and parametrizes them according to the text fragments which shall be accepted. For this a number of factory methods are provided, esp. for using the java type inference.
In other cases, a sub-class must be derived for overriding one or more methods, e.g. as in ULex.Seq and ULex.SemanticPattern for defining the outcome of the parsing in the semantic domain <T>.


Technical implementation:
The method parse(State) is only called once, from outside, by the user, intitating the parsing process of a given String value. Then, internally, all ULex objects perform parsing by calling tryParse(State). This can throw an ULex.ExceptionFail in case it cannot succeed any more.
Parser combinators like "alternative" (Alt) and "star closure" (Star) catch this exception and then try their alternative solutions.
ATTENTION!
Whenever an ExceptionFail reaches the toplevel parse(State) method, it is translated to null. This value hence indicates that parsing was not successful.
Independently, whenever the sub-parser fails, an OPT also returns null, indicating the absence of some sub-parser's match.
These two semantics of null are not at all consistent. [[FIXME ( ):DO IT BETTER !!!???]]
ATTENTION!
Currently all subclasses of ULex are also static inner classes of ULex. This is because the author does not like many little source files. It is no good practice, because each sub-class currently is (via inheritance) an inner class of itself !-)
We should change this !

ATTENTION!
Currently no back-tracking is supported.
Further alternatives (of some combinator) are only tested as long as no alternative succeeds. E.g.

    ("ab" | "a" ) "bc"
    
will never match "abc".
There is a "backtracking-library" by bt, which COULD(/SHOULD ??) be used for realizing a back-tracking version whenever appropriate.


Nested Class Summary
static class ULex.Alt<A>
          Parser which accepts one of two sub-parsers.
static class ULex.CharSet
          Accepts the next character iff it is contained/not contained n the given character Set (encoded as a String).
static class ULex.Concatenate
          Convenience sub-class of ULex.Seq to concatenate two string results.
static class ULex.Const
          Accepts given constant String value and returns it, or throws ULex.ExceptionFail.
static class ULex.ConstMap<T>
          Accepts a longest prefix match from a set of constant String values and returns the values defined by the map argument; or throws ULex.ExceptionFail.
static class ULex.DecimalDigit
          Returns a parsed decimal digit 0..9
static class ULex.End
          Parser which accepts the end of the input string.
protected static class ULex.ExceptionFail
           
static class ULex.Fail<T>
          Parser which accepts nothing
static class ULex.Int
          Returns a parsed integer
static class ULex.Length<T>
          Parser which delivers the length of items accepted by its sub-parser
static class ULex.Natural
          Returns a parsed natural number > 0, iff possible
static class ULex.Natural_0
          Returns a parsed natural number >= 0, iff possible
static class ULex.Negate
          Returns the integer "0-r" whenever it sub-parser returns the integer "r".
static class ULex.Opt<S>
          Parser which returns null in case the sub-parser throws ULex.ExceptionFail
static class ULex.Pattern
          Convenience sub-class of ULex.SemanticPattern in which the accepted string itself is the result returned from parsing.
protected static class ULex.PatternInteger
          Returns a parsed integer
static class ULex.PatternLength
          Convenience sub-class of ULex.SemanticPattern in which the character count of the accepted string is the result returned from parsing.
static class ULex.Plus<S>
          Parser which accepts one or more instances of its sub-parser and returns them in one List<S> datum.
static class ULex.SemanticPattern<R>
          Accepts a regular expression pattern as defined by Pattern and returns what the user-defined method ULex.SemanticPattern.semantics(String) calculates from it.
static class ULex.Seq<A,B,R>
          Parser which accepts a sequence of two sub-parsers and returns, what is calculated by the user-defined method ULex.Seq.combine(A, B).
static class ULex.Seq_1<A,B>
          A predefined subclass of ULex.Seq which discards the result of the second sub-parser and returns the result of the first sub-parser.
static class ULex.Seq_2<A,B>
          A predefined subclass of ULex.Seq which discards the result of the first sub-parser and returns the result of the second sub-parser.
static class ULex.SkipWhitespace<S>
          Skips whitespace and the executes the sub-parser.
ATTENTION!
static class ULex.Star<S>
          Parser which accepts zero or more instances of its sub-parser and returns them in one List<S> datum.
static class ULex.State
          Encapsulates the input data, output channels and the current state of the parsing process; indeed, only the "current read index" ULex.State.position is dynamic.
static class ULex.ToLower<S>
          Executes the sub-parser on a lower-case version of the input string.
 
Field Summary
static ULex.ExceptionFail EXCEPTION_FAIL
           
protected  T result
           
 
Constructor Summary
ULex()
           
 
Method Summary
static
<S> ULex.Alt<S>
alt(ULex<S> a, ULex<S> b)
          Convenience wrapper around constructor call; esp.
static
<S> ULex.Alt<S>
alt(ULex<S> a, ULex<S> b, ULex<S> c)
          Convenience wrapper around constructor call; esp.
static
<S> ULex.Alt<S>
alt(ULex<S> a, ULex<S> b, ULex<S> c, ULex<S> d)
          Convenience wrapper around constructor call; esp.
static
<S> ULex.Alt<S>
alt(ULex<S> a, ULex<S> b, ULex<S> c, ULex<S> d, ULex<S> e)
          Convenience wrapper around constructor call; esp.
static ULex.CharSet charSet(String value, boolean positive)
          Convenience wrapper around constructor call.
static ULex.Concatenate concatenate(ULex<String> a, ULex<String> b)
          Convenience wrapper around constructor call.
static ULex.Concatenate concatenate(ULex<String> a, ULex<String> b, ULex<String> c)
          Convenience wrapper around constructor call.
static ULex.Concatenate concatenate(ULex<String> a, ULex<String> b, ULex<String> c, ULex<String> d)
          Convenience wrapper around constructor call.
static
<S> ULex.ConstMap<S>
constMap(Map<String,S> map)
          Convenience wrapper around constructor call.
static ULex.End end()
          Convenience wrapper around constructor call.
static
<T> ULex<T>
fail()
          Convenience wrapper around constructor call.
static ULex.Const konst(String s)
           
static
<S> ULex.Length<S>
length(ULex<List<S>> sub)
          Convenience wrapper around constructor call; esp.
static ULex.Negate negate(ULex<Integer> sub)
          Convenience wrapper around constructor call.
static
<S> ULex.Opt<S>
opt(ULex<S> sub)
          Convenience wrapper around constructor call; esp.
 T parse(ULex.State state)
          Toplevel entry point.
static ULex.Pattern pattern(String s)
          Convenience wrapper around constructor call.
static ULex.PatternLength patternLength(String s)
          Convenience wrapper around constructor call.
static
<S> ULex.Plus<S>
plus(ULex<S> sub)
          Convenience wrapper around constructor call; esp.
static
<A,B> ULex.Seq_1<A,B>
seq_1(ULex<A> a, ULex<B> b)
          Convenience wrapper around constructor call; esp.
static
<A,B> ULex.Seq_2<A,B>
seq_2(ULex<A> a, ULex<B> b)
          Convenience wrapper around constructor call; esp.
static
<S> ULex.SkipWhitespace<S>
skipWhitespace(ULex<S> sub)
          Convenience wrapper around constructor call; esp.
static
<S> ULex.Star<S>
star(ULex<S> sub)
          Convenience wrapper around constructor call; esp.
static ULex.State state(String s)
          Convenience wrapper around constructor call.
static ULex.State state(String s, int pos)
          Convenience wrapper around constructor call.
static
<S> ULex.ToLower<S>
toLower(ULex<S> sub)
          Convenience wrapper around constructor call; esp.
protected abstract  T tryParse(ULex.State state)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

result

protected T result

EXCEPTION_FAIL

public static final ULex.ExceptionFail EXCEPTION_FAIL
Constructor Detail

ULex

public ULex()
Method Detail

parse

public T parse(ULex.State state)
Toplevel entry point.

Parameters:
state - the running state. Only "position" is changed, the other fields are treated read-oly.
Returns:
null in case of failure, so null cannot be a member of the result type.

tryParse

protected abstract T tryParse(ULex.State state)

state

public static ULex.State state(String s)
Convenience wrapper around constructor call.


state

public static ULex.State state(String s,
                               int pos)
Convenience wrapper around constructor call.


fail

public static <T> ULex<T> fail()
Convenience wrapper around constructor call.


end

public static ULex.End end()
Convenience wrapper around constructor call.


opt

public static <S> ULex.Opt<S> opt(ULex<S> sub)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


star

public static <S> ULex.Star<S> star(ULex<S> sub)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


plus

public static <S> ULex.Plus<S> plus(ULex<S> sub)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


length

public static <S> ULex.Length<S> length(ULex<List<S>> sub)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


seq_1

public static <A,B> ULex.Seq_1<A,B> seq_1(ULex<A> a,
                                          ULex<B> b)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


seq_2

public static <A,B> ULex.Seq_2<A,B> seq_2(ULex<A> a,
                                          ULex<B> b)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


concatenate

public static ULex.Concatenate concatenate(ULex<String> a,
                                           ULex<String> b)
Convenience wrapper around constructor call.


concatenate

public static ULex.Concatenate concatenate(ULex<String> a,
                                           ULex<String> b,
                                           ULex<String> c)
Convenience wrapper around constructor call.


concatenate

public static ULex.Concatenate concatenate(ULex<String> a,
                                           ULex<String> b,
                                           ULex<String> c,
                                           ULex<String> d)
Convenience wrapper around constructor call.


alt

public static <S> ULex.Alt<S> alt(ULex<S> a,
                                  ULex<S> b)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


alt

public static <S> ULex.Alt<S> alt(ULex<S> a,
                                  ULex<S> b,
                                  ULex<S> c)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


alt

public static <S> ULex.Alt<S> alt(ULex<S> a,
                                  ULex<S> b,
                                  ULex<S> c,
                                  ULex<S> d)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


alt

public static <S> ULex.Alt<S> alt(ULex<S> a,
                                  ULex<S> b,
                                  ULex<S> c,
                                  ULex<S> d,
                                  ULex<S> e)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


skipWhitespace

public static <S> ULex.SkipWhitespace<S> skipWhitespace(ULex<S> sub)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


toLower

public static <S> ULex.ToLower<S> toLower(ULex<S> sub)
Convenience wrapper around constructor call; esp. useful because it supports type inference, in contrast to the unwrapped!


konst

public static ULex.Const konst(String s)

charSet

public static ULex.CharSet charSet(String value,
                                   boolean positive)
Convenience wrapper around constructor call.


constMap

public static <S> ULex.ConstMap<S> constMap(Map<String,S> map)
Convenience wrapper around constructor call.


pattern

public static ULex.Pattern pattern(String s)
Convenience wrapper around constructor call.


patternLength

public static ULex.PatternLength patternLength(String s)
Convenience wrapper around constructor call.


negate

public static ULex.Negate negate(ULex<Integer> sub)
Convenience wrapper around constructor call.