|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object eu.bandm.tools.ulex.ULex<T>
public abstract class ULex<T>
"u-lex
" stands for "micro lexer" and supports small-scale
text analysis by a fully typed combinator library.
Each instance of a subclass of ULex is either a primitive
lexical scanner, or a combinator.
Each subclass of ULex<T> is parametrized with
"T", which the
type of the result returned by successful parsing.
Parsing operates on java.lang.String values, and is
performed by calling the method parse(State)
.
Such a ULex.State
object encapsulates the data to be parsed
and the current position.
In most cases the user creates instances of subclasses of
ULex and parametrizes them according to the text fragments
which shall be accepted.
For this a number of factory methods are provided, esp. for using
the java type inference.
In other cases, a sub-class must be derived for overriding one or
more methods, e.g. as in
ULex.Seq
and
ULex.SemanticPattern
for defining the
outcome of the parsing in the semantic domain
Technical implementation:
The method parse(State)
is only called once, from outside, by the user,
intitating the parsing process of a given String value.
Then, internally, all ULex objects perform parsing
by calling tryParse(State)
. This can throw an
ULex.ExceptionFail
in case it cannot succeed any more.
Parser combinators like "alternative" (Alt
)
and "star closure" (Star
) catch this
exception and then try their alternative solutions.
ATTENTION!
Whenever an ExceptionFail
reaches the toplevel parse(State)
method, it is
translated to null. This value hence indicates that parsing was not
successful.
Independently, whenever the sub-parser fails, an OPT also returns
null, indicating the absence of some sub-parser's match.
These two semantics of null are not at all consistent.
[[FIXME ( ):DO IT BETTER !!!???]]
ATTENTION!
Currently all subclasses of ULex are also static inner classes of ULex.
This is because the author does not like many little source files.
It is no good practice,
because each sub-class currently is (via inheritance)
an inner class of itself !-)
We should change this !
ATTENTION!
Currently no back-tracking is supported.
Further alternatives (of some combinator) are only tested as long as
no alternative succeeds.
E.g.
("ab" | "a" ) "bc"will never match "abc".
Nested Class Summary | |
---|---|
static class |
ULex.Alt<A>
Parser which accepts one of two sub-parsers. |
static class |
ULex.CharSet
Accepts the next character iff it is contained/not contained n the given character Set (encoded as a String). |
static class |
ULex.Concatenate
Convenience sub-class of ULex.Seq to
concatenate two string results. |
static class |
ULex.Const
Accepts given constant String value and returns it, or throws ULex.ExceptionFail . |
static class |
ULex.ConstMap<T>
Accepts a longest prefix match from a set of constant String values and returns the values defined by the map argument; or throws ULex.ExceptionFail . |
static class |
ULex.DecimalDigit
Returns a parsed decimal digit 0..9 |
static class |
ULex.End
Parser which accepts the end of the input string. |
protected static class |
ULex.ExceptionFail
|
static class |
ULex.Fail<T>
Parser which accepts nothing |
static class |
ULex.Int
Returns a parsed integer |
static class |
ULex.Length<T>
Parser which delivers the length of items accepted by its sub-parser |
static class |
ULex.Natural
Returns a parsed natural number > 0, iff possible |
static class |
ULex.Natural_0
Returns a parsed natural number >= 0, iff possible |
static class |
ULex.Negate
Returns the integer "0-r" whenever it sub-parser returns the integer "r". |
static class |
ULex.Opt<S>
Parser which returns null in case the sub-parser throws ULex.ExceptionFail |
static class |
ULex.Pattern
Convenience sub-class of ULex.SemanticPattern in
which the accepted string itself is the result returned from parsing. |
protected static class |
ULex.PatternInteger
Returns a parsed integer |
static class |
ULex.PatternLength
Convenience sub-class of ULex.SemanticPattern in
which the character count of the accepted string is the result returned from
parsing. |
static class |
ULex.Plus<S>
Parser which accepts one or more instances of its sub-parser and returns them in one List <S> datum. |
static class |
ULex.SemanticPattern<R>
Accepts a regular expression pattern as defined by Pattern and returns what the user-defined
method ULex.SemanticPattern.semantics(String) calculates from it. |
static class |
ULex.Seq<A,B,R>
Parser which accepts a sequence of two sub-parsers and returns, what is calculated by the user-defined method ULex.Seq.combine(A, B) . |
static class |
ULex.Seq_1<A,B>
A predefined subclass of ULex.Seq which
discards the result of the second sub-parser and returns the result of the
first sub-parser. |
static class |
ULex.Seq_2<A,B>
A predefined subclass of ULex.Seq which
discards the result of the first sub-parser and returns the result of the
second sub-parser. |
static class |
ULex.SkipWhitespace<S>
Skips whitespace and the executes the sub-parser. ATTENTION! |
static class |
ULex.Star<S>
Parser which accepts zero or more instances of its sub-parser and returns them in one List <S> datum. |
static class |
ULex.State
Encapsulates the input data, output channels and the current state of the parsing process; indeed, only the "current read index" ULex.State.position
is dynamic. |
static class |
ULex.ToLower<S>
Executes the sub-parser on a lower-case version of the input string. |
Field Summary | |
---|---|
static ULex.ExceptionFail |
EXCEPTION_FAIL
|
protected T |
result
|
Constructor Summary | |
---|---|
ULex()
|
Method Summary | ||
---|---|---|
static
|
alt(ULex<S> a,
ULex<S> b)
Convenience wrapper around constructor call; esp. |
|
static
|
alt(ULex<S> a,
ULex<S> b,
ULex<S> c)
Convenience wrapper around constructor call; esp. |
|
static
|
alt(ULex<S> a,
ULex<S> b,
ULex<S> c,
ULex<S> d)
Convenience wrapper around constructor call; esp. |
|
static
|
alt(ULex<S> a,
ULex<S> b,
ULex<S> c,
ULex<S> d,
ULex<S> e)
Convenience wrapper around constructor call; esp. |
|
static ULex.CharSet |
charSet(String value,
boolean positive)
Convenience wrapper around constructor call. |
|
static ULex.Concatenate |
concatenate(ULex<String> a,
ULex<String> b)
Convenience wrapper around constructor call. |
|
static ULex.Concatenate |
concatenate(ULex<String> a,
ULex<String> b,
ULex<String> c)
Convenience wrapper around constructor call. |
|
static ULex.Concatenate |
concatenate(ULex<String> a,
ULex<String> b,
ULex<String> c,
ULex<String> d)
Convenience wrapper around constructor call. |
|
static
|
constMap(Map<String,S> map)
Convenience wrapper around constructor call. |
|
static ULex.End |
end()
Convenience wrapper around constructor call. |
|
static
|
fail()
Convenience wrapper around constructor call. |
|
static ULex.Const |
konst(String s)
|
|
static
|
length(ULex<List<S>> sub)
Convenience wrapper around constructor call; esp. |
|
static ULex.Negate |
negate(ULex<Integer> sub)
Convenience wrapper around constructor call. |
|
static
|
opt(ULex<S> sub)
Convenience wrapper around constructor call; esp. |
|
T |
parse(ULex.State state)
Toplevel entry point. |
|
static ULex.Pattern |
pattern(String s)
Convenience wrapper around constructor call. |
|
static ULex.PatternLength |
patternLength(String s)
Convenience wrapper around constructor call. |
|
static
|
plus(ULex<S> sub)
Convenience wrapper around constructor call; esp. |
|
static
|
seq_1(ULex<A> a,
ULex<B> b)
Convenience wrapper around constructor call; esp. |
|
static
|
seq_2(ULex<A> a,
ULex<B> b)
Convenience wrapper around constructor call; esp. |
|
static
|
skipWhitespace(ULex<S> sub)
Convenience wrapper around constructor call; esp. |
|
static
|
star(ULex<S> sub)
Convenience wrapper around constructor call; esp. |
|
static ULex.State |
state(String s)
Convenience wrapper around constructor call. |
|
static ULex.State |
state(String s,
int pos)
Convenience wrapper around constructor call. |
|
static
|
toLower(ULex<S> sub)
Convenience wrapper around constructor call; esp. |
|
protected abstract T |
tryParse(ULex.State state)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected T result
public static final ULex.ExceptionFail EXCEPTION_FAIL
Constructor Detail |
---|
public ULex()
Method Detail |
---|
public T parse(ULex.State state)
state
- the running state. Only "position" is changed,
the other fields are treated read-oly.
protected abstract T tryParse(ULex.State state)
public static ULex.State state(String s)
public static ULex.State state(String s, int pos)
public static <T> ULex<T> fail()
public static ULex.End end()
public static <S> ULex.Opt<S> opt(ULex<S> sub)
public static <S> ULex.Star<S> star(ULex<S> sub)
public static <S> ULex.Plus<S> plus(ULex<S> sub)
public static <S> ULex.Length<S> length(ULex<List<S>> sub)
public static <A,B> ULex.Seq_1<A,B> seq_1(ULex<A> a, ULex<B> b)
public static <A,B> ULex.Seq_2<A,B> seq_2(ULex<A> a, ULex<B> b)
public static ULex.Concatenate concatenate(ULex<String> a, ULex<String> b)
public static ULex.Concatenate concatenate(ULex<String> a, ULex<String> b, ULex<String> c)
public static ULex.Concatenate concatenate(ULex<String> a, ULex<String> b, ULex<String> c, ULex<String> d)
public static <S> ULex.Alt<S> alt(ULex<S> a, ULex<S> b)
public static <S> ULex.Alt<S> alt(ULex<S> a, ULex<S> b, ULex<S> c)
public static <S> ULex.Alt<S> alt(ULex<S> a, ULex<S> b, ULex<S> c, ULex<S> d)
public static <S> ULex.Alt<S> alt(ULex<S> a, ULex<S> b, ULex<S> c, ULex<S> d, ULex<S> e)
public static <S> ULex.SkipWhitespace<S> skipWhitespace(ULex<S> sub)
public static <S> ULex.ToLower<S> toLower(ULex<S> sub)
public static ULex.Const konst(String s)
public static ULex.CharSet charSet(String value, boolean positive)
public static <S> ULex.ConstMap<S> constMap(Map<String,S> map)
public static ULex.Pattern pattern(String s)
public static ULex.PatternLength patternLength(String s)
public static ULex.Negate negate(ULex<Integer> sub)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |