Class TokenFragment
- All Implemented Interfaces:
FormatClient
Each fragment contains an accepting nondeterministic automaton as its implementation. The fragment is said to match a sequence of input code points if and only if the sequence is accepted by its implementation automaton.
The constructor of this class is hidden from applications. The creation of fragments is effected by factory methods.
Fragments are immutable. Factory methods are provided for non-destructive updates that derive new fragments from old ones.
Fragments are compositional: any fragment can serve directly in a token rule, no matter whether it is primitive or complex.
The language of fragments shares many constructs with the language of regular expressions. Others correspond more directly with set theory. The constructs are chosen such that there is an effective implementation in terms of nondeterministic automata.
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Singleton type indicating successful matching. -
Field Summary
Modifier and TypeFieldDescriptionstatic final TokenFragment.Success
Singleton value indicating successful matching. -
Method Summary
Modifier and TypeMethodDescriptionandThen
(TokenFragment other) Returns a token fragment that matches input matched by this fragment followed by the given other fragment.andThenUntil
(TokenFragment delimiter) Returns a token fragment that matches input matched by this fragment followed by any input finally matched once by the given delimiter fragment.andThenUntil
(TokenFragment body, TokenFragment delimiter) Returns a token fragment that matches input matched by this fragment followed by input matched by the given body fragment and finally matched once by the given delimiter fragment.static TokenFragment
anyOf
(int... codePoints) Returns a token fragment that matches any one of the given input code points.butNot
(TokenFragment other) Returns a token fragment that matches input matched by this fragment but not by the given other fragment.butOnly
(TokenFragment other) Returns a token fragment that matches input matched by both this fragment and also the given other fragment.Returns a token fragment that matches any input which contains a contiguous section matched by this fragment.static TokenFragment
epsilon()
Returns a token fragment that matches zero code points of input.static TokenFragment
except
(int... codePoints) Returns a token fragment that matches any input code point except for the given ones.static TokenFragment
fail()
Returns a token fragment that does not match any input.<F> F
format
(FormatServer<F> server) Represent this or the underlying object in a human-readable, pretty-printable way.Returns the accepting automaton that implements this fragment.Returns a token fragment that matches the same inputs as this fragment, but with a simple implementation.static TokenFragment
of
(int codePoint) Returns a token fragment that matches the given input code point.static TokenFragment
Returns a token fragment that matches the input code point sequence specified by the given string.optional()
Returns a token fragment that matches input matched by this fragment, or alternatively zero code points.orElse
(TokenFragment other) Returns a token fragment that matches input either matched by this fragment or by the given other fragment, or both.plus()
Returns a token fragment that matches input matched by one or more repetitions of this fragment.static TokenFragment
range
(int from, int to) Returns a token fragment that matches any input code point in the given interval.star()
Returns a token fragment that matches input matched by zero or more repetitions of this fragment.toString()
-
Field Details
-
success
Singleton value indicating successful matching.
-
-
Method Details
-
toString
-
format
Description copied from interface:FormatClient
Represent this or the underlying object in a human-readable, pretty-printable way.- Specified by:
format
in interfaceFormatClient
- Type Parameters:
F
- the type of format objects to produce- Parameters:
server
- a factory object that can produce format objects- Returns:
- a format object produced by the server
-
getImplementation
Returns the accepting automaton that implements this fragment.A sequence of input code points is accepted by the automaton, if it can consume the whole sequence, and ends up in a state labeled with
Collections.singleton(success)
. Otherwise, i.e., if the automaton fails to consume all of the sequence or ends up in a state labeled withCollections.emptySet()
, then the input sequence is rejected.- Returns:
- the accepting automaton that implements this fragment
-
epsilon
Returns a token fragment that matches zero code points of input.- Returns:
- a token fragment that matches zero code points of input
-
fail
Returns a token fragment that does not match any input.- Returns:
- a token fragment that does not match any input
-
of
Returns a token fragment that matches the given input code point.This construct corresponds to a single ordinary or quoted character in a regular expression.
- Parameters:
codePoint
- the code point to match- Returns:
- a token fragment that matches the given input code point
-
of
Returns a token fragment that matches the input code point sequence specified by the given string.This construct corresponds to a substring of ordinary or quoted characters in a regular expression.
- Parameters:
text
- the string to match- Returns:
- a token fragment that matches the input code point sequence specified by the given string
-
anyOf
Returns a token fragment that matches any one of the given input code points.This construct corresponds to a simple character class in a regular expression.
- Parameters:
codePoints
- the code points to match- Returns:
- a token fragment that matches any one of the given input code points
-
range
Returns a token fragment that matches any input code point in the given interval.Both given end points of the interval are inclusive; i.e., a code point
c
is matched iffrom <= c && c <= to
.This construct corresponds to a character range in a regular expression.
- Parameters:
from
- the lower end of the interval of code points to matchto
- the upper end of the interval of code points to match- Returns:
- a token fragment that matches any input code point in the given interval
-
except
Returns a token fragment that matches any input code point except for the given ones.This construct corresponds to a negated character class in a regular expression.
- Parameters:
codePoints
- the code points not to match- Returns:
- a token fragment that matches any input code point except for the given ones
-
andThen
Returns a token fragment that matches input matched by this fragment followed by the given other fragment.This construct corresponds to a followed-by sequence in a regular expression.
- Parameters:
other
- the other fragment- Returns:
- a token fragment that matches input matched by this fragment followed by the given other fragment
-
orElse
Returns a token fragment that matches input either matched by this fragment or by the given other fragment, or both.An input sequence matched by both parts of the combined fragment simultaneously is not considered ambiguous.
This construct corresponds to an either-or choice in a regular expression.
- Parameters:
other
- the other fragment- Returns:
- a token fragment that matches input matched by this fragment followed by the given other fragment
-
optional
Returns a token fragment that matches input matched by this fragment, or alternatively zero code points.This construct corresponds to a
?
operator in a regular expression.- Returns:
- a token fragment that matches input matched by this fragment, or alternatively zero code points
-
plus
Returns a token fragment that matches input matched by one or more repetitions of this fragment.This construct corresponds to a
+
operator in a regular expression.- Returns:
- a token fragment that matches input matched by one or more repetitions of this fragment
-
star
Returns a token fragment that matches input matched by zero or more repetitions of this fragment.This construct corresponds to a
*
operator in a regular expression.- Returns:
- a token fragment that matches input matched by zero or more repetitions of this fragment
-
butNot
Returns a token fragment that matches input matched by this fragment but not by the given other fragment.This construct does not correspond to negative lookahead, or any other typical feature, in a regular expression. It does, however, correspond to the set difference of the respective sublanguages; a concept that is sometimes used semiformally in definitions of the lexical structure of a language.
- Parameters:
other
- the other fragment- Returns:
- a token fragment that matches input matched by this fragment but not by the given other fragment
-
butOnly
Returns a token fragment that matches input matched by both this fragment and also the given other fragment.This construct does not correspond to positive lookahead, or any other typical feature, in a regular expression. It does, however, correspond to the set intersection of the respective sublanguages.
- Parameters:
other
- the other fragment- Returns:
- a token fragment that matches input matched by both this fragment and also the given other fragment
-
contained
Returns a token fragment that matches any input which contains a contiguous section matched by this fragment.While it is common to have a regular expression match only some substring of a given input string, there is no direct correspondence for this construct.
- Returns:
- a token fragment that matches any input which contains a contiguous section matched by this fragment
-
andThenUntil
Returns a token fragment that matches input matched by this fragment followed by any input finally matched once by the given delimiter fragment.Matches of the delimiter in the middle section are forbidden. Hence this construct is useful to counteract a longest-match strategy, and prevent variable-length token rules from matching too much of input.
- Parameters:
delimiter
- the delimiter fragment- Returns:
- a fragment that matches variable-length input with the given beginning and end
-
andThenUntil
Returns a token fragment that matches input matched by this fragment followed by input matched by the given body fragment and finally matched once by the given delimiter fragment.Matches of the delimiter in the middle section are forbidden. Hence this construct is useful to counteract a longest-match strategy, and prevent variable-length token rules from matching too much of input.
- Parameters:
body
- the body fragmentdelimiter
- the delimiter fragment- Returns:
- a fragment that matches variable-length input with the given beginning and end
-
normalize
Returns a token fragment that matches the same inputs as this fragment, but with a simple implementation.The implementing automaton of the result shall have no branching transitions, and no redundant or dead states. This does not imply that the number of states is non-increasing.
- Returns:
- a token fragment that matches the same inputs as this fragment, but with a simple implementation
-