Class TokenFragment
- All Implemented Interfaces:
FormatClient,Serializable
Each fragment contains an accepting nondeterministic automaton as its implementation. The fragment is said to match a sequence of input code points if and only if the sequence is accepted by its implementation automaton.
The constructor of this class is hidden from applications. The creation of fragments is effected by factory methods.
Fragments are immutable. Factory methods are provided for non-destructive updates that derive new fragments from old ones.
Fragments are compositional: any fragment can serve directly in a token rule, no matter whether it is primitive or complex.
The language of fragments shares many constructs with the language of regular expressions. Others correspond more directly with set theory. The constructs are chosen such that there is an effective implementation in terms of nondeterministic automata.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classSingleton type indicating successful matching. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final TokenFragment.SuccessSingleton value indicating successful matching. -
Method Summary
Modifier and TypeMethodDescriptionandThen(TokenFragment other) Returns a token fragment that matches input matched by this fragment followed by the given other fragment.andThenUntil(TokenFragment delimiter) Returns a token fragment that matches input matched by this fragment followed by any input finally matched once by the given delimiter fragment.andThenWithout(TokenFragment delimiter) Returns a token fragment that matches input matched by this fragment followed by any input not containing the given delimiter fragment.static TokenFragmentany()Returns a token fragment that matches any input code point.static TokenFragmentanyOf(int... codePoints) Returns a token fragment that matches any one of the given input code points.static TokenFragmentReturns a token fragment that matches any of the input code point sequences specified by the given strings.static TokenFragmentanyOf(Collection<Integer> codePoints) Returns a token fragment that matches any one of the given input code points.static TokenFragmentanyOfRange(int first, int last) Returns a token fragment that matches any one of the given interval of input code points.static TokenFragmentanyOfRange(int first, int last, IntPredicate cond) Returns a token fragment that matches some of the given interval of input code points.butNot(TokenFragment other) Returns a token fragment that matches input matched by this fragment but not by the given other fragment.butOnly(TokenFragment other) Returns a token fragment that matches input matched by both this fragment and also the given other fragment.Returns a token fragment that matches any input which contains a contiguous section matched by this fragment.static TokenFragmentepsilon()Returns a token fragment that matches only zero code points of input.static TokenFragmentexcept(int... codePoints) Returns a token fragment that matches any input code point except for the given ones.static TokenFragmentfail()Returns a token fragment that does not match any input.<F> Fformat(FormatServer<F> server) Represent this or the underlying object in a human-readable, pretty-printable way.Returns the accepting automaton that implements this fragment.booleanChecks whether this fragment matches zero code points of input.Returns a token fragment that matches the same inputs as this fragment, but with a simple implementation.static TokenFragmentof(int codePoint) Returns a token fragment that matches the given input code point.static TokenFragmentReturns a token fragment that matches the input code point sequence specified by the given string.optional()Returns a token fragment that matches input matched by this fragment, or alternatively zero code points.orElse(TokenFragment other) Returns a token fragment that matches input either matched by this fragment or by the given other fragment, or both.plus()Returns a token fragment that matches input matched by one or more repetitions of this fragment.static TokenFragmentrange(int from, int to) Returns a token fragment that matches any input code point in the given interval.repeat(int times) Returns a token fragment that matches the given number of of input subsequences matched by this token fragment.repeat(int minTimes, int maxTimes) Returns a token fragment that matches the given number of of input subsequences matched by this token fragment.repeat(int minTimes, int maxTimes, TokenFragment separator) Returns a token fragment that matches the given number of of input subsequences matched by this token fragment interspersed with the given separator.repeat(int times, TokenFragment separator) Returns a token fragment that matches the given number of of input subsequences matched by this token fragment interspersed with the given separator.reverse()Returns a token fragment that matches the reverse of each input code point sequence matched by this token fragment.star()Returns a token fragment that matches input matched by zero or more repetitions of this fragment.toString()Returns a token fragment that matches the code-point-wise translation, according to a relational image, of the sequences matched by this fragment.until(TokenFragment delimiter) Returns a token fragment that matches input matched by this fragment, followed by input matched once by the given delimiter fragment.without(TokenFragment delimiter) Returns a token fragment that matches input matched by this fragment, but not containing input that matches the given delimiter.
-
Field Details
-
success
Singleton value indicating successful matching.
-
-
Method Details
-
toString
-
format
Represent this or the underlying object in a human-readable, pretty-printable way.- Specified by:
formatin interfaceFormatClient- Type Parameters:
F- the type of format objects to produce- Parameters:
server- a factory object that can produce format objects- Returns:
- a format object produced by the server
-
getImplementation
Returns the accepting automaton that implements this fragment.A sequence of input code points is accepted by the automaton, if it can consume the whole sequence, and ends up in a state labeled with
Collections.singleton(success). Otherwise, i.e., if the automaton fails to consume all of the sequence or ends up in a state labeled withCollections.emptySet(), then the input sequence is rejected.- Returns:
- the accepting automaton that implements this fragment
-
matchesEpsilon
public boolean matchesEpsilon()Checks whether this fragment matches zero code points of input.- Returns:
trueif this fragment matches zero code points of input;falseif all matched sequences are nonempty
-
epsilon
Returns a token fragment that matches only zero code points of input.- Returns:
- a token fragment that matches only zero code points of input
-
fail
Returns a token fragment that does not match any input.- Returns:
- a token fragment that does not match any input
-
of
Returns a token fragment that matches the given input code point.This construct corresponds to a single ordinary or quoted character in a regular expression.
- Parameters:
codePoint- the code point to match- Returns:
- a token fragment that matches the given input code point
- Throws:
IllegalArgumentException- if the given number is not a valid code point
-
of
Returns a token fragment that matches the input code point sequence specified by the given string.This construct corresponds to a substring of ordinary or quoted characters in a regular expression.
- Parameters:
text- the string to match- Returns:
- a token fragment that matches the input code point sequence specified by the given string
-
anyOf
Returns a token fragment that matches any of the input code point sequences specified by the given strings.This construct corresponds to a choice of substrings of ordinary or quoted characters in a regular expression.
- Parameters:
text- the array of strings to match- Returns:
- a token fragment that matches the input code point sequence specified by one of the given strings
-
anyOf
Returns a token fragment that matches any one of the given input code points.This construct corresponds to a simple character class in a regular expression.
- Parameters:
codePoints- the code points to match- Returns:
- a token fragment that matches any one of the given input code points
-
anyOf
Returns a token fragment that matches any one of the given input code points.This construct corresponds to a simple character class in a regular expression.
- Parameters:
codePoints- the code points to match- Returns:
- a token fragment that matches any one of the given input code points
-
anyOfRange
Returns a token fragment that matches any one of the given interval of input code points.This construct corresponds to a simple character range in a regular expression.
- Parameters:
first- the lowest code point to matchlast- the highest code point to match- Returns:
- a token fragment that matches any input code point greater or equal
to
firstand less or equal tolast
-
anyOfRange
Returns a token fragment that matches some of the given interval of input code points.This construct corresponds to a simple character range in a regular expression.
- Parameters:
first- the lowest code point to matchlast- the highest code point to matchcond- an additional predicate that code points must satisfy- Returns:
- a token fragment that matches any input code point greater or equal
to
firstand less or equal tolast, and for whichcondreturnstrue
-
range
Returns a token fragment that matches any input code point in the given interval.Both given end points of the interval are inclusive; i.e., a code point
cis matched iffrom <= c && c <= to.This construct corresponds to a character range in a regular expression.
- Parameters:
from- the lower end of the interval of code points to matchto- the upper end of the interval of code points to match- Returns:
- a token fragment that matches any input code point in the given interval
- Throws:
NegativeArraySizeException- iffrom > to
-
except
Returns a token fragment that matches any input code point except for the given ones.This construct corresponds to a negated character class in a regular expression.
- Parameters:
codePoints- the code points not to match- Returns:
- a token fragment that matches any input code point except for the given ones
-
any
Returns a token fragment that matches any input code point.This construct corresponds to a wildcard character in a regular expression.
- Returns:
- a token fragment that matches any input code point
-
andThen
Returns a token fragment that matches input matched by this fragment followed by the given other fragment.This construct corresponds to a followed-by sequence in a regular expression.
- Parameters:
other- the other fragment- Returns:
- a token fragment that matches input matched by this fragment followed by the given other fragment
-
orElse
Returns a token fragment that matches input either matched by this fragment or by the given other fragment, or both.An input sequence matched by both parts of the combined fragment simultaneously is not considered ambiguous.
This construct corresponds to an either-or choice in a regular expression.
- Parameters:
other- the other fragment- Returns:
- a token fragment that matches input matched by this fragment followed by the given other fragment
-
optional
Returns a token fragment that matches input matched by this fragment, or alternatively zero code points.This construct corresponds to a
?operator in a regular expression.- Returns:
- a token fragment that matches input matched by this fragment, or alternatively zero code points
-
plus
Returns a token fragment that matches input matched by one or more repetitions of this fragment.This construct corresponds to a
+operator in a regular expression.- Returns:
- a token fragment that matches input matched by one or more repetitions of this fragment
-
star
Returns a token fragment that matches input matched by zero or more repetitions of this fragment.This construct corresponds to a
*operator in a regular expression.- Returns:
- a token fragment that matches input matched by zero or more repetitions of this fragment
-
butNot
Returns a token fragment that matches input matched by this fragment but not by the given other fragment.This construct does not correspond to negative lookahead, or any other typical feature, in a regular expression. It does, however, correspond to the set difference of the respective sublanguages; a concept that is sometimes used semiformally in definitions of the lexical structure of a language.
- Parameters:
other- the other fragment- Returns:
- a token fragment that matches input matched by this fragment but not by the given other fragment
-
butOnly
Returns a token fragment that matches input matched by both this fragment and also the given other fragment.This construct does not correspond to positive lookahead, or any other typical feature, in a regular expression. It does, however, correspond to the set intersection of the respective sublanguages.
- Parameters:
other- the other fragment- Returns:
- a token fragment that matches input matched by both this fragment and also the given other fragment
-
contained
Returns a token fragment that matches any input which contains a contiguous section matched by this fragment.While it is common to have a regular expression match only some substring of a given input string, there is no direct correspondence for this construct.
- Returns:
- a token fragment that matches any input which contains a contiguous section matched by this fragment
-
andThenUntil
Returns a token fragment that matches input matched by this fragment followed by any input finally matched once by the given delimiter fragment.Matches of the delimiter in the middle section are forbidden. Hence this construct is useful to counteract a longest-match strategy, and prevent variable-length token rules from matching too much of input.
Conversely, the resulting fragment will try to match all input until a match of the delimiter is found. If that is missing in the input, all input will be consumed tentatively, and the match will fail at the end.
- Parameters:
delimiter- the delimiter fragment- Returns:
- a fragment that matches variable-length input with the given
beginning (
this) and end (delimiter, both inclusive)
-
andThenWithout
Returns a token fragment that matches input matched by this fragment followed by any input not containing the given delimiter fragment.By the longest-match strategy, the fragment only ends with the first occurrence of input matching the delimiter, or at the end of input.
- Parameters:
delimiter- the delimiter fragment- Returns:
- a fragment that matches variable-length input with the given
beginning (
this, inclusive) and end (delimiter, exclusive)
-
without
Returns a token fragment that matches input matched by this fragment, but not containing input that matches the given delimiter.This construct is useful to counteract a longest-match strategy, and prevent variable-length token rules from matching too much of input.
- Parameters:
delimiter- the delimiter fragment- Returns:
- a fragment that matches input matched by
thiswith the given end (delimiter, exclusive) - See Also:
-
until
Returns a token fragment that matches input matched by this fragment, followed by input matched once by the given delimiter fragment.Matches of the delimiter before the end are forbidden. This construct is useful to counteract a longest-match strategy, and prevent variable-length token rules from matching too much of input.
- Parameters:
delimiter- the delimiter fragment- Returns:
- a fragment that matches input matched by
thiswith the given end (delimiter, inclusive) - See Also:
-
normalize
Returns a token fragment that matches the same inputs as this fragment, but with a simple implementation.The implementing automaton of the result shall have no branching transitions, and no redundant or dead states. This does not imply that the number of states is non-increasing.
- Returns:
- a token fragment that matches the same inputs as this fragment, but with a simple implementation
-
reverse
Returns a token fragment that matches the reverse of each input code point sequence matched by this token fragment.- Returns:
- a token fragment that matches the reverse of each input code point sequences matched by this token fragment
-
repeat
Returns a token fragment that matches the given number of of input subsequences matched by this token fragment.The underlying automaton is unfolded the given number of times; this method should only be called with fairly small numbers.
- Parameters:
times- the number of repetitions- Returns:
- a token fragment that matches the given number of of input subsequences matched by this token fragment
- Throws:
IllegalArgumentException- iftimesis negative- Since:
- 1.1
-
repeat
Returns a token fragment that matches the given number of of input subsequences matched by this token fragment interspersed with the given separator.The underlying automaton is unfolded the given number of times; this method should only be called with fairly small numbers.
- Parameters:
times- the number of repetitionsseparator- the fragment to intersperse between repetitions- Returns:
- a token fragment that matches the given number of of input subsequences matched by this token fragment interspersed with the given separator
- Throws:
IllegalArgumentException- iftimesis negative- Since:
- 1.1
-
repeat
Returns a token fragment that matches the given number of of input subsequences matched by this token fragment.The underlying automaton is unfolded the given number of times; this method should only be called with fairly small numbers.
- Parameters:
minTimes- the minimal number of repetitionsmaxTimes- the maximal number of repetitions- Returns:
- a token fragment that matches the given number of of input subsequences matched by this token fragment
- Throws:
IllegalArgumentException- ifminTimesormaxTimesis negative, ormaxTimesis less thanminTimes- Since:
- 1.1
-
repeat
Returns a token fragment that matches the given number of of input subsequences matched by this token fragment interspersed with the given separator.The underlying automaton is unfolded the given number of times; this method should only be called with fairly small numbers.
- Parameters:
minTimes- the minimal number of repetitionsmaxTimes- the maximal number of repetitionsseparator- the fragment to intersperse between repetitions- Returns:
- a token fragment that matches the given number of of input subsequences matched by this token fragment interspersed with the separator
- Throws:
IllegalArgumentException- ifminTimesormaxTimesis negative, ormaxTimesis less thanminTimesNullPointerException- ifseparatoris null- Since:
- 1.1
-
translate
Returns a token fragment that matches the code-point-wise translation, according to a relational image, of the sequences matched by this fragment.The resulting fragment matches a sequence of code points y1...yn, if and only if there is a corresponding sequence of code points x1...xn matched by this fragment, such that
trans(xi, yi)returnstruefor all i.- Parameters:
trans- the translating relation- Returns:
- a token fragment where all transition keys of this behavior have
been translated according to
trans - Throws:
NullPointerException- iftransis null- Since:
- 1.1
-