The alea Language Reference
Draft (2025-01-26)
1
Conventions
1.1
Unicode
1.2
Code Examples
1.3
Type Signatures
2
Files
3
Comments
3.1
Line Comments
3.2
Block Comments
4
Variables
4.1
Assignment
5
Functions
5.1
Overloading
6
Numbers
6.1
Subtypes
6.2
Numerical Constants
6.2.1
Types of Numbers
6.3
Arithmetic Operators
6.3.1
Powers
6.3.2
Multiplication and Division
6.3.3
Addition and Subtraction
6.3.4
Comparisons
6.3.5
Conjunction and Disjunction
6.4
Mathematical Functions
6.4.1
Minimum and Maximum
6.4.2
Rounding
6.4.3
Absolute Value and Sign
6.5
Ranges
6.6
Numerical Case Distinction
7
Collections
7.1
Shapes, Optionality and Element Types
7.2
Building a Collection
7.3
Transforming a Collection
7.3.1
Comprehension Syntax
7.3.2
Comprehension Semantics
7.4
Evaluating a Collection
7.4.1
Reduction Semantics
7.5
Collection Operators
7.6
Collection Functions
7.6.1
Counting Elements
7.6.2
Changing the Collection Shape
7.6.3
Multiplicities
8
Records
9
Tags
10
Distributions
A wide variety of Unicode characters may occur in alea source code. The mathematical notation uses the dedicated unicode characters preferentially, but for each character outside the ASCII range, an alternative in terms of combinations of ASCII characters is provided. Thus alea code can be written with any editor and keyboard.
Where this document discusses the characters for a notation, ASCII characters
are simply displayed as such, e.g. *
. Other Unicode characters are
referenced explicitly by code point and name, e.g. ¬
(U+00AC not
sign).
Valid examples of code are displayed in boxes.
valid example code
When discussion alternatives of notation, the meta-separator code!⤏! is used to indicate that the more specific left-hand side notation can be reduced, by rewriting or evaluation, to the right-hand side.
expression ⤏ alternative or result
Invalid examples of code, demonstrating common errors, are displayed in boxes of a distinct appearance.
invalid example code
Type signatures of functions, operators and probability distributions are displayed in boxes and use a standard mathematical notation.
f : A → B |
alea source code can be stored in arbitrarily named files. The content should be a single (commented) expression, and be encoded in UTF-8.
Comments are sections of the source code that are ignored for program execution, and exist only for the purpose of the human reader.
The character sequence --
or the single character —
(U+2014 em dash) starts a comment that extends to the end of the same
line.
1 + 1 -- a very large number, beware!
A line comment must always be terminated by an explicit line break; it is an error if the file ends before that.
A subexpression can be assigned to a variable and the result referred to
subsequently in another subexpression, the body — a so-called let
construct. Assignment uses the operator :=
or ≝
(U+225D
equal to by definition). The subexpressions are separated by ;
.
This construct is right-associative, so that several assignments can be chained.
two := 1 + 1; four := two * two; two * four ⤏ 8
For more complex expressions, it is recommended to start a new line after
each ;
.
If the body is omitted, the overall result is the (last) one assigned.
a := 42 ⤏ 42
All other operators bind more strongly than :=
and ;
. Hence a
let construct nested inside an expression must be enclosed in parentheses
(
and )
.
The binding of a value to a variable by assignment is only effective for the following body; it expires at the end of the let construct. Variables can be reused in nested subexpressions; in that case the previous value is unaffected outside the body.
a := 1; (a := 2; a + a) + a ⤏ 5
alea comes with a library of predefined functions. Every function is identified uniquely by a name. Predefined functions shall be introduced below, in the respective section dealing with the kind of data that they operate on. Operators built into the language are merely functions with special names and application syntax.
In the current version of alea, there is no mechanism for the user to define their own functions.
Each named function is conceived as having a single implementation that operates partially on the universe of untyped values. In order for a function application to be valid, it must be ensured that the function is defined for that particular argument value. This check is done statically, such that invalid function applications are ruled out without having to (try and fail to) evaluate them.
Every expression is assigned a type that approximates the set of its possible values. Functions can be given one or more type signatures of the form f : A → B, meaning the following contract: "If the value of the argument x is in the set denoted by type A, then the result of function application f(x) is defined and in the set denoted by type B." If a function is given more than one type signature, they all apply simultaneously.
Static checking of a function application starts with determining the type T of the argument expression x. Then the type U of the function application f(x) is the greatest lower bound of the B in all type signatures f : A → B where A is a supertype of T.
Numerical computations operate exactly on the rational numbers, the set ℚ. On the one hand, there is no limit to the precision; the number of significant digits can be as large as necessary. On the other hand, operations that conceptually return irrational values, e.g., the square root or the natural logarithm, are not supported. There are no approximate computations on floating-point numbers.
Besides the numbers proper, there is the special value NaN. Arithmetically undefined operations do not fail, but return NaN instead.
Four different sets of numbers are recognized as types:
These form a linear subtype hierarchy: A boolean is also a natural; a natural is also an integer; an integer is also a rational.
Numerical constants are written as decimal numbers. Numbers that are not
integers can be written either as decimal fractions or as true fractions with
the separator /
or ⁄
(U+2044 fraction slash).
1024, −1, 3.14142, 22/7
The notation of numbers follows traditional mathematical conventions:
123. + .456
The special value NaN can be written as an undefined fraction.
0/0
Since the number types overlap, a particular number may be an instance of more than one type. But for every number there is a unique principal type that is subtype of all types the number is an instance of:
The notation of arithmetic expressions largely follows traditional
mathematical conventions. The available operators are listed here in decreasing
order of precedence. Parentheses (
...)
can be used for
grouping.
A number is raised to a power with the operator ^
.
2^32, 10^(−12)
Nested powers are right-associative.
a^b^c ⤏ a^(b^c)
The power operation admits several combinations of argument types.
(^) : ℚ × ℤ → ℚ |
(^) : ℤ × ℕ → ℤ |
(^) : ℕ × ℕ → ℕ |
Negative exponents must be parenthesized.
n^−1
Note that fractional exponents are not supported semantically, even though the syntax may be well-formed.
2^12^(-1)
Numbers are multiplied with the operator *
. For division,
there are three distinct operators, namely /
or ∕
(U+2215 division slash) for exact rational division, //
or ⫽
(U+2AFD double solidus operator) for
truncating integer division, and \\
for modulo.
Note that the use of specialized Unicode slash symbols can disambiguate between a fractional constant and the division of two integers.
All four operators have equal precedence, can be mixed freely, and are jointly left-associative.
a * b / c \\ d ⤏ ((a * b) / c) \\ d
Multiplicative operators bind less strongly than powers.
6.032 * 10^23
Negative factors must be parenthesized.
3 * −5
The multiplicative operators admit several, distinct combinations of argument types.
(*) : ℚ × ℚ → ℚ |
(*) : ℤ × ℤ → ℤ |
(*) : ℕ × ℕ → ℕ |
Note that multiplying two booleans does not yield a boolean. This is for symmetry with addition; see Section 6.3.3 ("Addition and Subtraction"). For the proper conjunction operator, see Section 6.3.5 ("Conjunction and Disjunction").
(/) : ℚ × ℚ → ℚ |
(//) : ℚ × ℕ → ℤ |
(//) : ℕ × ℕ → ℕ |
(\\) : ℚ × ℕ → ℚ |
(\\) : ℤ × ℕ → ℕ |
The truncating division always yields the largest integer less than or equal to the result of exact division; compare Section 6.4.2 ("Rounding"). The modulo operation yields the positive remainder of division.
x / n = n * (x // n) + x \\ n
Note that a negative or fractional modulus is not supported semantically, even though the syntax may be well-formed.
17 // (-4)
Numbers are added with the operator +
. For subtraction, there are two
distinct operators, namely -
or −
(U+2212 minus sign)
for signed numbers, and \
or ∖
(U+2216 set minus) for
natural numbers.
All three operators have equal precedence, can be mixed freely, and are jointly left-associative.
a − b + c ⤏ (a − b) + c
Additive operators bind less strongly than multiplicative operators.
a + b * c ⤏ a + (b * c)
The multiplicative operators admit several, distinct combinations
of argument types. Note that +
and \
are also used for
collections; see Section 7.5 ("Collection Operators").
(+) : ℚ × ℚ → ℚ |
(+) : ℤ × ℤ → ℤ |
(+) : ℕ × ℕ → ℕ |
Note that adding two booleans does not yield a boolean. This is useful for counting. For the proper disjunction operator, see Section 6.3.5 ("Conjunction and Disjunction").
(−) : ℚ × ℚ → ℚ |
(−) : ℤ × ℤ → ℤ |
Unary sign operators are understood as having an implicit zero left operand.
−a ⤏ 0 − a
Note that it follows from this reading that −0
is not a
boolean, but principally an integer.
The nonnegative subtraction operation a \ b
is the same as
a − b
if positive, otherwise zero.
(\) : ℕ × ℕ → ℕ |
Numbers can be compared with the relational operators =
for equality,
and /=
or ≠
(U+2260 not equal to) for inequality.
They can be compared for order with <
, >
, <=
or ≤
(U+2264 less-than or equal to), and >=
or ≥
(U+2265
greater-than or equal to).
Comparisons are not associative, but can be chained. The meaning is the conjunction of all comparisons of pairwise adjacent operands, with each operand evaluated only once; see Section 4.1 ("Assignment") and Section 6.3.5 ("Conjunction and Disjunction").
a ≤ b < c ⤏ tmp := b; a ≤ tmp ∧ tmp < c
Note that no transitive relation between non-adjacent operands is implied.
a < b > c ≠ d
Comparison operators bind less strongly than arithmetic operators.
a^2 + b^2 ≤ c^2
The result of any comparison is boolean.
(=) : ℚ × ℚ → 𝔹 |
(≠) : ℚ × ℚ → 𝔹 |
(<) : ℚ × ℚ → 𝔹 |
(>) : ℚ × ℚ → 𝔹 |
(≤) : ℚ × ℚ → 𝔹 |
(≥) : ℚ × ℚ → 𝔹 |
The operators =
and ≠
should be used for the logical
operators if-and-only-if and exclusive-or, respectively.
Boolean conjunction and disjunction are expressed with the operators
&&
or ∧
(U+2227 logical and), and ||
or
∨
(U+2228 logical or), respectively.
Each of these operators is left-associative, but does not mix with the other.
a ∧ (b ∨ c) ∧ d
Logical operators bind less strongly than comparison operators.
0 ≤ i ∧ i < n
Logical operators operate on boolean operands only.
(∧) : 𝔹 × 𝔹 → 𝔹 |
(∨) : 𝔹 × 𝔹 → 𝔹 |
The minimum and maximum functions return the lesser and the greater of two operands, respectively. The minimum function operates uniformly on each number type.
min : ℚ × ℚ → ℚ |
min : ℤ × ℤ → ℤ |
min : ℕ × ℕ → ℕ |
min : 𝔹 × 𝔹 → 𝔹 |
Because the subset relation between integers and naturals is not order-symmetric, the maximum function admits more refined combinations of argument types.
max : ℚ × ℚ → ℚ |
max : ℤ × ℤ → ℤ |
max : ℤ × ℕ → ℕ |
max : ℕ × ℤ → ℕ |
max : 𝔹 × 𝔹 → 𝔹 |
Note that, by contrast, the minimum of a boolean and a natural is not generally a boolean, because of NaN.
As a special case on 𝔹, min
and max
are the same as
∧
and ∨
, respectively.
floor : ℚ → ℤ |
floor : ℕ → ℕ |
ceiling : ℚ → ℤ |
ceiling : ℕ → ℕ |
abs : ℚ → ℚ |
abs : ℤ → ℕ |
sign : ℚ → ℤ |
sign : ℕ → ℕ |
Note that sign
applied to a natural number does not yield a boolean,
since the sign of NaN is NaN.
Where multiple integer values are to given, they can be enumerated explicitly
separated by ,
. Contiguous intervals can be abbreviated by giving just
the limits (both inclusive) with the separator ..
.
2 .. 5 ⤏ 2, 3, 4, 5
Note that what a multitude of integers actually means depends on the context.
An expression of a number type can be used to choose between alternative
cases, very much like the switch statement of the C language
family. The operator ?
takes a numerical condition expression and a set of cases.
{
and }
.
_
instead of a number.
->
or →
(U+2192 rightwards arrow) separates the indices and the body expression of each case.
;
.
5 ? {
1 -> a;
2, 3 -> b;
4 .. 6 -> c;
_ -> d }
⤏ c
a ? b : c ⤏ a ? { 1 -> b; 0 -> c }
Collections are composite data aggregated from an arbitrary finite number of heterogeneous elements.
Collections come in three different shapes:
Some operations on a collection only make sense if it is nonempty. The type system recognizes expressions that are certain, without any context, to yield nonempty collections.
The elements of a collection are not processed individually, but by bulk jobs that apply the same basic operation uniformly to all elements. Hence these operations can only assume a type of argument of which all collection elements are instances. Such a type is referred to as the element type of a collection.
The collection shapes are notated each with a distinctive shape of brackets.
[
and ]
.
{
and }
.
⟨
(U+27E8 mathematical
left angle bracket) and ⟩
(U+27E9 mathematical right
angle bracket), or baggy brackets ⟅
(U+27C5 left
s-shaped bag delimiter) and ⟆
(U+27C6 right s-shaped bag
delimiter).
Alternatively, "poor man's" angled brackets <
and >
can be
used, but this is discouraged due to poor typography and confusion with
comparison operators. Opening and closing bracket must be of the same
style.
A collection can be built by simply enumerating expressions for its values,
individually or as ranges of integers. The separators ,
and ..
bind less strongly than all operators.
[ 1, 2, 1 + 2 ], { 1 .. n }
For ranges, both limits must evaluate to integers, and are understood as inclusive. The resulting collection is nonempty if and only if the upper limit is greater than or equal to the lower limit. It follows that a range denotes an empty collection if one of the limits is NaN.
The element type of a collection-building expression is the least upper bound of the types of the given element expressions.
Collections are transformed to other collections in terms of map and filter operations. These are written in the form of comprehensions.
Comprehensions are written with a bracket indicating the shape of the
resulting collection, containing a vertical bar |
. On the left hand side
of the bar, there is one or more element expressions as in
Section 7.2 ("Building a Collection")
above. On the right hand side, there is a sequence of zero or
more generator and filter clauses, separated by ;
.
_
, and a
collection expression, separated by an arrow <-
or ←
(U+2190
leftwards arrow).
[ n^2 | n ← [1..10] ]
{ a+b, a*b | a ← S; b ← T; a ≤ b }
The scope of the variable bound by a generator clause covers all following clauses, and the left hand side of the comprehension.
⟨ x | b ← B; x ← b ⟩
The expression in a generator clause must yield a collection of the same shape as the resulting collection, or a naturally convertible one.
{ n | n ← [1..3] }
[ n | n ← {1..3} ]
As a shortcut that is common in set theory notation, a generator can take the place on the left hand side of the bar. Then it functions both as the leftmost clause and as the element expression.
{ x ← S | x > 0 } ⤏ { x | x ← S; x > 0 }
The meaning of a comprehension can be defined inductively, by successively simplifying or eliminating the leftmost clause, immediately right of the bar.
If the leftmost clause is a generator clause that draws from an empty collection, then the result is empty.
[ es | x ← []; cs ] ⤏ []
Otherwise, if the generator clause draws from a singleton collection, then it is equivalent to a let assignment followed by the rest of the comprehension.
[ es | x ← [v]; cs ] ⤏ let x := v; [ es | cs ]
Otherwise, if the generator clause draws from a collection that can be decomposed into two parts, then it is equivalent to the combination of two analogous comprehensions that draw from the parts.
[ es | x ← l + r; cs ] ⤏ [ es | x ← l; cs ] + [ e | x ← r; cs ]
Otherwise, if the leftmost clause is a filter clause, then it means the same as a conditional that chooses between the rest of the comprehension and the empty collection.
[ es | b; cs ] ⤏ b ? [ es | cs ] : []
When there are no clauses left, the comprehension is equivalent to a plain collection-building expression.
[ es | ] ⤏ [es]
f[] ⤏ n
f[v] ⤏ v
f(l + r) ⤏ f(f(l), f(r))
(∧) : S(T, *) × S(T, *) → S(T, *) |
(∨) : S(T, *) × S(T, *) → S(T, *) |
(∨) : S(T, *) × S(T, +) → S(T, +) |
(∨) : S(T, +) × S(T, *) → S(T, +) |
(+) : L(T, *) × L(T, *) → L(T, *) |
(+) : L(T, +) × L(T, *) → L(T, +) |
(+) : L(T, *) × L(T, +) → L(T, +) |
(+) : B(T, *) × B(T, *) → B(T, *) |
(+) : B(T, +) × B(T, *) → B(T, +) |
(+) : B(T, *) × B(T, +) → B(T, +) |
(\) : B(T, *) × B(T, *) → B(T, *) |
(\) : S(T, *) × S(T, *) → S(T, *) |
min : B(T, *) × B(T, *) → B(T, *) |
max : B(T, *) × B(T, *) → B(T, *) |
max : B(T, +) × B(T, *) → B(T, +) |
max : B(T, *) × B(T, +) → B(T, +) |
(=) : B(T, *) × B(T, *) → 𝔹 |
... |
size : L(T, *) → ℕ |
size : B(T, *) → ℕ |
size : S(T, *) → ℕ |
supp : L(T, *) → S(T, *) |
supp : L(T, +) → S(T, +) |
supp : B(T, *) → S(T, *) |
supp : B(T, +) → S(T, +) |
shuffle : L(T, *) → B(T, *) |
shuffle : L(T, +) → B(T, +) |
once : S(T, *) → B(T, *) |
once : S(T, +) → B(T, +) |
The function mults
maps a bag (with arbitrary element type) to the bag
of multiplicities of its elements.
mults⟨a, b, c, b, a⟩ ⤏ ⟨1, 2, 2⟩
Note that each occurrence of NaN is not equal to others, and hence counts with multiplicity one.
The bag of multiplicities of a nonempty bag is nonempty.
mults : B(T, *) → B(ℕ, *) |
mults : B(T, +) → B(ℕ, +) |
produced with
eu.bandm.metatools.d2d
and
XSLT 1.0