Introduction¶

The “Language Specifications” section specifies how an Ergol program MUST behave when running after being compiled. This section sometimes also provides a way to implement those specifications. When implementing those specifications in a compiler, these parts MAY be ignored, but ONLY IF the compiled program behave exactly as the specifications has specified.

For example, the specifications define an algorithm to infer the type of a literal. However, a more optimized algorithm MAY be implemented instead, but this new algorithm MUST infer the same types as the first one.

Prerequisites¶

Keywords used for requirement level¶

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this section are to be interpreted as defined in RFC 2119.

Used syntax definition language¶

In this section, the used syntax definition language is a modification of the Augmented Backus-Naur Form (ABNF) defined by RFC 5234 and updated by RFC 7405.

Modification #1¶

It is possible to use Unicode code points, and Unicode code point ranges using the %u prefix:

%u0000         ; NULL character
%u0370-03FF    ; Greek and Coptic range
%u10000-10005D ; Linear B syllabary range

Since the only authorized encoding for source files is UTF-8 (see Source files character encoding), it is easy to obtain the corresponding bytes representation from an Unicode code point or Unicode code point range.

Modification #2¶

Strings are case sensitive by default. To specify that a string is case insensitive, the %i prefix MUST be used:

"abc"   ; matches only abc
%s"abc" ; matches only abc
%i"abc" ; matches abc, Abc, aBc, abC, ABc, AbC, aBC and ABC

Modification #3¶

Between each terminals, there can be any number of blank characters. If the end of the left terminal and the beginning of right terminal is a normal character, there MUST be at least one blank character between the two terminals:

"a" "b"
; matches “a b”, “a  b”, “a   b”, etc...
; does not match “ab”

"a" "+"
; matches “a+”, “a +”, “a  +”, “a   +”, etc...

Blank characters¶

A blank character is defined as any character having the White_Space property in the Unicode Character Database:

blank-character  = %u0009-000D / %u0020 / %u0085 / %u00A0 / %u1680 / %u2000-200A
blank-character /= %u2028-2029 / %u202F / %u205F / %u3000

Normal characters¶

A normal character is any character in the Unicode categories Ll, Lm, Lo, Lt, Lu, Mc, Nd, Nl, Mn and Pc, and characters U+0024, U+200C and U+200D.

Explanation of some mathematical symbols¶

Some symbols used in this section can be misinterpreted, so here are their explanation (inspired from ISO 31-11:1992):

Symbol	Example	Meaning and verbal equivalent
ℕ		the set of natural numbers; the set of positive integers and zero
ℝ		the set of real numbers
[ .. ]	[a .. b]	closed interval in ℕ from a (included) to b (included)
[ , ]	[a, b]	closed interval in ℝ from a (included) to b (included)
{ }	{x₁, x₂, ..., x_n}	set with elements x₁, x₂, ..., x_n
Ø		the empty set
∈	x ∈ A	x belongs to A; x is an element of the set A

Source files character encoding¶

Source files MUST be written using the UTF-8 encoding from the Unicode standard (or ASCII which is a subset of UTF-8).

Last update: June 15, 2021