Primitive Types
Definitions¶
Here are some definitions to describe what are a “value”, a “notation”, and a “literal”.
Value¶
A value is the data itself. In Ergol, primitive types can store three different kind of values:
- booleans
- integers
- floating point numbers (also called “floats”)
Info
Float values follow the IEEE 754 standard, so they can be:
- an imprecise rational number
- not a number
- positive infinity
- negative infinity
Warning
Every rational number can't be represented as float, so floats are imprecise by nature.
Info
The floats that have no fractional part are called “full floating point numbers” or “full floats”.
Notation¶
A notation is a set of rules used to define how a literal MUST be interpreted to know which value it represents. Ergol notations for primitive types are:
- nBoolean
- nBinary
- nOctal
- nDecimal
- nDecimalDP
- nScientific
- nHexadecimal
- nCharacter
Literal¶
A literal is a string of characters written in the source code that represent a value. A literal is written using a specific notation and is evaluated at compile time. Many literal can represent the same value.
Example
These three literal represent the same value:
0.0
0.
.0
Types¶
Boolean type¶
The bool
type is used to store the truthiness of something (a boolean).
It have only 2 possible values: the true
and false
of the classical logic.
Integer types¶
It exists 9 different integer types. Four of them are signed (can contain a negative integer):
type | size (in bits) | min value (included) | max value (included) |
---|---|---|---|
i8 | 8 | - 27 | 27 - 1 |
i16 | 16 | - 215 | 215 - 1 |
i32 | 32 | - 231 | 231 - 1 |
i64 | 64 | - 263 | 263 - 1 |
The five others are unsigned (cannot contain a negative integer):
type | size (in bits) | min value (included) | max value (included) |
---|---|---|---|
u8 | 8 | 0 | 28 - 1 |
u16 | 16 | 0 | 216 - 1 |
u32 | 32 | 0 | 232 - 1 |
char | 32 | 0 | 232 - 1 |
u64 | 64 | 0 | 264 - 1 |
In Ergol, characters are stored as integers using their Unicode code point.
It exists an integer type made for this purpose named char
which is equivalent to u32
.
It's a good practice to use the char
type instead of the u32
type when storing a character, because it's more explicit.
Warning
Even if char
and u32
are equivalent in size, min value and max value, they are not the same type.
Therefore, it's legal to overload a function by changing a parameter type from char
to u32
.
If char
and u32
were the same type, it would not be possible because both functions would have the same signature.
fn printValue(char x) {
printl("The value of the character is:", x);
}
fn printValue(u32 x) {
printl("The value of the integer is:", x);
}
printValue('a');
printValue(42);
Output:
The value of the character is: a⮐
The value of the integer is: 42⮐
Since characters are stored as integers, nothing stops you to use other smaller integer types to store characters, such as u8
or u16
:
u8
is fine for storing the first characters of the Unicode set:- the C0 Unicode block characters (also known as ASCII characters)
- the C1 Unicode block characters
u16
is fine for storing characters from U+0000 to U+FFFF.
After all, if space is not a major concern, use the char
type to store characters because it can store every Unicode characters (from U+0000 to U+10FFFF).
Float types¶
It exists 2 float types. They are based on the IEEE 754 standard.
type | size (in bits) | format | smallest representable number | greatest representable number |
---|---|---|---|---|
f32 | 32 | binary32 IEEE 754 floating point number | (2−23 - 2) × 2127 | (2 − 2−23) × 2127 |
f64 | 64 | binary64 IEEE 754 floating point number | (2−52 - 2) × 21023 | (2 − 2−52) × 21023 |
Notations¶
nBoolean¶
syntax¶
boolean-literal = "true" / "false"
The literal of a boolean value can be written using the nBoolean notation: false
or true
.
interpretation¶
To get the value from a literal:
- The
false
literal corresponds to thefalse
value. - The
true
literal corresponds to thetrue
value.
nBinary¶
syntax¶
binary-digit = "0" / "1"
binary-literal = "0b" 1*(*"_" binary-digit) *"_"
The literal of an integer or a positive full float, except positive infinity, can be written using the nBinary notation (0
and 1
digits).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.
The number of digits MUST be, at least 1.
The digits MUST be preceded by the binary prefix: 0b
.
Underscore characters MAY appear anywhere after the prefix.
Examples
0b00101010
0b0
0b1
0b_11111111_11111111
0b___0_
0b0
interpretation¶
To get the value from a literal:
- Drop:
- the prefix
- all underscores
- the leading zeros
- Interpret the result using the base-2 numeral system.
nOctal¶
syntax¶
octal-digit = %u0030-0037
octal-literal = "0o" 1*(*"_" octal-digit) *"_"
The literal of a positive integer or a positive full float, except positive infinity, can be written using the nOctal notation (digits from 0
to 7
).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.
The number of digits MUST be, at least 1.
The digits MUST be preceded by the octal prefix: 0o
.
Underscore characters MAY appear anywhere after the prefix.
Examples
0o777
0o750
0o0
0o01_234_567
0o___0_
0o0
interpretation¶
To get the value from a literal:
- Drop:
- the prefix
- all underscores
- the leading zeros
- Interpret the result using the base-8 numeral system.
nDecimal¶
syntax¶
decimal-digit = %u0030-0039
decimal-literal = 1*(*"_" decimal-digit) *"_"
The literal of a positive integer or a positive full float, except positive infinity, can be written using the nDecimal notation (digits from 0
to 9
).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.
The number of digits MUST be, at least 1.
Underscore characters MAY appear anywhere in the literal.
Examples
42
0
123_456_789
0000
01
__0_0_
interpretation¶
To get the value from a literal:
- Drop:
- all underscores
- the leading zeros
- Interpret the result using the base-10 numeral system.
nDecimalDP¶
syntax¶
decimal-dp-literal = ([decimal-literal] "." decimal-literal) / (decimal-literal "." [decimal-literal])
The literal of a positive float, except positive infinity, can be written using the nDecimalDP notation (digits from 0
to 9
and a decimal point).
It consists of a decimal point (.
) preceded by an integral part and followed by an fractional part.
The integral part is written using one or more digits from 0
to 9
.
Leading zeros are allowed but OPTIONAL.
If the digits of the integral part consist only of zeros, the right-most one is not considered as a leading zero.
The integral part can be omitted if the fractional part is not.
The fractional part is written using one or more digits from 0
to 9
.
Trailing zeros are allowed but OPTIONAL.
If the digits of the fractional part consist only of zeros, the left-most one is not considered as a trailing zero.
The fractional part can be omitted if the integral part is not.
Underscore characters MAY appear anywhere in the literal.
Examples
3.14
.42
50.
1_000_000.500
03.
_.132_000_000
interpretation¶
To get the value from a literal:
- Drop:
- all underscores
- the leading zeros in the integral part
- the trailing zeros in the fractional part
- If the string starts with
.
, add a zero at the start. - If the string ends with
.
, add a zero at the end. - Interpret the result using the base-10 numeral system.
nScientific¶
syntax¶
scientific-literal = decimal-dp-literal "e" ["-"] decimal-literal
The literal of a positive integer or a positive float, except positive infinity, can be written using the nScientific notation.
It consists of an integral part, followed by a decimal point, followed by a fractional part, followed by a lowercase letter e
, followed by an exponent part.
The integral part is written using one or more digits from 0
to 9
.
Leading zeros are allowed but OPTIONAL.
If the digits of the integral part consist only of zeros, the right-most one is not considered as a leading zero.
The integral part can be omitted if the fractional part is not.
The fractional part is written using one or more digits from 0
to 9
.
Trailing zeros are allowed but OPTIONAL.
If the digits of the fractional part consist only of zeros, the left-most one is not considered as a trailing zero.
The fractional part can be omitted if the integral part is not.
The exponent part is written using an optional -
character followed by digits from 0
to 9
.
Leading zeros are allowed but OPTIONAL.
If the digits of the exponent part consist only of zeros, the right-most one is not considered as a leading zero.
Underscore characters MAY appear anywhere in the literal.
Examples
1.14e0
4.2e2
.05e-10
1_500e3
03.e__-_1
_.132_000_000e0_
interpretation¶
To get the value from a literal:
- Drop:
- all underscores
- the leading zeros in the integral part
- the trailing zeros in the fractional part
- the leading zeros in the exponent part
- Let
M
be the characters before the lowercase lettere
. - Let
E
be the characters after the lowercase lettere
. - If the
M
starts with.
, add a zero at the start ofM
. - If the
M
ends with.
, add a zero at the end ofM
. - Let
MV
be the interpretation ofM
using the base-10 numeral system. - Let
EV
be the interpretation ofE
using the base-10 numeral system. - The value is
MV
times, ten to the power ofEV
.
nHexadecimal¶
syntax¶
hexadecimal-digit = %u0030-0039 / %u0041-0046 / %u0061-0066
hexadecimal-literal = "0x" 1*(*"_" hexadecimal-digit) *"_"
The literal of an integer or a positive full float, except positive infinity, can be written using the nHexadecimal notation (digits from 0
to 9
, then from A
to F
, case insensitive).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.
The number of digits MUST be, at least 1.
The digits MUST be preceded by the hexadecimal prefix: 0x
.
Underscore characters MAY appear anywhere after the prefix.
Examples
0x2A
0x2a
0x6aFD4
0x__6a_FD4_
0xB6_AF_D4
0x0
interpretation¶
To get the value from a literal:
- Drop:
- the prefix
- all underscores
- the leading zeros
- Transform the lowercase letters to their uppercase equivalent.
- Interpret the result using the base-16 numeral system (
0123456789ABCDEF
alphabet).
nCharacter¶
syntax¶
; all Unicode characters except the null character, the single quote and the backslash.
single-quotable-character = %u0001-0026 / %u0028-005B / %u005D-10FFFF
; 6-digit hexadecimal number in the [000000 .. 10FFFF] range
unicode-code-point = (("0" hexadecimal-digit) / "10") 4hexadecimal-digit
; all escape sequences
escape-sequence = "\" ("0" / "a" / "b" / "t" / "n" / "v" / "f" / "r" / "e" / DQUOTE / "'" / "\" / ("u" unicode-code-point))
character-literal = "'" (single-quotable-character / escape-sequence) "'"
The literal of an integer, or full float, in the [0 .. 1114111] range can be written using the nCharacter notation.
Info
The “character” value does not exist in Ergol. The nCharacter notation describe an integer value corresponding to the Unicode code point of the character to store.
See more here.
interpretation¶
The nCharacter notation can have multiple form. All of them are described below. Each sub-notation is followed by the integer value that it represents and the corresponding Unicode character that it represent.
literal | represented integer value | corresponding Unicode character |
---|---|---|
'\0' |
0 | NULL |
'\a' |
7 | BELL |
'\b' |
8 | BACKSPACE |
'\t' |
9 | CHARACTER TABULATION |
'\n' |
10 | LINE FEED |
'\v' |
11 | LINE TABULATION |
'\f' |
12 | FORM FEED |
'\r' |
13 | CARRIAGE RETURN |
'\e' |
27 | ESCAPE |
'\"' |
34 | QUOTATION MARK |
'\'' |
39 | APOSTROPHE |
'\\' |
92 | REVERSE SOLIDUS |
'X' with X any Unicode character except ' and \ |
Unicode code point of X |
X |
'\uHHHHHH' with HHHHHH a six-digit hexadecimal number (case insensitive) in the [000000 .. 10FFFF] range |
value of HHHHHH |
the Unicode character corresponding to the code point of HHHHHH |
Examples
'a'
'0'
'Z'
'_'
'é'
'丕'
'\n'
'\\'
'\u0000e9'
nExtreme¶
syntax¶
extreme-literal = "NaN" / "NegInfinity" / "PosInfinity"
The literal of a “not a number” float, “negative infinity” float, or “positive infinity” float can be written using the nExtreme notation. So, one of the following literal:
NaN
NegInfinity
PosInfinity
interpretation¶
literal | coresponding value |
---|---|
NaN |
not a number |
NegInfinity |
negative infinity |
PosInfinity |
positive infinity |
Other notations¶
nBoolean, nBinary, nOctal, nDecimal, nDecimalDP, nScientific, nHexadecimal and nCharacter are the only notations that can be used in a primitive type literal. In order to use other notation like base 64, see the base converter library.
Type inference¶
A literal represents a value, and this value MUST have a type associated to it. On its own, a literal give little to no information about the type that MUST be used.
The type to use is chosen using the type inference algorithm and depends of:
- the value to store
- the notation used to represent the value
- the context
Defining candidates¶
Each type is not suitable to every value. A compatibility table MUST be used to know if the value is compatible with a given type.
Compatibility table¶
nBoolean | nBinary, nOctal, nDecimal, nHexadecimal | nScientific | nDecimalDP | nCharacter | nExtreme | |
---|---|---|---|---|---|---|
bool | yes | no | no | no | no | no |
i8 | no | v ∈ I8 | v ∈ I8 | no | v ∈ I8 | no |
i16 | no | v ∈ I16 | v ∈ I16 | no | v ∈ I16 | no |
i32 | no | v ∈ I32 | v ∈ I32 | no | yes | no |
i64 | no | v ∈ I64 | v ∈ I64 | no | yes | no |
u8 | no | v ∈ U8 | v ∈ U8 | no | v ∈ U8 | no |
u16 | no | v ∈ U16 | v ∈ U16 | no | v ∈ U16 | no |
u32 | no | v ∈ U32 | v ∈ U32 | no | yes | no |
char | no | v ∈ U32 | v ∈ U32 | no | yes | no |
u64 | no | v ∈ U64 | v ∈ U64 | no | yes | no |
f32 | no | v ∈ FF32 | v ∈ F32 | v ∈ F32 | yes | yes |
f64 | no | v ∈ FF64 | v ∈ F64 | v ∈ F64 | yes | yes |
Understanding the table¶
The table above indicates if a given type (line) can be used if a given notation (column) is used:
- yes: The type can be used.
- no: The type cannot be used.
- predicate: The type can be used IF AND ONLY IF the predicate is fulfilled.
Below are the parameters and constants used in the predicates:
- The value represented by the literal:
v
- The integer interval of all representable numbers in a
x
bits signed integer:
Ix = [- 2x-1 .. 2x-1 - 1] - The integer interval of all representable numbers in a
x
bits unsigned integer:
Ux = [0 .. 2x - 1] - The smallest interval containing all representable numbers by a
f32
:
F32 = [(2−23 - 2) × 2127, (2 − 2−23) × 2127] - The smallest interval containing all representable numbers by a
f64
:
F64 = [(2−52 - 2) × 21023, (2 − 2−52) × 21023] - The biggest interval containing full floats all representable by a
f32
:
FF32 = [- 224 .. 224] - The biggest interval containing full floats all representable by a
f64
:
FF64 = [- 253 .. 253]
Type priority¶
When the context is not sufficient to choose a type candidate over another, a priority table MUST be used. Type with highest priority (highest number in the table) is chosen.
type | priority |
---|---|
bool | 12 if used notation is nBoolean, 1 either |
char | 12 if used notation is nCharacter, 0 either |
i32 | 11 |
i64 | 10 |
i16 | 9 |
i8 | 8 |
u32 | 7 |
u64 | 6 |
u16 | 5 |
u8 | 4 |
f64 | 3 |
f32 | 2 |
The priorities are defined using these simple rules, sorted from the most important to the least important:
- The
bool
type (respectively thechar
type) is preferred if and only if the nBoolean notation (respectively the nCharacter notation) was used, else it's the last choice, with a preference with thebool
type over typechar
type. - Integers types are preferred over float types.
- Signed integer types preferred over unsigned integer types.
- 32 bits formats are preferred over other formats, except for float types where 64 bits formats are preferred.
- Bigger formats are preferred over smaller formats.
Type inference algorithm¶
Basically, the type inference algorithm works as follows:
- Define all the type candidates.
- Keep the types that are best-fitting the context (the types that minimize the number of implicit type casting).
- If there is still multiple candidates, the candidate with the highest priority is chosen.
Example
i32 a = 42;
i64 b = 42;
var c = a + 100; // the i32 type is associated to the 100 literal
var d = b + 100; // the i64 type is associated to the 100 literal
// The f64 type is associated to the 100.0 literal.
// A conversion will occurre to transform the value of “a” to a f64 so the addition can be done.
var e = a + 100.0;
In a more rigorous way, the inference algorithm is defined as follows:
Consider in the AST (Abstract Syntax Tree) of the syntactically valid code, the node L that contains the literal which we want to define the type.
Start from node L and go up from parent to parent, until finding a child node of an “assignment” node or of a “function call” node. We will name this node F and its parent R.
Now, consider the sub-tree A, of root R, excluding nodes (except R and F) which are not descendant of F.
Consider, for each node of A, a property named “possible types”, which is a set of types, initialized to the empty set Ø.
For each node N of A excluding R, having a “possible types” property equals to Ø, do starting from the farthest nodes of R:
If N is a “variable” node, then:
Let T be the type of the variable, assign {T} to the “possible types” property of N.
If N is a “function call” node, then:
For each children P of N (for each function parameter), do:
Let T be the highest priority type (see “Type priority” table) among the “possible types” of P, assign {T} to the “possible types“ property of P.
The function signature is now known, because the type of each parameter is now defined without ambiguity.
Let T be the return type of the function, assign {T} to the “possible types” property of N.
If N is an “assignment” node, then:
If the type of the assigned variable is known, then:
Let T be the type of the assigned variable, assign {T} to the “possible types” property of N.
Else:
Let T be the “possible types“ property of the child node of N that represents the assigned value, assign T to the “possible types” property of N.
If N is a “literal” node, then:
Let S be the set of types which are compatible with the literal (see the “Compatibility” table), assign S to the “possible types” property of N.
If N is an “operation” node and not an ”assignment” node, then:
For each possible tuple X, with X_n being one of the possible type of the nth operand of N (according to the “possible types” property of the operand), do:
Let T be the type resulting of the operation with X_n being the type of the nth operand, add T the “possible types” property of N.
If R is an “assignment” node and the type of the assigned variable is known, then:
Let T be the type of the assigned variable, assign {T} to the “possible types” of R.
Else:
Let T be the highest priority type (see “Type priority” table) among the “possible types” property of F, assign {T} to the “possible types” property of R.
For each node N of A having a number of possible types higher than 1, do starting from the nearest nodes of R:
Consider T, the unique type in the “possible types” property of the parent of N.
If T is in the “possible types” of N, then:
Assign {T} to the “possible types” property of N.
Else:
Let T be the highest priority type (see “Type priority” table) among the “possible types” property of N, assign {T} to the “possible types” property of N.
Let T be the unique type in the “possible types” property of L, T is the type that must be associated to the value represented by the literal.