# Primitive Types

## Definitions¶

Here are some definitions to describe what are a “value”, a “notation”, and a “literal”.

### Value¶

A value is the data itself. In Ergol, primitive types can store three different kind of values:

- booleans
- integers
- floating point numbers (also called “floats”)

Info

Float values follow the IEEE 754 standard, so they can be:

- an imprecise rational number
- not a number
- positive infinity
- negative infinity

Warning

Every rational number can't be represented as float, so floats are imprecise by nature.

Info

The floats that have no fractional part are called “full floating point numbers” or “full floats”.

### Notation¶

A notation is a set of rules used to define how a literal MUST be interpreted to know which value it represents. Ergol notations for primitive types are:

- nBoolean
- nBinary
- nOctal
- nDecimal
- nDecimalDP
- nScientific
- nHexadecimal
- nCharacter

### Literal¶

A literal is a string of characters written in the source code that represent a value. A literal is written using a specific notation and is evaluated at compile time. Many literal can represent the same value.

Example

These three literal represent the same value:

`0.0`

`0.`

`.0`

## Types¶

### Boolean type¶

The `bool`

type is used to store the truthiness of something (a boolean).
It have only 2 possible values: the `true`

and `false`

of the classical logic.

### Integer types¶

It exists 9 different integer types. Four of them are signed (can contain a negative integer):

type | size (in bits) | min value (included) | max value (included) |
---|---|---|---|

i8 | 8 | - 2^{7} |
2^{7} - 1 |

i16 | 16 | - 2^{15} |
2^{15} - 1 |

i32 | 32 | - 2^{31} |
2^{31} - 1 |

i64 | 64 | - 2^{63} |
2^{63} - 1 |

The five others are unsigned (cannot contain a negative integer):

type | size (in bits) | min value (included) | max value (included) |
---|---|---|---|

u8 | 8 | 0 | 2^{8} - 1 |

u16 | 16 | 0 | 2^{16} - 1 |

u32 | 32 | 0 | 2^{32} - 1 |

char | 32 | 0 | 2^{32} - 1 |

u64 | 64 | 0 | 2^{64} - 1 |

In Ergol, characters are stored as integers using their Unicode code point.
It exists an integer type made for this purpose named `char`

which is equivalent to `u32`

.
It's a good practice to use the `char`

type instead of the `u32`

type when storing a character, because it's more explicit.

Warning

Even if `char`

and `u32`

are equivalent in size, min value and max value, they are not the same type.

Therefore, it's legal to overload a function by changing a parameter type from `char`

to `u32`

.
If `char`

and `u32`

were the same type, it would not be possible because both functions would have the same signature.

```
fn printValue(char x) {
printl("The value of the character is:", x);
}
fn printValue(u32 x) {
printl("The value of the integer is:", x);
}
printValue('a');
printValue(42);
```

Output:

```
The value of the character is: a⮐
The value of the integer is: 42⮐
```

Since characters are stored as integers, nothing stops you to use other smaller integer types to store characters, such as `u8`

or `u16`

:

`u8`

is fine for storing the first characters of the Unicode set:- the C0 Unicode block characters (also known as ASCII characters)
- the C1 Unicode block characters

`u16`

is fine for storing characters from U+0000 to U+FFFF.

After all, if space is not a major concern, use the `char`

type to store characters because it can store every Unicode characters (from U+0000 to U+10FFFF).

### Float types¶

It exists 2 float types. They are based on the IEEE 754 standard.

type | size (in bits) | format | smallest representable number | greatest representable number |
---|---|---|---|---|

f32 | 32 | binary32 IEEE 754 floating point number | (2^{−23} - 2) × 2^{127} |
(2 − 2^{−23}) × 2^{127} |

f64 | 64 | binary64 IEEE 754 floating point number | (2^{−52} - 2) × 2^{1023} |
(2 − 2^{−52}) × 2^{1023} |

## Notations¶

### nBoolean¶

#### syntax¶

```
boolean-literal = "true" / "false"
```

The literal of a boolean value can be written using the nBoolean notation: `false`

or `true`

.

#### interpretation¶

To get the value from a literal:

- The
`false`

literal corresponds to the`false`

value. - The
`true`

literal corresponds to the`true`

value.

### nBinary¶

#### syntax¶

```
binary-digit = "0" / "1"
binary-literal = "0b" 1*(*"_" binary-digit) *"_"
```

The literal of an integer or a positive full float, except positive infinity, can be written using the nBinary notation (`0`

and `1`

digits).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

The digits MUST be preceded by the binary prefix: `0b`

.
Underscore characters MAY appear anywhere after the prefix.

Examples

`0b00101010`

`0b0`

`0b1`

`0b_11111111_11111111`

`0b___0_`

`0b0`

#### interpretation¶

To get the value from a literal:

- Drop:
- the prefix
- all underscores
- the leading zeros

- Interpret the result using the base-2 numeral system.

### nOctal¶

#### syntax¶

```
octal-digit = %u0030-0037
octal-literal = "0o" 1*(*"_" octal-digit) *"_"
```

The literal of a positive integer or a positive full float, except positive infinity, can be written using the nOctal notation (digits from `0`

to `7`

).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

The digits MUST be preceded by the octal prefix: `0o`

.
Underscore characters MAY appear anywhere after the prefix.

Examples

`0o777`

`0o750`

`0o0`

`0o01_234_567`

`0o___0_`

`0o0`

#### interpretation¶

To get the value from a literal:

- Drop:
- the prefix
- all underscores
- the leading zeros

- Interpret the result using the base-8 numeral system.

### nDecimal¶

#### syntax¶

```
decimal-digit = %u0030-0039
decimal-literal = 1*(*"_" decimal-digit) *"_"
```

The literal of a positive integer or a positive full float, except positive infinity, can be written using the nDecimal notation (digits from `0`

to `9`

).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

Underscore characters MAY appear anywhere in the literal.

Examples

`42`

`0`

`123_456_789`

`0000`

`01`

`__0_0_`

#### interpretation¶

To get the value from a literal:

- Drop:
- all underscores
- the leading zeros

- Interpret the result using the base-10 numeral system.

### nDecimalDP¶

#### syntax¶

```
decimal-dp-literal = ([decimal-literal] "." decimal-literal) / (decimal-literal "." [decimal-literal])
```

The literal of a positive float, except positive infinity, can be written using the nDecimalDP notation (digits from `0`

to `9`

and a decimal point).

It consists of a decimal point (`.`

) preceded by an integral part and followed by an fractional part.

The integral part is written using one or more digits from `0`

to `9`

.
Leading zeros are allowed but OPTIONAL.
If the digits of the integral part consist only of zeros, the right-most one is not considered as a leading zero.
The integral part can be omitted if the fractional part is not.

The fractional part is written using one or more digits from `0`

to `9`

.
Trailing zeros are allowed but OPTIONAL.
If the digits of the fractional part consist only of zeros, the left-most one is not considered as a trailing zero.
The fractional part can be omitted if the integral part is not.

Underscore characters MAY appear anywhere in the literal.

Examples

`3.14`

`.42`

`50.`

`1_000_000.500`

`03.`

`_.132_000_000`

#### interpretation¶

To get the value from a literal:

- Drop:
- all underscores
- the leading zeros in the integral part
- the trailing zeros in the fractional part

- If the string starts with
`.`

, add a zero at the start. - If the string ends with
`.`

, add a zero at the end. - Interpret the result using the base-10 numeral system.

### nScientific¶

#### syntax¶

```
scientific-literal = decimal-dp-literal "e" ["-"] decimal-literal
```

The literal of a positive integer or a positive float, except positive infinity, can be written using the nScientific notation.

It consists of an integral part, followed by a decimal point, followed by a fractional part, followed by a lowercase letter `e`

, followed by an exponent part.

The integral part is written using one or more digits from `0`

to `9`

.
Leading zeros are allowed but OPTIONAL.
If the digits of the integral part consist only of zeros, the right-most one is not considered as a leading zero.
The integral part can be omitted if the fractional part is not.

The fractional part is written using one or more digits from `0`

to `9`

.
Trailing zeros are allowed but OPTIONAL.
If the digits of the fractional part consist only of zeros, the left-most one is not considered as a trailing zero.
The fractional part can be omitted if the integral part is not.

The exponent part is written using an optional `-`

character followed by digits from `0`

to `9`

.
Leading zeros are allowed but OPTIONAL.
If the digits of the exponent part consist only of zeros, the right-most one is not considered as a leading zero.

Underscore characters MAY appear anywhere in the literal.

Examples

`1.14e0`

`4.2e2`

`.05e-10`

`1_500e3`

`03.e__-_1`

`_.132_000_000e0_`

#### interpretation¶

To get the value from a literal:

- Drop:
- all underscores
- the leading zeros in the integral part
- the trailing zeros in the fractional part
- the leading zeros in the exponent part

- Let
`M`

be the characters before the lowercase letter`e`

. - Let
`E`

be the characters after the lowercase letter`e`

. - If the
`M`

starts with`.`

, add a zero at the start of`M`

. - If the
`M`

ends with`.`

, add a zero at the end of`M`

. - Let
`MV`

be the interpretation of`M`

using the base-10 numeral system. - Let
`EV`

be the interpretation of`E`

using the base-10 numeral system. - The value is
`MV`

times, ten to the power of`EV`

.

### nHexadecimal¶

#### syntax¶

```
hexadecimal-digit = %u0030-0039 / %u0041-0046 / %u0061-0066
hexadecimal-literal = "0x" 1*(*"_" hexadecimal-digit) *"_"
```

The literal of an integer or a positive full float, except positive infinity, can be written using the nHexadecimal notation (digits from `0`

to `9`

, then from `A`

to `F`

, case insensitive).
Leading zeros are allowed but OPTIONAL.
If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

The digits MUST be preceded by the hexadecimal prefix: `0x`

.
Underscore characters MAY appear anywhere after the prefix.

Examples

`0x2A`

`0x2a`

`0x6aFD4`

`0x__6a_FD4_`

`0xB6_AF_D4`

`0x0`

#### interpretation¶

To get the value from a literal:

- Drop:
- the prefix
- all underscores
- the leading zeros

- Transform the lowercase letters to their uppercase equivalent.
- Interpret the result using the base-16 numeral system (
`0123456789ABCDEF`

alphabet).

### nCharacter¶

#### syntax¶

```
; all Unicode characters except the null character, the single quote and the backslash.
single-quotable-character = %u0001-0026 / %u0028-005B / %u005D-10FFFF
; 6-digit hexadecimal number in the [000000 .. 10FFFF] range
unicode-code-point = (("0" hexadecimal-digit) / "10") 4hexadecimal-digit
; all escape sequences
escape-sequence = "\" ("0" / "a" / "b" / "t" / "n" / "v" / "f" / "r" / "e" / DQUOTE / "'" / "\" / ("u" unicode-code-point))
character-literal = "'" (single-quotable-character / escape-sequence) "'"
```

The literal of an integer, or full float, in the [0 .. 1114111] range can be written using the nCharacter notation.

Info

The “character” value does not exist in Ergol. The nCharacter notation describe an integer value corresponding to the Unicode code point of the character to store.

See more here.

#### interpretation¶

The nCharacter notation can have multiple form. All of them are described below. Each sub-notation is followed by the integer value that it represents and the corresponding Unicode character that it represent.

literal | represented integer value | corresponding Unicode character |
---|---|---|

`'\0'` |
0 | NULL |

`'\a'` |
7 | BELL |

`'\b'` |
8 | BACKSPACE |

`'\t'` |
9 | CHARACTER TABULATION |

`'\n'` |
10 | LINE FEED |

`'\v'` |
11 | LINE TABULATION |

`'\f'` |
12 | FORM FEED |

`'\r'` |
13 | CARRIAGE RETURN |

`'\e'` |
27 | ESCAPE |

`'\"'` |
34 | QUOTATION MARK |

`'\''` |
39 | APOSTROPHE |

`'\\'` |
92 | REVERSE SOLIDUS |

`'X'` with `X` any Unicode character except `'` and `\` |
Unicode code point of `X` |
`X` |

`'\uHHHHHH'` with `HHHHHH` a six-digit hexadecimal number (case insensitive) in the [000000 .. 10FFFF] range |
value of `HHHHHH` |
the Unicode character corresponding to the code point of `HHHHHH` |

Examples

`'a'`

`'0'`

`'Z'`

`'_'`

`'é'`

`'丕'`

`'\n'`

`'\\'`

`'\u0000e9'`

### nExtreme¶

#### syntax¶

```
extreme-literal = "NaN" / "NegInfinity" / "PosInfinity"
```

The literal of a “not a number” float, “negative infinity” float, or “positive infinity” float can be written using the nExtreme notation. So, one of the following literal:

`NaN`

`NegInfinity`

`PosInfinity`

#### interpretation¶

literal | coresponding value |
---|---|

`NaN` |
not a number |

`NegInfinity` |
negative infinity |

`PosInfinity` |
positive infinity |

### Other notations¶

nBoolean, nBinary, nOctal, nDecimal, nDecimalDP, nScientific, nHexadecimal and nCharacter are the only notations that can be used in a primitive type literal. In order to use other notation like base 64, see the base converter library.

## Type inference¶

A literal represents a value, and this value MUST have a type associated to it. On its own, a literal give little to no information about the type that MUST be used.

The type to use is chosen using the type inference algorithm and depends of:

- the value to store
- the notation used to represent the value
- the context

### Defining candidates¶

Each type is not suitable to every value. A compatibility table MUST be used to know if the value is compatible with a given type.

#### Compatibility table¶

nBoolean | nBinary, nOctal, nDecimal, nHexadecimal | nScientific | nDecimalDP | nCharacter | nExtreme | |
---|---|---|---|---|---|---|

bool | yes | no | no | no | no | no |

i8 | no | v ∈ I_{8} |
v ∈ I_{8} |
no | v ∈ I_{8} |
no |

i16 | no | v ∈ I_{16} |
v ∈ I_{16} |
no | v ∈ I_{16} |
no |

i32 | no | v ∈ I_{32} |
v ∈ I_{32} |
no | yes | no |

i64 | no | v ∈ I_{64} |
v ∈ I_{64} |
no | yes | no |

u8 | no | v ∈ U_{8} |
v ∈ U_{8} |
no | v ∈ U_{8} |
no |

u16 | no | v ∈ U_{16} |
v ∈ U_{16} |
no | v ∈ U_{16} |
no |

u32 | no | v ∈ U_{32} |
v ∈ U_{32} |
no | yes | no |

char | no | v ∈ U_{32} |
v ∈ U_{32} |
no | yes | no |

u64 | no | v ∈ U_{64} |
v ∈ U_{64} |
no | yes | no |

f32 | no | v ∈ FF_{32} |
v ∈ F_{32} |
v ∈ F_{32} |
yes | yes |

f64 | no | v ∈ FF_{64} |
v ∈ F_{64} |
v ∈ F_{64} |
yes | yes |

#### Understanding the table¶

The table above indicates if a given type (line) can be used if a given notation (column) is used:

- yes: The type can be used.
- no: The type cannot be used.
- predicate: The type can be used IF AND ONLY IF the predicate is fulfilled.

Below are the parameters and constants used in the predicates:

- The value represented by the literal:
`v`

- The integer interval of all representable numbers in a
`x`

bits signed integer:

I_{x}= [- 2^{x-1}.. 2^{x-1}- 1] - The integer interval of all representable numbers in a
`x`

bits unsigned integer:

U_{x}= [0 .. 2^{x}- 1] - The smallest interval containing all representable numbers by a
`f32`

:

F_{32}= [(2^{−23}- 2) × 2^{127}, (2 − 2^{−23}) × 2^{127}] - The smallest interval containing all representable numbers by a
`f64`

:

F_{64}= [(2^{−52}- 2) × 2^{1023}, (2 − 2^{−52}) × 2^{1023}] - The biggest interval containing full floats all representable by a
`f32`

:

FF_{32}= [- 2^{24}.. 2^{24}] - The biggest interval containing full floats all representable by a
`f64`

:

FF_{64}= [- 2^{53}.. 2^{53}]

### Type priority¶

When the context is not sufficient to choose a type candidate over another, a priority table MUST be used. Type with highest priority (highest number in the table) is chosen.

type | priority |
---|---|

bool | 12 if used notation is nBoolean, 1 either |

char | 12 if used notation is nCharacter, 0 either |

i32 | 11 |

i64 | 10 |

i16 | 9 |

i8 | 8 |

u32 | 7 |

u64 | 6 |

u16 | 5 |

u8 | 4 |

f64 | 3 |

f32 | 2 |

The priorities are defined using these simple rules, sorted from the most important to the least important:

- The
`bool`

type (respectively the`char`

type) is preferred if and only if the nBoolean notation (respectively the nCharacter notation) was used, else it's the last choice, with a preference with the`bool`

type over type`char`

type. - Integers types are preferred over float types.
- Signed integer types preferred over unsigned integer types.
- 32 bits formats are preferred over other formats, except for float types where 64 bits formats are preferred.
- Bigger formats are preferred over smaller formats.

### Type inference algorithm¶

Basically, the type inference algorithm works as follows:

- Define all the type candidates.
- Keep the types that are best-fitting the context (the types that minimize the number of implicit type casting).
- If there is still multiple candidates, the candidate with the highest priority is chosen.

Example

```
i32 a = 42;
i64 b = 42;
var c = a + 100; // the i32 type is associated to the 100 literal
var d = b + 100; // the i64 type is associated to the 100 literal
// The f64 type is associated to the 100.0 literal.
// A conversion will occurre to transform the value of “a” to a f64 so the addition can be done.
var e = a + 100.0;
```

In a more rigorous way, the inference algorithm is defined as follows:

```
Consider in the AST (Abstract Syntax Tree) of the syntactically valid code, the node L that contains the literal which we want to define the type.
Start from node L and go up from parent to parent, until finding a child node of an “assignment” node or of a “function call” node. We will name this node F and its parent R.
Now, consider the sub-tree A, of root R, excluding nodes (except R and F) which are not descendant of F.
```

```
Consider, for each node of A, a property named “possible types”, which is a set of types, initialized to the empty set Ø.
For each node N of A excluding R, having a “possible types” property equals to Ø, do starting from the farthest nodes of R:
If N is a “variable” node, then:
Let T be the type of the variable, assign {T} to the “possible types” property of N.
If N is a “function call” node, then:
For each children P of N (for each function parameter), do:
Let T be the highest priority type (see “Type priority” table) among the “possible types” of P, assign {T} to the “possible types“ property of P.
The function signature is now known, because the type of each parameter is now defined without ambiguity.
Let T be the return type of the function, assign {T} to the “possible types” property of N.
If N is an “assignment” node, then:
If the type of the assigned variable is known, then:
Let T be the type of the assigned variable, assign {T} to the “possible types” property of N.
Else:
Let T be the “possible types“ property of the child node of N that represents the assigned value, assign T to the “possible types” property of N.
If N is a “literal” node, then:
Let S be the set of types which are compatible with the literal (see the “Compatibility” table), assign S to the “possible types” property of N.
If N is an “operation” node and not an ”assignment” node, then:
For each possible tuple X, with X_n being one of the possible type of the nth operand of N (according to the “possible types” property of the operand), do:
Let T be the type resulting of the operation with X_n being the type of the nth operand, add T the “possible types” property of N.
If R is an “assignment” node and the type of the assigned variable is known, then:
Let T be the type of the assigned variable, assign {T} to the “possible types” of R.
Else:
Let T be the highest priority type (see “Type priority” table) among the “possible types” property of F, assign {T} to the “possible types” property of R.
```

```
For each node N of A having a number of possible types higher than 1, do starting from the nearest nodes of R:
Consider T, the unique type in the “possible types” property of the parent of N.
If T is in the “possible types” of N, then:
Assign {T} to the “possible types” property of N.
Else:
Let T be the highest priority type (see “Type priority” table) among the “possible types” property of N, assign {T} to the “possible types” property of N.
Let T be the unique type in the “possible types” property of L, T is the type that must be associated to the value represented by the literal.
```