Skip to content

2. Primitive types

2.1. Definitions

Here are some definitions to describe what are a “value”, a “type”, a “notation”, and a “literal”.

2.1.1. Value

A value is the data itself. In Ergol there are three different kind of values:

  • booleans
  • integers
  • floating point numbers (also called “floats”)

Info

Float values follow the IEEE 754 standard, so they can be:

  • an imprecise rational number
  • not a number
  • positive infinity
  • negative infinity

Warning

Every rational number can't be represented as float, so floats are imprecise by nature.

Info

The floats that have no fractional part are called “full floating point numbers” or “full floats”.

2.1.2. Type

A type is an attribute of data which tells the compiler how the data will be used. The type also defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.

A primitive type is a type that is implemented at a language level. It does not require a library (even the standard one) to work. We can class the different primitive types using these simple rules:

  • The primitive type storing a boolean is called a boolean type.
  • Primitive types storing an integer are called integer types.
  • Integer types storing a signed integer are called signed integer types.
  • Integer types storing an unsigned integer are called unsigned integer types.
  • Primitive types storing a float are called floating point types or float types.

In Ergol it exists only the 12 following primitive types (grouped by category):

  • boolean type:
    • bool
  • integer types:
    • signed integer types:
      • i8
      • i16
      • i32
      • i64
    • unsigned integer types:
      • u8
      • u16
      • u32
      • u64
      • char
  • float types:
    • f32
    • f64

2.1.3. Notation

A notation is a set of rules used to define how a literal MUST be interpreted to know which value it represents. Ergol notations for primitive types are:

  • nBoolean
  • nBinary
  • nOctal
  • nDecimal
  • nDecimalDP
  • nScientific
  • nHexadecimal
  • nCharacter

2.1.4. Literal

A literal is a string of characters written in the source code that represent a value. A literal is written using a specific notation and is evaluated at compile time. Many literal can represent the same value.

Example

These three literal represent the same value:

  • 0.0
  • 0.
  • .0

2.2. Types

2.2.1. Boolean type

In Ergol, the only boolean type is bool. It's used to store the truthiness of something (a boolean). It have only 2 possible values: the true and false of the classical logic.

2.2.2. Integer types

It exists 9 different integer types. Four of them are signed (can contain a negative integer):

type size (in bits) min value (included) max value (included)
i8 8 - 27 27 - 1
i16 16 - 215 215 - 1
i32 32 - 231 231 - 1
i64 64 - 263 263 - 1

The five others are unsigned (cannot contain a negative integer):

type size (in bits) min value (included) max value (included)
u8 8 0 28 - 1
u16 16 0 216 - 1
u32 32 0 232 - 1
char 32 0 232 - 1
u64 64 0 264 - 1

In Ergol, characters are stored as integers using their Unicode code point. It exists an integer type made for this purpose named char which is equivalent to u32. It's a good practice to use the char type instead of the u32 type when storing a character, because it's more explicit.

Warning

Even if char and u32 are equivalent in size, min value and max value, they are not the same type.

Therefore, it's legal to overload a function by changing a parameter type from char to u32. If char and u32 were the same type, it would not be possible because both functions would have the same signature.

void printValue(char x) {
    printl("The value of the character is:", x)
}

void printValue(u32 x) {
    printl("The value of the integer is:", x)
}

printValue('a')
printValue(42)

Output:

The value of the character is: a⮐
The value of the integer is: 42⮐

Since characters are stored as integers, nothing stops you to use other smaller integer types to store characters, such as u8 or u16:

  • u8 is fine for storing the first characters of the Unicode set:
  • u16 is fine for storing characters from U+0000 to U+FFFF.

After all, if space is not a major concern, use the char type to store characters because it can store every Unicode characters (from U+0000 to U+10FFFF).

2.2.3. Float types

It exists 2 float types. They are based on the IEEE 754 standard.

type size (in bits) format smallest representable number greatest representable number
f32 32 binary32 IEEE 754 floating point number (2−23 - 2) × 2127 (2 − 2−23) × 2127
f64 64 binary64 IEEE 754 floating point number (2−52 - 2) × 21023 (2 − 2−52) × 21023

2.3. Notations

2.3.1. nBoolean

2.3.1.1. syntax

boolean-literal = %s"true" / %s"false"

The literal of a boolean value can be written using the nBoolean notation: false or true.

2.3.1.2. interpretation

To get the value from a literal:

  • The false literal corresponds to the false value.
  • The true literal corresponds to the true value.

2.3.2. nBinary

2.3.2.1. syntax

binary-digit   = "0" / "1"
binary-literal = %s"0b" 1*(*"_" binary-digit) *"_"

The literal of an integer or a positive full float, except positive infinity, can be written using the nBinary notation (0 and 1 digits). Leading zeros are allowed but OPTIONAL. If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

The digits MUST be preceded by the binary prefix: 0b. Underscore characters MAY appear anywhere after the prefix.

Examples

0b00101010
0b0
0b1
0b_11111111_11111111
0b___0_
0b0

2.3.2.2. interpretation

To get the value from a literal:

  1. Drop:
    • the prefix
    • all underscores
    • the leading zeros
  2. Interpret the result using the base-2 numeral system.

2.3.3. nOctal

2.3.3.1. syntax

octal-digit   = %u0030-0037
octal-literal = %s"0o" 1*(*"_" octal-digit) *"_"

The literal of a positive integer or a positive full float, except positive infinity, can be written using the nOctal notation (digits from 0 to 7). Leading zeros are allowed but OPTIONAL. If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

The digits MUST be preceded by the octal prefix: 0o. Underscore characters MAY appear anywhere after the prefix.

Examples

0o777
0o750
0o0
0o01_234_567
0o___0_
0o0

2.3.3.2. interpretation

To get the value from a literal:

  1. Drop:
    • the prefix
    • all underscores
    • the leading zeros
  2. Interpret the result using the base-8 numeral system.

2.3.4. nDecimal

2.3.4.1. syntax

decimal-literal = 1*(*"_" decimal-digit) *"_"

The literal of a positive integer or a positive full float, except positive infinity, can be written using the nDecimal notation (digits from 0 to 9). Leading zeros are allowed but OPTIONAL. If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

Underscore characters MAY appear anywhere in the literal.

Examples

42
0
123_456_789
0000
01
__0_0_

2.3.4.2. interpretation

To get the value from a literal:

  1. Drop:
    • all underscores
    • the leading zeros
  2. Interpret the result using the base-10 numeral system.

2.3.5. nDecimalDP

2.3.5.1. syntax

decimal-dp-literal = ([decimal-literal] "." decimal-literal) / (decimal-literal "." [decimal-literal])

The literal of a positive float, except positive infinity, can be written using the nDecimalDP notation (digits from 0 to 9 and a decimal point).

It consists of a decimal point (.) preceded by an integral part and followed by an fractional part.

The integral part is written using one or more digits from 0 to 9. Leading zeros are allowed but OPTIONAL. If the digits of the integral part consist only of zeros, the right-most one is not considered as a leading zero. The integral part can be omitted if the fractional part is not.

The fractional part is written using one or more digits from 0 to 9. Trailing zeros are allowed but OPTIONAL. If the digits of the fractional part consist only of zeros, the left-most one is not considered as a trailing zero. The fractional part can be omitted if the integral part is not.

Underscore characters MAY appear anywhere in the literal.

Examples

3.14
.42
50.
1_000_000.500
03.
_.132_000_000

2.3.5.2. interpretation

To get the value from a literal:

  1. Drop:
    • all underscores
    • the leading zeros in the integral part
    • the trailing zeros in the fractional part
  2. If the string starts with ., add a zero at the start.
  3. If the string ends with ., add a zero at the end.
  4. Interpret the result using the base-10 numeral system.

2.3.6. nScientific

2.3.6.1. syntax

scientific-literal = decimal-dp-literal %s"e" ["-"] decimal-literal

The literal of a positive integer or a positive float, except positive infinity, can be written using the nScientific notation.

It consists of an integral part, followed by a decimal point, followed by a fractional part, followed by a lowercase letter e, followed by an exponent part.

The integral part is written using one or more digits from 0 to 9. Leading zeros are allowed but OPTIONAL. If the digits of the integral part consist only of zeros, the right-most one is not considered as a leading zero. The integral part can be omitted if the fractional part is not.

The fractional part is written using one or more digits from 0 to 9. Trailing zeros are allowed but OPTIONAL. If the digits of the fractional part consist only of zeros, the left-most one is not considered as a trailing zero. The fractional part can be omitted if the integral part is not.

The exponent part is written using an optional - character followed by digits from 0 to 9. Leading zeros are allowed but OPTIONAL. If the digits of the exponent part consist only of zeros, the right-most one is not considered as a leading zero.

Underscore characters MAY appear anywhere in the literal.

Examples

1.14e0
4.2e2
.05e-10
1_500e3
03.e__-_1
_.132_000_000e0_

2.3.6.2. interpretation

To get the value from a literal:

  1. Drop:
    • all underscores
    • the leading zeros in the integral part
    • the trailing zeros in the fractional part
    • the leading zeros in the exponent part
  2. Let M be the characters before the lowercase letter e.
  3. Let E be the characters after the lowercase letter e.
  4. If the M starts with ., add a zero at the start of M.
  5. If the M ends with ., add a zero at the end of M.
  6. Let MV be the interpretation of M using the base-10 numeral system.
  7. Let EV be the interpretation of E using the base-10 numeral system.
  8. The value is MV times, ten to the power of EV.

2.3.7. nHexadecimal

2.3.7.1. syntax

hexadecimal-digit   = %u0030-0039 / %u0041-0046 / %u0061-0066
hexadecimal-literal = %s"0x" 1*(*"_" hexadecimal-digit) *"_"

The literal of an integer or a positive full float, except positive infinity, can be written using the nHexadecimal notation (digits from 0 to 9, then from A to F, case insensitive). Leading zeros are allowed but OPTIONAL. If the digits consist only of zeros, the right-most digit is not considered as a leading zero.

The number of digits MUST be, at least 1.

The digits MUST be preceded by the hexadecimal prefix: 0x. Underscore characters MAY appear anywhere after the prefix.

Examples

0x2A
0x2a
0x6aFD4
0x__6a_FD4_
0xB6_AF_D4
0x0

2.3.7.2. interpretation

To get the value from a literal:

  1. Drop:
    • the prefix
    • all underscores
    • the leading zeros
  2. Transform the lowercase letters to their uppercase equivalent.
  3. Interpret the result using the base-16 numeral system (0123456789ABCDEF alphabet).

2.3.8. nCharacter

2.3.8.1. syntax

; all Unicode characters except the single quote, the backslash, and control characters, but CHARACTER TABULATION (U+0009) is allowed
single-quotable-character = %u0009 / %u0020-0026 / %u0028-005B / %u005D-005E / %u00A0-10FFFF
; 6-digit hexadecimal number in the [000000 .. 10FFFF] range
unicode-code-point = (("0" hexadecimal-digit) / "10") 4hexadecimal-digit
; all escape sequences
escape-sequence = "\" ( "0" / %s"a" / %s"b" / %s"t" / %s"n" / %s"v" / %s"f" / %s"r" / %s"e" / DQUOTE / "'" / "\" / (%s"u" unicode-code-point))

character-literal = "'" (single-quotable-character / escape-sequence) "'"

The literal of an integer, or full float, in the [0 .. 1114111] range can be written using the nCharacter notation.

Info

The “character” value does not exist in Ergol. The nCharacter notation describe an integer value corresponding to the Unicode code point of the character to store.

See more here.

2.3.8.1. interpretation

The nCharacter notation can have multiple form. All of them are described below. Each sub-notation is followed by the integer value that it represents and the corresponding Unicode character that it represent.

literal represented integer value corresponding Unicode character
'\0' 0 NULL
'\a' 7 BELL
'\b' 8 BACKSPACE
'\t' 9 CHARACTER TABULATION
'\n' 10 LINE FEED
'\v' 11 LINE TABULATION
'\f' 12 FORM FEED
'\r' 13 CARRIAGE RETURN
'\e' 27 ESCAPE
'\"' 34 QUOTATION MARK
'\'' 39 APOSTROPHE
'\\' 92 REVERSE SOLIDUS
'X' with X any Unicode character except ' and \ Unicode code point of X X
'\uHHHHHH' with HHHHHH a six-digit hexadecimal number (case insensitive) in the [000000 .. 10FFFF] range value of HHHHHH the Unicode character corresponding to the code point of HHHHHH

Examples

'a'
'0'
'Z'
'_'
'é'
'丕'
'\n'
'\\'
'\u0000e9'

2.3.9. nExtreme

2.3.9.1. syntax

extreme-literal = %s"NaN" / %s"NegInfinity" / %s"PosInfinity"

The literal of a “not a number” float, “negative infinity” float, or “positive inifinity” float can be writen using the nExtreme notation. So, one of the following literal:

  • NaN
  • NegInfinity
  • PosInfinity

2.3.9.2. interpretation

literal coresponding value
NaN not a number
NegInfinity negative infinity
PosInfinity positive infinity

2.3.10. Other notations

nBoolean, nBinary, nOctal, nDecimal, nDecimalDP, nScientific, nHexadecimal and nCharacter are the only notations that can be used in a primitive type literal. In order to use other notation like base 64, see the base converter library.

2.4. Type inference

A literal represents a value, and this value MUST have a type associated to it. On its own, a literal give little to no information about the type that MUST be used.

The type to use is chosen using the type inference algorithm and depends of:

  • the value to store
  • the notation used to represent the value
  • the context

2.4.1 Defining candidates

Each type is not suitable to every value. A compatibility table MUST be used to know if the value is compatible with a given type.

2.4.1.1. Compatibility table

nBoolean nBinary, nOctal, nDecimal, nHexadecimal nScientific nDecimalDP nCharacter nExtreme
bool yes no no no no no
i8 no v ∈ I8 v ∈ I8 no v ∈ I8 no
i16 no v ∈ I16 v ∈ I16 no v ∈ I16 no
i32 no v ∈ I32 v ∈ I32 no yes no
i64 no v ∈ I64 v ∈ I64 no yes no
u8 no v ∈ U8 v ∈ U8 no v ∈ U8 no
u16 no v ∈ U16 v ∈ U16 no v ∈ U16 no
u32 no v ∈ U32 v ∈ U32 no yes no
char no v ∈ U32 v ∈ U32 no yes no
u64 no v ∈ U64 v ∈ U64 no yes no
f32 no v ∈ FF32 v ∈ F32 v ∈ F32 yes yes
f64 no v ∈ FF64 v ∈ F64 v ∈ F64 yes yes

2.4.1.2. Understanding the table

The table above indicates if a given type (line) can be used if a given notation (column) is used:

  • yes: The type can be used.
  • no: The type cannot be used.
  • predicate: The type can be used IF AND ONLY IF the predicate is fulfilled.

Below are the parameters and constants used in the predicates:

  • The value represented by the literal: v
  • The integer interval of all representable numbers in a x bits signed integer:
    Ix = [- 2x-1 .. 2x-1 - 1]
  • The integer interval of all representable numbers in a x bits unsigned integer:
    Ux = [0 .. 2x - 1]
  • The smallest interval containing all representable numbers by a f32:
    F32 = [(2−23 - 2) × 2127, (2 − 2−23) × 2127]
  • The smallest interval containing all representable numbers by a f64:
    F64 = [(2−52 - 2) × 21023, (2 − 2−52) × 21023]
  • The biggest interval containing full floats all representable by a f32:
    FF32 = [- 224 .. 224]
  • The biggest interval containing full floats all representable by a f64:
    FF64 = [- 253 .. 253]

2.4.2. Type priority

When the context is not sufficient to choose a type candidate over another, a priority table MUST be used. Type with highest priority (highest number in the table) is chosen.

type priority
bool 12 if used notation is nBoolean, 1 either
char 12 if used notation is nCharacter, 0 either
i32 11
i64 10
i16 9
i8 8
u32 7
u64 6
u16 5
u8 4
f64 3
f32 2

The priorities are defined using these simple rules, sorted from the most important to the least important:

  1. The bool type (respectively the char type) is preferred if and only if the nBoolean notation (respectively the nCharacter notation) was used, else it's the last choice, with a preference with the bool type over type char type.
  2. Integers types are preferred over float types.
  3. Signed integer types preferred over unsigned integer types.
  4. 32 bits formats are preferred over other formats, except for float types where 64 bits formats are preferred.
  5. Bigger formats are preferred over smaller formats.

2.4.3. Type inference algorithm

Basically, the type inference algorithm works as follows:

  1. Define all the type candidates.
  2. Keep the types that are best-fitting the context (the types that minimize the number of conversions).
  3. If there is still multiple candidates, the candidate with the highest priority is chosen.

Example

i32 a = 42
i64 b = 42

var c = a + 100 // the i32 type is associated to the 100 litteral
var d = b + 100 // the i64 type is associated to the 100 litteral

// The f64 type is associated to the 100.0 litteral.
// A conversion will occure to transform the value of “a” to a f64 so the addition can be done.
var e = a + 100.0

In a more rigorous way, the inference algorithm is defined as follows:

Consider in the AST (Abstract Syntax Tree) of the syntactically valid code, the node L that contains the literal which we want to define the type.

Start from node L and go up from parent to parent, until finding a child node of an “assignment” node or of a “function call” node. We will name this node F and its parent R.

Now, consider the sub-tree A, of root R, excluding nodes (except R and F) which are not descendant of F.
Consider, for each node of A, a property named “possible types”, which is a set of types, initialized to the empty set Ø.

For each node N of A excluding R, having a “possible types” property equals to Ø, do starting from the farthest nodes of R:
    If N is a “variable” node, then:
        Let T be the type of the variable, assign {T} to the “possible types” property of N.

    If N is a “function call” node, then:
        For each children P of N (for each function parameter), do:
            Let T be the highest priority type (see “Type priority” table) among the “possible types” of P, assign {T} to the “possible types“ property of P.
        The function signature is now known, because the type of each parameter is now defined without ambiguity.
        Let T be the return type of the function, assign {T} to the “possible types” property of N.

    If N is an “assignment” node, then:
        If the type of the assigned variable is known, then:
            Let T be the type of the assigned variable, assign {T} to the “possible types” property of N.
        Else:
            Let T be the “possible types“ property of the child node of N that represents the assigned value, assign T to the “possible types” property of N.

    If N is a “literal” node, then:
        Let S be the set of types which are compatible with the literal (see the “Compatibility” table), assign S to the “possible types” property of N.

    If N is an “operation” node and not an ”assignment” node, then:
        For each possible tuple X, with X_n being one of the possible type of the nth operand of N (according to the “possible types” property of the operand), do:
            Let T be the type resulting of the operation with X_n being the type of the nth operand, add T the “possible types” property of N.


If R is an “assignment” node and the type of the assigned variable is known, then:
    Let T be the type of the assigned variable, assign {T} to the “possible types” of R.
Else:
    Let T be the highest priority type (see “Type priority” table) among the “possible types” property of F, assign {T} to the “possible types” property of R.
For each node N of A having a number of possible types higher than 1, do starting from the nearest nodes of R:
    Consider T, the unique type in the “possible types” property of the parent of N.

    If T is in the “possible types” of N, then:
        Assign {T} to the “possible types” property of N.
    Else:
        Let T be the highest priority type (see “Type priority” table) among the “possible types” property of N, assign {T} to the “possible types” property of N.

Let T be the unique type in the “possible types” property of L, T is the type that must be associated to the value represented by the literal.

Last update: March 29, 2021