Zig aims to be a simple language. It is not easy to define what simple exactly means, but zig is also a low-level programming language that aims for c-compatibility. To reach this goal, it needs good semantics in its type system so that developers have a complete toolbox to manipulate data.
So types in zig are composable, but this can become rapidly overwhelming. See those examples. Are you able to understand them at a glance, as soon as you read them?
*const ?u8
?*const u8
*const [2]u8
[]?u8
?[]u8
[*]u8
[*:0]u8
*[]const u8
*[]*const ?u8
They seemed complex to me, even if Loris Cro helped a lot in the following video.
https://www.youtube.com/watch?v=VgjRyaRTH6E
So when I donβt understand something, I like to draw representations to illustrate the concepts. It is what I do for example to understand how commits relate to each other in git.
So here is my take on the zig type system. Feel free to comment if anything is wrong or unclear.
It all starts with boxes.
βββββ
β β
βββββ
u8
and u16
are examples of box sizes and are called types.
βββββ βββββββ βββ
β β u8 β β u16 β β u2
βββββ βββββββ βββ
There are many other types in zig, but we will use u8
in this document to illustrate the concepts. In the end, types are just representations of different things of different sizes.
A variable is a named box having a type and a value.
βββββ
aβ€ 1 β var a: u8 = 1;
βββββ
Here, the box named a
of type u8
holds the value 1
.
A constant is a variable that cannot change over time. Imagine a variable box, but that looks like a jail because it cannot be opened/changed anymore after initialization.
ββ°ββ°β
aβ€β1ββ const a: u8 = 1;
ββΈββΈβ
An optional variable can hold either a value or be null
. The box accomodates both and is represented here with a vertical split. The top part represents the variable when it is filled, and the bottom part is when it is null
.
βββββ βββββ
β 1 β β β
aβΌβ?ββ€ var a: ?u8 = 1; aβΌβ?ββ€ var a: ?u8 = null;
β β β β
β
βββββ βββββ
An array holds multiple values of the same type. The boxes are visually represented attached to each other.
βββββββββββββββ
aβ€ 1 ββ€ 2 ββ€ 3 β var a: [3]u8 = [_]u8{ 1, 2, 3 };
βββββββββββββββ
Values in the array can be accessed using indexes starting at 0
. For example, a[0]
has the value 1
and can be changed with a[0] = 2
.
Here is an array of optional u8
values.
βββββββββββββββ
β 1 ββ ββ 3 β
aβΌβ?ββΌβΌβ?ββΌβΌβ?ββ€ var a: [3]?u8 = [_]?u8{ 1, null, 3 };
β ββ β
ββ β
βββββββββββββββ
An array can also be constant. Elements cannot be changed.
ββ°ββ°βββ°ββ°βββ°ββ°β
aβ€β1βββ€β2βββ€β3ββ const a: [3]u8 = [_]u8{ 1, 2, 3 };
ββΈββΈβββΈββΈβββΈββΈβ
An array can be zero-terminated. It means there is an additional value 0
at the index of its length. And the compiler will let you access that element instead of returning an out-of-bands error.
ββββββββββββββββββββ
aβ€ 1 ββ€ 2 ββ€ 3 ββ€ 0 β var a: [3:0]u8 = [_:0]u8{ 1, 2, 3 };
ββββββββββββββββββββ
std.debug.print("{}\n", .{a[3]}); // correct and prints 0
This zero value can be changed to any other sentinel value ([3:2]u8
or even [3:'f']u8
for example).
A pointer is an address to another variable. In the diagram here, &
represents an address for simplicity but it is normally a real memory address. The arrow means that the address stored in the pointer is the one of the pointed variable.
βββββ
aβ€ 1 β var a: u8 = 1;
βββ²ββ
βββββ β
pβ€ & βββββ var p: *u8 = &a; // &a is the address of a
βββββ
p.* = 2; // the value of a can be changed through p using the * keyword
If a pointer is constant, it cannot change but the pointed variable can.
βββββ
aβ€ 1 β var a: u8 = 1;
βββ²ββ
ββ°ββ°β β
pβ€β&ββββββ const p: *u8 = &a;
ββΈββΈβ
p.* = 2; // changing a through p is correct
p = &c; // changing p directly is incorrect
A pointer can instead point to a constant instead of a variable.
ββ°ββ°β
aβ€β1ββ const a: u8 = 1;
ββΈβ²βΈβ
βββββ β
pβ€ & βββββ var p: *const u8 = &a;
βββββ
p.* = 2; // incorrect
p = &c; // correct
A pointer to a variable can be coerced to a pointer to a constant, but not the opposite.
βββββ ββ°ββ°β
aβ€ 1 β var a: u8 = 1; aβ€β1ββ const a: u8 = 1;
βββ²ββ ββΈβ²βΈβ
βββββ β βββββ β
pβ€ & βββββ var p: *u8 = &a; pβ€ & βββββ var p: *const u8 = &a;
βββββ βββββ
var p2: *const u8 = p; // correct var p2: *u8 = p; // incorrect
A pointer can point to an optional variable.
βββββ
β β
aβΌβ?ββ€ var a: ?u8 = null;
β β
β
βββ²ββ
βββββ β
pβ€ & βββββ var p: *?u8 = &a;
βββββ
Or a pointer can itself be optional.
βββββ
aβ€ 1 β var a: u8 = 1;
βββ²ββ
βββββ β
β & βββββ var p: ?*u8 = &a;
pβΌβ?ββ€
β β p = null; // correct
βββββ
A pointer can point to a constant value that is optional.
ββ°ββ°β
ββ2ββ
aβΌβ?ββ€ const a: ?u8 = 2;
ββ ββ
ββΈβ²βΈβ
βββββ β
pβ€ & βββββ var p: *const ?u8 = &a;
βββββ
A optional pointer can also point to a constant.
ββ°ββ°β
aβ€β1ββ const a: u8 = 1;
ββΈβ²βΈβ
βββββ β
β & βββββ var p: ?*const u8 = &a;
pβΌβ?ββ€
β β
βββββ
A pointer can point to an array. This one points to an array of u8
.
βββββββββββββββ
aβ€ 1 ββ€ 2 ββ€ 3 β var a: [3]u8 = [_]u8{ 1, 2, 3 };
βββ²ββββββββββββ
βββββ β
pβ€ & βββββ var p: *[3]u8 = &a;
βββββ
This one points to a constant array of u8
.
ββ°ββ°βββ°ββ°βββ°ββ°β
aβ€β1βββ€β2βββ€β3ββ const a: [3]u8 = [_]u8{ 1, 2, 3 };
ββΈβ²βΈβββΈββΈβββΈββΈβ
βββββ β
pβ€ & βββββ var p: *const [3]u8 = &a;
βββββ
A pointer can point to an unknown number of u8
.
ββββββββββββββββββββ
aβ€ 1 ββ€ 2 ββ€ β¦ ββ€ 5 β var a: [5]u8 = [_]u8{ 1, 2, 3, 4, 5 };
βββ²βββββββββββββββββ
βββββ β
pβ€ & βββββ var p: [*]u8 = &a;
βββββ
The advantage over a regular pointer to u8
(*u8
) is that it says there can be many u8
at this address. The system just does not know how many.
A pointer can also point to an unknown number but zero-terminated of u8
values.
βββββββββββββββββββββββββ
aβ€ 1 ββ€ 2 ββ€ β¦ ββ€ 5 ββ€ 0 β var a: [5:0]u8 = [_:0]u8{ 1, 2, 3, 4, 5 };
βββ²ββββββββββββββββββββββ
βββββ β
pβ€ & βββββ var p: [*:0]u8 = &a;
βββββ
At the opposite, see an array of pointers to u8
values.
βββββ βββββ βββββ var a: u8 = 1;
aβ€ 1 β bβ€ 2 β cβ€ 3 β var b: u8 = 2;
βββ²ββ βββ²ββ βββ²ββ var c: u8 = 3;
β βββ β
β β βββββ
βββ΄βββββ΄βββββ΄ββ
pβ€ & ββ€ & ββ€ & β var p: [3]*u8 = [_]*u8{ &a, &b, &c };
βββββββββββββββ
And to finish with pointers, they can also point to other pointers.
βββββ
aβ€ 1 β var a: u8 = 1;
βββ²ββ
βββββ β
p1β€ & βββββ var p1: *u8 = &a;
βββ²ββ
βββββ β
p2β€ & βββββ var p2: **u8 = &p1;
βββββ
A slice is a pointer to an array with a length known at runtime. In the slice box, there is the address of the first element of the array represented by &
and the length of the slice after the colon character.
A slice can be initiated using a pointer to the backing array (the compiler knows how to coerce them), and the length will be defined at runtime to the length of the array. This means we can always coerce a pointer to an array into a slice, but not the opposite. Thatβs because the compiler wonβt know the length of the array from the slice at compile time.
βββββββββββββββ
aβ€ a ββ€ b ββ€ c β var a: [3]u8 = [_]u8{ 'a', 'b', 'c' };
ββ[βββββββββ]ββ
βββββ β
pβ€ & βββββββ€ var p: *[3]u8 = &a;
βββββ β
βββββββ β
sβ€ &:3 βββββ var s: []u8 = &a; // directly pointing to a
βββββββ var s: []u8 = p; // or assigned from p
var s: []u8 = a[0..3]; // or from a range of the array
A zero-terminated slice guarantees that a zero value exists at the element indexed by the length.
ββββββββββββββββββββ
aβ€ a ββ€ b ββ€ c ββ€ 0 β var a: [3:0]u8 = [_:0]u8{ 'a', 'b', 'c' };
ββ[βββββββββ]βββββββ
βββββββ β
sβ€ &:3 βββββ var s: [:0]u8 = &a;
βββββββ
A slice can be optional.
ββββββββββββββββββββ
aβ€ a ββ€ b ββ€ c ββ€ d β var a: [4]u8 = [_]u8{ 'a', 'b', 'c', 'd' };
βββββββ[βββββββββ]ββ
βββββββ β
β &:3 ββββββββββ var s: ?[]u8 = &a;
sβΌββ?βββ€
β β s = null; // correct
βββββββ
Now, here is a slice of constant u8
values.
ββ°ββ°βββ°ββ°βββ°ββ°β
aβ€β1βββ€β2βββ€β3ββ const a: [3]u8 = [_]u8{ 1, 2, 3 };
ββΈ[βΈβββΈββΈβββΈ]βΈβ
βββββββ β
sβ€ &:3 βββββ var s: []const u8 = &a;
βββββββ
A string litteral is a zero-terminated constant known at comptime that is stored in the binary.
ββ°ββ°βββ°ββ°βββ°ββ°βββ°ββ°βββ°ββ°βββ°ββ°β
ββhβββ€βeβββ€βlβββ€βlβββ€βoβββ€β0ββ
ββΈ[βΈβββΈββΈβββΈββΈβββΈββΈβββΈ]βΈβββΈββΈβ
βββββ β
pβ€ & βββββββ€ var p: *const [5:0]u8 = "hello";
βββββ β
βββββββ β
sβ€ &:5 βββββ var s: []const u8 = p; // correct
βββββββ
And to add a level of indirection, here is a slice of pointers pointing to constant u8
values.
ββ°ββ°β ββ°ββ°β ββ°ββ°β const a: u8 = 1;
aβ€β1ββ bβ€β2ββ cβ€β3ββ const b: u8 = 2;
ββΈβ²βΈβ ββΈβ²βΈβ ββΈβ²βΈβ const c: u8 = 3;
β βββ β
β β βββββ
βββ΄βββββ΄βββββ΄ββ
pβ€ & ββ€ & ββ€ & β var p: [3]*const u8 = [_]*u8{ &a, &b, &c };
ββ[βββββββββ]ββ
βββββββ β
sβ€ &:3 βββββ var s: []*const u8 = p[0..p.len]; // s.len is 3
βββββββ
After all, the type system of zig is just composing those previous concepts. And drawing the representation can help understanding what is going on behind the scenes. I cannot draw all the possible combinations as there is a lot of them, but here is a last one for fun that is more complex.
ββ°ββ°β ββ°ββ°β ββ°ββ°β
ββ1ββ ββ ββ ββ3ββ const a: ?u8 = 1;
aβΌβ?ββ€ bβΌβ?ββ€ cβΌβ?ββ€ const b: ?u8 = null;
ββ ββ βββ
ββ ββ ββ const c: ?u8 = 3;
ββΈβ²βΈβ ββΈβ²βΈβ ββΈβ²βΈβ
β βββ βββββ
βββ΄βββββ΄βββββ΄ββ
dβ€ & ββ€ & ββ€ & β var d: [3]*const ?u8 = [_]*const ?u8{ &a, &b, &c };
ββ[βββββββββ]ββ
βββββββ β
sβ€ &:3 βββββ var s: []*const ?u8 = d[0..2];
ββββ²βββ
βββββ β
pβ€ & ββββββ var p: *[]*const ?u8 = &s;
βββββ
In case you did not guess, this is a pointer to slice of pointers to constant optional u8
values.
Feel free to find other combinations and try to draw them to improve your knowledge about the zig type system!
the s box in the most bottom diagram should be:
&:3