Mix.install([
{:kino, "~> 0.6.2"}
])
If you want to know more about the book:
- You can download two free chapters.
- Or, you can buy the complete book together with's accompanying Livebook notebooks.
In this Livebook document, we will be looking at the lifecycle of supervision trees to
better understand how processes are started, restarted, and stopped. Before jumping into
the implementation, let's see a visual diagram of what we will be building (later we'll
compare this diagram with what Kino.Process.sup_tree/1
reports):
graph TD;
classDef root fill:#c4b5fd, stroke:#374151, stroke-width:4px;
classDef supervisor fill:#c4b5fd, stroke:#374151, stroke-width:1px;
classDef worker fill:#93c5fd, stroke:#374151, stroke-width:1px;
classDef notstarted color:#777, fill:#d9d9d9, stroke:#777, stroke-width:1px;
x(ParentSupervisor):::root
a(GenServer 1):::worker
y(ChildSupervisor):::supervisor
c(GenServer 4):::worker
d(GenServer 2):::worker
e(GenServer 3):::worker
x --> a
x --> y
x --> c
y --> d
y --> e
As you can see, we have a top-level supervisor that starts two processes as well as another supervisor. The child supervisor also has two children which brings the total number of worker processes for this supervision tree to four. In addition, our supervision tree will start in the order specified in the diagram to show how the order in which we define the components of our supervision tree has an impact on how things are started.
Let's start the implementation of the GenServer processes and then move on to our supervisor implementations.
Our GenServer for this example will be minimalistic, implementing only the init/1
and
terminate/2
callbacks and logging its name when it starts/terminates. By logging the
name of the GenServer instance you will be able to see when processes are started relative
to their position in the supervision tree definition.
One important thing to note here is the addition of the child_spec/1
function. You may not
have seen this function before as the use GenServer
macro automatically generates this
function for you taking into account the options that you pass to the macro
(Elixir source code).
The child_spec/1
function is used to configure how a process runs under a supervisor and
is required to instruct the supervisor how to treat that process in the case of a restart or
shutdown. The reason that we override this default child_spec/1
function, is that if we
attempt to add a GenServer to the same Supervisor and the processes have the same :id
value
in their child specifications, we will get an error along the lines of
Evaluation process terminated - bad child specification, more than one child specification has the id: SimpleGenServer
since the default GenServer macro uses the module name as the :id
.
defmodule SimpleGenServer do
use GenServer
def start_link(name) do
GenServer.start_link(__MODULE__, name, name: name)
end
def child_spec(init_arg) do
Supervisor.child_spec(
%{
id: init_arg,
start: {__MODULE__, :start_link, [init_arg]}
},
[]
)
end
@impl true
def init(name) do
Process.flag(:trap_exit, true)
IO.puts("Starting GenServer: #{inspect(name)}")
{:ok, name}
end
@impl true
def terminate(_reason, name) do
IO.puts("Shutting down GenServer: #{inspect(name)}")
:ok
end
end
With our minimalist GenServer implementation in place, we can now get started on the implementation of our top-level supervisor module.
Our top-level supervisor module (ParentSupervisor
in the diagram) will start two instances of
our GenServer process as well as another supervision tree (ChildSupervisor
in the diagram).
Let's see what this looks like and then break down what is happening:
defmodule ParentSupervisor do
use Supervisor
def start_link(init_arg) do
Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
end
@impl true
def init(_init_arg) do
IO.puts("Starting Supervisor: #{inspect(__MODULE__)}")
children = [
{SimpleGenServer, :gen_server_one},
{ChildSupervisor, []},
{SimpleGenServer, :gen_server_four}
]
Supervisor.init(children, strategy: :one_for_one)
end
end
Our parent supervisor starts by calling use Supervisor
which automatically defines
the child_spec/1
function for our supervisor. It also ensures that the module conforms to the
Supervisor
behaviour by injecting @behaviour Supervisor
into the module. We then go on
to implement a start_link/1
similarly to how we do in our GenServer modules. Given that
supervisors are also processes, this provides us with a nice function so that
we can start the supervision process.
Lastly, we implement the init/1
callback (similarly to how we do in our GenServer modules) and
instruct the supervisor what child processes it needs to start. This callback, just like the
GenServer init/1
callback will block until all of the child processes have been started and
have gone through their own init/1
callbacks.
In this particular Supervisor module, we start SimpleGenServer
and give it the name of
:gen_server_one
, then we start our child supervisor (we'll implement this in the next
section), and lastly, we start another instance of SimpleGenServer
with the name
:gen_server_four
as we laid out in our initial diagram. The ParentSupervisor
Supervision
tree will finish initialization once all of the aforementioned processes have started up. If any of the children fail to initialize properly, the entire supervision tree will be killed.
The last thing to note is that the restart policy, or :strategy
, used for the supervisor is
:one_for_one
. When the strategy is :one_for_one
, only a failed process is restarted by the
supervisor and the remaining child processes are left to run as they were. Our ChildSupervisor
will instead leverage the :one_for_all
strategy so we can see how that differs
With the ParentSupervisor
in place, let's take a look at the ChildSupervisor
and see what
that looks like.
Our ChildSupervisor
module is very similar to our ParentSupervisor
module in that
it is responsible for starting a couple of processes and then rolling them up under its
supervision tree. The major difference in this supervisor module is that the restart
strategy for the child processes is :one_for_all
. With this strategy, whenever a process
encounters an error and is terminated, all of the processes under the supervision tree are
terminated, and the whole tree is brought up from scratch. Aside from that subtle difference,
the two supervisors or more or less the same:
defmodule ChildSupervisor do
use Supervisor
def start_link(init_arg) do
Supervisor.start_link(__MODULE__, init_arg, name: __MODULE__)
end
@impl true
def init(_init_arg) do
IO.puts("Starting Supervisor: #{inspect(__MODULE__)}")
children = [
{SimpleGenServer, :gen_server_two},
{SimpleGenServer, :gen_server_three}
]
Supervisor.init(children, strategy: :one_for_all)
end
end
With our child supervisor all set, it is time to start up the entire supervision tree and experiment with it to see how how it starts up and how it reacts to various simulated errors.
With all of the necessary components in place, we are ready to fire up our supervision tree!
You'll notice that when the ParentSupervisor
starts, the log statements are printed in a
depth-first traversal order as the ParentSupervisor
starts up all of the processes and the
ChildSupervisor
. Also note that no matter how many times you run the following snippet,
the order of the print statements remains the same (the start order is deterministic). To
reinforce that the supervision tree that we created mirrors the structure that we initially
described at the top of this document, we can lean on Kino.Process
module to render the
supervision tree for us (the order of the processes in this diagram may differ as the tools
in Kino.Process
do not take into account start order for processes in the tree).
# Kill the supervision tree if it is already running so that you don't encounter
# any errors starting the tree.
if is_pid(Process.whereis(ParentSupervisor)) do
Supervisor.stop(ParentSupervisor)
end
# Start the supervision tree
ParentSupervisor.start_link([])
# Output the supervision tree
Kino.Process.sup_tree(ParentSupervisor)
With our supervision tree up and running, it's time to experiment with it a little and
see what happens under different failure scenarios. If you recall, our supervisors had
different restart strategies for when the processes fail. With the ParentSupervisor
, only
the processes that crash are restarted. Whereas with the ChildSupervisor
, all of the
processes under the supervisor are restarted if any of them crash. Let's artificially induce
some failures and see this in action.
GenServer.stop(:gen_server_two, :brutal_kill)
As you can see, both :gen_server_two
and :gen_server_three
are restarted and their print
statements are invoked (also note that their start order is just as we expect). Now, let's try
restarting one of the GenServers under the ParentSupervisor
and see what happens:
GenServer.stop(:gen_server_one, :brutal_kill)
As you can see the only process that is restarted is the :gen_server_one
process as the
restart strategy for the ParentSupervisor
supervisor is :one_for_one
. Let's also
brutally kill the ChildSupervisor
to see that the ParentSupervisor
is able to bring it
back up without any issues:
GenServer.stop(ChildSupervisor, :brutal_kill)