In this gist, I will show you how I use the podman container system to run R and RStudio on my laptop running Fedora 42.
So I will cover here mainly local usage of podman, but the remote usage is very similar so I will drop a few words about that too.
Combining podman and R is relatively straightforward thanks to the rocker project which provides the so-called images one needs.
I am using Fedora as my main operating system, so this is the "use case" environment covered here.
I expect that what I describe below works for other Linux distros as well; but I did not test it.
Also, on Windows and MacOS, there may be some additional steps to get podman running, which I did not look into (I read that one must use a podman machine
to create a virtual machine first...).
What I show should work with docker instead of podman but I did not test that either.
Before we start, I want to mention 3 good reasons for not switching to podman or docker to run R on your local machine:
- it increases internet traffic (using containers implies downloading an operating system and not just R).
- it decreases your free storage space (downloading an operating system and not just R uses more of your disk).
- it adds overheads for computation (using R within a container seems to slow things down by a factor of 1.5 times for me).
In short, using podman to run R consumes more electricity than using R on your computer directly. If it allows you to switch from a source installation of R packages to a binary installation, you will probably recup all the losses... Yet, I cannot claim that for myself since on Fedora I use cran2copr to install binary versions of packages (when I am not running R from podman).
podman is very similar to the better known software called docker, and both are used to create containers. I think of those as virtual boxes in which an operating system and software are installed, where you can install things and work without messing up your system. It is useful for testing things, keeping things separate, and enhancing reproducibility and distribution for complex projects.
For example, thanks to containers you can have several versions of R in parallel; you can have different projects that use different versions of R packages; you can try random packages without fearing to clutter your system with many new system libraries that will extend your update times. And much more...
If you are wondering why podman and not docker, the reason is that at work my admin prefers the former because it handles privileges differently. I trust him and don't really care since it works with podman, but if you do I bet that there are many web pages out there on podman vs docker. In any case, for the basic usages I have, it probably makes no difference as a user.
In my case, I first heard of docker years ago when I was investigating practices around reproducibility. I gave it a go back then, but it was slow and huge, and it was a little intimidating. Internet download speed got better and hard drives got much better, and I guess I have become somewhat less easily intimidated by software. Truth be told, I restarted to look into this topic a year ago or so to see if one could use podman/docker as a hack: I wanted to see if one could use RStudio server on a linux remote computer embedded in a larger Windows-based network. Indeed, the free version of RStudio server cannot use Microsoft Windows Active Directory as an authentication system. Many "hacks" are possible (e.g. using a VNC connection with RStudio desktop, or using VScode instead of RStudio), and podman/docker is indeed one of such "hacks", if not the better one. During this exploration with questionable motivations, I got to understand podman/docker a little better and eventually ended up seduced enough by all the doors it can open that I am now using podman locally on my laptop too.
If I want to test something in R using the latest R-devel I can just open a terminal on a computer where podman is installed and run this:
podman run --rm -ti rocker/drd
If it is the first time you run this, the whole system gets ready within around one minute in total (more or less, depending on your download speed). It does not interfere with your system as the more classic installation of R-devel: it will not replace it nor compete with whatever other R version you may have installed. After a minute of download, you should simply see R-devel opened in your terminal.
If I run the R command sessionInfo()
it tells me that the R-devel is installed on top of an Ubuntu system:
That's right, while my laptop runs Fedora, images can be based on other linux distributions, and here we have Ubuntu running in a container inside my Fedora installation (not a full Ubuntu though, since containers use the kernel of the host machine).
Imagine you want to quickly know how many total dependencies (direct + indirect) a package --say, tidyverse-- comes with. For this we can simply install a fresh version of base R and install the package:
podman run --rm -ti rocker/r-ver Rscript -e "
before <- length(dir(.libPaths()[1]))
install.packages('tidyverse')
after <- length(dir(.libPaths()[1]))
after - before"
and again, in a few seconds (greatly sped up by the binary installation of packages allowed by r-ver
), we learned that at the time of writing, tidyverse brings a total of 99 packages to your installation (and 114 if one runs the same code with install.packages('tidyverse', dependencies = TRUE)
).
Of course, there are smarter ways to explore dependencies, but the point here is not to solve a specific R problem but to illustrate podman/R capabilities.
OK, now I will unpack things a little, explaining things step-by-step.
The first step is to install podman on the computer where the work will happen (so here my laptop, but otherwise on your remote computer where you do your serious computations).
To install podman on Fedora (look here for other systems), I just did this:
sudo dnf install podman
I did not change any settings and thus did nothing more.
One setting that one may want to change is where the images and containers get stored on the drive.
Under default settings it all goes to ~/.local/share/containers
, which is fine for me.
In the two examples above, some of the complexity is obfuscated: that of the distinction between an image and a container. Above the commands created containers but due to podman "magic", the images needed to do so were automatically downloaded. The containers are indeed the things you want to use, but containers are derived from images -- the thing you download. In practice, you will probably derive several containers from a single image. For clarity, I thus prefer to deal with images and containers separately.
To download an image I do:
podman pull [IMAGE]
[IMAGE]
refers to the image you want to download. Those provider by rocker are listed here.
I tend to use rocker/geospatial
as [IMAGE]
for much of what I do since it contains R, RStudio, the tidyverse and many geospatial packages (sf
, stars
, terra
and many more). It works out-of-the-box. In the examples below, if you don't need these packages you can substitute rocker/geospatial
by rocker/tidyverse
which is a smaller image, or even by rocker/rstudio
if you don't need the tidyverse (I do assume here you want RStudio, but I will return on the more simple case of R only install at the end).
So, in practice, I simply do:
podman pull rocker/geospatial
This results in this:
As you can see, some "blobs" were skipped, this is because parts of the stuff we need were already available on the system.
Images are well thought through; they follow standards set by docker and the Open Container Initiative: images are made of several parts, each part has it own signature, and when pulling a new image from the registry podman will check which parts do not need to be downloaded in case they are already part of another image present on your system.
Note that to update an image, you just need to rerun podman pull [IMAGE]
.
It will check if it needs to be updated and will do so if it does.
As long as you are using images that have been made by others (and not yet build your own custom ones), the only commands I need when it comes to managing images are those:
podman images ## list images, their storage footprint, and provide their IMAGE name (under REPOSITORY)
podman image rm [IMAGE] ## remove the image of a particular IMAGE
For example, the following shows the images installed on my system:
You can see that the geospatial image is quite large (ca. 5GB) but the other two are smaller.
If I want to delete the image of R-devel I used above, I would do:
podman image rm rocker/drd
Checking once in a while which images you have installed on your system and removing those not needed is a good habit, especially if you actually work on a remote computer sharing space with others.
Ok, this is the slightly less trivial step and also the most important one.
The basic command to create (and run) a container is simple enough:
podman run [IMAGE]
You can alternatively use podman create
instead of podman run
if you don't want to run the container immediately, but I prefer to run the container straight away to make sure I did not mess up the call.
Indeed, that sounds simple enough not to mess up, but, unfortunately, without adding the right options this won't be very useful.
Calling podman run --help
will list a ton of options, but the ones I need are restricted to this list:
-t
&-i
or-ti
for short: I, and probably most others, use these options very often as it allows for good communication between you and the container inside a terminal; it handles the input/output (t
is for teletypewriter or terminal, andi
for interactive). I had found a lovely blog post about that, but alas I cannot find it back right now...--name [CONTAINER NAME]
: I use this almost all the time, it allows you to give a name to your container which otherwise you get a random name. The only time I do not use that is when I use the option--rm
...--rm
: I use this when I want to create and run a container without keeping it after a single use. I do this for small tests such as the examples shown above.-d
: I use this when I want to run the container in the background and not open anything in the terminal, I do this typically when I want to use RStudio.-w
: I use this to define the working directory. It works for bash and R within bash, but not for RStudio which I handle differently (see below).-e [something=something]
: I use that to define the environment variable which we need for RStudio.-v [something complicated, see below]
: I use this to define which folder of my computers will be available from within the container. If you do not use that, you will not be able to read or write files and other data outside the container. So I use this all the time.-p [local port:port within the container]
: I use this to specify the ports used by RStudio server (see below).
OK, that sounds overly complicated, and I should also mention that the order of the options matters for some of them; but in practice it is not that bad as I simply reuse the templates below and adapt them if needed.
Let us start with a simple example that is useful for you to master the -v
option:
podman run -ti --rm -w /myfiles -v $HOME/somefolder/:/myfiles/:z rocker/geospatial ls
Here I am assuming that the folder I want to access from within the container is called somefolder
and that it is located at the root for my home folder (i.e. ~
but use $HOME
). I am defining that, within the container, that folder will be called myfiles
and located at the root of the container (that is the /myfiles/
part). If that works the terminal should show the file(s) contained in the folder (here `foo.R).
The :z
is needed to make things work when SElinux is on (a safety thingy on linux, the default on my distro). One can use :Z
instead (capital) if the folder won't be accessed by another container (:z
allows sharing across containers, while :Z
does not). If SElinux is not on, you may need to drop that :z/Z
.
Importantly, I am not certain why, but many online sources suggest to use -v $HOME:$HOME
to make your whole home available. I noticed that on my system having $HOME:
makes things fail:
I did waste hours trying to figure out why things were not working and to solve this problem...
In practice, the restriction does not bother me since my files are relatively well organised, so I never need to have access to my entire home folder to work within R.
The -w /myfiles
sets the working directory to the folder. This way we do not need to tell the command ls
we run inside the container (to display the files) where to look for the files.
I used --rm
because until you find the right settings to make this work, you don't want to keep creating new containers. So here, after the test, the container is removed.
Assuming that what we did above works, we can now test if we have the right settings for RStudio to work.
podman run -ti --rm -w /myfiles -e PASSWORD=root -v $HOME/somefolder/:/myfiles/:z -p 8788:8787 rocker/geospatial
So the new options used here are -e PASSWORD=root
which simply sets the password for accessing RStudio server (you don't have to use root
and can use anything sensible) and -p 8788:8787
to set the ports. In that latter option, 8787
should not be changed since this is the default port used by RStudio server from within the container. In contrast, 8788
is the port I selected to access RStudio in your web browser (see below), so you can choose another one. Be careful though: some ports are not available as they are already used by your computer and which ones are free depends on what you use (see
this list for examples).
In practice, 8XXX
seems to be a good choice (replacing XXX by a 3-digit number).
In the terminal you will see a complex output:
Just ignore it. Instead go to your favorite web browser, open a new tab and type http://localhost:8788/
(adjusting the number if you did not use 8788). It should open the RStudio server login window:
The username is root
(the default when using podman) and the password is also root
(since we set it up that way using -e PASSWORD=root
).
Note for remote users: if you are not running the container on your local machine but on a remote server to which you are connected via
ssh
. For this to work, you will have to modify your ssh profile (by adding e.g.LocalForward 8788 localhost:8788
) or your ssh call to forward the connection (ssh -L 8788:localhost:8788 [username]@[remote address]
). I repeated twice 8788 here because I am assuming that you want to use locally the same port as the one you chose on the remote machine, otherwise change the first number.
As mentioned above, you will not yet easily see your files within RStudio since the working directory is "/". We will solve this below.
If that worked out so far, close the browser tab, then close the active container in the terminal using Ctrl-C, and continue.
Now that I am sure of the configuration for podman run
, I can create a container that is there to stay. I thus remove the option --rm
, add a name (here Rgeo
) and use the option -d
to free my terminal after the creation:
podman run -tid --name Rgeo -w /myfiles -e PASSWORD=root -v $HOME/somefolder/:/myfiles/:z -p 8788:8787 rocker/geospatial
This time, I used -d
to detach the container as there is no need to interact with the terminal.
For the same reason you could also remove -ti
, but leaving it creates no harm.
Again, go to your favorite web browser, open a new tab and type http://localhost:8788/
.
Once you are logged into RStudio server run the following in the R console:
file.edit("/etc/rstudio/rsession.conf")
This should open a file. Append the following line and save the file:
session-default-working-dir=/myfiles/
Just to be clear, the path /etc/rstudio/rsession.conf
refers to the system that is inside the container and not your local machine (remember that containers have a whole operating system inside them, typically debian or ubuntu depending on the rocker image).
Then restart the RStudio session: in the top menu, click on Session and then on Quit Session... (do not try using the rstudio API package for that, somehow it does not do the job).
Now you should see your files directly in the Files pane of RStudio:
To manage my containers, I use the following commands frequently:
podman ps -a ## list all containers and show their ID and NAME
podman start [CONTAINER ID or NAME] ## start container
podman stop [CONTAINER ID or NAME] ## stop container (RStudio session will still be active with all its objects, when starting the container again)
podman rm [CONTAINER ID or NAME] ## remove container
podman stop --all ## stop all containers
podman rm --all ## remove all containers
The first command tells you what is currently active or not on your system:
Here I only have the container I just created above.
Since the status is UP, if I don't want to use it and free the memory (RAM) associated with it, I can stop it (after closing the RStudio tab) using:
podman stop Rgeo
Now you can see that the output of podman ps -a
has changed:
The container still exits but the status is "Exited".
If I want to use RStudio again, via this container, I would then do:
podman start Rgeo
podman rm
is useful to reduce the clutter by deleting the containers you no longer need.
The only thing to know is that to remove a container it must first be stopped.
Removing a container will not delete the image it has been derived from.
It will delete the files and data you have created with the containers if you did not save them in the folder you mounted with -v
.
In other words, if the data and files you created are accessible outside the container, you should be fine.
Those will stay.
You can install programs within the container as you would do within your linux system. For this you can start bash within the container as follows:
podman start Rgeo ## the container must be running
podman exec -ti Rgeo bash
This is what it looks like to install htop
:
Note that before you can install anything, you need to call
apt update
(or equivalent) once as I did.
I am using apt
because the image I am using here (rocker/geospatial
) is built on top of Ubuntu.
As you can see there is no need for sudo
, this is because you are logged in as root within your container.
That sounds scary but it is not because the worst thing that can happen is that you mess up your container (and not your computer).
If you want to circumvent podman exec
, you can also directly use the terminal windows inside RStudio server.
Or, you can call system commands from within the R console itself (e.g. system("apt install -y htop")
).
I mostly encounter the need to install stuff outside R when installing R packages that require system libraries, and in those cases as I am already in RStudio server, I use the embedded terminal.
You can directly install R packages as you would do it from your usual RStudio desktop, using menu or install.packages()
.
You can solve troubles as you would do in your normal linux.
For example, let us trying running rjags
:
So clearly, we are missing a system library...
Since the linux distro in the container may not be one you are used to, we can use
pak
for the rescue:
Then simply copy and paste the "Install scripts" in the RStudio terminal:
And go back to your R console:
It worked. Easy.
If you don't need RStudio, to create your final container you can do this instead:
podman run -tid --name justR -w /myfiles -v $HOME/somefolder/:/myfiles/:z rocker/r-ver
and everytime you want to use it do that:
podman exec -ti justR R
This is something I will cover in the future, probably in another gist.