Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save aatizghimire/30f333f5ef29ce13e6c9d45dd0afbfbd to your computer and use it in GitHub Desktop.
Save aatizghimire/30f333f5ef29ce13e6c9d45dd0afbfbd to your computer and use it in GitHub Desktop.
# πŸš€ SLURM on Ubuntu 24.04 with GPU (GRES) Support
> ✍️ Author: Aatiz Ghimire
> πŸ“… Updated: June 2025
> 🧠 Description: Minimal working example to install and configure SLURM with GPU support on a single-node Ubuntu Server 24.04 setup.
## 🧹 Step 0 – Clean Previous SLURM/MUNGE Setup
sudo systemctl stop slurmctld slurmd munge
sudo apt purge -y 'slurm*' 'munge*' libmunge-dev libmunge2
sudo rm -rf /etc/slurm /var/spool/slurm* /var/log/slurm* /etc/munge /var/lib/munge
sudo userdel -r slurm
sudo userdel -r munge
## πŸ“¦ Step 1 – Install SLURM & MUNGE
sudo apt update
sudo apt install -y slurm-wlm slurm-wlm-basic-plugins slurmctld slurmd munge libmunge-dev libmunge2
## πŸ” Step 2 – Setup MUNGE
sudo /usr/sbin/create-munge-key -f
sudo chown -R munge:munge /etc/munge /var/lib/munge /var/log/munge
sudo chmod 0700 /etc/munge /var/lib/munge /var/log/munge
sudo systemctl enable --now munge
## πŸ“ Step 3 – Create SLURM Directories
sudo mkdir -p /etc/slurm /var/spool/slurm/ctld /var/spool/slurm/d /var/log/slurm
sudo chown -R slurm:slurm /etc/slurm /var/spool/slurm /var/log/slurm
## βš™οΈ Step 4 – Configuration
Edit the file`/etc/slurm/slurm.conf`
Add:
ClusterName=cluster
ControlMachine=localhost
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
CryptoType=crypto/munge
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
# Disable SLURMDBD-based accounting
AccountingStorageType=accounting_storage/none
#AccountingStorageEnforce=none
# Optional: Disable job accounting plugin too
JobAcctGatherType=jobacct_gather/none
ReturnToService=1
ProctrackType=proctrack/pgid
TaskPlugin=task/affinity
SchedulerType=sched/backfill
GresTypes=gpu
NodeName=localhost CPUs=64 RealMemory=500000 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Gres=gpu:1 State=UNKNOWN
PartitionName=debug Nodes=localhost Default=YES MaxTime=INFINITE State=UP
---
> Adjust CPU/memory specs based on `slurmd -C` output.
### Edit the file `/etc/slurm/gres.conf`
Add:
Name=gpu File=/dev/nvidia0
---
## ▢️ Step 5 – Enable SLURM Services
sudo systemctl enable --now slurmctld slurmd
## βœ… Step 6 – Validate
scontrol reconfigure
scontrol show node localhost
srun --gres=gpu:1 nvidia-smi
## 🧠 Notes
- Make sure `nvidia-smi` is installed and functional.
- Ensure `/dev/nvidia0` exists and driver is correctly loaded.
- Use `nvidia-smi -L` to list available GPUs.
Feel free to fork this Gist and adapt for multi-node, MPI, or GPU clusters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment