Last active
June 7, 2025 16:24
-
-
Save aatizghimire/30f333f5ef29ce13e6c9d45dd0afbfbd to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# π SLURM on Ubuntu 24.04 with GPU (GRES) Support | |
> βοΈ Author: Aatiz Ghimire | |
> π Updated: June 2025 | |
> π§ Description: Minimal working example to install and configure SLURM with GPU support on a single-node Ubuntu Server 24.04 setup. | |
## π§Ή Step 0 β Clean Previous SLURM/MUNGE Setup | |
sudo systemctl stop slurmctld slurmd munge | |
sudo apt purge -y 'slurm*' 'munge*' libmunge-dev libmunge2 | |
sudo rm -rf /etc/slurm /var/spool/slurm* /var/log/slurm* /etc/munge /var/lib/munge | |
sudo userdel -r slurm | |
sudo userdel -r munge | |
## π¦ Step 1 β Install SLURM & MUNGE | |
sudo apt update | |
sudo apt install -y slurm-wlm slurm-wlm-basic-plugins slurmctld slurmd munge libmunge-dev libmunge2 | |
## π Step 2 β Setup MUNGE | |
sudo /usr/sbin/create-munge-key -f | |
sudo chown -R munge:munge /etc/munge /var/lib/munge /var/log/munge | |
sudo chmod 0700 /etc/munge /var/lib/munge /var/log/munge | |
sudo systemctl enable --now munge | |
## π Step 3 β Create SLURM Directories | |
sudo mkdir -p /etc/slurm /var/spool/slurm/ctld /var/spool/slurm/d /var/log/slurm | |
sudo chown -R slurm:slurm /etc/slurm /var/spool/slurm /var/log/slurm | |
## βοΈ Step 4 β Configuration | |
Edit the file`/etc/slurm/slurm.conf` | |
Add: | |
ClusterName=cluster | |
ControlMachine=localhost | |
SlurmUser=slurm | |
SlurmctldPort=6817 | |
SlurmdPort=6818 | |
AuthType=auth/munge | |
CryptoType=crypto/munge | |
StateSaveLocation=/var/spool/slurm/ctld | |
SlurmdSpoolDir=/var/spool/slurm/d | |
SlurmctldPidFile=/run/slurmctld.pid | |
SlurmdPidFile=/run/slurmd.pid | |
SlurmctldLogFile=/var/log/slurm/slurmctld.log | |
SlurmdLogFile=/var/log/slurm/slurmd.log | |
# Disable SLURMDBD-based accounting | |
AccountingStorageType=accounting_storage/none | |
#AccountingStorageEnforce=none | |
# Optional: Disable job accounting plugin too | |
JobAcctGatherType=jobacct_gather/none | |
ReturnToService=1 | |
ProctrackType=proctrack/pgid | |
TaskPlugin=task/affinity | |
SchedulerType=sched/backfill | |
GresTypes=gpu | |
NodeName=localhost CPUs=64 RealMemory=500000 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Gres=gpu:1 State=UNKNOWN | |
PartitionName=debug Nodes=localhost Default=YES MaxTime=INFINITE State=UP | |
--- | |
> Adjust CPU/memory specs based on `slurmd -C` output. | |
### Edit the file `/etc/slurm/gres.conf` | |
Add: | |
Name=gpu File=/dev/nvidia0 | |
--- | |
## βΆοΈ Step 5 β Enable SLURM Services | |
sudo systemctl enable --now slurmctld slurmd | |
## β Step 6 β Validate | |
scontrol reconfigure | |
scontrol show node localhost | |
srun --gres=gpu:1 nvidia-smi | |
## π§ Notes | |
- Make sure `nvidia-smi` is installed and functional. | |
- Ensure `/dev/nvidia0` exists and driver is correctly loaded. | |
- Use `nvidia-smi -L` to list available GPUs. | |
Feel free to fork this Gist and adapt for multi-node, MPI, or GPU clusters. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment