coltonbh/docker-swarm-gpu.md

Last active September 30, 2025 14:44

Star (34) You must be signed in to star a gist
Fork (4) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/coltonbh/374c415517dbeb4a6aa92f462b9eb287.js"></script>
Save coltonbh/374c415517dbeb4a6aa92f462b9eb287 to your computer and use it in GitHub Desktop.

Download ZIP

Docker Swarm GPU Support

Raw

docker-swarm-gpu.md

GPU Support For Docker Swarm

Docker compose has nice support for GPUs, K8s has moved their cluster-wide GPU scheduler from experimental to stable status. Docker swarm has yet to support the device option used in docker compose so the mechanisms for supporting GPUs on swarm are a bit more open-ended.

Basic documentation

NVIDIA container runtime for docker. The runtime is no longer required to run GPU support with the docker cli or compose; however, it appears necessary so that one can set Default Runtime: nvidia for swarm mode.
docker compose GPU support
Good GitHub Gist Reference for an overview on Swarm with GPUs. It is a bit dated, but has good links and conversation.
Miscellaneous Options for docker configuration. Go down to "Node Generic Resources" for an explanation of how this is intended to support NVIDIA GPUs. The main idea is one has to change the /etc/docker/daemon.json file to advertise the node-generic-resources (NVIDIA GPUs) on each node. GPUs have to be added by hand the the daemon.json file, swarm does not detect and advertise them automatically.
How to create a service with generic resources. This shows how to create stacks/services requesting the generic resources advertised in the /etc/docker/daemon.json file.
Quick blog overview confirming these basic approaches.
Really good overview on Generic Resources in swarm.

Solutions to Enable Swarm GPU Support

Both solutions need to follow these steps first:

Install nvidia-container-runtime. Follow the steps here. Takes <5 minutes.
Update /etc/docker/daemon.json to use nvidia as the default runtime.

{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Restart the docker daemon on each node sudo service docker restart. Confirm the default runtime is nvidia with docker info.

Solution 1

You're done. When you deploy a service to a node, it will by default see all the GPUs on that node. Generally this means you are deploying global services (one per node) or assigning services to specific nodes so that there aren't accidental collisions between services accessing the same GPU resources simultaneously.

If you want to expose only certain GPUs to a given service (e.g., multiple services on one node with each having access only to its own GPU(s)) use the NVIDIA_VISIBLE_DEVICES environment variable for each service. To do this dynamically so that each service gets access to its own GPU using docker service templates looks like this:


services:
  my-service-node-001:
    image: blah blah
    environment:
      - NVIDIA_VISIBLE_DEVICES={{.Task.Slot}}
      deploy:
        replicas: 15
        placement:
          constraints:
            - node.hostname==some-node-001

Because {{.Task.Slot}} starts counting at 1, you may want to include a global service in the template to make use of GPU 0.

Solution 2

Advertise NVIDA GPUs using Node Generic Resources. This is the most general purpose approach and will enable services to simply declare the required GPU resources and swarm will schedule them accordingly.

The /etc/docker/daemon.json file on each node needs to be updated to advertise its GPU resources. You can find the UUID for each GPU by running nvidia-smi -a | grep UUID. You only need to include GPU plus the first 8 digits of the UUID, it seems, i.e., GPU-ba74caf3 for the UUID. The following needs to be added to the daemon.json file already declaring nvidia as the default runtime.

{
  "node-generic-resources": [
    "NVIDIA-GPU=GPU-ba74caf3",
    "NVIDIA-GPU=GPU-dl23cdb4"
  ]
}

Enable GPU resource advertising by uncommenting the swarm-resource = "DOCKER_RESOURCE_GPU" line (line 2) in /etc/nvidia-container-runtime/config.toml.

The docker daemon must be restarted after updating these files by running sudo service docker restart on each node. Services can now request GPUs using the generic-resource flag.

docker service create \
    --name cuda \
    --generic-resource "NVIDIA-GPU=2" \
    --generic-resource "SSD=1" \
    nvidia/cuda

The names for node-generic-resources in /etc/docker/daemon.json could be anything you want. So if you want to declare NVIDIA-H100 and NVIDIA-4090 you could and then request specific GPU types with --generic-resource "NVIDIA-H100".

To request GPU resources in a docker-compose.yaml file for the stack use the following under the deploy key.

services:
  my-gpu-service:
    ...
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
              kind: "NVIDIA-GPU"
              value: 2

kokei commented Mar 15, 2023

Hello sir:

My system had 2 GPUs. I add node-generic-resources" to /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia",
"node-generic-resources": [
"NVIDIA-GPU=GPU-7af7a99b",
"NVIDIA-GPU=GPU-e934c858"
]
}
also uncomment swarm-resource = "DOCKER_RESOURCE_GPU"

When I run docker service create --replicas 1 --name swarm-test --generic-resource "NVIDIA-GPU=1" swarmimage nvidia-smi
=>The container always use all GPU devices.
But if I run docker run -it --gpus 1 swarmimage nvidia-smi
=>The container only use GPU #0 device.

Would you please help this case?

SuFIND commented Mar 20, 2023

I try to manger my gpu stack in portainer (stack method). I got the 2 ERROR.

discrete_resource_spec not mapping. It's a yaml error. Using the follow config:

generic_resources:
  - discrete_resource_spec:
        kind: "NVIDIA-GPU"
        value: 2

replace the config:

generic_resources:
  - discrete_resource_spec:
    kind: "NVIDIA-GPU"
    value: 2

It's work for me!
2. no suitable node (insufficient resources on 1 node)
I have no idea to solve it. I have make sure GPU-UUID is corrent. The /etc/docker/daemon.json and /etc/nvidia-container-runtime/config.toml edit has done.

kokei commented Mar 22, 2023 •

edited

Loading

I try to manger my gpu stack in portainer (stack method). I got the 2 ERROR.

discrete_resource_spec not mapping. It's a yaml error. Using the follow config:
generic_resources:
  - discrete_resource_spec:
        kind: "NVIDIA-GPU"
        value: 2
replace the config:
generic_resources:
  - discrete_resource_spec:
    kind: "NVIDIA-GPU"
    value: 2
It's work for me! 2. no suitable node (insufficient resources on 1 node) I have no idea to solve it. I have make sure GPU-UUID is corrent. The /etc/docker/daemon.json and /etc/nvidia-container-runtime/config.toml edit has done.

Would you please help to check environment in container:
NVIDIA_VISIBLE_DEVICES=?

I think nvidia-container-runtime do not refer generic_resources config.
Always refer NVIDIA_VISIBLE_DEVICES value.

Author

coltonbh commented Mar 22, 2023

Hi @kokei. nvidia-container-runtime will expose all GPUs on a node by default. Uncommenting swarm-resource = "DOCKER_RESOURCE_GPU in /etc/nvidia-container-runtime/config.toml should tell it to now look for the docker environment variable to know what resources to use. Did you restart docker sudo service docker restart after uncommenting the swarm-resource = "DOCKER_RESOURCE_GPU line so that the docker daemon picks up this new config?

SuFIND commented Mar 23, 2023

你好@kokei. nvidia-container-runtime默认情况下将公开节点上的所有 GPU。取消注释swarm-resource = "DOCKER_RESOURCE_GPU应该/etc/nvidia-container-runtime/config.toml告诉它现在查找 docker 环境变量以了解要使用的资源。您是否sudo service docker restart在取消swarm-resource = "DOCKER_RESOURCE_GPU注释该行后重新启动 docker，以便 docker 守护进程获取这个新配置？

Yes, I have tried restarting the docker service and reboot my machine, but it still doesn't work. My docker version is 23.0.1.

SuFIND commented Mar 23, 2023 •

edited

Loading

I try to manger my gpu stack in portainer (stack method). I got the 2 ERROR.

discrete_resource_spec not mapping. It's a yaml error. Using the follow config:
generic_resources:
  - discrete_resource_spec:
        kind: "NVIDIA-GPU"
        value: 2
replace the config:
generic_resources:
  - discrete_resource_spec:
    kind: "NVIDIA-GPU"
    value: 2
It's work for me! 2. no suitable node (insufficient resources on 1 node) I have no idea to solve it. I have make sure GPU-UUID is corrent. The /etc/docker/daemon.json and /etc/nvidia-container-runtime/config.toml edit has done.
Would you please help to check environment in container: NVIDIA_VISIBLE_DEVICES=?

I think nvidia-container-runtime do not refer generic_resources config. Always refer NVIDIA_VISIBLE_DEVICES value.

The containers can use the gpu when I try to use docker compose to deploy. NVIDIA_VISIBLE_DEVICES like:

No any container can be created. Because the swarm not found a suitable node can be deploy when I try to use docker stack. So I can't inspect no created container.

kokei commented Mar 23, 2023 •

edited

Loading

Hi @kokei. nvidia-container-runtime will expose all GPUs on a node by default. Uncommenting swarm-resource = "DOCKER_RESOURCE_GPU in /etc/nvidia-container-runtime/config.toml should tell it to now look for the docker environment variable to know what resources to use. Did you restart docker sudo service docker restart after uncommenting the swarm-resource = "DOCKER_RESOURCE_GPU line so that the docker daemon picks up this new config?

@coltonbh
Yes, I had run "systemctl restart docker.service" to reload docker.
And use "docker node inspect node", I can see GPUID at Resources.GenericResources.NamedResourceSpec.Kind: NVIDIA-GPU, Value: GPU-e934c858
Run docker with --generic-resourece "NVIDIA-GPU=1", the GPU ID can pass to container environment.
But when I run nvidia-smi is failed. (Due to without NVIDIA_VISIBLE_DEVICES)
=>If I set NVIDIA_VISIBLE_DEVICES, run nvidia-smi ok, no matter with --generic-resourece "NVIDIA-GPU=1"

docker version: 23.0.1
nvidia-docker version: 2.12.0
Which docker and nvidia-docker version you uesd?

kokei commented Mar 23, 2023

I try to manger my gpu stack in portainer (stack method). I got the 2 ERROR.

discrete_resource_spec not mapping. It's a yaml error. Using the follow config:
generic_resources:
  - discrete_resource_spec:
        kind: "NVIDIA-GPU"
        value: 2
replace the config:
generic_resources:
  - discrete_resource_spec:
    kind: "NVIDIA-GPU"
    value: 2
It's work for me! 2. no suitable node (insufficient resources on 1 node) I have no idea to solve it. I have make sure GPU-UUID is corrent. The /etc/docker/daemon.json and /etc/nvidia-container-runtime/config.toml edit has done.
Would you please help to check environment in container: NVIDIA_VISIBLE_DEVICES=?
I think nvidia-container-runtime do not refer generic_resources config. Always refer NVIDIA_VISIBLE_DEVICES value.
The containers can use the gpu when I try to use docker compose to deploy. NVIDIA_VISIBLE_DEVICES like:

No any container can be created. Because the swarm not found a suitable node can be deploy when I try to use docker stack. So I can't inspect no created container.

If you set NVIDIA_VISIBLE_DEVICES, nvidia-docker will always refer it, no matter --generic-resourece.

fangq commented Mar 25, 2023 •

edited

Loading

thanks for this documentation. We have a docker-swarm set up running GPU based simulations from our website (http://mcx.space/cloud/).

over the past year running this service, I found some robustness issues related to how docker swarm matches GPU resources. Below is a question I posted on stackoverflow - I wonder if anyone has similar findings, and knows if there is a workaround?

How to make docker swarm more robust by allowing random resource-matching?

I run a docker-swarm made of 4-5 Linux hosts, each providing a list of GPUs. I allow users to submit jobs to this swarm, upon receiving a request, the following command will create a new job

docker service create --user ... \
   --restart-condition=none \
   --mount ... --name '...' \
   --generic-resource "NVIDIA_GPU=1" "dockerimage" "commands"

the hope is that docker swarm will automatically find an idle GPU and launch the job to it.

it works mostly fine, but I found some robustness issue. Whenever one of the nodes has an issue - such as the GPU driver was accidentally removed or updated, the node still shows as "active" in the docker node ls list and accept jobs, but any job thrown to it will fail, but this does not stop docker swarm from keeping giving jobs to it. It also seems that docker swarm match GPUs in a sequential order from my docker node ls output. So, it kept get stuck on a node with GPU failure.

I would like to know if there is a way to make this more robust - for example, can I ask docker swarm to randomly pick a GPU instead of sequentially match the resources?

any other suggestions will be appreciated!

thanks

Author

coltonbh commented Mar 25, 2023 •

edited

Loading

@fangq I don't have a complete answer for you but I think part of the answer is that docker knows nothing about the health of your device--in this case a GPU. It knows it is registered with the system due to the entries in /etc/docker/daemon.json and from these entries it assumes the device is healthy and available. It knows nothing else about removed drivers or failing devices--similar to how it would know nothing about a CPU that is having trouble. Docker simply assumes that declared hardware resources--like CPUs, memory, or a GPU--are functioning correctly. Perhaps someone else could chime in with a possible solution?

I think your solution most likely will have to live at the application level--maybe perform some health check with application code to make sure that the correct resources are available (an nvidia-smi call?) and then have the job fail and reschedule if the node is unhealthy.

fangq commented Mar 25, 2023

@coltonbh, thanks for the comment. I did not expect docker to know whether a GPU is functioning or not, or intend to tell docker about this information. Instead, I just want docker randomly assign available GPUs when looking for available resources. This way, even a GPU is failing, when a user try again (or setting the --restart-condition to do automatic relaunch), at least the following attempt will have a chance to find a GPU that is working.

Author

coltonbh commented Mar 25, 2023

Ah got it. Not sure what's available for that. Do post here if you figure something out :)

NiklasWilson commented Jan 31, 2024 •

edited

Loading

When I run docker service create --replicas 1 --name swarm-test --generic-resource "NVIDIA-GPU=1" swarmimage nvidia-smi =>The container always use all GPU devices. But if I run docker run -it --gpus 1 swarmimage nvidia-smi =>The container only use GPU #0 device.

@kokei did you find a solution for the docker images only using the first GPU on each machine?

I am having the exact same issue. I can see that each docker image in the swarm is told use a specific GPU while also exposing all the GPUs at a same which is resulting in each docker image trying to use only the first GPU, instead of the GPU id it was told to use.

Author

coltonbh commented Feb 1, 2024

@NiklasWilson I believe this is the expected behavior! Docker swarm does not have support for individual GPU allocation for stacks/services. So you either need each service to pick up an environment variable like NVIDIA_VISIBLE_DEVICE=X using the {{.Task.Slot}} environment variables so that it uses only one device, or you'll need to explicitly declare the GPUs in your device configuration in the /etc/docker/daemon.json. When running docker not in swarm mode (i.e., docker run --it --gpus 1 ... then docker DOES know how to grab only certain GPUs for the container. The key distinction here is running the docker daemon in single node mode vs. swarm mode.

NiklasWilson commented Feb 1, 2024

@NiklasWilson I believe this is the expected behavior! Docker swarm does not have support for individual GPU allocation for stacks/services. So you either need each service to pick up an environment variable like NVIDIA_VISIBLE_DEVICE=X using the {{.Task.Slot}} environment variables so that it uses only one device, or you'll need to explicitly declare the GPUs in your device configuration in the /etc/docker/daemon.json. When running docker not in swarm mode (i.e., docker run --it --gpus 1 ... then docker DOES know how to grab only certain GPUs for the container. The key distinction here is running the docker daemon in single node mode vs. swarm mode.

@coltonbh If this is the case what exactly is the purpose of setting up node-generic-resources with individual gpu uuids if the swarm can't use them to assign an image to a specific gpu?

NiklasWilson commented Feb 1, 2024 •

edited

Loading

After 3 days I have finally found the solution.

Use complete UUIDs
In /etc/nvidia-container-runtime/config.toml change "DOCKER_RESOURCE_GPU" to "DOCKER_RESOURCE_NVIDIA-GPU"

After making those changes the docker swarm will be able to select the correct GPU on machines with multiple GPUs.

Author

coltonbh commented Feb 1, 2024 •

edited

Loading

@NiklasWilson the purpose of setting up the node-generic-resources is so that swarm mode CAN now assign the specific GPUs to services. But swarm mode cannot do it out-of-the-box without the manual configuration. docker run CAN assign only certain GPUs to a container out-of-the-box without manual configuration. Sounds like you got it working! Congrats :)

pangyuteng commented Jun 16, 2024 •

edited

Loading

@NiklasWilson your patches 1 and 2 did the trick! services deployed via swarm now are using seperate gpus. woohoo!
@coltonbh Thanks for sharing this detailed gist!!

to those wanting to quickly test your swarm, providing below handy command to spin up a service (coltonbh already provided the docker service create command in the main post, i just added --entrypoint to let it hang:

docker service create --name hola-gpu --generic-resource "NVIDIA-GPU=1"  --entrypoint "tail -f /dev/null" nvidia/cuda:12.5.0-devel-ubuntu22.04
# docker service rm hola-gpu

also if the service in swarm loose gpu access after a few hours, try setting no-cgroups = false in /etc/nvidia-container-runtime/config.toml (more info here: https://stackoverflow.com/a/78137688/868736 )

kidroca commented Sep 10, 2024 •

edited

Loading

Enable GPU resource advertising by uncommenting the swarm-resource = "DOCKER_RESOURCE_GPU" line (line 2) in /etc/nvidia-container-runtime/config.toml.

You have to make sure that DOCKER_RESOURCE_{NAME} is the same as what you declare in /etc/docker/daemon.json

{
  "node-generic-resources": [
    "NVIDIA-GPU=GPU-ba74caf3",
    "NVIDIA-GPU=GPU-dl23cdb4"
  ]
}

If you're using the key NVIDIA-GPU then in /etc/nvidia-container-runtime/config.toml you have to have

swarm-resource = "DOCKER_RESOURCE_NVIDIA-GPU"

And not

swarm-resource = "DOCKER_RESOURCE_GPU"

Also at some point I switched to using full GUIDs - I'm not sure whether shorter versions from the example still work

54ckj commented Feb 7, 2025 •

edited

Loading

I added an 8-bit GPU UUID by modifying /etc/docker/daemon.json. Uncomment swarm-resource = "DOCKER_RESOURCE_GPU" in /etc/nvidia-container-runtime/config.toml or add swarm-resource = "DOCKER_RESOURCE_NVIDIA-GPU". After restarting daemon and docker, I checked with the command: docker node inspect self --pretty. Resources did not output the GPU resources. What is the problem?

54ckj commented Feb 7, 2025

I added an 8-bit GPU UUID by modifying /etc/docker/daemon.json. Uncomment swarm-resource = "DOCKER_RESOURCE_GPU" in /etc/nvidia-container-runtime/config.toml or add swarm-resource = "DOCKER_RESOURCE_NVIDIA-GPU". After restarting daemon and docker, I checked with the command: docker node inspect self --pretty. Resources did not output the GPU resources. What is the problem?
This is a screenshot of the relevant configuration

alexdelorenzo commented Mar 16, 2025

Anyone get this working with Intel and AMD iGPU/dGPUs?

erfan-zekri commented May 17, 2025

@coltonbh, thanks for the comment. I did not expect docker to know whether a GPU is functioning or not, or intend to tell docker about this information. Instead, I just want docker randomly assign available GPUs when looking for available resources. This way, even a GPU is failing, when a user try again (or setting the --restart-condition to do automatic relaunch), at least the following attempt will have a chance to find a GPU that is working.

I have the same issue. Did you ever find any solution for this?

coltonbh/docker-swarm-gpu.md

GPU Support For Docker Swarm

Basic documentation

Solutions to Enable Swarm GPU Support

Solution 1

Solution 2

kokei commented Mar 15, 2023

Uh oh!

SuFIND commented Mar 20, 2023

Uh oh!

kokei commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coltonbh commented Mar 22, 2023

Uh oh!

SuFIND commented Mar 23, 2023

Uh oh!

SuFIND commented Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kokei commented Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kokei commented Mar 23, 2023

Uh oh!

fangq commented Mar 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coltonbh commented Mar 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fangq commented Mar 25, 2023

Uh oh!

coltonbh commented Mar 25, 2023

Uh oh!

NiklasWilson commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coltonbh commented Feb 1, 2024

Uh oh!

NiklasWilson commented Feb 1, 2024

Uh oh!

NiklasWilson commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coltonbh commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pangyuteng commented Jun 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kidroca commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

54ckj commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

54ckj commented Feb 7, 2025

Uh oh!

alexdelorenzo commented Mar 16, 2025

Uh oh!

erfan-zekri commented May 17, 2025

Uh oh!

kokei commented Mar 22, 2023 •

edited

Loading

SuFIND commented Mar 23, 2023 •

edited

Loading

kokei commented Mar 23, 2023 •

edited

Loading

fangq commented Mar 25, 2023 •

edited

Loading

coltonbh commented Mar 25, 2023 •

edited

Loading

NiklasWilson commented Jan 31, 2024 •

edited

Loading

NiklasWilson commented Feb 1, 2024 •

edited

Loading

coltonbh commented Feb 1, 2024 •

edited

Loading

pangyuteng commented Jun 16, 2024 •

edited

Loading

kidroca commented Sep 10, 2024 •

edited

Loading

54ckj commented Feb 7, 2025 •

edited

Loading