Skip to content

Instantly share code, notes, and snippets.

@1eedaegon
Last active June 18, 2025 00:38
Show Gist options
  • Save 1eedaegon/9ee354df5a0e9c82cd5f823832411cb2 to your computer and use it in GitHub Desktop.
Save 1eedaegon/9ee354df5a0e9c82cd5f823832411cb2 to your computer and use it in GitHub Desktop.
Problem: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Situation

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Even after properly installing the NVIDIA driver, the following error occurs: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

1. On Mainboard

Set the following three items in the BIOS.

If these settings are not available in the BIOS, download and update it via USB.

  • Secure boot: off
  • Above-4G-decoding: on or enable
  • Resizable-BAR: on

I updated the BIOS of Asus WS X299 SAGE from 2024.08 to 2025.03, after which the PCIe settings appeared.

2. On Linux Bootloader

Add the pci=realloc=off option to the end of GRUB_CMDLINE_LINUX_DEFAULT in grub. This is necessary to use the Resizable-BAR setting configured above.

  • sudo vi /etc/default/grub

  • e.g) Modify to: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc=off"

  • sudo update-grub <- This is the most important step.

  • sudo reboot

If nvidia-smi shows no device found after reboot, it means success.

3. Nvidia driver

Purge all NVIDIA drivers and reboot.

Then reinstall, and for small-scale machines like workstations, install the nvidia-[version]-open driver.

Note: Do NOT install using ubuntu-drivers autoinstall.

  • sudo apt-get purge '*nvidia*' && sudo apt autoremove

  • sudo ubuntu-drivers list

> sudo ubuntu-drivers list

nvidia-driver-570-server, (kernel modules provided by linux-modules-nvidia-570-server-generic-hwe-22.04)
nvidia-driver-570, (kernel modules provided by linux-modules-nvidia-570-generic-hwe-22.04)
nvidia-driver-570-open, (kernel modules provided by linux-modules-nvidia-570-open-generic-hwe-22.04)
nvidia-driver-570-server-open, (kernel modules provided by linux-modules-nvidia-570-server-open-generic-hwe-22.04)
  • sudo apt install nvidia-driver-570-open
  • sudo reboot or sudo shutdown -r now

Then:

(Since I launched the container immediately after checking, it's already using VRAM)

 $ nvidia-smi
Fri Jun 13 11:30:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.153.02             Driver Version: 570.153.02     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        Off |   00000000:68:00.0 Off |                  N/A |
|  0%   43C    P8             11W /  600W |    9087MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1071      G   /usr/lib/xorg/Xorg                       60MiB |
|    0   N/A  N/A            1136      G   /usr/bin/gnome-shell                     11MiB |
|    0   N/A  N/A          340887      C   python3                                7352MiB |
|    0   N/A  N/A          383678      C   python3                                1620MiB |
+-----------------------------------------------------------------------------------------+

Optional. Nvidia container toolkit

Since I use Docker containers that require GPU access, installed the nvidia container runtime toolkit that can recognize the GPU.

  • sudo apt install nvidia-container-toolkit -y
  • After install run which nvidia-container-runtime
$ which nvidia-container-runtime
/usr/bin/nvidia-container-runtime
  • Fix docker daemon configuration like:
$ vi /etc/docker/daemon.json

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    # If this is not configured, the CUDA driver will not be recognized inside the container after rebooting.
    "exec-opts": ["native.cgroupdriver=cgroupfs"] 
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment