NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Even after properly installing the NVIDIA driver, the following error occurs:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Set the following three items in the BIOS.
If these settings are not available in the BIOS, download and update it via USB.
- Secure boot:
off
- Above-4G-decoding:
on
orenable
- Resizable-BAR:
on
I updated the BIOS of Asus WS X299 SAGE from 2024.08 to 2025.03, after which the PCIe settings appeared.
Add the pci=realloc=off
option to the end of GRUB_CMDLINE_LINUX_DEFAULT
in grub.
This is necessary to use the Resizable-BAR setting configured above.
-
sudo vi /etc/default/grub
-
e.g) Modify to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc=off"
-
sudo update-grub
<- This is the most important step. -
sudo reboot
If nvidia-smi shows no device found
after reboot, it means success.
Purge all NVIDIA drivers and reboot.
Then reinstall, and for small-scale machines like workstations, install the nvidia-[version]-open
driver.
Note: Do NOT install using ubuntu-drivers autoinstall
.
-
sudo apt-get purge '*nvidia*' && sudo apt autoremove
-
sudo ubuntu-drivers list
> sudo ubuntu-drivers list
nvidia-driver-570-server, (kernel modules provided by linux-modules-nvidia-570-server-generic-hwe-22.04)
nvidia-driver-570, (kernel modules provided by linux-modules-nvidia-570-generic-hwe-22.04)
nvidia-driver-570-open, (kernel modules provided by linux-modules-nvidia-570-open-generic-hwe-22.04)
nvidia-driver-570-server-open, (kernel modules provided by linux-modules-nvidia-570-server-open-generic-hwe-22.04)
sudo apt install nvidia-driver-570-open
sudo reboot
orsudo shutdown -r now
Then:
(Since I launched the container immediately after checking, it's already using VRAM)
$ nvidia-smi
Fri Jun 13 11:30:14 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.153.02 Driver Version: 570.153.02 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 Off | 00000000:68:00.0 Off | N/A |
| 0% 43C P8 11W / 600W | 9087MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1071 G /usr/lib/xorg/Xorg 60MiB |
| 0 N/A N/A 1136 G /usr/bin/gnome-shell 11MiB |
| 0 N/A N/A 340887 C python3 7352MiB |
| 0 N/A N/A 383678 C python3 1620MiB |
+-----------------------------------------------------------------------------------------+
Since I use Docker containers that require GPU access, installed the nvidia container runtime toolkit that can recognize the GPU.
sudo apt install nvidia-container-toolkit -y
- After install run
which nvidia-container-runtime
$ which nvidia-container-runtime
/usr/bin/nvidia-container-runtime
- Fix docker daemon configuration like:
$ vi /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
},
# If this is not configured, the CUDA driver will not be recognized inside the container after rebooting.
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}