cf. CUDA C++ Programming Guide / §16. Compute Capabilities / §16.2. Features and Technical Specifications
Processor | deviceQuery output |
---|---|
NVIDIA T4 | t4.txt |
NVIDIA H100 | h100.txt |
NVIDIA H200 | h200.txt |
cf. CUDA C++ Programming Guide / §16. Compute Capabilities / §16.2. Features and Technical Specifications
Processor | deviceQuery output |
---|---|
NVIDIA T4 | t4.txt |
NVIDIA H100 | h100.txt |
NVIDIA H200 | h200.txt |
./deviceQuery Starting... | |
CUDA Device Query (Runtime API) version (CUDART static linking) | |
Detected 1 CUDA Capable device(s) | |
Device 0: "NVIDIA H100 PCIe" | |
CUDA Driver Version / Runtime Version 12.8 / 12.8 | |
CUDA Capability Major/Minor version number: 9.0 | |
Total amount of global memory: 81090 MBytes (85029158912 bytes) | |
(114) Multiprocessors, (128) CUDA Cores/MP: 14592 CUDA Cores | |
GPU Max Clock rate: 1755 MHz (1.75 GHz) | |
Memory Clock rate: 1593 Mhz | |
Memory Bus Width: 5120-bit | |
L2 Cache Size: 52428800 bytes | |
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) | |
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers | |
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers | |
Total amount of constant memory: 65536 bytes | |
Total amount of shared memory per block: 49152 bytes | |
Total shared memory per multiprocessor: 233472 bytes | |
Total number of registers available per block: 65536 | |
Warp size: 32 | |
Maximum number of threads per multiprocessor: 2048 | |
Maximum number of threads per block: 1024 | |
Max dimension size of a thread block (x,y,z): (1024, 1024, 64) | |
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) | |
Maximum memory pitch: 2147483647 bytes | |
Texture alignment: 512 bytes | |
Concurrent copy and kernel execution: Yes with 3 copy engine(s) | |
Run time limit on kernels: No | |
Integrated GPU sharing Host Memory: No | |
Support host page-locked memory mapping: Yes | |
Alignment requirement for Surfaces: Yes | |
Device has ECC support: Enabled | |
Device supports Unified Addressing (UVA): Yes | |
Device supports Managed Memory: Yes | |
Device supports Compute Preemption: Yes | |
Supports Cooperative Kernel Launch: Yes | |
Supports MultiDevice Co-op Kernel Launch: Yes | |
Device PCI Domain ID / Bus ID / location ID: 0 / 22 / 0 | |
Compute Mode: | |
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > | |
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.8, CUDA Runtime Version = 12.8, NumDevs = 1 | |
Result = PASS |
./deviceQuery Starting... | |
CUDA Device Query (Runtime API) version (CUDART static linking) | |
Detected 1 CUDA Capable device(s) | |
Device 0: "NVIDIA GH200 120GB" | |
CUDA Driver Version / Runtime Version 12.4 / 12.6 | |
CUDA Capability Major/Minor version number: 9.0 | |
Total amount of global memory: 96768 MBytes (101468602368 bytes) | |
(132) Multiprocessors, (128) CUDA Cores/MP: 16896 CUDA Cores | |
GPU Max Clock rate: 1980 MHz (1.98 GHz) | |
Memory Clock rate: 2619 Mhz | |
Memory Bus Width: 6144-bit | |
L2 Cache Size: 62914560 bytes | |
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) | |
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers | |
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers | |
Total amount of constant memory: 65536 bytes | |
Total amount of shared memory per block: 49152 bytes | |
Total shared memory per multiprocessor: 233472 bytes | |
Total number of registers available per block: 65536 | |
Warp size: 32 | |
Maximum number of threads per multiprocessor: 2048 | |
Maximum number of threads per block: 1024 | |
Max dimension size of a thread block (x,y,z): (1024, 1024, 64) | |
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) | |
Maximum memory pitch: 2147483647 bytes | |
Texture alignment: 512 bytes | |
Concurrent copy and kernel execution: Yes with 2 copy engine(s) | |
Run time limit on kernels: No | |
Integrated GPU sharing Host Memory: No | |
Support host page-locked memory mapping: Yes | |
Alignment requirement for Surfaces: Yes | |
Device has ECC support: Enabled | |
Device supports Unified Addressing (UVA): Yes | |
Device supports Managed Memory: Yes | |
Device supports Compute Preemption: Yes | |
Supports Cooperative Kernel Launch: Yes | |
Supports MultiDevice Co-op Kernel Launch: Yes | |
Device PCI Domain ID / Bus ID / location ID: 9 / 1 / 0 | |
Compute Mode: | |
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > | |
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.4, CUDA Runtime Version = 12.6, NumDevs = 1 | |
Result = PASS |
./deviceQuery Starting... | |
CUDA Device Query (Runtime API) version (CUDART static linking) | |
Detected 1 CUDA Capable device(s) | |
Device 0: "Tesla T4" | |
CUDA Driver Version / Runtime Version 12.2 / 11.8 | |
CUDA Capability Major/Minor version number: 7.5 | |
Total amount of global memory: 15102 MBytes (15835660288 bytes) | |
(040) Multiprocessors, (064) CUDA Cores/MP: 2560 CUDA Cores | |
GPU Max Clock rate: 1590 MHz (1.59 GHz) | |
Memory Clock rate: 5001 Mhz | |
Memory Bus Width: 256-bit | |
L2 Cache Size: 4194304 bytes | |
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) | |
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers | |
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers | |
Total amount of constant memory: 65536 bytes | |
Total amount of shared memory per block: 49152 bytes | |
Total shared memory per multiprocessor: 65536 bytes | |
Total number of registers available per block: 65536 | |
Warp size: 32 | |
Maximum number of threads per multiprocessor: 1024 | |
Maximum number of threads per block: 1024 | |
Max dimension size of a thread block (x,y,z): (1024, 1024, 64) | |
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) | |
Maximum memory pitch: 2147483647 bytes | |
Texture alignment: 512 bytes | |
Concurrent copy and kernel execution: Yes with 3 copy engine(s) | |
Run time limit on kernels: No | |
Integrated GPU sharing Host Memory: No | |
Support host page-locked memory mapping: Yes | |
Alignment requirement for Surfaces: Yes | |
Device has ECC support: Enabled | |
Device supports Unified Addressing (UVA): Yes | |
Device supports Managed Memory: Yes | |
Device supports Compute Preemption: Yes | |
Supports Cooperative Kernel Launch: Yes | |
Supports MultiDevice Co-op Kernel Launch: Yes | |
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 4 | |
Compute Mode: | |
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > | |
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 1 | |
Result = PASS |