Skip to content

Instantly share code, notes, and snippets.

@awdemos
Last active February 1, 2025 18:53
Show Gist options
  • Save awdemos/be193e66f211255237df25d212002340 to your computer and use it in GitHub Desktop.
Save awdemos/be193e66f211255237df25d212002340 to your computer and use it in GitHub Desktop.
How many GPUs needed for GPT-3?

GPT-3's computational requirements depend heavily on model size, precision, and hardware optimization. For the full 175B-parameter model, running inference requires at least 5 NVIDIA A100 GPUs (80GB each) to meet the 350GB memory requirement when using FP16 precision[4][9][22]. However, practical implementations often use 8 GPUs for improved parallelism and throughput[3][22].

Key Technical Requirements:

  • Memory:

    • 175B parameters require 350GB of VRAM at FP16 precision (2 bytes/parameter)[4][9][14].
    • Consumer GPUs like RTX 3090s (24GB VRAM) can technically work in multi-GPU setups (e.g., 8x24GB = 192GB), but require aggressive memory optimizations like 8-bit quantization[2][28].
  • Hardware Recommendations:

    • Data center GPUs: 5–8 NVIDIA A100/A800 (80GB) GPUs for stable deployment[3][4][13].
    • Consumer GPUs: 8x RTX 3090/4090 with PCIe 5.0 and NVLink for reduced communication bottlenecks[2][28].

Performance Considerations:

  • Inference speed:
    • A single A100 generates ~1 word every 350ms for GPT-3[3].
    • An 8-GPU cluster achieves 15–20 words/sec with batch size 1[3][22].
  • Cost:
    • Cloud deployment costs ~$6–7/hour for 8xA100 instances[2].
    • On-prem setups with 8xRTX 3090s cost ~$10K for hardware[2].

Optimization Techniques:

  1. Model Parallelism: Split layers across GPUs to overcome memory limits[5][6].
  2. Quantization: 8-bit weights reduce memory usage to ~1 byte/parameter[2][9].
  3. KV Caching: Reuse attention computations to reduce redundant calculations[9].

While 10 GPUs exceed the minimum requirement, this provides headroom for larger batch sizes or hybrid training/inference workloads. Oracle demonstrated GPT-3-sized model inference with 8xA100 GPUs[13], confirming feasibility for high-end deployments.

Citations: [1] https://ai.stackexchange.com/questions/22877/how-much-computing-power-does-it-cost-to-run-gpt-3

[2] https://news.ycombinator.com/item?id=33881504

[3] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt

[4] https://www.reddit.com/r/OpenAI/comments/10aocxc/does_anyone_have_any_hard_numbers_on_the_gpu/

[5] https://llmgpuhelper.com/en/blog/optimizing-gpt3-multi-gpu-training

[6] https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/

[7] https://lambdalabs.com/blog/demystifying-gpt-3

[8] https://ai.gopubby.com/multi-gpu-model-training-made-easy-with-distributed-data-parallel-ddp-453ba9f6846e?gi=a737dc56a3e4 [9] https://blog.spheron.network/how-much-gpu-memory-is-required-to-run-a-large-language-model-find-out-here

[10] https://developer.nvidia.com/blog/openai-presents-gpt-3-a-175-billion-parameters-language-model/

[11] https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/

[12] https://news.ycombinator.com/item?id=37674913

[13] https://blogs.oracle.com/research/post/oracle-first-to-finetune-gpt3-sized-ai-models-with-nvidia-a100-gpu

[14] https://en.wikipedia.org/wiki/GPT-3

[15] https://www.reddit.com/r/nvidia/comments/113euip/openai_trained_chat_gpt_on_10k_a100s/

[16] https://www.fierceelectronics.com/sensors/chatgpt-runs-10k-nvidia-training-gpus-potential-thousands-more

[17] https://www.lesswrong.com/posts/HBisQEDajGwhWirky/how-feasible-costly-would-it-be-to-train-a-very-large-ai

[18] https://developer.nvidia.com/blog/efficiently-scale-llm-training-across-a-large-gpu-cluster-with-alpa-and-ray/

[19] https://lambdalabs.com/blog/demystifying-gpt-3

[20] https://www.reddit.com/r/singularity/comments/inp025/if_you_want_to_run_your_own_full_gpt3_instance/ [21] https://techcommunity.microsoft.com/blog/machinelearningblog/unlocking-the-power-of-large-scale-training-in-ai/4303390

[22] https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt

[23] https://rethinkpriorities.org/research-area/gpt-3-like-models-are-now-much-easier-to-access-and-deploy-than-to-develop/

[24] https://company.hpc-ai.com/blog/train-18-billion-parameter-gpt-models-with-a-single-gpu-on-your-personal-computer

[25] https://arstechnica.com/civis/threads/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberrypi.1490659/page-2

[26] https://www.weka.io/blog/gpu/gpu-for-ai/

[27] https://www.reddit.com/r/MachineLearning/comments/gzb5uv/d_what_would_it_take_to_run_openais_gpt3_on/

[28] https://www.reddit.com/r/OpenAI/comments/11fwfjg/what_kind_of_pc_would_you_need_to_have_a_model/

[29] https://www.reddit.com/r/GPT3/comments/zufeg9/how_long_before_we_can_run_gpt3_locally/

[30] https://rethinkpriorities.org/research-area/the-replication-and-emulation-of-gpt-3/

[31] https://arxiv.org/pdf/2104.04473.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment