Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save eonist/904f76a183b54eb7fefcad55868d59ac to your computer and use it in GitHub Desktop.
Save eonist/904f76a183b54eb7fefcad55868d59ac to your computer and use it in GitHub Desktop.
is llm compute cost going down? show timeline from 2022 until now

Is LLM compute cost going down? Timeline (2022–Aug 2025)

Short answer

  • Yes: the cost to run LLMs (inference) has fallen dramatically since 2022—by orders of magnitude for equivalent quality levels—while training costs for cutting-edge frontier models have generally increased.[1][2][3][4][5][6]

Timeline of key shifts (2022–Aug 2025)

2022

  • Early public benchmarks and pricing baselines: models achieving mid-tier MMLU performance (e.g., GPT‑3.5-level) were priced at roughly $20 per million tokens in late 2022, setting an initial reference point for subsequent price declines.[2]

2023

  • Frontier training costs surge: estimates for training GPT‑4 and Google’s Gemini rose into the tens to hundreds of millions, reflecting growing scale and complexity of top-tier models.[3][5]
  • Inference begins rapid decline for same-quality targets: comparing consistent quality levels (e.g., MMLU), prices start falling sharply across providers.[6][2]

2024

  • Massive drop in inference price for GPT‑3.5‑equivalent quality: the cost to query a model scoring about GPT‑3.5 on MMLU fell from $20.00/million tokens (Nov 2022) to $0.07/million tokens (Oct 2024), a >280× reduction in ~18 months.[2]
  • Training costs remain high across major releases (e.g., Llama 3.1‑405B $170M; Grok‑2 $107M; Mistral Large $41M), underscoring divergence between training and inference economics at the frontier.[4]

2025 (through August)

  • “Inference price declines are rapid but uneven”: analysis shows 9× to 900× per year declines depending on the task/benchmark, with some of the fastest drops occurring in the past year; uncertainty remains about persistence of the fastest rates.[1][2]
  • Cross‑provider dispersion persists: open‑weight models (e.g., Llama‑3.1‑70B, 405B) show wide price spreads across serving providers—from ~$0.20 to ~$2.90 per million tokens for 70B, and ~$0.90 to ~$9.50 for 405B—indicating a non‑commodity market despite overall downtrend.[7]
  • Overall “LLMflation” pattern: multiple analyses find roughly an order‑of‑magnitude (10×) per year decrease in inference cost for constant quality, with ~1,000× over three years at lower MMLU targets and ~62× since 2023 at GPT‑4‑class MMLU levels.[6][1]
  • Economic shift toward inference/test‑time compute: improvements in smaller, efficient models and reasoning‑time strategies change cost structures, moving more cost to inference cycles even as per‑token prices fall, driven by higher usage and longer “thinking.”[8]

What this means

  • Inference costs: strongly down since 2022 for equivalent performance, with 10× per year as a useful rule‑of‑thumb across many benchmarks; exact declines vary widely by task and provider.[7][1][2][6]
  • Training costs: at the frontier, still trending high (often $50M–$200M+) even as some efficient/open models are cheaper to train—so “compute cost” depends on whether discussing training or inference.[5][3][4]
  • Market dynamics: significant price dispersion by provider and model size persists despite overall declines, suggesting room for further competition and optimization.[7]

Data points at a glance

  • GPT‑3.5‑equivalent inference price: $20.00/million tokens (Nov 2022) → $0.07/million (Oct 2024), >280× decline.[2]
  • Inference price declines across tasks: 9× to 900× per year; some fastest drops in the last year.[1][2]
  • Three‑year view: ~1,000× decline at lower MMLU target (42) from $60 → $0.06 per million tokens; ~62× decline since GPT‑4 era at higher MMLU target (83).[6]
  • Provider spread (example, open‑weight): Llama‑3.1‑70B Instruct ranges ~$0.20–$2.90 per million tokens; Llama‑3.1‑405B ~$0.90–$9.50.[7]
  • Frontier training costs (examples): GPT‑4 ~$79M (estimate); Gemini 1.0 Ultra ~$192M; Llama 3.1‑405B ~$170M; Grok‑2 ~$107M; Mistral Large ~$41M.[3][4][5]

Bottom line

  • For inference: compute cost is decisively going down since 2022, with step‑changes in 2023–2025; pace varies by task and provider.[1][2][6][7]
  • For training: leading‑edge training costs are not falling; they remain very high or rising for the most capable frontier systems.[4][5][3]

[1] https://epoch.ai/data-insights/llm-inference-price-trends [2] https://hai.stanford.edu/ai-index/2025-ai-index-report/research-and-development [3] https://www.statista.com/chart/33114/estimated-cost-of-training-selected-ai-models/ [4] https://www.visualcapitalist.com/the-surging-cost-of-training-ai-models/ [5] https://www.forbes.com/sites/katharinabuchholz/2024/08/23/the-extreme-cost-of-training-ai-models/ [6] https://a16z.com/llmflation-llm-inference-cost/ [7] https://techgov.intelligence.org/blog/observations-about-llm-inference-pricing [8] https://www.bruegel.org/policy-brief/how-deepseek-has-changed-artificial-intelligence-and-what-it-means-europe [9] https://skywork.ai/skypage/en/Analysis%20of%20the%20Evolution%20Path%20of%20%22Inference%20Cost%22%20of%20Large%20Models%20in%202025:%20The%20API%20Price%20War%20Erupts/1948243097032671232 [10] https://hai-production.s3.amazonaws.com/files/hai_ai_index_report_2025.pdf [11] https://epoch.ai/trends [12] https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_cost_is_decreasing_by_10x_each_year_for/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment