Is LLM compute cost going down? Timeline (2022–Aug 2025)

Short answer

Yes: the cost to run LLMs (inference) has fallen dramatically since 2022—by orders of magnitude for equivalent quality levels—while training costs for cutting-edge frontier models have generally increased.[1][2][3][4][5][6]

Early public benchmarks and pricing baselines: models achieving mid-tier MMLU performance (e.g., GPT‑3.5-level) were priced at roughly $20 per million tokens in late 2022, setting an initial reference point for subsequent price declines.[2]

Frontier training costs surge: estimates for training GPT‑4 and Google’s Gemini rose into the tens to hundreds of millions, reflecting growing scale and complexity of top-tier models.[3][5]
Inference begins rapid decline for same-quality targets: comparing consistent quality levels (e.g., MMLU), prices start falling sharply across providers.[6][2]

Massive drop in inference price for GPT‑3.5‑equivalent quality: the cost to query a model scoring about GPT‑3.5 on MMLU fell from $20.00/million tokens (Nov 2022) to $0.07/million tokens (Oct 2024), a >280× reduction in ~18 months.[2]
Training costs remain high across major releases (e.g., Llama 3.1‑405B $170M; Grok‑2 $107M; Mistral Large $41M), underscoring divergence between training and inference economics at the frontier.[4]

“Inference price declines are rapid but uneven”: analysis shows 9× to 900× per year declines depending on the task/benchmark, with some of the fastest drops occurring in the past year; uncertainty remains about persistence of the fastest rates.[1][2]
Cross‑provider dispersion persists: open‑weight models (e.g., Llama‑3.1‑70B, 405B) show wide price spreads across serving providers—from ~$0.20 to ~$2.90 per million tokens for 70B, and ~$0.90 to ~$9.50 for 405B—indicating a non‑commodity market despite overall downtrend.[7]
Overall “LLMflation” pattern: multiple analyses find roughly an order‑of‑magnitude (10×) per year decrease in inference cost for constant quality, with ~1,000× over three years at lower MMLU targets and ~62× since 2023 at GPT‑4‑class MMLU levels.[6][1]
Economic shift toward inference/test‑time compute: improvements in smaller, efficient models and reasoning‑time strategies change cost structures, moving more cost to inference cycles even as per‑token prices fall, driven by higher usage and longer “thinking.”[8]

Inference costs: strongly down since 2022 for equivalent performance, with 10× per year as a useful rule‑of‑thumb across many benchmarks; exact declines vary widely by task and provider.[7][1][2][6]
Training costs: at the frontier, still trending high (often $50M–$200M+) even as some efficient/open models are cheaper to train—so “compute cost” depends on whether discussing training or inference.[5][3][4]
Market dynamics: significant price dispersion by provider and model size persists despite overall declines, suggesting room for further competition and optimization.[7]

GPT‑3.5‑equivalent inference price: $20.00/million tokens (Nov 2022) → $0.07/million (Oct 2024), >280× decline.[2]
Inference price declines across tasks: 9× to 900× per year; some fastest drops in the last year.[1][2]
Three‑year view: ~1,000× decline at lower MMLU target (42) from $60 → $0.06 per million tokens; ~62× decline since GPT‑4 era at higher MMLU target (83).[6]
Provider spread (example, open‑weight): Llama‑3.1‑70B Instruct ranges ~$0.20–$2.90 per million tokens; Llama‑3.1‑405B ~$0.90–$9.50.[7]
Frontier training costs (examples): GPT‑4 ~$79M (estimate); Gemini 1.0 Ultra ~$192M; Llama 3.1‑405B ~$170M; Grok‑2 ~$107M; Mistral Large ~$41M.[3][4][5]

For inference: compute cost is decisively going down since 2022, with step‑changes in 2023–2025; pace varies by task and provider.[1][2][6][7]
For training: leading‑edge training costs are not falling; they remain very high or rising for the most capable frontier systems.[4][5][3]