Deerfield Green
Enterprise AI Economics

Token Economics for the C-Suite

Enterprise AI cost intelligence: Token economics for the C-Suite

The economics of enterprise AI are defined by a central paradox: per-token costs have collapsed 90–280x since 2022, yet 85% of organizations still misestimate their AI budgets by more than 10%. This disconnect stems from hidden multipliers—data preprocessing, maintenance, integration, and the token-hungry nature of agentic systems—that dwarf raw inference costs.


The 50x cost decline claim is actually conservative

The claim that GPT-4-equivalent inference costs have declined ~50x since late 2022 understates reality. GPT-4 launched in March 2023 at $30/million input tokens and $60/million output tokens, yielding a blended rate of approximately $36/million tokens. Today, multiple models matching or exceeding GPT-4’s capabilities are available at dramatically lower prices:

This translates to a 90–140x decline from GPT-4’s launch pricing to the cheapest GPT-4-class models today. Andrew Ng noted in August 2024 that GPT-4 pricing was already declining at ~79% per year. Epoch AI’s research finds LLM inference prices declining between 9x and 900x per year, with a median of 50x per year. The Stanford HAI AI Index Report 2025 provides the most authoritative benchmark: the cost of achieving GPT-3.5-level performance dropped from $20/million tokens to $0.07/million tokens between November 2022 and October 2024—a 280-fold reduction in just 18 months.

The user’s claim references “GPT-4-equivalent from $20/million tokens to ~$0.40 today.” The $20 figure likely references GPT-4 Turbo pricing (November 2023, ~$10/$30 per MTok, blended ~$15) or a simplified average. The ~$0.40 endpoint is accurate for models like DeepSeek V3.2 and Gemini 2.0 Flash.

Current pricing across major LLM providers (March 2026)

The pricing spread across major models as of early 2026 spans over 1,000x from cheapest to most expensive:

Frontier/flagship models (per million tokens, input/output):

ProviderModelInputOutputContext
OpenAIGPT-5.4$2.50$15.00270K
OpenAIGPT-5$1.25$10.00128K
OpenAIGPT-5.2 Pro$21.00$168.00200K
OpenAIo1-pro$150.00$600.00200K
AnthropicClaude Opus 4/4.1$15.00$75.00200K
AnthropicClaude Opus 4.5/4.6$5.00$25.00200K
AnthropicClaude Sonnet 4/4.5/4.6$3.00$15.00200K (1M beta)
GoogleGemini 3.1 Pro$2.00$12.001M
GoogleGemini 2.5 Pro$1.25$10.002M
DeepSeekV4$0.30$0.50128K
DeepSeekV3.2$0.28$0.42128K
xAIGrok 4$3.00$15.002M
MistralLarge 3$2.00$6.00128K

Budget/efficiency models:

ProviderModelInputOutputContext
OpenAIGPT-5 Nano$0.05$0.40128K
OpenAIGPT-4o-mini$0.15$0.60128K
GoogleGemini 2.5 Flash-Lite$0.10$0.401M
GoogleGemini 2.0 Flash$0.10$0.401M
AnthropicClaude Haiku 3.5$0.80$4.00200K
DeepSeekV3 (original)$0.14$0.28128K
MistralNemo$0.02$0.02128K
xAIGrok 4.1 Fast$0.20$0.502M

The user’s specific claim about “DeepSeek V3.2 at $0.028/million tokens” requires clarification: the $0.028 figure is the cache-hit input price (90% discount on the $0.28 base input price). The standard DeepSeek V3.2 pricing is $0.28/$0.42 per million tokens for input/output respectively. Claude Opus 4 at $75/million output tokens is verified exactly ($15 input / $75 output).

Key cost-saving mechanisms now standard across providers:

Benchmarkit 2025 survey: verified with caveats

Status: VERIFIED. The “2025 State of AI Cost Management” report was published September 10, 2025 by Benchmarkit in partnership with Mavvrik (an AI cost governance platform), surveying 372 enterprise organizations. Exact findings:

Ray Rike, CEO of Benchmarkit, stated: “These numbers should rattle every finance leader. AI is no longer just experimental—it’s hitting gross margins, and most companies can’t even predict the impact.” The report was widely cited by CIO, CFO Dive, and Yahoo Finance. Caveat: Mavvrik is a cost governance vendor with commercial interest in emphasizing the problem, though the findings align with other independent data from FinOps Foundation, Flexera, and CloudZero.

Supporting data from other sources: the FinOps Foundation State of FinOps 2026 Report (1,192 respondents representing $83B+ in annual cloud spend) found 98% of respondents now manage AI spend, up from 63% in 2025 and 31% in 2024. Flexera’s 2026 State of the Cloud found 85% of organizations have adopted some form of FinOps practices, yet 32% of cloud spend remains wasted. CloudZero reports average monthly AI spending hit $85,521 in 2025, up 36% from $62,964 in 2024, with 45% of organizations now planning to invest >$100K/month on AI.

The 380% cost overrun claim: partially verified

Status: PARTIALLY VERIFIED. The 380% figure appears in a Pertama Partners article titled “AI Project Failure Statistics 2026” (published February 8, 2026), attributed to “MIT Sloan, 2025.” The article states: “Cost overruns average 380% at production scale versus pilot projections.” However, Pertama Partners is an AI consulting firm (not independent research), and the direct MIT Sloan publication containing this specific 380% figure could not be independently located.

Alternative and corroborating cost overrun data:

The 380% figure is plausible given multiple sources documenting 3x–10x cost escalation from pilot to production, but should be cited with the Pertama Partners attribution rather than directly to MIT Sloan.

Hidden cost multipliers are well-documented

Data preprocessing adding 30–50% to costs: VERIFIED. Multiple independent sources confirm this range:

Annual maintenance at 17–30% of initial development: VERIFIED. Consensus range across sources is 15–30%:

Additional hidden multipliers identified:

Open-source economics: strong savings, but market share claim overstated

The claim that open-source serves 45% of total tokens is NOT VERIFIED. The best empirical data comes from the a16z/OpenRouter State of AI study (analyzing 100+ trillion tokens served in 2025), which found open-source models have reached equilibrium at roughly 30% of total token usage—not 45%. MIT Sloan research (January 2026, using OpenRouter data from May–September 2025) found open-source models serving approximately 20% of tokens and capturing only ~4% of revenue. Gartner forecasts 60%+ of businesses will adopt open-source LLMs for at least one application, but adoption ≠ token share.

The 60–70% cost savings claim is VERIFIED and possibly understated:

Self-hosting economics: becomes cost-effective at approximately 1 million queries/month, with initial hardware investment ($10,000–$50,000) offset by eliminating API fees within 6–12 months. Medium-scale open models (Llama 3.3 70B) run on 2x A100-80GB GPUs (~$30K hardware) with accuracy within 10% of closed frontier models. Early adopters report 50–70% GPU cost reductions through optimized self-hosting.

The “average price paid per token” paradox is real

VERIFIED by a December 2025 academic paper by Andrey Fradkin (“The Emerging Market for Intelligence: Pricing, Supply, and Demand for LLMs”), analyzing OpenRouter data covering 100+ trillion tokens. The key finding: “The average price paid per token has remained relatively constant, consistent with demand for superior intelligence.”

The mechanism is straightforward: when a new state-of-the-art model is released, 99% of demand immediately shifts to it (per ikangai.com’s “The LLM Cost Paradox” analysis). Reasoning models like o1, o3, and GPT-5 generate thousands of internal “thinking” tokens, making total cost per task higher despite lower per-token prices. The pricing spread between the cheapest model (Mistral Nemo at $0.02/MTok) and the most expensive (o1-pro at $375/MTok blended) now exceeds 18,000x—organizations that want frontier capabilities pay frontier prices.

The Fradkin paper also finds that short-run demand elasticities in the API market are “unlikely to justify the Jevons Paradox”—price drops don’t cause proportional increases in total token demand. The a16z team confirms this with “LLMflation” analysis: while inference costs decline ~10x per year for equivalent performance, the newest frontier models (like o1) have the same cost per output token ($60/M) as GPT-3 at launch.

Token budgeting frameworks for enterprise CFOs

Enterprise token budgeting is an emerging but rapidly maturing discipline. Key frameworks and data points:

Model routing is the most impactful single strategy: routing simple tasks to budget models and complex tasks to frontier models can cut costs by 60–90%. Approximately 85% of enterprise queries can be handled by budget-tier models. An allocation of 85% budget / 10% balanced / 5% frontier models yields ~92% savings versus frontier-only usage.

Additional optimization strategies and their savings:

The TALE framework (Token-Budget-Aware LLM Reasoning, ACL 2025) achieves 62–70% output token reduction with minimal accuracy loss. Enterprises are implementing graduated budget enforcement: alerts at 50% utilization, throttling at 80%, model downgrades at 90%, blocking at 100%. Companies with usage-based dashboards reduce unplanned costs by 40% within six months. Gartner data shows enterprises with centralized AI token management programs report 23–30% lower overall costs.

Budget allocation guidance: Development ~15–20% of total; Production ~60–70%. ICONIQ’s research finds inference spend averages 23% of revenue at the scaling stage for AI-native companies.

Total cost of ownership extends far beyond tokens

The six core TCO components identified by enterprise AI cost analyses:

  1. Infrastructure: GPU clusters, auto-scaling, multi-cloud ($200K–$2M+ annually). NVIDIA H100 cloud pricing: $0.58–$8.54/hour. 100 H100 GPUs at $3M hardware is only 35% of actual 5-year TCO ($8.6M total with power, cooling, staff).
  2. Data engineering: Pipeline processing, quality monitoring (25–40% of total spend)
  3. Talent: Specialized AI engineers ($200K–$500K+ compensation per person)
  4. Model maintenance: Drift detection, retraining (15–30% overhead annually)
  5. Compliance and governance: Up to 7% revenue penalty risk; 233 AI-related security incidents in 2024, up 56.4% year-over-year
  6. Integration complexity: 2–3x implementation premium over base model costs

LLM API spending hit $8.4 billion by mid-2025, doubling year-over-year. 37% of enterprises spend >$250K/year on LLM APIs alone. Google reportedly spends 10–20x more on inference than training. Average monthly AI infrastructure spending across enterprises reached $85,521 in 2025 (CloudZero), with 45% planning to invest >$100K/month.

Enterprise AI spending and skeptical perspectives

Macro spending data:

Skeptical and contrarian perspectives are mounting: