Enterprise AI cost intelligence: Token economics for the C-Suite

The economics of enterprise AI are defined by a central paradox: per-token costs have collapsed 90–280x since 2022, yet 85% of organizations still misestimate their AI budgets by more than 10%. This disconnect stems from hidden multipliers—data preprocessing, maintenance, integration, and the token-hungry nature of agentic systems—that dwarf raw inference costs.

The 50x cost decline claim is actually conservative

The claim that GPT-4-equivalent inference costs have declined ~50x since late 2022 understates reality. GPT-4 launched in March 2023 at $30/million input tokens and $60/million output tokens, yielding a blended rate of approximately $36/million tokens. Today, multiple models matching or exceeding GPT-4’s capabilities are available at dramatically lower prices:

DeepSeek V3.2: $0.28 input / $0.42 output per million tokens (blended ~$0.35)
GPT-4o-mini: $0.15 input / $0.60 output (blended ~$0.26)
GPT-5 Nano: $0.05 input / $0.40 output (blended ~$0.14)
Gemini 2.0 Flash: $0.10 input / $0.40 output (blended ~$0.18)

This translates to a 90–140x decline from GPT-4’s launch pricing to the cheapest GPT-4-class models today. Andrew Ng noted in August 2024 that GPT-4 pricing was already declining at ~79% per year. Epoch AI’s research finds LLM inference prices declining between 9x and 900x per year, with a median of 50x per year. The Stanford HAI AI Index Report 2025 provides the most authoritative benchmark: the cost of achieving GPT-3.5-level performance dropped from $20/million tokens to $0.07/million tokens between November 2022 and October 2024—a 280-fold reduction in just 18 months.

The user’s claim references “GPT-4-equivalent from $20/million tokens to ~$0.40 today.” The $20 figure likely references GPT-4 Turbo pricing (November 2023, ~$10/$30 per MTok, blended ~$15) or a simplified average. The ~$0.40 endpoint is accurate for models like DeepSeek V3.2 and Gemini 2.0 Flash.

Current pricing across major LLM providers (March 2026)

The pricing spread across major models as of early 2026 spans over 1,000x from cheapest to most expensive:

Frontier/flagship models (per million tokens, input/output):

Provider	Model	Input	Output	Context
OpenAI	GPT-5.4	$2.50	$15.00	270K
OpenAI	GPT-5	$1.25	$10.00	128K
OpenAI	GPT-5.2 Pro	$21.00	$168.00	200K
OpenAI	o1-pro	$150.00	$600.00	200K
Anthropic	Claude Opus 4/4.1	$15.00	$75.00	200K
Anthropic	Claude Opus 4.5/4.6	$5.00	$25.00	200K
Anthropic	Claude Sonnet 4/4.5/4.6	$3.00	$15.00	200K (1M beta)
Google	Gemini 3.1 Pro	$2.00	$12.00	1M
Google	Gemini 2.5 Pro	$1.25	$10.00	2M
DeepSeek	V4	$0.30	$0.50	128K
DeepSeek	V3.2	$0.28	$0.42	128K
xAI	Grok 4	$3.00	$15.00	2M
Mistral	Large 3	$2.00	$6.00	128K

Budget/efficiency models:

Provider	Model	Input	Output	Context
OpenAI	GPT-5 Nano	$0.05	$0.40	128K
OpenAI	GPT-4o-mini	$0.15	$0.60	128K
Google	Gemini 2.5 Flash-Lite	$0.10	$0.40	1M
Google	Gemini 2.0 Flash	$0.10	$0.40	1M
Anthropic	Claude Haiku 3.5	$0.80	$4.00	200K
DeepSeek	V3 (original)	$0.14	$0.28	128K
Mistral	Nemo	$0.02	$0.02	128K
xAI	Grok 4.1 Fast	$0.20	$0.50	2M

The user’s specific claim about “DeepSeek V3.2 at $0.028/million tokens” requires clarification: the $0.028 figure is the cache-hit input price (90% discount on the $0.28 base input price). The standard DeepSeek V3.2 pricing is $0.28/$0.42 per million tokens for input/output respectively. Claude Opus 4 at $75/million output tokens is verified exactly ($15 input / $75 output).

Key cost-saving mechanisms now standard across providers:

Batch API processing: 50% discount (OpenAI, Anthropic, Google)
Prompt caching: ~90% savings on cached input tokens
Combined batch + caching: Up to 95% total savings
Off-peak pricing (DeepSeek): 50-75% discount during off-peak hours

Benchmarkit 2025 survey: verified with caveats

Status: VERIFIED. The “2025 State of AI Cost Management” report was published September 10, 2025 by Benchmarkit in partnership with Mavvrik (an AI cost governance platform), surveying 372 enterprise organizations. Exact findings:

85% of organizations misestimate AI costs by more than 10%
Nearly 24% (one in four) miss forecasts by 50% or more
80% of enterprises miss AI infrastructure forecasts by more than 25%
84% report significant gross margin erosion tied to AI workloads (>6%)
More than 25% of companies see gross margin drops of 16% or more
Half of companies with AI-core products do not track LLM API costs
Only 35% include on-premises AI costs in reporting

Ray Rike, CEO of Benchmarkit, stated: “These numbers should rattle every finance leader. AI is no longer just experimental—it’s hitting gross margins, and most companies can’t even predict the impact.” The report was widely cited by CIO, CFO Dive, and Yahoo Finance. Caveat: Mavvrik is a cost governance vendor with commercial interest in emphasizing the problem, though the findings align with other independent data from FinOps Foundation, Flexera, and CloudZero.

Supporting data from other sources: the FinOps Foundation State of FinOps 2026 Report (1,192 respondents representing $83B+ in annual cloud spend) found 98% of respondents now manage AI spend, up from 63% in 2025 and 31% in 2024. Flexera’s 2026 State of the Cloud found 85% of organizations have adopted some form of FinOps practices, yet 32% of cloud spend remains wasted. CloudZero reports average monthly AI spending hit $85,521 in 2025, up 36% from $62,964 in 2024, with 45% of organizations now planning to invest >$100K/month on AI.

The 380% cost overrun claim: partially verified

Status: PARTIALLY VERIFIED. The 380% figure appears in a Pertama Partners article titled “AI Project Failure Statistics 2026” (published February 8, 2026), attributed to “MIT Sloan, 2025.” The article states: “Cost overruns average 380% at production scale versus pilot projections.” However, Pertama Partners is an AI consulting firm (not independent research), and the direct MIT Sloan publication containing this specific 380% figure could not be independently located.

Alternative and corroborating cost overrun data:

SmartDev (2025): Businesses routinely underestimate AI costs by 500–1,000% when scaling from pilot to production
Gartner (2027 prediction): By 2027, 40% of enterprises using consumption-priced AI coding tools will face unplanned costs exceeding twice expected budgets
IDC FutureScape 2026: G1000 organizations face up to 30% rise in underestimated AI infrastructure costs by 2027
Galileo AI: 96% of enterprises report AI costs exceeding initial estimates
FinOps Foundation: 63% of enterprises exceed AI budgets by at least 30% within year one
McKinsey: 62% of organizations using token-based AI services experienced at least one month of unexpected cost overruns in first year
PYMNTS/SearchQ.AI founder: “For every dollar spent on AI models, businesses are spending $5 to $10 to make models production-ready and enterprise-compliant”

The 380% figure is plausible given multiple sources documenting 3x–10x cost escalation from pilot to production, but should be cited with the Pertama Partners attribution rather than directly to MIT Sloan.

Hidden cost multipliers are well-documented

Data preprocessing adding 30–50% to costs: VERIFIED. Multiple independent sources confirm this range:

Riseup Labs (2026): “Data preparation frequently takes 30–50% of the total AI budget”
Codewave (2026): 25–35% of project budget for data preparation/labeling
Enterprise estimates: $100,000–$380,000 typical enterprise data preparation costs
96% of businesses begin AI projects without sufficient high-quality training data (USM Systems)
Data labeling runs $20–50/hour, with enterprises spending $50K–$200K before first model deployment

Annual maintenance at 17–30% of initial development: VERIFIED. Consensus range across sources is 15–30%:

SumatoSoft (2026): 17–30% with up to 50% worst-case
Xenoss, Prismetric, Durapid, Glean: 15–25% annually
91% of ML models experience degradation over time (MIT research); 75% of businesses observe performance declines without proper monitoring
For smaller AI applications, maintenance can reach 30–50% of original development cost annually

Additional hidden multipliers identified:

Safety and governance requirements add 20–35% to total costs
Compliance retrofitting adds 20–30% budget increases
Fine-tuning costs: $500–$100,000+ per project depending on scope
Model development and training: 20–40% of project budget
Testing and QA: 10–15% of project budget
Integration and deployment: 10–20% of project budget
Enterprise implementations typically cost 3–5x the advertised subscription price
Visible costs represent only 15–20% of total AI expenditures

The claim that open-source serves 45% of total tokens is NOT VERIFIED. The best empirical data comes from the a16z/OpenRouter State of AI study (analyzing 100+ trillion tokens served in 2025), which found open-source models have reached equilibrium at roughly 30% of total token usage—not 45%. MIT Sloan research (January 2026, using OpenRouter data from May–September 2025) found open-source models serving approximately 20% of tokens and capturing only ~4% of revenue. Gartner forecasts 60%+ of businesses will adopt open-source LLMs for at least one application, but adoption ≠ token share.

The 60–70% cost savings claim is VERIFIED and possibly understated:

MIT Sloan/Nagle & Yue (January 2026): Closed models cost 87% more ($1.86/MTok vs. $0.23/MTok for open models)
Deloitte: Companies using open-source LLMs save 40% in costs with similar performance
Market.us: 35% reduction in TCO for firms using open-source vs. full proprietary
Optimal reallocation could save >70%, worth ~$25 billion annually to the global AI economy
Open-weight models now close the performance gap within 13 weeks of a frontier model release (down from 27 weeks a year prior), achieving 89.6% of closed-model performance at release

Self-hosting economics: becomes cost-effective at approximately 1 million queries/month, with initial hardware investment ($10,000–$50,000) offset by eliminating API fees within 6–12 months. Medium-scale open models (Llama 3.3 70B) run on 2x A100-80GB GPUs (~$30K hardware) with accuracy within 10% of closed frontier models. Early adopters report 50–70% GPU cost reductions through optimized self-hosting.

The “average price paid per token” paradox is real

VERIFIED by a December 2025 academic paper by Andrey Fradkin (“The Emerging Market for Intelligence: Pricing, Supply, and Demand for LLMs”), analyzing OpenRouter data covering 100+ trillion tokens. The key finding: “The average price paid per token has remained relatively constant, consistent with demand for superior intelligence.”

The mechanism is straightforward: when a new state-of-the-art model is released, 99% of demand immediately shifts to it (per ikangai.com’s “The LLM Cost Paradox” analysis). Reasoning models like o1, o3, and GPT-5 generate thousands of internal “thinking” tokens, making total cost per task higher despite lower per-token prices. The pricing spread between the cheapest model (Mistral Nemo at $0.02/MTok) and the most expensive (o1-pro at $375/MTok blended) now exceeds 18,000x—organizations that want frontier capabilities pay frontier prices.

The Fradkin paper also finds that short-run demand elasticities in the API market are “unlikely to justify the Jevons Paradox”—price drops don’t cause proportional increases in total token demand. The a16z team confirms this with “LLMflation” analysis: while inference costs decline ~10x per year for equivalent performance, the newest frontier models (like o1) have the same cost per output token ($60/M) as GPT-3 at launch.

Token budgeting frameworks for enterprise CFOs

Enterprise token budgeting is an emerging but rapidly maturing discipline. Key frameworks and data points:

Model routing is the most impactful single strategy: routing simple tasks to budget models and complex tasks to frontier models can cut costs by 60–90%. Approximately 85% of enterprise queries can be handled by budget-tier models. An allocation of 85% budget / 10% balanced / 5% frontier models yields ~92% savings versus frontier-only usage.

Additional optimization strategies and their savings:

Semantic caching: 40–60% reduction in redundant API calls
Prompt optimization: 30–50% token reduction without quality loss
Prompt compression (LLMLingua): Up to 20x compression with 1.5% performance loss
Model distillation: 50–85% cost reduction for specific tasks
Combined stacking (caching + routing + compression): 70–90% total cost reduction

The TALE framework (Token-Budget-Aware LLM Reasoning, ACL 2025) achieves 62–70% output token reduction with minimal accuracy loss. Enterprises are implementing graduated budget enforcement: alerts at 50% utilization, throttling at 80%, model downgrades at 90%, blocking at 100%. Companies with usage-based dashboards reduce unplanned costs by 40% within six months. Gartner data shows enterprises with centralized AI token management programs report 23–30% lower overall costs.

Budget allocation guidance: Development ~15–20% of total; Production ~60–70%. ICONIQ’s research finds inference spend averages 23% of revenue at the scaling stage for AI-native companies.

Total cost of ownership extends far beyond tokens

The six core TCO components identified by enterprise AI cost analyses:

Infrastructure: GPU clusters, auto-scaling, multi-cloud ($200K–$2M+ annually). NVIDIA H100 cloud pricing: $0.58–$8.54/hour. 100 H100 GPUs at $3M hardware is only 35% of actual 5-year TCO ($8.6M total with power, cooling, staff).
Data engineering: Pipeline processing, quality monitoring (25–40% of total spend)
Talent: Specialized AI engineers ($200K–$500K+ compensation per person)
Model maintenance: Drift detection, retraining (15–30% overhead annually)
Compliance and governance: Up to 7% revenue penalty risk; 233 AI-related security incidents in 2024, up 56.4% year-over-year
Integration complexity: 2–3x implementation premium over base model costs

LLM API spending hit $8.4 billion by mid-2025, doubling year-over-year. 37% of enterprises spend >$250K/year on LLM APIs alone. Google reportedly spends 10–20x more on inference than training. Average monthly AI infrastructure spending across enterprises reached $85,521 in 2025 (CloudZero), with 45% planning to invest >$100K/month.

Enterprise AI spending and skeptical perspectives

Macro spending data:

Gartner: Worldwide AI spending will total $2.52 trillion in 2026 (44% YoY increase)
GenAI spending reached $644 billion in 2025 (76.4% YoY growth), with 80% going to hardware
Menlo Ventures: Companies spent $37 billion on generative AI in 2025 (3.2x YoY from $11.5B)
Stanford HAI: Total corporate AI investment hit $252.3 billion in 2024 (26% YoY increase)

Skeptical and contrarian perspectives are mounting:

Goldman Sachs Chief Economist Jan Hatzius (February 2026): AI’s impact on the US economy was “basically zero” in 2025, with “no meaningful relationship between productivity and AI adoption at the economy-wide level”—though noting a median 30% productivity gain in coding and customer service specifically
Goldman Sachs (November 2025): $19 trillion in market value “running well ahead of actual economic impact”
Sequoia Capital’s David Cahn: The AI industry needs $600 billion in annual revenue to justify current infrastructure spending—a gap that has tripled from $200B to $600B in 12 months
J.P. Morgan: AI needs to generate over $600 billion in annual revenue just to achieve a 10% return on infrastructure expenditures
McKinsey: Only 6% of organizations qualify as “AI high performers” (>5% EBIT impact); only 23% see AI delivering any favorable change in costs
RAND Corporation: 80.3% overall AI project failure rate
MIT NANDA (August 2025): ~95% of GenAI pilot programs fail to deliver measurable business value or reach production
Over 80% of companies report no productivity gains from AI despite billions in investment

Token Economics for the C-Suite