Deerfield Green
Enterprise AI Economics

The Jevons Paradox of AI

Article 3: The Jevons paradox of AI—how 280x cheaper tokens are bankrupting enterprise budgets

Core thesis: LLM inference costs have plummeted 280x since November 2022, yet enterprise generative AI spending exploded from $11.5 billion to $37 billion (3.2x) in a single year. The culprit is a structural Jevons paradox: cheaper tokens don’t reduce bills—they unleash entirely new consumption patterns. Agentic AI workflows consume 5–50x more tokens per task, reasoning models generate thousands of invisible “thinking tokens,” and 96% of enterprises report AI costs exceeding projections. This article uses the calculator’s volume slider to show how token consumption at scale obliterates per-unit savings, and why the input:output ratio is the most underappreciated cost variable in AI.

The 280x cost reduction is real and accelerating. Stanford HAI’s AI Index 2025 documents GPT-3.5-equivalent performance dropping from $20.00/M tokens (November 2022) to $0.07/M tokens (October 2024). Epoch AI found price decline rates of 9x to 900x per year depending on the benchmark, with the median accelerating to 200x/year after January 2024. GPT-4’s input pricing fell 92% from launch ($30/M) to today’s GPT-4o ($2.50/M). Gartner forecasts that by 2030, inference on a 1-trillion-parameter model will cost 90%+ less than in 2025.

But enterprise bills are exploding, not shrinking. Enterprise generative AI spending surged from $11.5 billion (2024) to $37 billion (2025)—a 3.2x increase during the same period per-token costs dropped 1,000x. CloudZero reports average monthly AI budgets rose 36% to $85,521, with organizations spending over $100K/month doubling to 45% of the market. Andreessen Horowitz’s survey of 100 CIOs found expected ~75% growth in LLM budgets over the next year. One enterprise leader told a16z: “What I spent in 2023 I now spend in a week.” Gartner’s own analysts warn: “Enterprises won’t see a direct benefit in passed-down savings, particularly as demand increases for frontier capabilities such as agentic AI, which require more tokens per task.”

Three forces drive the paradox. First, application proliferation: falling costs removed financial gatekeeping, pushing the median enterprise from 1–2 AI apps to dozens of use cases across marketing, sales, legal, HR, and finance. Second, volume per application: a customer service AI grew from 500 to 15,000 interactions/day, tokens per interaction from 800 to 4,500, plus 3–5 follow-up inference calls each. Third, model complexity escalation: when new frontier models launch, “99% of demand immediately shifts to it.” A simple query on a budget model costs 7 tokens; the same query with aggressive reasoning (Grok-4) costs 603 tokens—an 86x increase.

Agentic AI is the token multiplication bomb. An OpenReview empirical study of coding agents found up to 10x variance in token consumption across runs of the same task, with millions of tokens consumed per agentic coding session. A Reflexion loop running 10 cycles consumes 50x the tokens of a single linear pass. Claude Code’s “Max Unlimited” tier saw users consume 10 billion tokens in a single month—equivalent to 12,500 copies of War and Peace. Projected 24-hour autonomous agent runs could cost $4,320/day per user. Perhaps most revealing: JSON structural characters alone account for 40% of total token spend in agentic API calls—a logistics firm cut per-query costs from $0.12 to $0.008 (15x) through data distillation alone.

The budget unpredictability problem is structural. IDC reports 96% of enterprises say AI costs exceed initial projections. Only 44% have financial guardrails for AI. An estimated $400 million in unbudgeted AI cloud spend leaks annually across the Fortune 500. Startups report bills exploding from $10,000 to $100,000 in just three months. As one analyst put it: “A single product manager deciding to add ‘AI-powered insights’ to a dashboard can commit the organization to millions in inference costs.” The FinOps Foundation now recommends weekly or monthly rolling forecasts—not quarterly—because AI cost variance breaks traditional planning cycles.

Satya Nadella himself invoked the Jevons paradox after DeepSeek’s disruption: “As AI gets more efficient and accessible, we will see its use skyrocket, turning it into a commodity we just can’t get enough of.” Researchers Zhang & Zhang formalized a “Structural Jevons Paradox” in a January 2026 paper: falling API prices don’t just increase volume—they induce developers to “endogenously redesign their agent architectures to consume dramatically more compute,” adopting deeper reasoning loops and larger context windows that were previously uneconomical.

Contrarian finding: Despite the Jevons paradox narrative, the subscription squeeze suggests providers themselves can’t afford it. Claude Code’s $200/month unlimited tier “got obliterated” by user consumption. Every AI company faces a prisoner’s dilemma between sustainable usage-based pricing and competitive flat-rate pricing. And a Northeastern University economist argues the 160-year-old Jevons analogy may be imperfect for AI because DeepSeek’s efficiency gains emerged from chip restrictions (legislative supply constraints), not natural market dynamics. Additionally, while inference costs plunge, training costs continue to double every ~8 months—current inference pricing may be subsidized by market-share competition, and a reckoning could come.

Calculator integration angle: The volume slider from 1M to 1B tokens is this article’s star feature. Show how a seemingly modest agentic workflow—100 tasks/day × 200,000 tokens/task × 30 days = 600M tokens/month—lands in the calculator’s upper range. Then toggle the input:output ratio: at 1:1 on GPT-5 ($1.25/$10.00), the blended rate is $5.63/M; at 1:5 (output-heavy agentic work), it jumps to $8.54/M—a 52% increase from ratio alone. The tier comparison shows how the same 600M tokens costs $3,378 on GPT-4.1 Nano, $5,118 on GPT-5, or $56,700 on GPT-5.2 Pro—a 17x spread that one product decision can trigger.

Key data points to feature: