Why your AI bills are going up (even as tokens get cheaper) ๐Ÿ“‰๐Ÿ’ธ

Subscribeย โ€ขย Previous Issues

The End of the AI Experiment: Surviving the CFOโ€™s New ROI Demands

Why This Has Become an Executive Issue

Why is AI spend no longer just an IT budget problem? AI has crossed a threshold where aggregate spend across every department requires capital allocation discipline, not just software procurement review. Every function now has a case for AI investment, and someone has to decide which requests deserve ongoing funding. That decision has landed with the CFO, which means technology leaders who frame AI proposals as feature requests will lose funding to peers who can demonstrate measurable business outcomes.

What do “tokenomics” and “tokenmaxxing” actually mean in practice? Tokenomics is simply the practical economics of AI usage: how prompts, automated workflows, and background agents translate into real spending, and whether that spending is producing value. Tokenmaxxing is the emerging habit of pushing more work through AI because tokens feel cheap, or because high-consumption workflows appear more productive. The instinct can be rational, but it creates a governance problem because organizations need a way to distinguish productive consumption from wasteful consumption, and most have not built that capability yet.

Reading on a regular basis? Consider becoming a paid supporter ๐Ÿ™

Why are AI bills climbing even though token prices keep falling? Lower unit prices are encouraging more consumption, not less. As tokens get cheaper, teams build more ambitious systems: more automated, more context-heavy, always-on agents running continuously in the background. The marginal cost of any single query feels negligible, so consumption expands to fill whatever budget exists. Organizations focused purely on negotiating lower unit prices while ignoring how their systems are designed will find their total bills climbing regardless.

Why is CFO scrutiny intensifying right now? The broad experimentation phase is ending. Many organizations have deployed AI in some form, but far fewer believe those deployments have produced tangible value. Once that gap becomes visible, finance teams stop treating AI as a learning exercise and start demanding evidence for continued investment. The funding logic shifts from supporting a large portfolio of loosely defined experiments to concentrating resources on fewer workflows with a clear payback case.

What Leaders Should Actually Govern

What is the right unit of control: seats, teams, vendors, or workflows? The most useful unit of governance is the individual application or workflow, not the software seat or department budget. AI costs are generated by usage patterns, not by who holds a license. A single automated workflow can quietly consume more tokens than dozens of human users combined. Budgeting at the workflow level makes it possible to see which use cases are scaling, which are overrunning, and which should be redesigned or shut down.

When do spending caps help, and when do they backfire? Caps help when they prevent undisciplined growth in low-value usage, particularly when nobody can explain where the spending is coming from. They backfire when they suppress the most productive work. If your highest-consuming teams are also your highest-performing ones, a blanket ceiling is a tax on performance dressed up as financial discipline. The right sequence is to instrument outcomes first, then decide where controls belong.

What should leaders actually ask when a vendor proposes outcome-based pricing? Outcome-based pricing sounds appealing because it appears to align vendor incentives with business results. That alignment is not automatic. It depends entirely on how the outcome is defined, how success is verified, and what happens when the system produces something that technically triggers a charge but does not create real value. Leaders should ask who defines what counts as a valid outcome, how disputes are handled, and whether the vendor has any incentive to maximize billable events in ways that diverge from the customer’s actual objective.ย 

Why do different AI pricing models need different governance approaches? Not all AI spend behaves the same way. Subscription pricing buys predictability but can conceal waste inside a flat fee. Usage-based pricing makes activity visible but creates volatile invoices. Outcome-based pricing sounds more business-friendly, but it can obscure the operational work required to verify whether the billed result was correct, complete, and valuable. The shift toward seats-plus-consumption adds another complication: buyers may renew a familiar per-seat contract while also taking on usage charges, credits, agent actions, or outcome fees that behave very differently. Leaders need governance that matches how value is claimed, how cost is incurred, and how performance can fail. Otherwise, they risk optimizing the old pricing model while their real exposure has already moved somewhere else.

The seat is no longer the product. Increasingly, it is just the wrapper around prepaid consumption.

Visibility: The Prerequisite for Everything Else

What is the single most important governance gap right now? Attribution. Most organizations cannot answer the basic question of which team, workflow, or agent is consuming how many tokens, and what business outcome that consumption supports. Without that visibility, every other governance mechanism, whether caps, chargebacks, or ROI thresholds, operates on incomplete information. Solving attribution is the prerequisite for everything else.

What does good visibility infrastructure actually look like? It means purpose-built dashboards that surface per-workflow and per-agent consumption in near real time, not month-end invoices that arrive with no ability to trace costs back to specific decisions or teams. Salesforce expanded its internal Engineering 360 dashboards to track AI usage at the workflow and team level, showing how companies often need custom visibility tools when standard reporting does not give leaders a clear view of token consumption, agent activity, and adoption patterns. This is an area where early investment in custom observability pays off rather than waiting for the vendor ecosystem to catch up.

How does token consumption become a productivity signal rather than just a cost metric? High token consumption and high-quality output often correlate. Before setting any controls, connect token spend to actual business outcomes: deals closed, issues resolved, code shipped, churn prevented. Once you have that picture, invest more in the high-correlation workflows and scrutinize the rest. Organizations that skip this step and go straight to spending ceilings risk penalizing their most productive teams first.

Practical Governance Mechanisms That Work

What is the most actionable governance step we can take right now? Set per-application token budgets with automated alerting thresholds, and require cost-impact assessments for any new AI feature before it ships. Build that review into sprint planning rather than treating it as a finance team afterthought. This embeds financial discipline into the development process rather than bolting it on after costs have already run up.

What are FinOps practices and why do they matter for AI? FinOps is the discipline of bringing financial accountability to technology spend through collaboration between engineering, finance, and business teams. Applied to AI, it means forecasting token demand before projects launch, setting ROI approval gates for competing use cases, and implementing chargebacks so business units bear the actual cost of their own consumption. The chargeback mechanism in particular creates real incentives for teams to ask whether their usage is justified.

If your highest-consuming teams are also your highest-performing ones, a blanket spending cap is just a tax on performance dressed up as financial discipline.

How should infrastructure choices factor into AI cost governance? Stop treating all AI workloads as equivalent from a cost perspective. Public cloud is the right choice for experimentation and burst capacity where flexibility justifies the premium. Predictable, high-volume inference workloads are better suited to private or on-premises infrastructure where fixed costs outperform consumption pricing over time. Defaulting everything to public cloud absorbs a premium that compounds significantly as workloads scale.

Procurement and Organizational Risk

Our vendor contracts are still per-seat. Is that a problem? Yes. Per-seat pricing no longer maps cleanly to how AI systems generate costs. In many AI-heavy products, the seat is becoming a wrapper around a base level of included usage rather than a reliable proxy for total cost. Every prompt, automated workflow, and background agent can burn tokens regardless of how many people are licensed, creating invoice volatility that per-seat budgeting cannot predict. Push for hybrid models that combine a predictable baseline fee with usage-based pricing above agreed thresholds, with explicit price caps, volume commitments, reporting rights, and overage terms built in.

What changes when a seat becomes a consumption bundle? The license still matters because it controls access, but it no longer tells you enough about cost. Two teams with the same number of seats can generate very different bills if one uses AI for occasional drafting and the other runs context-heavy agents across customer support, software development, or security workflows. Procurement teams therefore need to negotiate included usage, overage rates, usage reporting, and contractual limits on unexpected consumption. The buying question shifts from โ€œhow many people need access?โ€ to โ€œhow much machine work are we authorizing?โ€

What is the governance maturity gap for agentic AI? Agentic AI refers to systems that take sequences of actions autonomously rather than responding to a single prompt. That matters economically because an agent is not naturally a seat-based user. It performs tasks, calls tools, consumes tokens, and may keep working after the human has stepped away. Research suggests only about one in five organizations planning to deploy agentic AI has a mature governance model in place. Without clear accountability structures and performance metrics, organizations accumulate what practitioners call โ€œcontent debt,โ€ meaning AI-generated outputs requiring human remediation that erode the ROI case for further investment. Building governance before you scale is significantly cheaper than retrofitting it after problems surface.

How should we frame AI cost governance to get board-level attention? Frame it as a competitive risk, not a budget management problem. Unmanaged AI consumption erodes margins in a way that compounds over time, and organizations that govern their AI economics well will have a structural cost advantage over those that do not. Tokens are becoming a real operational input, and treating them with the same rigor applied to energy procurement or capital expenditure is not optional for organizations that intend to scale AI seriously


๐ŸŽ—๏ธCerebras IPO๐ŸŽ—๏ธ

Cerebras is going public this week, a milestone for an AI infrastructure company I have followed since its early days. I first met CEO Andrew Feldman in early 2018, before Cerebras had released its first processors and when the company was still focused mainly on AI training. After its first-generation chip came out, one of the first talks the team gave was at a conference I co-chaired in 2019. What makes this IPO especially interesting now is Cerebrasโ€™s growing focus on inference, the work of running trained AI models to produce answers, code, images, or other outputs. That shift matters as more enterprises move AI into production and as reasoning models use more compute while generating responses, not just during training. For those of us who build, buy, or use AI applications, another strong, speed-focused alternative to Nvidia is welcome news.

Discover more from Gradient Flow

Subscribe now to keep reading and get access to the full archive.

Continue reading