When AI Power Moves to Inference
When I first flagged inference scaling —the strategic surge of computational muscle during AI’s operational phase—it was clear we were witnessing a pivotal shift. Unlike traditional methods focused solely on training larger models, inference scaling dynamically allocates compute at runtime, empowering AI to reason deeply, evaluate multiple possibilities, and produce outputs with unprecedented sophistication.
But this leap forward comes at a steep cost. The substantial performance gains in complex reasoning tasks mean significantly higher operational expenses, forcing organizations into careful strategic decisions about resource allocation. Those who navigate this trade-off successfully won’t just gain superior AI capabilities—they’ll secure a decisive competitive advantage.
Inference scaling is fast emerging as the optimization method of choice, reshaping not only technological approaches but entire business models. Ignoring this shift is no longer viable for anyone committed to leading in the AI-driven future.
Inference Scaling: The Governance Blind Spot
Inference scaling fundamentally challenges existing AI governance frameworks, which focus primarily on training-based compute thresholds. Current regulations, such as the EU AI Act (triggered at 10²⁵ FLOP) and a recent US Executive Order (10²⁶ FLOP), rely heavily on monitoring the computational power used during the initial training phase. However, inference scaling allows models that were initially trained below these regulatory thresholds to later achieve advanced capabilities by allocating substantial computational resources during their operational use—a gap not addressed by current oversight mechanisms.

This shift toward greater computational intensity during inference also complicates traditional monitoring methods and risk assessments. As Toby Ord recently noted, AI developers can use “inference-during-training,” a practice where powerful inference-scaled models generate synthetic data or iteratively refine simpler models through methods such as distillation and amplification. This technique enables labs to quietly develop sophisticated systems without crossing regulatory training thresholds, making AI development less transparent and harder to track until deployment.
Moreover, because inference computations can be distributed across many locations rather than concentrated in a single data center, traditional governance methods—such as monitoring centralized data centers or energy consumption—become far less effective. While inference scaling might reduce certain risks, such as the unauthorized proliferation of powerful AI (since stolen model parameters alone become less valuable without the necessary compute resources), it introduces serious new concerns. Chief among these is fairness: advanced AI capabilities increasingly depend on significant inference budgets, potentially concentrating powerful AI tools in the hands of wealthy organizations and deepening inequalities.
Regulating AI After Pretraining
Current regulatory frameworks for AI are predominantly constructed around the assumption that the scale of pretraining drives capability. This “pretraining paradigm,” defined by predictable scaling laws and centralized compute-intensive training runs, provided clear governance leverage points. However, accumulating evidence suggests that we are approaching a “pretraining frontier,” beyond which merely scaling resources yields diminishing returns. This emerging constraint significantly undermines regulatory frameworks recently enacted or under consideration across the EU, US, UK, and China, just as they begin to take effect.
As companies increasingly turn to inference optimization, synthetic data generation, and specialized architectures to overcome these pretraining limits, regulatory oversight faces an unprecedented challenge. Traditional compute-based governance triggers may fail to capture these diverse and less predictable avenues of AI advancement, complicating monitoring and enforcement. Future regulatory frameworks will need a broader approach, potentially encompassing total compute usage—covering both development and deployment—tracking specialized datasets, applying capability-based rather than purely resource-based thresholds, and employing new evaluation mechanisms tailored to emerging technologies. Successfully navigating this landscape demands regulators significantly bolster their technical expertise and establish flexible oversight mechanisms capable of adapting rapidly to evolving AI development methodologies.
Bridging the Governance Gap
Inference scaling and the transition beyond the pretraining frontier mark a transformative shift in how advanced AI systems are developed, optimized, and deployed. This transformation necessitates an equally fundamental evolution in governance—from static, compute-focused thresholds to sophisticated frameworks capable of capturing the diversity and complexity of modern AI development paths. Effective governance must strike a delicate balance: supporting innovation and competitiveness, while safeguarding society from emergent risks associated with increasingly powerful AI technologies.
Help shape the future of AI governance—take our brief AI Governance Survey today.
Support our work by subscribing to our newsletter📩

