Your Enterprise Data Deserves Better Than a Chatbot

Large language models and their multimodal variants remain the foundation models most people encounter first. That makes sense. Text, images, audio, and video cover a huge range of knowledge-work tasks, and today’s chatbots are far more capable than the text-only systems many people first tried. But enterprise AI does not run on chat alone. It runs on tables, time series, transactions, telemetry, product catalogs, customer histories, service graphs, and messy operational data that rarely fits neatly into a prompt. Using current coding agents for data science makes this gap concrete. Generating Python and SQL is easy for an agent or an LLM, but data science requires a uniquely human skepticism to contextualize messy data and recognize when a result is simply too good to be true. We may finally be moving past the TINA phase of foundation models for enterprise AI, where the answer to every problem has been “just use an LLM.” A new wave of frontier models look more specialized and useful for the prediction and decision problems that actually run businesses.

If you’re a regular reader, consider becoming a paid supporter 🙏

Structured and Semi-structured Data Get Their Foundation Model Moment

Kumo’s foundation mode is aimed at one of the most valuable categories of enterprise data: structured relational data. Think customer records, orders, transactions, risk signals, and event histories spread across warehouses and lakehouses. The argument is straightforward: businesses still rely on slow, hand-built ML pipelines, while LLMs lose too much structure when tables are flattened into text. AutoML tried to automate this work, but mostly by tuning the old workflow of joins, features, and model selection. Kumo’s relational foundation model  takes a different route: it treats a database as a graph, where rows and tables become connected entities. Instead of manually building features, the model attends over raw relational context and generates predictions such as churn risk, recommendations, fraud likelihood, or customer lifetime value. The appeal is workflow compression: fewer bespoke pipelines, faster iteration, and a prediction layer that agents or applications can call on demand.

The newer KumoRFM-2 release sharpens that pitch. It is described as a pre-trained model that can make predictions directly over multi-table databases using only a small number of labeled examples at query time. In other words, teams can ask predictive questions across connected tables without first flattening the data, building features, training a task-specific model, and maintaining a separate production pipeline. The reported benchmark results are notable because KumoRFM-2 claims to beat both supervised baselines and other foundation-model approaches on relational tasks while using a tiny fraction of the available labels. 

That said, I would not treat this class of foundation models as a blanket replacement for every predictive system. For lower-stakes routing, forecasts, personalization, fraud triage, and operational scoring, the value could be immediate. For quantitative finance, regulated credit decisions, medical risk scoring, or underwriting, I would expect a more cautious path: careful validation, calibration checks, monitoring, challenger models, explainability review, and human governance before deploying to production. 

Kumo isn’t the only active effort here. Prior Labs is pursuing foundation models for tabular data with TabPFN, a system aimed at row-and-column prediction problems such as churn, fraud, pricing, demand forecasting, credit scoring, predictive maintenance, and clinical risk. Its pitch is similar in spirit: reduce the amount of preprocessing, feature engineering, tuning, and model selection needed before teams get useful predictions. This momentum extends to temporal data, where an expanding roster of time-series foundation models can now generate reliable forecasts without prior, domain-specific fine-tuning. The important lesson is that structured enterprise data is becoming a first-class target for foundation-model development, with architectures that target tabular, relational, and temporal data.

Datadog is moving in a related direction, though it uses different language. At the AI Agent Conference, Ameet Talwalker described work on what Datadog calls a “world model” for observability. I think it belongs in this discussion because the inputs are largely structured and semi-structured operational data: metrics, logs, traces, service topology, code, events, alerts, and incident history. The goal appears to be a foundation model that learns how production software systems behave over time, not just a dashboard that summarizes what already happened.

The Toto 2.0 release strengthens that story. Toto began as a time-series foundation model for observability data, and the new version turns it into a family of open-weights models that scale from 4M to 2.5B parameters. Datadog’s results suggest that larger Toto models keep getting better, run much faster than the first generation, and generalize beyond observability despite being trained largely on observability and synthetic data. For enterprise teams, the practical implication is that telemetry may become a richer prediction layer for incident detection, root-cause analysis, simulation, and agentic remediation. The world model effort is the next step in that direction, moving beyond forecasting individual streams toward a learned model of how distributed systems fail, recover, and respond to change. 

AI Moves Closer to the Point of Work

The same push toward specialization applies to how models interact with people, not just what they predict. Thinking Machines is training what it calls an “interaction model” for continuous, two-way exchange across audio, video, and text, rather than the familiar turn-taking pattern where a user speaks, waits, and receives a finished response. The key design choice is that the model works in real time, listening, responding, and tracking visual cues while deeper reasoning and tool use happen in the background. The enterprise relevance is direct: customer support, field service, sales coaching, clinical workflows, industrial operations, and design reviews all involve interruption, demonstration, and real-time correction. These are not clean prompt-and-response tasks. In many settings, the best interface is less like filing a ticket and more like working alongside a capable colleague. This is still a research preview with real constraints around connectivity and session length, but the direction is one that product and platform teams should be tracking.

There is also a related wave of specialized interface models that expand where AI can be embedded in workflows. Google DeepMind’s AI-enabled pointer imagines assistance that follows users across applications, understands what is on screen, and accepts natural instructions like “fix this” or “move that” without requiring a carefully written prompt. OpenAI’s real-time voice models point in a similar direction, with separate models for live reasoning, translation, and streaming transcription. The common thread is that these are less about building a general chatbot and more about making AI useful at the point of work. Put those alongside the structured data and observability models covered earlier and a clearer picture emerges: enterprise AI is developing a stack of specialized models, from prediction engines on relational and telemetry data to real-time interaction layers, all coordinated around the workflows where value is actually created.

Foundation Models Start to Specialize

World models is an emerging category I have not treated fully here. There is a lot happening, and some of it is genuinely interesting. But the world-model work I know best is not yet centered on the enterprise workflows I spend my time thinking about. My recent conversations with the founders of Rhoda AI and Odyssey made that clear. Rhoda is applying video-native foundation models to robotics tasks such as decanting, return processing, and container breakdown. Odyssey is building interactive world simulations, with early relevance for gaming, robotics, media, and other visual environments. These are important directions, but they sit closer to physical systems and simulated worlds than to the messy operational data, prediction problems, and knowledge-work interfaces that dominate most enterprise AI roadmaps.

The bigger takeaway is that foundation models are starting to specialize in useful ways. LLMs and multimodal chatbots will remain central, but they will not carry the enterprise AI stack by themselves. Enterprise AI will likely depend on a portfolio of more targeted models: relational models for structured data, time-series models for forecasting, observability models for production systems, interaction models for real-time work, and eventually world models where simulation and physical reasoning matter. The better end state is not one giant model that handles everything, but a routing layer that reads the task and directs it to the right model for the job.

Discover more from Gradient Flow

Subscribe now to keep reading and get access to the full archive.

Continue reading