Site icon Gradient Flow

Deep Dive into OpenAI’s Agent Ecosystem

Secure Your Spot at the AI Agent Conference (NYC, May 6-7) – Limited Seats! Use code GRADIENTFLOW25 for 25% off before it expires.

In recent weeks, I have been examining the rapid evolution of AI agents, a field where OpenAI’s latest offerings represent just one approach in an increasingly transformative and globally competitive landscape. As my analysis of Manus, (the “general AI agent” from Chinese startup Monica.ai) revealed, significant innovation is emerging from diverse sources, with Manus even outperforming OpenAI’s offerings on the General AI Assistants (GAIA) benchmark. This isn‘t a winner-take-all market; it’s a rapidly developing global ecosystem where both established AI labs and nimble startups are driving progress. OpenAI itself has identified agents as a major growth area, underscoring the strategic importance of this technology.

OpenAI‘s new agent-building tools and Deep Research warrant analysis not because they necessarily represent the best solutions in their class, but because they crystallize key trends shaping the broader agent landscape. Their approach to web search, file search, and, critically, computer use tools (enabling GUI-based interaction) reflects a wider industry shift towards layered agent architectures and modular agent design. This move towards GUI interaction is particularly significant as it enables AI to interact with virtually any software through graphical interfaces, dramatically expanding the scope of automation beyond systems with specialized APIs. Manus exemplifies this modularity, reportedly leveraging a multi-agent system that incorporates models like Anthropic’s Claude and finetuned Qwen models. The emphasis is shifting from monolithic models to the effective orchestration of specialized agents.

While healthy skepticism towards any single vendor is warranted, the current moment is defined by the rapid translation of theoretical concepts into practical applications. We‘re witnessing a convergence of GUI-based interaction, layered and modular architectures, and the maturation of Planner-Actor-Validator and Tool-Use design patterns.  Furthermore, as Manus demonstrates, competitive advantage increasingly stems from effective product engineering and integration of existing models, rather than solely from foundational research breakthroughs. This intensifies competition and highlights the importance of execution speed. This is an exceptionally productive period for those building agentic applications – one where understanding the evolving landscape, including the critical challenges of accountability, safety, and real-world evaluation, has become essential for technologists and business leaders alike. The technical capabilities to create useful autonomous agents already exist; now the race is on to deliver reliable, safe, and truly effective implementations that address these challenges.


Table of Contents

I. OpenAI’s New Agent-Building Tools

II. Deep Research


I. OpenAI’s New Agent-Building Tools

OpenAI’s Core Building Blocks

OpenAI defines an agent as a system capable of independent action to perform tasks on a user’s behalf. They announced three core, built-in tools to facilitate agent development:

  1. Web Search Tool: Provides models with access to up-to-date information from the internet. It’s powered by a fine-tuned GPT-4o model (or a smaller variant) optimized for information retrieval and source citation. This is the same technology powering search functionality within ChatGPT.
  2. File Search Tool: Enables developers to upload and perform semantic searches over their own private documents. Crucially, it includes metadata filtering for precise queries and a direct search endpoint that bypasses model filtering, offering greater control and accuracy, especially for Retrieval-Augmented Generation (RAG).
  3. Computer Use Tool: Brings the capabilities of ChatGPT’s “Operator” feature to the API. It allows agents to control computers (including virtual machines and legacy applications) via their graphical user interfaces (GUIs). This enables automation of tasks without requiring direct API access. It uses the same model as Operator and has demonstrated strong performance on benchmarks like OS-World, WebArena, and WebVoyager.

These tools are designed to address the common challenge of integrating disparate, low-level APIs when building agent applications.

Back to top

Responses API: The Evolution Beyond Chat Completions

The Responses API is a new, more flexible API designed as the eventual successor to the Chat Completions API. It’s built to support the complex, multi-turn interactions and tool use that are essential for sophisticated agents. Key differences and features include:

The Chat Completions API will continue to be supported with new models and capabilities. However, some new features and models, particularly those related to advanced agent functionality, will be exclusive to the Responses API. Migration from Chat Completions to Responses is intended to be straightforward.

Back to top

Web Search Tool: Enhancing AI with Real-Time Internet Access

The Web Search Tool allows models to retrieve and analyze current information from the internet, enhancing the factual accuracy and timeliness of responses. It leverages the same technology as ChatGPT’s search feature and provides:

This ensures that AI applications can access real-time information beyond their training data.

Back to top

Enhanced File Search: New Metadata and Direct Query Capabilities

The File Search Tool, previously part of the Assistants API, has been significantly enhanced with:

These enhancements make RAG implementations more flexible and efficient for applications leveraging private knowledge bases.

Back to top

Computer Use Tool: Bringing GUI Automation to AI Agents

The Computer Use Tool brings the functionality of ChatGPT’s “Operator” to the API. It allows AI agents to control computers by interacting with graphical user interfaces (GUIs). This enables:

The tool utilizes the same model powering Operator in ChatGPT, with strong performance on benchmarks such as OS-World, WebArena, and WebVoyager.

(click to enlarge)

Back to top

Agents SDK: Streamlining Multi-Agent Application Development

The Agents SDK (formerly “Swarm”) is an open-source framework (installable via pip install openai-agents, with JavaScript support coming soon) designed to simplify the orchestration of multiple agents within a single application. Key features include:

Back to top

Assistants API Migration: Transition Timeline and Strategy

OpenAI plans to sunset the Assistants API in 2026. Before this occurs:

  1. Feature Parity: All Assistants API functionality will be incorporated into the Responses API.
  2. Migration Guide: A comprehensive migration guide will be provided to assist developers in transitioning their applications smoothly, without data or functionality loss.
  3. Ample Time: Developers will have sufficient time to migrate their applications.

This consolidation aims for a unified and streamlined developer experience. The Responses API will maintain support for multimodal inputs and all agent-building blocks currently in the Assistants API.

Back to top

Combining Agent Tools: Practical Use Cases and Solutions

In their announcement video, OpenAI shared several examples illustrate the combined power of these tools:

The Responses API allows these tools to be called within a single API response, streamlining development.

Back to top

Advantages for Development Teams: Key Benefits Overview

These tools and the Responses API offer significant advantages:

These advancements enable teams to build more sophisticated and effective AI applications more quickly and efficiently, shifting the focus from simply answering questions to performing tasks autonomously.

Back to top

Deep Research as a Model: Understanding the Practical Application of New Tools

Deep Research is an example of an existing agent that OpenAI has already built, which uses the kinds of capabilities they’re now making available to developers through their API. Specifically:

  1. Deep Research is an existing agent that can condense a week’s worth of research into 15 minutes.
  2. The tools being announced (Web Search Tool, File Search Tool, Computer Use Tool) and the new Responses API are positioned as enabling developers to build similar agent capabilities to what’s already in Deep Research.
  3. Deep Research is described as a product that uses “multiple model turns and multiple tool calls behind the scenes” – which is precisely what the new Responses API is designed to facilitate for developers.

In essence, Deep Research serves as a concrete example of what developers can now build themselves using the newly announced tools and APIs.  See below for more on Deep Research.

Back to top


Consider supporting our work by leaving a small tip💰 here and inviting your friends and colleagues to subscribe to our newsletter📩


II. Deep Research

Introduction to Deep Research

Deep Research is an AI agent developed by OpenAI, integrated within ChatGPT, that automates comprehensive online research. It’s designed for complex research tasks that usually take hours, delivering detailed reports with sources and citations in 5-30 minutes. It’s significantly more thorough than standard ChatGPT responses because it’s specifically optimized for tasks needing extensive web research and external context, going beyond the model’s pre-trained knowledge.

Back to top

Deep Research’s Role in OpenAI’s Agent Vision

Deep Research and Operator are currently separate products, but they represent steps toward a unified AI agent that can seamlessly handle various tasks (web search, computer operation, etc.)—a “fusion agent” that combines web, API, and desktop interactions. Deep Research exemplifies:

Back to top

The Deep Research Engine: A Six-Stage Iterative Research Process

Deep Research uses a fine-tuned version of OpenAI’s O3 reasoning model, trained end-to-end with reinforcement learning on browsing and reasoning tasks. It has access to a browsing tool and a Python tool. The core process is iterative:

  1. Query Understanding: The model analyzes the user’s request.
  2. Search: It formulates and executes web searches.
  3. Information Extraction: It reads and extracts relevant information from web pages.
  4. Synthesis: It synthesizes the gathered information.
  5. Decision: It decides whether to continue searching or generate a report.
  6. Report Generation: If sufficient information is gathered, it creates a structured report with citations.

This iterative, end-to-end approach allows the model to learn complex research strategies that might not be apparent to human designers.

Back to top

User Profiles and Applications

Deep Research targets anyone doing “knowledge work,” both professionally and personally. Common use cases include:

Surprising applications include coding (finding documentation, writing scripts) and medical research (finding literature, identifying clinical trials).

Back to top

Accuracy Mechanisms: Citations, Verification, and Limitations

The primary mechanism used to evaluate accuracy is citation: the reports include references to the sources used. The training process emphasizes correct citation. The clarification flow also helps ensure the model understands the user’s needs. However, the model can still make mistakes or rely on unreliable sources. Users should always verify critical information using the provided citations. This is an ongoing area of improvement.

Back to top

Deep Research Usage Guide: Five Strategies for Effective Implementation
(click to enlarge)

Back to top

End-to-End Training vs. Modular Systems

The key differentiator behind Deep Research is end-to-end training using reinforcement learning. Most other approaches use a modular design, with language models acting as decision-making nodes within a pre-defined graph of operations. Deep Research, however, is trained holistically on complete research tasks. This gives it more flexibility and adaptability to handle edge cases, unexpected information, and complex queries. It learns to adjust its strategy based on the information it finds.

By optimizing directly for research outcomes through reinforcement learning, Deep Research can develop more sophisticated strategies than hand-coded systems. This approach of taking a state-of-the-art reasoning model, giving it access to tools, and optimizing it directly for outcomes is what makes Deep Research particularly powerful.

Back to top

Architectural Decisions: Selecting Between End-to-End and Modular Approaches

The choice depends on the task’s characteristics:

Back to top

Reinforcement Learning’s Critical Role

Reinforcement learning (RL) allows the agent to adapt its approach in real-time, unlike fixed scripts. It can pivot based on the information it finds, making it more flexible and effective for tasks with unpredictable search paths. RL is now viable because powerful pre-trained language models (the “cake” and “frosting” in the analogy) provide a strong foundation for RL (the “cherry on top”) to optimize for specific tasks.

Back to top

Technical Obstacles: Training Data, Accuracy, and User Interaction Design

In building Deep Research, creating high-quality training datasets was a major challenge, requiring:

Ensuring factual accuracy and proper source attribution (citations) was another key challenge. The design of an effective “clarification flow” (where the model asks clarifying questions) was also crucial.

Back to top

Next-Generation Features: The Future of AI Research Agents

Future developments include:

Back to top

How Deep Research Will Transform Work and Education

These agents are tools to enhance human capabilities, not replace jobs. They automate time-consuming tasks, freeing people for higher-level work and enabling tasks that were previously impractical. In education, they offer personalized and efficient learning experiences, adapting to individual needs and providing a more engaging alternative to traditional methods.

Back to top

Staying Current: A Practitioner’s Guide to AI Agent Developments

Back to top


Related Content
Exit mobile version