111 KiB
111 KiB
This file is a merged representation of a subset of the codebase, containing specifically included files, combined into a single document by Repomix. The content has been processed where security check has been disabled.
File Summary
Purpose
This file contains a packed representation of a subset of the repository's contents that is considered the most important context. It is designed to be easily consumable by AI systems for analysis, code review, or other automated processes.
File Format
The content is organized as follows:
- This summary section
- Repository information
- Directory structure
- Repository files (if enabled)
- Multiple file entries, each consisting of: a. A header with the file path (## File: path/to/file) b. The full contents of the file in a code block
Usage Guidelines
- This file should be treated as read-only. Any changes should be made to the original repository files, not this packed version.
- When processing this file, use the file path to distinguish between different files in the repository.
- Be aware that this file may contain sensitive information. Handle it with the same level of security as you would the original repository.
Notes
- Some files may have been excluded based on .gitignore rules and Repomix's configuration
- Binary files are not included in this packed representation. Please refer to the Repository Structure section for a complete list of file paths, including binary files
- Only files matching these patterns are included: docs/*.md
- Files matching patterns in .gitignore are excluded
- Files matching default ignore patterns are excluded
- Security check has been disabled - content may contain sensitive information
- Files are sorted by Git change count (files with more changes are at the bottom)
Directory Structure
docs/
agents.md
config.md
context.md
examples.md
guardrails.md
handoffs.md
index.md
mcp.md
multi_agent.md
quickstart.md
release.md
repl.md
results.md
running_agents.md
streaming.md
tools.md
tracing.md
usage.md
visualization.md
Files
File: docs/agents.md
# Agents
Agents are the core building block in your apps. An agent is a large language model (LLM), configured with instructions and tools.
## Basic configuration
The most common properties of an agent you'll configure are:
- `name`: A required string that identifies your agent.
- `instructions`: also known as a developer message or system prompt.
- `model`: which LLM to use, and optional `model_settings` to configure model tuning parameters like temperature, top_p, etc.
- `tools`: Tools that the agent can use to achieve its tasks.
```python
from agents import Agent, ModelSettings, function_tool
@function_tool
def get_weather(city: str) -> str:
"""returns weather info for the specified city."""
return f"The weather in {city} is sunny"
agent = Agent(
name="Haiku agent",
instructions="Always respond in haiku form",
model="gpt-5-nano",
tools=[get_weather],
)
```
## Context
Agents are generic on their `context` type. Context is a dependency-injection tool: it's an object you create and pass to `Runner.run()`, that is passed to every agent, tool, handoff etc, and it serves as a grab bag of dependencies and state for the agent run. You can provide any Python object as the context.
```python
@dataclass
class UserContext:
name: str
uid: str
is_pro_user: bool
async def fetch_purchases() -> list[Purchase]:
return ...
agent = Agent[UserContext](
...,
)
```
## Output types
By default, agents produce plain text (i.e. `str`) outputs. If you want the agent to produce a particular type of output, you can use the `output_type` parameter. A common choice is to use [Pydantic](https://docs.pydantic.dev/) objects, but we support any type that can be wrapped in a Pydantic [TypeAdapter](https://docs.pydantic.dev/latest/api/type_adapter/) - dataclasses, lists, TypedDict, etc.
```python
from pydantic import BaseModel
from agents import Agent
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
agent = Agent(
name="Calendar extractor",
instructions="Extract calendar events from text",
output_type=CalendarEvent,
)
```
!!! note
When you pass an `output_type`, that tells the model to use [structured outputs](https://platform.openai.com/docs/guides/structured-outputs) instead of regular plain text responses.
## Multi-agent system design patterns
There are many ways to design multi‑agent systems, but we commonly see two broadly applicable patterns:
1. Manager (agents as tools): A central manager/orchestrator invokes specialized sub‑agents as tools and retains control of the conversation.
2. Handoffs: Peer agents hand off control to a specialized agent that takes over the conversation. This is decentralized.
See [our practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf) for more details.
### Manager (agents as tools)
The `customer_facing_agent` handles all user interaction and invokes specialized sub‑agents exposed as tools. Read more in the [tools](tools.md#agents-as-tools) documentation.
```python
from agents import Agent
booking_agent = Agent(...)
refund_agent = Agent(...)
customer_facing_agent = Agent(
name="Customer-facing agent",
instructions=(
"Handle all direct user communication. "
"Call the relevant tools when specialized expertise is needed."
),
tools=[
booking_agent.as_tool(
tool_name="booking_expert",
tool_description="Handles booking questions and requests.",
),
refund_agent.as_tool(
tool_name="refund_expert",
tool_description="Handles refund questions and requests.",
)
],
)
```
### Handoffs
Handoffs are sub‑agents the agent can delegate to. When a handoff occurs, the delegated agent receives the conversation history and takes over the conversation. This pattern enables modular, specialized agents that excel at a single task. Read more in the [handoffs](handoffs.md) documentation.
```python
from agents import Agent
booking_agent = Agent(...)
refund_agent = Agent(...)
triage_agent = Agent(
name="Triage agent",
instructions=(
"Help the user with their questions. "
"If they ask about booking, hand off to the booking agent. "
"If they ask about refunds, hand off to the refund agent."
),
handoffs=[booking_agent, refund_agent],
)
```
## Dynamic instructions
In most cases, you can provide instructions when you create the agent. However, you can also provide dynamic instructions via a function. The function will receive the agent and context, and must return the prompt. Both regular and `async` functions are accepted.
```python
def dynamic_instructions(
context: RunContextWrapper[UserContext], agent: Agent[UserContext]
) -> str:
return f"The user's name is {context.context.name}. Help them with their questions."
agent = Agent[UserContext](
name="Triage agent",
instructions=dynamic_instructions,
)
```
## Lifecycle events (hooks)
Sometimes, you want to observe the lifecycle of an agent. For example, you may want to log events, or pre-fetch data when certain events occur. You can hook into the agent lifecycle with the `hooks` property. Subclass the [`AgentHooks`][agents.lifecycle.AgentHooks] class, and override the methods you're interested in.
## Guardrails
Guardrails allow you to run checks/validations on user input in parallel to the agent running, and on the agent's output once it is produced. For example, you could screen the user's input and agent's output for relevance. Read more in the [guardrails](guardrails.md) documentation.
## Cloning/copying agents
By using the `clone()` method on an agent, you can duplicate an Agent, and optionally change any properties you like.
```python
pirate_agent = Agent(
name="Pirate",
instructions="Write like a pirate",
model="gpt-4.1",
)
robot_agent = pirate_agent.clone(
name="Robot",
instructions="Write like a robot",
)
```
## Forcing tool use
Supplying a list of tools doesn't always mean the LLM will use a tool. You can force tool use by setting [`ModelSettings.tool_choice`][agents.model_settings.ModelSettings.tool_choice]. Valid values are:
1. `auto`, which allows the LLM to decide whether or not to use a tool.
2. `required`, which requires the LLM to use a tool (but it can intelligently decide which tool).
3. `none`, which requires the LLM to _not_ use a tool.
4. Setting a specific string e.g. `my_tool`, which requires the LLM to use that specific tool.
```python
from agents import Agent, Runner, function_tool, ModelSettings
@function_tool
def get_weather(city: str) -> str:
"""Returns weather info for the specified city."""
return f"The weather in {city} is sunny"
agent = Agent(
name="Weather Agent",
instructions="Retrieve weather details.",
tools=[get_weather],
model_settings=ModelSettings(tool_choice="get_weather")
)
```
## Tool Use Behavior
The `tool_use_behavior` parameter in the `Agent` configuration controls how tool outputs are handled:
- `"run_llm_again"`: The default. Tools are run, and the LLM processes the results to produce a final response.
- `"stop_on_first_tool"`: The output of the first tool call is used as the final response, without further LLM processing.
```python
from agents import Agent, Runner, function_tool, ModelSettings
@function_tool
def get_weather(city: str) -> str:
"""Returns weather info for the specified city."""
return f"The weather in {city} is sunny"
agent = Agent(
name="Weather Agent",
instructions="Retrieve weather details.",
tools=[get_weather],
tool_use_behavior="stop_on_first_tool"
)
```
- `StopAtTools(stop_at_tool_names=[...])`: Stops if any specified tool is called, using its output as the final response.
```python
from agents import Agent, Runner, function_tool
from agents.agent import StopAtTools
@function_tool
def get_weather(city: str) -> str:
"""Returns weather info for the specified city."""
return f"The weather in {city} is sunny"
@function_tool
def sum_numbers(a: int, b: int) -> int:
"""Adds two numbers."""
return a + b
agent = Agent(
name="Stop At Stock Agent",
instructions="Get weather or sum numbers.",
tools=[get_weather, sum_numbers],
tool_use_behavior=StopAtTools(stop_at_tool_names=["get_weather"])
)
```
- `ToolsToFinalOutputFunction`: A custom function that processes tool results and decides whether to stop or continue with the LLM.
```python
from agents import Agent, Runner, function_tool, FunctionToolResult, RunContextWrapper
from agents.agent import ToolsToFinalOutputResult
from typing import List, Any
@function_tool
def get_weather(city: str) -> str:
"""Returns weather info for the specified city."""
return f"The weather in {city} is sunny"
def custom_tool_handler(
context: RunContextWrapper[Any],
tool_results: List[FunctionToolResult]
) -> ToolsToFinalOutputResult:
"""Processes tool results to decide final output."""
for result in tool_results:
if result.output and "sunny" in result.output:
return ToolsToFinalOutputResult(
is_final_output=True,
final_output=f"Final weather: {result.output}"
)
return ToolsToFinalOutputResult(
is_final_output=False,
final_output=None
)
agent = Agent(
name="Weather Agent",
instructions="Retrieve weather details.",
tools=[get_weather],
tool_use_behavior=custom_tool_handler
)
```
!!! note
To prevent infinite loops, the framework automatically resets `tool_choice` to "auto" after a tool call. This behavior is configurable via [`agent.reset_tool_choice`][agents.agent.Agent.reset_tool_choice]. The infinite loop is because tool results are sent to the LLM, which then generates another tool call because of `tool_choice`, ad infinitum.
File: docs/config.md
# Configuring the SDK
## API keys and clients
By default, the SDK looks for the `OPENAI_API_KEY` environment variable for LLM requests and tracing, as soon as it is imported. If you are unable to set that environment variable before your app starts, you can use the [set_default_openai_key()][agents.set_default_openai_key] function to set the key.
```python
from agents import set_default_openai_key
set_default_openai_key("sk-...")
```
Alternatively, you can also configure an OpenAI client to be used. By default, the SDK creates an `AsyncOpenAI` instance, using the API key from the environment variable or the default key set above. You can change this by using the [set_default_openai_client()][agents.set_default_openai_client] function.
```python
from openai import AsyncOpenAI
from agents import set_default_openai_client
custom_client = AsyncOpenAI(base_url="...", api_key="...")
set_default_openai_client(custom_client)
```
Finally, you can also customize the OpenAI API that is used. By default, we use the OpenAI Responses API. You can override this to use the Chat Completions API by using the [set_default_openai_api()][agents.set_default_openai_api] function.
```python
from agents import set_default_openai_api
set_default_openai_api("chat_completions")
```
## Tracing
Tracing is enabled by default. It uses the OpenAI API keys from the section above by default (i.e. the environment variable or the default key you set). You can specifically set the API key used for tracing by using the [`set_tracing_export_api_key`][agents.set_tracing_export_api_key] function.
```python
from agents import set_tracing_export_api_key
set_tracing_export_api_key("sk-...")
```
You can also disable tracing entirely by using the [`set_tracing_disabled()`][agents.set_tracing_disabled] function.
```python
from agents import set_tracing_disabled
set_tracing_disabled(True)
```
## Debug logging
The SDK has two Python loggers without any handlers set. By default, this means that warnings and errors are sent to `stdout`, but other logs are suppressed.
To enable verbose logging, use the [`enable_verbose_stdout_logging()`][agents.enable_verbose_stdout_logging] function.
```python
from agents import enable_verbose_stdout_logging
enable_verbose_stdout_logging()
```
Alternatively, you can customize the logs by adding handlers, filters, formatters, etc. You can read more in the [Python logging guide](https://docs.python.org/3/howto/logging.html).
```python
import logging
logger = logging.getLogger("openai.agents") # or openai.agents.tracing for the Tracing logger
# To make all logs show up
logger.setLevel(logging.DEBUG)
# To make info and above show up
logger.setLevel(logging.INFO)
# To make warning and above show up
logger.setLevel(logging.WARNING)
# etc
# You can customize this as needed, but this will output to `stderr` by default
logger.addHandler(logging.StreamHandler())
```
### Sensitive data in logs
Certain logs may contain sensitive data (for example, user data). If you want to disable this data from being logged, set the following environment variables.
To disable logging LLM inputs and outputs:
```bash
export OPENAI_AGENTS_DONT_LOG_MODEL_DATA=1
```
To disable logging tool inputs and outputs:
```bash
export OPENAI_AGENTS_DONT_LOG_TOOL_DATA=1
```
File: docs/context.md
# Context management
Context is an overloaded term. There are two main classes of context you might care about:
1. Context available locally to your code: this is data and dependencies you might need when tool functions run, during callbacks like `on_handoff`, in lifecycle hooks, etc.
2. Context available to LLMs: this is data the LLM sees when generating a response.
## Local context
This is represented via the [`RunContextWrapper`][agents.run_context.RunContextWrapper] class and the [`context`][agents.run_context.RunContextWrapper.context] property within it. The way this works is:
1. You create any Python object you want. A common pattern is to use a dataclass or a Pydantic object.
2. You pass that object to the various run methods (e.g. `Runner.run(..., **context=whatever**))`.
3. All your tool calls, lifecycle hooks etc will be passed a wrapper object, `RunContextWrapper[T]`, where `T` represents your context object type which you can access via `wrapper.context`.
The **most important** thing to be aware of: every agent, tool function, lifecycle etc for a given agent run must use the same _type_ of context.
You can use the context for things like:
- Contextual data for your run (e.g. things like a username/uid or other information about the user)
- Dependencies (e.g. logger objects, data fetchers, etc)
- Helper functions
!!! danger "Note"
The context object is **not** sent to the LLM. It is purely a local object that you can read from, write to and call methods on it.
```python
import asyncio
from dataclasses import dataclass
from agents import Agent, RunContextWrapper, Runner, function_tool
@dataclass
class UserInfo: # (1)!
name: str
uid: int
@function_tool
async def fetch_user_age(wrapper: RunContextWrapper[UserInfo]) -> str: # (2)!
"""Fetch the age of the user. Call this function to get user's age information."""
return f"The user {wrapper.context.name} is 47 years old"
async def main():
user_info = UserInfo(name="John", uid=123)
agent = Agent[UserInfo]( # (3)!
name="Assistant",
tools=[fetch_user_age],
)
result = await Runner.run( # (4)!
starting_agent=agent,
input="What is the age of the user?",
context=user_info,
)
print(result.final_output) # (5)!
# The user John is 47 years old.
if __name__ == "__main__":
asyncio.run(main())
```
1. This is the context object. We've used a dataclass here, but you can use any type.
2. This is a tool. You can see it takes a `RunContextWrapper[UserInfo]`. The tool implementation reads from the context.
3. We mark the agent with the generic `UserInfo`, so that the typechecker can catch errors (for example, if we tried to pass a tool that took a different context type).
4. The context is passed to the `run` function.
5. The agent correctly calls the tool and gets the age.
---
### Advanced: `ToolContext`
In some cases, you might want to access extra metadata about the tool being executed — such as its name, call ID, or raw argument string.
For this, you can use the [`ToolContext`][agents.tool_context.ToolContext] class, which extends `RunContextWrapper`.
```python
from typing import Annotated
from pydantic import BaseModel, Field
from agents import Agent, Runner, function_tool
from agents.tool_context import ToolContext
class WeatherContext(BaseModel):
user_id: str
class Weather(BaseModel):
city: str = Field(description="The city name")
temperature_range: str = Field(description="The temperature range in Celsius")
conditions: str = Field(description="The weather conditions")
@function_tool
def get_weather(ctx: ToolContext[WeatherContext], city: Annotated[str, "The city to get the weather for"]) -> Weather:
print(f"[debug] Tool context: (name: {ctx.tool_name}, call_id: {ctx.tool_call_id}, args: {ctx.tool_arguments})")
return Weather(city=city, temperature_range="14-20C", conditions="Sunny with wind.")
agent = Agent(
name="Weather Agent",
instructions="You are a helpful agent that can tell the weather of a given city.",
tools=[get_weather],
)
```
`ToolContext` provides the same `.context` property as `RunContextWrapper`,
plus additional fields specific to the current tool call:
- `tool_name` – the name of the tool being invoked
- `tool_call_id` – a unique identifier for this tool call
- `tool_arguments` – the raw argument string passed to the tool
Use `ToolContext` when you need tool-level metadata during execution.
For general context sharing between agents and tools, `RunContextWrapper` remains sufficient.
---
## Agent/LLM context
When an LLM is called, the **only** data it can see is from the conversation history. This means that if you want to make some new data available to the LLM, you must do it in a way that makes it available in that history. There are a few ways to do this:
1. You can add it to the Agent `instructions`. This is also known as a "system prompt" or "developer message". System prompts can be static strings, or they can be dynamic functions that receive the context and output a string. This is a common tactic for information that is always useful (for example, the user's name or the current date).
2. Add it to the `input` when calling the `Runner.run` functions. This is similar to the `instructions` tactic, but allows you to have messages that are lower in the [chain of command](https://cdn.openai.com/spec/model-spec-2024-05-08.html#follow-the-chain-of-command).
3. Expose it via function tools. This is useful for _on-demand_ context - the LLM decides when it needs some data, and can call the tool to fetch that data.
4. Use retrieval or web search. These are special tools that are able to fetch relevant data from files or databases (retrieval), or from the web (web search). This is useful for "grounding" the response in relevant contextual data.
File: docs/examples.md
# Examples
Check out a variety of sample implementations of the SDK in the examples section of the [repo](https://github.com/openai/openai-agents-python/tree/main/examples). The examples are organized into several categories that demonstrate different patterns and capabilities.
## Categories
- **[agent_patterns](https://github.com/openai/openai-agents-python/tree/main/examples/agent_patterns):**
Examples in this category illustrate common agent design patterns, such as
- Deterministic workflows
- Agents as tools
- Parallel agent execution
- Conditional tool usage
- Input/output guardrails
- LLM as a judge
- Routing
- Streaming guardrails
- **[basic](https://github.com/openai/openai-agents-python/tree/main/examples/basic):**
These examples showcase foundational capabilities of the SDK, such as
- Hello world examples (Default model, GPT-5, open-weight model)
- Agent lifecycle management
- Dynamic system prompts
- Streaming outputs (text, items, function call args)
- Prompt templates
- File handling (local and remote, images and PDFs)
- Usage tracking
- Non-strict output types
- Previous response ID usage
- **[customer_service](https://github.com/openai/openai-agents-python/tree/main/examples/customer_service):**
Example customer service system for an airline.
- **[financial_research_agent](https://github.com/openai/openai-agents-python/tree/main/examples/financial_research_agent):**
A financial research agent that demonstrates structured research workflows with agents and tools for financial data analysis.
- **[handoffs](https://github.com/openai/openai-agents-python/tree/main/examples/handoffs):**
See practical examples of agent handoffs with message filtering.
- **[hosted_mcp](https://github.com/openai/openai-agents-python/tree/main/examples/hosted_mcp):**
Examples demonstrating how to use hosted MCP (Model Context Protocol) connectors and approvals.
- **[mcp](https://github.com/openai/openai-agents-python/tree/main/examples/mcp):**
Learn how to build agents with MCP (Model Context Protocol), including:
- Filesystem examples
- Git examples
- MCP prompt server examples
- SSE (Server-Sent Events) examples
- Streamable HTTP examples
- **[memory](https://github.com/openai/openai-agents-python/tree/main/examples/memory):**
Examples of different memory implementations for agents, including:
- SQLite session storage
- Advanced SQLite session storage
- Redis session storage
- SQLAlchemy session storage
- Encrypted session storage
- OpenAI session storage
- **[model_providers](https://github.com/openai/openai-agents-python/tree/main/examples/model_providers):**
Explore how to use non-OpenAI models with the SDK, including custom providers and LiteLLM integration.
- **[realtime](https://github.com/openai/openai-agents-python/tree/main/examples/realtime):**
Examples showing how to build real-time experiences using the SDK, including:
- Web applications
- Command-line interfaces
- Twilio integration
- **[reasoning_content](https://github.com/openai/openai-agents-python/tree/main/examples/reasoning_content):**
Examples demonstrating how to work with reasoning content and structured outputs.
- **[research_bot](https://github.com/openai/openai-agents-python/tree/main/examples/research_bot):**
Simple deep research clone that demonstrates complex multi-agent research workflows.
- **[tools](https://github.com/openai/openai-agents-python/tree/main/examples/tools):**
Learn how to implement OAI hosted tools such as:
- Web search and web search with filters
- File search
- Code interpreter
- Computer use
- Image generation
- **[voice](https://github.com/openai/openai-agents-python/tree/main/examples/voice):**
See examples of voice agents, using our TTS and STT models, including streamed voice examples.
File: docs/guardrails.md
# Guardrails
Guardrails run _in parallel_ to your agents, enabling you to do checks and validations of user input. For example, imagine you have an agent that uses a very smart (and hence slow/expensive) model to help with customer requests. You wouldn't want malicious users to ask the model to help them with their math homework. So, you can run a guardrail with a fast/cheap model. If the guardrail detects malicious usage, it can immediately raise an error, which stops the expensive model from running and saves you time/money.
There are two kinds of guardrails:
1. Input guardrails run on the initial user input
2. Output guardrails run on the final agent output
## Input guardrails
Input guardrails run in 3 steps:
1. First, the guardrail receives the same input passed to the agent.
2. Next, the guardrail function runs to produce a [`GuardrailFunctionOutput`][agents.guardrail.GuardrailFunctionOutput], which is then wrapped in an [`InputGuardrailResult`][agents.guardrail.InputGuardrailResult]
3. Finally, we check if [`.tripwire_triggered`][agents.guardrail.GuardrailFunctionOutput.tripwire_triggered] is true. If true, an [`InputGuardrailTripwireTriggered`][agents.exceptions.InputGuardrailTripwireTriggered] exception is raised, so you can appropriately respond to the user or handle the exception.
!!! Note
Input guardrails are intended to run on user input, so an agent's guardrails only run if the agent is the *first* agent. You might wonder, why is the `guardrails` property on the agent instead of passed to `Runner.run`? It's because guardrails tend to be related to the actual Agent - you'd run different guardrails for different agents, so colocating the code is useful for readability.
## Output guardrails
Output guardrails run in 3 steps:
1. First, the guardrail receives the output produced by the agent.
2. Next, the guardrail function runs to produce a [`GuardrailFunctionOutput`][agents.guardrail.GuardrailFunctionOutput], which is then wrapped in an [`OutputGuardrailResult`][agents.guardrail.OutputGuardrailResult]
3. Finally, we check if [`.tripwire_triggered`][agents.guardrail.GuardrailFunctionOutput.tripwire_triggered] is true. If true, an [`OutputGuardrailTripwireTriggered`][agents.exceptions.OutputGuardrailTripwireTriggered] exception is raised, so you can appropriately respond to the user or handle the exception.
!!! Note
Output guardrails are intended to run on the final agent output, so an agent's guardrails only run if the agent is the *last* agent. Similar to the input guardrails, we do this because guardrails tend to be related to the actual Agent - you'd run different guardrails for different agents, so colocating the code is useful for readability.
## Tripwires
If the input or output fails the guardrail, the Guardrail can signal this with a tripwire. As soon as we see a guardrail that has triggered the tripwires, we immediately raise a `{Input,Output}GuardrailTripwireTriggered` exception and halt the Agent execution.
## Implementing a guardrail
You need to provide a function that receives input, and returns a [`GuardrailFunctionOutput`][agents.guardrail.GuardrailFunctionOutput]. In this example, we'll do this by running an Agent under the hood.
```python
from pydantic import BaseModel
from agents import (
Agent,
GuardrailFunctionOutput,
InputGuardrailTripwireTriggered,
RunContextWrapper,
Runner,
TResponseInputItem,
input_guardrail,
)
class MathHomeworkOutput(BaseModel):
is_math_homework: bool
reasoning: str
guardrail_agent = Agent( # (1)!
name="Guardrail check",
instructions="Check if the user is asking you to do their math homework.",
output_type=MathHomeworkOutput,
)
@input_guardrail
async def math_guardrail( # (2)!
ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output, # (3)!
tripwire_triggered=result.final_output.is_math_homework,
)
agent = Agent( # (4)!
name="Customer support agent",
instructions="You are a customer support agent. You help customers with their questions.",
input_guardrails=[math_guardrail],
)
async def main():
# This should trip the guardrail
try:
await Runner.run(agent, "Hello, can you help me solve for x: 2x + 3 = 11?")
print("Guardrail didn't trip - this is unexpected")
except InputGuardrailTripwireTriggered:
print("Math homework guardrail tripped")
```
1. We'll use this agent in our guardrail function.
2. This is the guardrail function that receives the agent's input/context, and returns the result.
3. We can include extra information in the guardrail result.
4. This is the actual agent that defines the workflow.
Output guardrails are similar.
```python
from pydantic import BaseModel
from agents import (
Agent,
GuardrailFunctionOutput,
OutputGuardrailTripwireTriggered,
RunContextWrapper,
Runner,
output_guardrail,
)
class MessageOutput(BaseModel): # (1)!
response: str
class MathOutput(BaseModel): # (2)!
reasoning: str
is_math: bool
guardrail_agent = Agent(
name="Guardrail check",
instructions="Check if the output includes any math.",
output_type=MathOutput,
)
@output_guardrail
async def math_guardrail( # (3)!
ctx: RunContextWrapper, agent: Agent, output: MessageOutput
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, output.response, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.is_math,
)
agent = Agent( # (4)!
name="Customer support agent",
instructions="You are a customer support agent. You help customers with their questions.",
output_guardrails=[math_guardrail],
output_type=MessageOutput,
)
async def main():
# This should trip the guardrail
try:
await Runner.run(agent, "Hello, can you help me solve for x: 2x + 3 = 11?")
print("Guardrail didn't trip - this is unexpected")
except OutputGuardrailTripwireTriggered:
print("Math output guardrail tripped")
```
1. This is the actual agent's output type.
2. This is the guardrail's output type.
3. This is the guardrail function that receives the agent's output, and returns the result.
4. This is the actual agent that defines the workflow.
File: docs/handoffs.md
# Handoffs
Handoffs allow an agent to delegate tasks to another agent. This is particularly useful in scenarios where different agents specialize in distinct areas. For example, a customer support app might have agents that each specifically handle tasks like order status, refunds, FAQs, etc.
Handoffs are represented as tools to the LLM. So if there's a handoff to an agent named `Refund Agent`, the tool would be called `transfer_to_refund_agent`.
## Creating a handoff
All agents have a [`handoffs`][agents.agent.Agent.handoffs] param, which can either take an `Agent` directly, or a `Handoff` object that customizes the Handoff.
You can create a handoff using the [`handoff()`][agents.handoffs.handoff] function provided by the Agents SDK. This function allows you to specify the agent to hand off to, along with optional overrides and input filters.
### Basic Usage
Here's how you can create a simple handoff:
```python
from agents import Agent, handoff
billing_agent = Agent(name="Billing agent")
refund_agent = Agent(name="Refund agent")
# (1)!
triage_agent = Agent(name="Triage agent", handoffs=[billing_agent, handoff(refund_agent)])
```
1. You can use the agent directly (as in `billing_agent`), or you can use the `handoff()` function.
### Customizing handoffs via the `handoff()` function
The [`handoff()`][agents.handoffs.handoff] function lets you customize things.
- `agent`: This is the agent to which things will be handed off.
- `tool_name_override`: By default, the `Handoff.default_tool_name()` function is used, which resolves to `transfer_to_<agent_name>`. You can override this.
- `tool_description_override`: Override the default tool description from `Handoff.default_tool_description()`
- `on_handoff`: A callback function executed when the handoff is invoked. This is useful for things like kicking off some data fetching as soon as you know a handoff is being invoked. This function receives the agent context, and can optionally also receive LLM generated input. The input data is controlled by the `input_type` param.
- `input_type`: The type of input expected by the handoff (optional).
- `input_filter`: This lets you filter the input received by the next agent. See below for more.
- `is_enabled`: Whether the handoff is enabled. This can be a boolean or a function that returns a boolean, allowing you to dynamically enable or disable the handoff at runtime.
```python
from agents import Agent, handoff, RunContextWrapper
def on_handoff(ctx: RunContextWrapper[None]):
print("Handoff called")
agent = Agent(name="My agent")
handoff_obj = handoff(
agent=agent,
on_handoff=on_handoff,
tool_name_override="custom_handoff_tool",
tool_description_override="Custom description",
)
```
## Handoff inputs
In certain situations, you want the LLM to provide some data when it calls a handoff. For example, imagine a handoff to an "Escalation agent". You might want a reason to be provided, so you can log it.
```python
from pydantic import BaseModel
from agents import Agent, handoff, RunContextWrapper
class EscalationData(BaseModel):
reason: str
async def on_handoff(ctx: RunContextWrapper[None], input_data: EscalationData):
print(f"Escalation agent called with reason: {input_data.reason}")
agent = Agent(name="Escalation agent")
handoff_obj = handoff(
agent=agent,
on_handoff=on_handoff,
input_type=EscalationData,
)
```
## Input filters
When a handoff occurs, it's as though the new agent takes over the conversation, and gets to see the entire previous conversation history. If you want to change this, you can set an [`input_filter`][agents.handoffs.Handoff.input_filter]. An input filter is a function that receives the existing input via a [`HandoffInputData`][agents.handoffs.HandoffInputData], and must return a new `HandoffInputData`.
There are some common patterns (for example removing all tool calls from the history), which are implemented for you in [`agents.extensions.handoff_filters`][]
```python
from agents import Agent, handoff
from agents.extensions import handoff_filters
agent = Agent(name="FAQ agent")
handoff_obj = handoff(
agent=agent,
input_filter=handoff_filters.remove_all_tools, # (1)!
)
```
1. This will automatically remove all tools from the history when `FAQ agent` is called.
## Recommended prompts
To make sure that LLMs understand handoffs properly, we recommend including information about handoffs in your agents. We have a suggested prefix in [`agents.extensions.handoff_prompt.RECOMMENDED_PROMPT_PREFIX`][], or you can call [`agents.extensions.handoff_prompt.prompt_with_handoff_instructions`][] to automatically add recommended data to your prompts.
```python
from agents import Agent
from agents.extensions.handoff_prompt import RECOMMENDED_PROMPT_PREFIX
billing_agent = Agent(
name="Billing agent",
instructions=f"""{RECOMMENDED_PROMPT_PREFIX}
<Fill in the rest of your prompt here>.""",
)
```
File: docs/index.md
# OpenAI Agents SDK
The [OpenAI Agents SDK](https://github.com/openai/openai-agents-python) enables you to build agentic AI apps in a lightweight, easy-to-use package with very few abstractions. It's a production-ready upgrade of our previous experimentation for agents, [Swarm](https://github.com/openai/swarm/tree/main). The Agents SDK has a very small set of primitives:
- **Agents**, which are LLMs equipped with instructions and tools
- **Handoffs**, which allow agents to delegate to other agents for specific tasks
- **Guardrails**, which enable validation of agent inputs and outputs
- **Sessions**, which automatically maintains conversation history across agent runs
In combination with Python, these primitives are powerful enough to express complex relationships between tools and agents, and allow you to build real-world applications without a steep learning curve. In addition, the SDK comes with built-in **tracing** that lets you visualize and debug your agentic flows, as well as evaluate them and even fine-tune models for your application.
## Why use the Agents SDK
The SDK has two driving design principles:
1. Enough features to be worth using, but few enough primitives to make it quick to learn.
2. Works great out of the box, but you can customize exactly what happens.
Here are the main features of the SDK:
- Agent loop: Built-in agent loop that handles calling tools, sending results to the LLM, and looping until the LLM is done.
- Python-first: Use built-in language features to orchestrate and chain agents, rather than needing to learn new abstractions.
- Handoffs: A powerful feature to coordinate and delegate between multiple agents.
- Guardrails: Run input validations and checks in parallel to your agents, breaking early if the checks fail.
- Sessions: Automatic conversation history management across agent runs, eliminating manual state handling.
- Function tools: Turn any Python function into a tool, with automatic schema generation and Pydantic-powered validation.
- Tracing: Built-in tracing that lets you visualize, debug and monitor your workflows, as well as use the OpenAI suite of evaluation, fine-tuning and distillation tools.
## Installation
```bash
pip install openai-agents
```
## Hello world example
```python
from agents import Agent, Runner
agent = Agent(name="Assistant", instructions="You are a helpful assistant")
result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")
print(result.final_output)
# Code within the code,
# Functions calling themselves,
# Infinite loop's dance.
```
(_If running this, ensure you set the `OPENAI_API_KEY` environment variable_)
```bash
export OPENAI_API_KEY=sk-...
```
File: docs/mcp.md
# Model context protocol (MCP)
The [Model context protocol](https://modelcontextprotocol.io/introduction) (MCP) standardises how applications expose tools and
context to language models. From the official documentation:
> MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI
> applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP
> provides a standardized way to connect AI models to different data sources and tools.
The Agents Python SDK understands multiple MCP transports. This lets you reuse existing MCP servers or build your own to expose
filesystem, HTTP, or connector backed tools to an agent.
## Choosing an MCP integration
Before wiring an MCP server into an agent decide where the tool calls should execute and which transports you can reach. The
matrix below summarises the options that the Python SDK supports.
| What you need | Recommended option |
| ------------------------------------------------------------------------------------ | ----------------------------------------------------- |
| Let OpenAI's Responses API call a publicly reachable MCP server on the model's behalf| **Hosted MCP server tools** via [`HostedMCPTool`][agents.tool.HostedMCPTool] |
| Connect to Streamable HTTP servers that you run locally or remotely | **Streamable HTTP MCP servers** via [`MCPServerStreamableHttp`][agents.mcp.server.MCPServerStreamableHttp] |
| Talk to servers that implement HTTP with Server-Sent Events | **HTTP with SSE MCP servers** via [`MCPServerSse`][agents.mcp.server.MCPServerSse] |
| Launch a local process and communicate over stdin/stdout | **stdio MCP servers** via [`MCPServerStdio`][agents.mcp.server.MCPServerStdio] |
The sections below walk through each option, how to configure it, and when to prefer one transport over another.
## 1. Hosted MCP server tools
Hosted tools push the entire tool round-trip into OpenAI's infrastructure. Instead of your code listing and calling tools, the
[`HostedMCPTool`][agents.tool.HostedMCPTool] forwards a server label (and optional connector metadata) to the Responses API. The
model lists the remote server's tools and invokes them without an extra callback to your Python process. Hosted tools currently
work with OpenAI models that support the Responses API's hosted MCP integration.
### Basic hosted MCP tool
Create a hosted tool by adding a [`HostedMCPTool`][agents.tool.HostedMCPTool] to the agent's `tools` list. The `tool_config`
dict mirrors the JSON you would send to the REST API:
```python
import asyncio
from agents import Agent, HostedMCPTool, Runner
async def main() -> None:
agent = Agent(
name="Assistant",
tools=[
HostedMCPTool(
tool_config={
"type": "mcp",
"server_label": "gitmcp",
"server_url": "https://gitmcp.io/openai/codex",
"require_approval": "never",
}
)
],
)
result = await Runner.run(agent, "Which language is this repository written in?")
print(result.final_output)
asyncio.run(main())
```
The hosted server exposes its tools automatically; you do not add it to `mcp_servers`.
### Streaming hosted MCP results
Hosted tools support streaming results in exactly the same way as function tools. Pass `stream=True` to `Runner.run_streamed` to
consume incremental MCP output while the model is still working:
```python
result = Runner.run_streamed(agent, "Summarise this repository's top languages")
async for event in result.stream_events():
if event.type == "run_item_stream_event":
print(f"Received: {event.item}")
print(result.final_output)
```
### Optional approval flows
If a server can perform sensitive operations you can require human or programmatic approval before each tool execution. Configure
`require_approval` in the `tool_config` with either a single policy (`"always"`, `"never"`) or a dict mapping tool names to
policies. To make the decision inside Python, provide an `on_approval_request` callback.
```python
from agents import MCPToolApprovalFunctionResult, MCPToolApprovalRequest
SAFE_TOOLS = {"read_project_metadata"}
def approve_tool(request: MCPToolApprovalRequest) -> MCPToolApprovalFunctionResult:
if request.data.name in SAFE_TOOLS:
return {"approve": True}
return {"approve": False, "reason": "Escalate to a human reviewer"}
agent = Agent(
name="Assistant",
tools=[
HostedMCPTool(
tool_config={
"type": "mcp",
"server_label": "gitmcp",
"server_url": "https://gitmcp.io/openai/codex",
"require_approval": "always",
},
on_approval_request=approve_tool,
)
],
)
```
The callback can be synchronous or asynchronous and is invoked whenever the model needs approval data to keep running.
### Connector-backed hosted servers
Hosted MCP also supports OpenAI connectors. Instead of specifying a `server_url`, supply a `connector_id` and an access token. The
Responses API handles authentication and the hosted server exposes the connector's tools.
```python
import os
HostedMCPTool(
tool_config={
"type": "mcp",
"server_label": "google_calendar",
"connector_id": "connector_googlecalendar",
"authorization": os.environ["GOOGLE_CALENDAR_AUTHORIZATION"],
"require_approval": "never",
}
)
```
Fully working hosted tool samples—including streaming, approvals, and connectors—live in
[`examples/hosted_mcp`](https://github.com/openai/openai-agents-python/tree/main/examples/hosted_mcp).
## 2. Streamable HTTP MCP servers
When you want to manage the network connection yourself, use
[`MCPServerStreamableHttp`][agents.mcp.server.MCPServerStreamableHttp]. Streamable HTTP servers are ideal when you control the
transport or want to run the server inside your own infrastructure while keeping latency low.
```python
import asyncio
import os
from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHttp
from agents.model_settings import ModelSettings
async def main() -> None:
token = os.environ["MCP_SERVER_TOKEN"]
async with MCPServerStreamableHttp(
name="Streamable HTTP Python Server",
params={
"url": "http://localhost:8000/mcp",
"headers": {"Authorization": f"Bearer {token}"},
"timeout": 10,
},
cache_tools_list=True,
max_retry_attempts=3,
) as server:
agent = Agent(
name="Assistant",
instructions="Use the MCP tools to answer the questions.",
mcp_servers=[server],
model_settings=ModelSettings(tool_choice="required"),
)
result = await Runner.run(agent, "Add 7 and 22.")
print(result.final_output)
asyncio.run(main())
```
The constructor accepts additional options:
- `client_session_timeout_seconds` controls HTTP read timeouts.
- `use_structured_content` toggles whether `tool_result.structured_content` is preferred over textual output.
- `max_retry_attempts` and `retry_backoff_seconds_base` add automatic retries for `list_tools()` and `call_tool()`.
- `tool_filter` lets you expose only a subset of tools (see [Tool filtering](#tool-filtering)).
## 3. HTTP with SSE MCP servers
If the MCP server implements the HTTP with SSE transport, instantiate
[`MCPServerSse`][agents.mcp.server.MCPServerSse]. Apart from the transport, the API is identical to the Streamable HTTP server.
```python
from agents import Agent, Runner
from agents.model_settings import ModelSettings
from agents.mcp import MCPServerSse
workspace_id = "demo-workspace"
async with MCPServerSse(
name="SSE Python Server",
params={
"url": "http://localhost:8000/sse",
"headers": {"X-Workspace": workspace_id},
},
cache_tools_list=True,
) as server:
agent = Agent(
name="Assistant",
mcp_servers=[server],
model_settings=ModelSettings(tool_choice="required"),
)
result = await Runner.run(agent, "What's the weather in Tokyo?")
print(result.final_output)
```
## 4. stdio MCP servers
For MCP servers that run as local subprocesses, use [`MCPServerStdio`][agents.mcp.server.MCPServerStdio]. The SDK spawns the
process, keeps the pipes open, and closes them automatically when the context manager exits. This option is helpful for quick
proofs of concept or when the server only exposes a command line entry point.
```python
from pathlib import Path
from agents import Agent, Runner
from agents.mcp import MCPServerStdio
current_dir = Path(__file__).parent
samples_dir = current_dir / "sample_files"
async with MCPServerStdio(
name="Filesystem Server via npx",
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", str(samples_dir)],
},
) as server:
agent = Agent(
name="Assistant",
instructions="Use the files in the sample directory to answer questions.",
mcp_servers=[server],
)
result = await Runner.run(agent, "List the files available to you.")
print(result.final_output)
```
## Tool filtering
Each MCP server supports tool filters so that you can expose only the functions that your agent needs. Filtering can happen at
construction time or dynamically per run.
### Static tool filtering
Use [`create_static_tool_filter`][agents.mcp.create_static_tool_filter] to configure simple allow/block lists:
```python
from pathlib import Path
from agents.mcp import MCPServerStdio, create_static_tool_filter
samples_dir = Path("/path/to/files")
filesystem_server = MCPServerStdio(
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", str(samples_dir)],
},
tool_filter=create_static_tool_filter(allowed_tool_names=["read_file", "write_file"]),
)
```
When both `allowed_tool_names` and `blocked_tool_names` are supplied the SDK applies the allow-list first and then removes any
blocked tools from the remaining set.
### Dynamic tool filtering
For more elaborate logic pass a callable that receives a [`ToolFilterContext`][agents.mcp.ToolFilterContext]. The callable can be
synchronous or asynchronous and returns `True` when the tool should be exposed.
```python
from pathlib import Path
from agents.mcp import MCPServerStdio, ToolFilterContext
samples_dir = Path("/path/to/files")
async def context_aware_filter(context: ToolFilterContext, tool) -> bool:
if context.agent.name == "Code Reviewer" and tool.name.startswith("danger_"):
return False
return True
async with MCPServerStdio(
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", str(samples_dir)],
},
tool_filter=context_aware_filter,
) as server:
...
```
The filter context exposes the active `run_context`, the `agent` requesting the tools, and the `server_name`.
## Prompts
MCP servers can also provide prompts that dynamically generate agent instructions. Servers that support prompts expose two
methods:
- `list_prompts()` enumerates the available prompt templates.
- `get_prompt(name, arguments)` fetches a concrete prompt, optionally with parameters.
```python
from agents import Agent
prompt_result = await server.get_prompt(
"generate_code_review_instructions",
{"focus": "security vulnerabilities", "language": "python"},
)
instructions = prompt_result.messages[0].content.text
agent = Agent(
name="Code Reviewer",
instructions=instructions,
mcp_servers=[server],
)
```
## Caching
Every agent run calls `list_tools()` on each MCP server. Remote servers can introduce noticeable latency, so all of the MCP
server classes expose a `cache_tools_list` option. Set it to `True` only if you are confident that the tool definitions do not
change frequently. To force a fresh list later, call `invalidate_tools_cache()` on the server instance.
## Tracing
[Tracing](./tracing.md) automatically captures MCP activity, including:
1. Calls to the MCP server to list tools.
2. MCP-related information on tool calls.

## Further reading
- [Model Context Protocol](https://modelcontextprotocol.io/) – the specification and design guides.
- [examples/mcp](https://github.com/openai/openai-agents-python/tree/main/examples/mcp) – runnable stdio, SSE, and Streamable HTTP samples.
- [examples/hosted_mcp](https://github.com/openai/openai-agents-python/tree/main/examples/hosted_mcp) – complete hosted MCP demonstrations including approvals and connectors.
File: docs/multi_agent.md
# Orchestrating multiple agents
Orchestration refers to the flow of agents in your app. Which agents run, in what order, and how do they decide what happens next? There are two main ways to orchestrate agents:
1. Allowing the LLM to make decisions: this uses the intelligence of an LLM to plan, reason, and decide on what steps to take based on that.
2. Orchestrating via code: determining the flow of agents via your code.
You can mix and match these patterns. Each has their own tradeoffs, described below.
## Orchestrating via LLM
An agent is an LLM equipped with instructions, tools and handoffs. This means that given an open-ended task, the LLM can autonomously plan how it will tackle the task, using tools to take actions and acquire data, and using handoffs to delegate tasks to sub-agents. For example, a research agent could be equipped with tools like:
- Web search to find information online
- File search and retrieval to search through proprietary data and connections
- Computer use to take actions on a computer
- Code execution to do data analysis
- Handoffs to specialized agents that are great at planning, report writing and more.
This pattern is great when the task is open-ended and you want to rely on the intelligence of an LLM. The most important tactics here are:
1. Invest in good prompts. Make it clear what tools are available, how to use them, and what parameters it must operate within.
2. Monitor your app and iterate on it. See where things go wrong, and iterate on your prompts.
3. Allow the agent to introspect and improve. For example, run it in a loop, and let it critique itself; or, provide error messages and let it improve.
4. Have specialized agents that excel in one task, rather than having a general purpose agent that is expected to be good at anything.
5. Invest in [evals](https://platform.openai.com/docs/guides/evals). This lets you train your agents to improve and get better at tasks.
## Orchestrating via code
While orchestrating via LLM is powerful, orchestrating via code makes tasks more deterministic and predictable, in terms of speed, cost and performance. Common patterns here are:
- Using [structured outputs](https://platform.openai.com/docs/guides/structured-outputs) to generate well formed data that you can inspect with your code. For example, you might ask an agent to classify the task into a few categories, and then pick the next agent based on the category.
- Chaining multiple agents by transforming the output of one into the input of the next. You can decompose a task like writing a blog post into a series of steps - do research, write an outline, write the blog post, critique it, and then improve it.
- Running the agent that performs the task in a `while` loop with an agent that evaluates and provides feedback, until the evaluator says the output passes certain criteria.
- Running multiple agents in parallel, e.g. via Python primitives like `asyncio.gather`. This is useful for speed when you have multiple tasks that don't depend on each other.
We have a number of examples in [`examples/agent_patterns`](https://github.com/openai/openai-agents-python/tree/main/examples/agent_patterns).
File: docs/quickstart.md
# Quickstart
## Create a project and virtual environment
You'll only need to do this once.
```bash
mkdir my_project
cd my_project
python -m venv .venv
```
### Activate the virtual environment
Do this every time you start a new terminal session.
```bash
source .venv/bin/activate
```
### Install the Agents SDK
```bash
pip install openai-agents # or `uv add openai-agents`, etc
```
### Set an OpenAI API key
If you don't have one, follow [these instructions](https://platform.openai.com/docs/quickstart#create-and-export-an-api-key) to create an OpenAI API key.
```bash
export OPENAI_API_KEY=sk-...
```
## Create your first agent
Agents are defined with instructions, a name, and optional config (such as `model_config`)
```python
from agents import Agent
agent = Agent(
name="Math Tutor",
instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
)
```
## Add a few more agents
Additional agents can be defined in the same way. `handoff_descriptions` provide additional context for determining handoff routing
```python
from agents import Agent
history_tutor_agent = Agent(
name="History Tutor",
handoff_description="Specialist agent for historical questions",
instructions="You provide assistance with historical queries. Explain important events and context clearly.",
)
math_tutor_agent = Agent(
name="Math Tutor",
handoff_description="Specialist agent for math questions",
instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
)
```
## Define your handoffs
On each agent, you can define an inventory of outgoing handoff options that the agent can choose from to decide how to make progress on their task.
```python
triage_agent = Agent(
name="Triage Agent",
instructions="You determine which agent to use based on the user's homework question",
handoffs=[history_tutor_agent, math_tutor_agent]
)
```
## Run the agent orchestration
Let's check that the workflow runs and the triage agent correctly routes between the two specialist agents.
```python
from agents import Runner
async def main():
result = await Runner.run(triage_agent, "What is the capital of France?")
print(result.final_output)
```
## Add a guardrail
You can define custom guardrails to run on the input or output.
```python
from agents import GuardrailFunctionOutput, Agent, Runner
from pydantic import BaseModel
class HomeworkOutput(BaseModel):
is_homework: bool
reasoning: str
guardrail_agent = Agent(
name="Guardrail check",
instructions="Check if the user is asking about homework.",
output_type=HomeworkOutput,
)
async def homework_guardrail(ctx, agent, input_data):
result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
final_output = result.final_output_as(HomeworkOutput)
return GuardrailFunctionOutput(
output_info=final_output,
tripwire_triggered=not final_output.is_homework,
)
```
## Put it all together
Let's put it all together and run the entire workflow, using handoffs and the input guardrail.
```python
from agents import Agent, InputGuardrail, GuardrailFunctionOutput, Runner
from agents.exceptions import InputGuardrailTripwireTriggered
from pydantic import BaseModel
import asyncio
class HomeworkOutput(BaseModel):
is_homework: bool
reasoning: str
guardrail_agent = Agent(
name="Guardrail check",
instructions="Check if the user is asking about homework.",
output_type=HomeworkOutput,
)
math_tutor_agent = Agent(
name="Math Tutor",
handoff_description="Specialist agent for math questions",
instructions="You provide help with math problems. Explain your reasoning at each step and include examples",
)
history_tutor_agent = Agent(
name="History Tutor",
handoff_description="Specialist agent for historical questions",
instructions="You provide assistance with historical queries. Explain important events and context clearly.",
)
async def homework_guardrail(ctx, agent, input_data):
result = await Runner.run(guardrail_agent, input_data, context=ctx.context)
final_output = result.final_output_as(HomeworkOutput)
return GuardrailFunctionOutput(
output_info=final_output,
tripwire_triggered=not final_output.is_homework,
)
triage_agent = Agent(
name="Triage Agent",
instructions="You determine which agent to use based on the user's homework question",
handoffs=[history_tutor_agent, math_tutor_agent],
input_guardrails=[
InputGuardrail(guardrail_function=homework_guardrail),
],
)
async def main():
# Example 1: History question
try:
result = await Runner.run(triage_agent, "who was the first president of the united states?")
print(result.final_output)
except InputGuardrailTripwireTriggered as e:
print("Guardrail blocked this input:", e)
# Example 2: General/philosophical question
try:
result = await Runner.run(triage_agent, "What is the meaning of life?")
print(result.final_output)
except InputGuardrailTripwireTriggered as e:
print("Guardrail blocked this input:", e)
if __name__ == "__main__":
asyncio.run(main())
```
## View your traces
To review what happened during your agent run, navigate to the [Trace viewer in the OpenAI Dashboard](https://platform.openai.com/traces) to view traces of your agent runs.
## Next steps
Learn how to build more complex agentic flows:
- Learn about how to configure [Agents](agents.md).
- Learn about [running agents](running_agents.md).
- Learn about [tools](tools.md), [guardrails](guardrails.md) and [models](models/index.md).
File: docs/release.md
# Release process/changelog
The project follows a slightly modified version of semantic versioning using the form `0.Y.Z`. The leading `0` indicates the SDK is still evolving rapidly. Increment the components as follows:
## Minor (`Y`) versions
We will increase minor versions `Y` for **breaking changes** to any public interfaces that are not marked as beta. For example, going from `0.0.x` to `0.1.x` might include breaking changes.
If you don't want breaking changes, we recommend pinning to `0.0.x` versions in your project.
## Patch (`Z`) versions
We will increment `Z` for non-breaking changes:
- Bug fixes
- New features
- Changes to private interfaces
- Updates to beta features
## Breaking change changelog
### 0.2.0
In this version, a few places that used to take `Agent` as an arg, now take `AgentBase` as an arg instead. For example, the `list_tools()` call in MCP servers. This is a purely typing change, you will still receive `Agent` objects. To update, just fix type errors by replacing `Agent` with `AgentBase`.
### 0.1.0
In this version, [`MCPServer.list_tools()`][agents.mcp.server.MCPServer] has two new params: `run_context` and `agent`. You'll need to add these params to any classes that subclass `MCPServer`.
File: docs/repl.md
# REPL utility
The SDK provides `run_demo_loop` for quick, interactive testing of an agent's behavior directly in your terminal.
```python
import asyncio
from agents import Agent, run_demo_loop
async def main() -> None:
agent = Agent(name="Assistant", instructions="You are a helpful assistant.")
await run_demo_loop(agent)
if __name__ == "__main__":
asyncio.run(main())
```
`run_demo_loop` prompts for user input in a loop, keeping the conversation history between turns. By default, it streams model output as it is produced. When you run the example above, run_demo_loop starts an interactive chat session. It continuously asks for your input, remembers the entire conversation history between turns (so your agent knows what's been discussed) and automatically streams the agent's responses to you in real-time as they are generated.
To end this chat session, simply type `quit` or `exit` (and press Enter) or use the `Ctrl-D` keyboard shortcut.
File: docs/results.md
# Results
When you call the `Runner.run` methods, you either get a:
- [`RunResult`][agents.result.RunResult] if you call `run` or `run_sync`
- [`RunResultStreaming`][agents.result.RunResultStreaming] if you call `run_streamed`
Both of these inherit from [`RunResultBase`][agents.result.RunResultBase], which is where most useful information is present.
## Final output
The [`final_output`][agents.result.RunResultBase.final_output] property contains the final output of the last agent that ran. This is either:
- a `str`, if the last agent didn't have an `output_type` defined
- an object of type `last_agent.output_type`, if the agent had an output type defined.
!!! note
`final_output` is of type `Any`. We can't statically type this, because of handoffs. If handoffs occur, that means any Agent might be the last agent, so we don't statically know the set of possible output types.
## Inputs for the next turn
You can use [`result.to_input_list()`][agents.result.RunResultBase.to_input_list] to turn the result into an input list that concatenates the original input you provided, to the items generated during the agent run. This makes it convenient to take the outputs of one agent run and pass them into another run, or to run it in a loop and append new user inputs each time.
## Last agent
The [`last_agent`][agents.result.RunResultBase.last_agent] property contains the last agent that ran. Depending on your application, this is often useful for the next time the user inputs something. For example, if you have a frontline triage agent that hands off to a language-specific agent, you can store the last agent, and re-use it the next time the user messages the agent.
## New items
The [`new_items`][agents.result.RunResultBase.new_items] property contains the new items generated during the run. The items are [`RunItem`][agents.items.RunItem]s. A run item wraps the raw item generated by the LLM.
- [`MessageOutputItem`][agents.items.MessageOutputItem] indicates a message from the LLM. The raw item is the message generated.
- [`HandoffCallItem`][agents.items.HandoffCallItem] indicates that the LLM called the handoff tool. The raw item is the tool call item from the LLM.
- [`HandoffOutputItem`][agents.items.HandoffOutputItem] indicates that a handoff occurred. The raw item is the tool response to the handoff tool call. You can also access the source/target agents from the item.
- [`ToolCallItem`][agents.items.ToolCallItem] indicates that the LLM invoked a tool.
- [`ToolCallOutputItem`][agents.items.ToolCallOutputItem] indicates that a tool was called. The raw item is the tool response. You can also access the tool output from the item.
- [`ReasoningItem`][agents.items.ReasoningItem] indicates a reasoning item from the LLM. The raw item is the reasoning generated.
## Other information
### Guardrail results
The [`input_guardrail_results`][agents.result.RunResultBase.input_guardrail_results] and [`output_guardrail_results`][agents.result.RunResultBase.output_guardrail_results] properties contain the results of the guardrails, if any. Guardrail results can sometimes contain useful information you want to log or store, so we make these available to you.
### Raw responses
The [`raw_responses`][agents.result.RunResultBase.raw_responses] property contains the [`ModelResponse`][agents.items.ModelResponse]s generated by the LLM.
### Original input
The [`input`][agents.result.RunResultBase.input] property contains the original input you provided to the `run` method. In most cases you won't need this, but it's available in case you do.
File: docs/running_agents.md
# Running agents
You can run agents via the [`Runner`][agents.run.Runner] class. You have 3 options:
1. [`Runner.run()`][agents.run.Runner.run], which runs async and returns a [`RunResult`][agents.result.RunResult].
2. [`Runner.run_sync()`][agents.run.Runner.run_sync], which is a sync method and just runs `.run()` under the hood.
3. [`Runner.run_streamed()`][agents.run.Runner.run_streamed], which runs async and returns a [`RunResultStreaming`][agents.result.RunResultStreaming]. It calls the LLM in streaming mode, and streams those events to you as they are received.
```python
from agents import Agent, Runner
async def main():
agent = Agent(name="Assistant", instructions="You are a helpful assistant")
result = await Runner.run(agent, "Write a haiku about recursion in programming.")
print(result.final_output)
# Code within the code,
# Functions calling themselves,
# Infinite loop's dance
```
Read more in the [results guide](results.md).
## The agent loop
When you use the run method in `Runner`, you pass in a starting agent and input. The input can either be a string (which is considered a user message), or a list of input items, which are the items in the OpenAI Responses API.
The runner then runs a loop:
1. We call the LLM for the current agent, with the current input.
2. The LLM produces its output.
1. If the LLM returns a `final_output`, the loop ends and we return the result.
2. If the LLM does a handoff, we update the current agent and input, and re-run the loop.
3. If the LLM produces tool calls, we run those tool calls, append the results, and re-run the loop.
3. If we exceed the `max_turns` passed, we raise a [`MaxTurnsExceeded`][agents.exceptions.MaxTurnsExceeded] exception.
!!! note
The rule for whether the LLM output is considered as a "final output" is that it produces text output with the desired type, and there are no tool calls.
## Streaming
Streaming allows you to additionally receive streaming events as the LLM runs. Once the stream is done, the [`RunResultStreaming`][agents.result.RunResultStreaming] will contain the complete information about the run, including all the new outputs produced. You can call `.stream_events()` for the streaming events. Read more in the [streaming guide](streaming.md).
## Run config
The `run_config` parameter lets you configure some global settings for the agent run:
- [`model`][agents.run.RunConfig.model]: Allows setting a global LLM model to use, irrespective of what `model` each Agent has.
- [`model_provider`][agents.run.RunConfig.model_provider]: A model provider for looking up model names, which defaults to OpenAI.
- [`model_settings`][agents.run.RunConfig.model_settings]: Overrides agent-specific settings. For example, you can set a global `temperature` or `top_p`.
- [`input_guardrails`][agents.run.RunConfig.input_guardrails], [`output_guardrails`][agents.run.RunConfig.output_guardrails]: A list of input or output guardrails to include on all runs.
- [`handoff_input_filter`][agents.run.RunConfig.handoff_input_filter]: A global input filter to apply to all handoffs, if the handoff doesn't already have one. The input filter allows you to edit the inputs that are sent to the new agent. See the documentation in [`Handoff.input_filter`][agents.handoffs.Handoff.input_filter] for more details.
- [`tracing_disabled`][agents.run.RunConfig.tracing_disabled]: Allows you to disable [tracing](tracing.md) for the entire run.
- [`trace_include_sensitive_data`][agents.run.RunConfig.trace_include_sensitive_data]: Configures whether traces will include potentially sensitive data, such as LLM and tool call inputs/outputs.
- [`workflow_name`][agents.run.RunConfig.workflow_name], [`trace_id`][agents.run.RunConfig.trace_id], [`group_id`][agents.run.RunConfig.group_id]: Sets the tracing workflow name, trace ID and trace group ID for the run. We recommend at least setting `workflow_name`. The group ID is an optional field that lets you link traces across multiple runs.
- [`trace_metadata`][agents.run.RunConfig.trace_metadata]: Metadata to include on all traces.
## Conversations/chat threads
Calling any of the run methods can result in one or more agents running (and hence one or more LLM calls), but it represents a single logical turn in a chat conversation. For example:
1. User turn: user enter text
2. Runner run: first agent calls LLM, runs tools, does a handoff to a second agent, second agent runs more tools, and then produces an output.
At the end of the agent run, you can choose what to show to the user. For example, you might show the user every new item generated by the agents, or just the final output. Either way, the user might then ask a followup question, in which case you can call the run method again.
### Manual conversation management
You can manually manage conversation history using the [`RunResultBase.to_input_list()`][agents.result.RunResultBase.to_input_list] method to get the inputs for the next turn:
```python
async def main():
agent = Agent(name="Assistant", instructions="Reply very concisely.")
thread_id = "thread_123" # Example thread ID
with trace(workflow_name="Conversation", group_id=thread_id):
# First turn
result = await Runner.run(agent, "What city is the Golden Gate Bridge in?")
print(result.final_output)
# San Francisco
# Second turn
new_input = result.to_input_list() + [{"role": "user", "content": "What state is it in?"}]
result = await Runner.run(agent, new_input)
print(result.final_output)
# California
```
### Automatic conversation management with Sessions
For a simpler approach, you can use [Sessions](sessions/index.md) to automatically handle conversation history without manually calling `.to_input_list()`:
```python
from agents import Agent, Runner, SQLiteSession
async def main():
agent = Agent(name="Assistant", instructions="Reply very concisely.")
# Create session instance
session = SQLiteSession("conversation_123")
thread_id = "thread_123" # Example thread ID
with trace(workflow_name="Conversation", group_id=thread_id):
# First turn
result = await Runner.run(agent, "What city is the Golden Gate Bridge in?", session=session)
print(result.final_output)
# San Francisco
# Second turn - agent automatically remembers previous context
result = await Runner.run(agent, "What state is it in?", session=session)
print(result.final_output)
# California
```
Sessions automatically:
- Retrieves conversation history before each run
- Stores new messages after each run
- Maintains separate conversations for different session IDs
See the [Sessions documentation](sessions/index.md) for more details.
### Server-managed conversations
You can also let the OpenAI conversation state feature manage conversation state on the server side, instead of handling it locally with `to_input_list()` or `Sessions`. This allows you to preserve conversation history without manually resending all past messages. See the [OpenAI Conversation state guide](https://platform.openai.com/docs/guides/conversation-state?api-mode=responses) for more details.
OpenAI provides two ways to track state across turns:
#### 1. Using `conversation_id`
You first create a conversation using the OpenAI Conversations API and then reuse its ID for every subsequent call:
```python
from agents import Agent, Runner
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def main():
# Create a server-managed conversation
conversation = await client.conversations.create()
conv_id = conversation.id
agent = Agent(name="Assistant", instructions="Reply very concisely.")
# First turn
result1 = await Runner.run(agent, "What city is the Golden Gate Bridge in?", conversation_id=conv_id)
print(result1.final_output)
# San Francisco
# Second turn reuses the same conversation_id
result2 = await Runner.run(
agent,
"What state is it in?",
conversation_id=conv_id,
)
print(result2.final_output)
# California
```
#### 2. Using `previous_response_id`
Another option is **response chaining**, where each turn links explicitly to the response ID from the previous turn.
```python
from agents import Agent, Runner
async def main():
agent = Agent(name="Assistant", instructions="Reply very concisely.")
# First turn
result1 = await Runner.run(agent, "What city is the Golden Gate Bridge in?")
print(result1.final_output)
# San Francisco
# Second turn, chained to the previous response
result2 = await Runner.run(
agent,
"What state is it in?",
previous_response_id=result1.last_response_id,
)
print(result2.final_output)
# California
```
## Long running agents & human-in-the-loop
You can use the Agents SDK [Temporal](https://temporal.io/) integration to run durable, long-running workflows, including human-in-the-loop tasks. View a demo of Temporal and the Agents SDK working in action to complete long-running tasks [in this video](https://www.youtube.com/watch?v=fFBZqzT4DD8), and [view docs here](https://github.com/temporalio/sdk-python/tree/main/temporalio/contrib/openai_agents).
## Exceptions
The SDK raises exceptions in certain cases. The full list is in [`agents.exceptions`][]. As an overview:
- [`AgentsException`][agents.exceptions.AgentsException]: This is the base class for all exceptions raised within the SDK. It serves as a generic type from which all other specific exceptions are derived.
- [`MaxTurnsExceeded`][agents.exceptions.MaxTurnsExceeded]: This exception is raised when the agent's run exceeds the `max_turns` limit passed to the `Runner.run`, `Runner.run_sync`, or `Runner.run_streamed` methods. It indicates that the agent could not complete its task within the specified number of interaction turns.
- [`ModelBehaviorError`][agents.exceptions.ModelBehaviorError]: This exception occurs when the underlying model (LLM) produces unexpected or invalid outputs. This can include:
- Malformed JSON: When the model provides a malformed JSON structure for tool calls or in its direct output, especially if a specific `output_type` is defined.
- Unexpected tool-related failures: When the model fails to use tools in an expected manner
- [`UserError`][agents.exceptions.UserError]: This exception is raised when you (the person writing code using the SDK) make an error while using the SDK. This typically results from incorrect code implementation, invalid configuration, or misuse of the SDK's API.
- [`InputGuardrailTripwireTriggered`][agents.exceptions.InputGuardrailTripwireTriggered], [`OutputGuardrailTripwireTriggered`][agents.exceptions.OutputGuardrailTripwireTriggered]: This exception is raised when the conditions of an input guardrail or output guardrail are met, respectively. Input guardrails check incoming messages before processing, while output guardrails check the agent's final response before delivery.
File: docs/streaming.md
# Streaming
Streaming lets you subscribe to updates of the agent run as it proceeds. This can be useful for showing the end-user progress updates and partial responses.
To stream, you can call [`Runner.run_streamed()`][agents.run.Runner.run_streamed], which will give you a [`RunResultStreaming`][agents.result.RunResultStreaming]. Calling `result.stream_events()` gives you an async stream of [`StreamEvent`][agents.stream_events.StreamEvent] objects, which are described below.
## Raw response events
[`RawResponsesStreamEvent`][agents.stream_events.RawResponsesStreamEvent] are raw events passed directly from the LLM. They are in OpenAI Responses API format, which means each event has a type (like `response.created`, `response.output_text.delta`, etc) and data. These events are useful if you want to stream response messages to the user as soon as they are generated.
For example, this will output the text generated by the LLM token-by-token.
```python
import asyncio
from openai.types.responses import ResponseTextDeltaEvent
from agents import Agent, Runner
async def main():
agent = Agent(
name="Joker",
instructions="You are a helpful assistant.",
)
result = Runner.run_streamed(agent, input="Please tell me 5 jokes.")
async for event in result.stream_events():
if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
print(event.data.delta, end="", flush=True)
if __name__ == "__main__":
asyncio.run(main())
```
## Run item events and agent events
[`RunItemStreamEvent`][agents.stream_events.RunItemStreamEvent]s are higher level events. They inform you when an item has been fully generated. This allows you to push progress updates at the level of "message generated", "tool ran", etc, instead of each token. Similarly, [`AgentUpdatedStreamEvent`][agents.stream_events.AgentUpdatedStreamEvent] gives you updates when the current agent changes (e.g. as the result of a handoff).
For example, this will ignore raw events and stream updates to the user.
```python
import asyncio
import random
from agents import Agent, ItemHelpers, Runner, function_tool
@function_tool
def how_many_jokes() -> int:
return random.randint(1, 10)
async def main():
agent = Agent(
name="Joker",
instructions="First call the `how_many_jokes` tool, then tell that many jokes.",
tools=[how_many_jokes],
)
result = Runner.run_streamed(
agent,
input="Hello",
)
print("=== Run starting ===")
async for event in result.stream_events():
# We'll ignore the raw responses event deltas
if event.type == "raw_response_event":
continue
# When the agent updates, print that
elif event.type == "agent_updated_stream_event":
print(f"Agent updated: {event.new_agent.name}")
continue
# When items are generated, print them
elif event.type == "run_item_stream_event":
if event.item.type == "tool_call_item":
print("-- Tool was called")
elif event.item.type == "tool_call_output_item":
print(f"-- Tool output: {event.item.output}")
elif event.item.type == "message_output_item":
print(f"-- Message output:\n {ItemHelpers.text_message_output(event.item)}")
else:
pass # Ignore other event types
print("=== Run complete ===")
if __name__ == "__main__":
asyncio.run(main())
```
File: docs/tools.md
# Tools
Tools let agents take actions: things like fetching data, running code, calling external APIs, and even using a computer. There are three classes of tools in the Agent SDK:
- Hosted tools: these run on LLM servers alongside the AI models. OpenAI offers retrieval, web search and computer use as hosted tools.
- Function calling: these allow you to use any Python function as a tool.
- Agents as tools: this allows you to use an agent as a tool, allowing Agents to call other agents without handing off to them.
## Hosted tools
OpenAI offers a few built-in tools when using the [`OpenAIResponsesModel`][agents.models.openai_responses.OpenAIResponsesModel]:
- The [`WebSearchTool`][agents.tool.WebSearchTool] lets an agent search the web.
- The [`FileSearchTool`][agents.tool.FileSearchTool] allows retrieving information from your OpenAI Vector Stores.
- The [`ComputerTool`][agents.tool.ComputerTool] allows automating computer use tasks.
- The [`CodeInterpreterTool`][agents.tool.CodeInterpreterTool] lets the LLM execute code in a sandboxed environment.
- The [`HostedMCPTool`][agents.tool.HostedMCPTool] exposes a remote MCP server's tools to the model.
- The [`ImageGenerationTool`][agents.tool.ImageGenerationTool] generates images from a prompt.
- The [`LocalShellTool`][agents.tool.LocalShellTool] runs shell commands on your machine.
```python
from agents import Agent, FileSearchTool, Runner, WebSearchTool
agent = Agent(
name="Assistant",
tools=[
WebSearchTool(),
FileSearchTool(
max_num_results=3,
vector_store_ids=["VECTOR_STORE_ID"],
),
],
)
async def main():
result = await Runner.run(agent, "Which coffee shop should I go to, taking into account my preferences and the weather today in SF?")
print(result.final_output)
```
## Function tools
You can use any Python function as a tool. The Agents SDK will setup the tool automatically:
- The name of the tool will be the name of the Python function (or you can provide a name)
- Tool description will be taken from the docstring of the function (or you can provide a description)
- The schema for the function inputs is automatically created from the function's arguments
- Descriptions for each input are taken from the docstring of the function, unless disabled
We use Python's `inspect` module to extract the function signature, along with [`griffe`](https://mkdocstrings.github.io/griffe/) to parse docstrings and `pydantic` for schema creation.
```python
import json
from typing_extensions import TypedDict, Any
from agents import Agent, FunctionTool, RunContextWrapper, function_tool
class Location(TypedDict):
lat: float
long: float
@function_tool # (1)!
async def fetch_weather(location: Location) -> str:
# (2)!
"""Fetch the weather for a given location.
Args:
location: The location to fetch the weather for.
"""
# In real life, we'd fetch the weather from a weather API
return "sunny"
@function_tool(name_override="fetch_data") # (3)!
def read_file(ctx: RunContextWrapper[Any], path: str, directory: str | None = None) -> str:
"""Read the contents of a file.
Args:
path: The path to the file to read.
directory: The directory to read the file from.
"""
# In real life, we'd read the file from the file system
return "<file contents>"
agent = Agent(
name="Assistant",
tools=[fetch_weather, read_file], # (4)!
)
for tool in agent.tools:
if isinstance(tool, FunctionTool):
print(tool.name)
print(tool.description)
print(json.dumps(tool.params_json_schema, indent=2))
print()
```
1. You can use any Python types as arguments to your functions, and the function can be sync or async.
2. Docstrings, if present, are used to capture descriptions and argument descriptions
3. Functions can optionally take the `context` (must be the first argument). You can also set overrides, like the name of the tool, description, which docstring style to use, etc.
4. You can pass the decorated functions to the list of tools.
??? note "Expand to see output"
```
fetch_weather
Fetch the weather for a given location.
{
"$defs": {
"Location": {
"properties": {
"lat": {
"title": "Lat",
"type": "number"
},
"long": {
"title": "Long",
"type": "number"
}
},
"required": [
"lat",
"long"
],
"title": "Location",
"type": "object"
}
},
"properties": {
"location": {
"$ref": "#/$defs/Location",
"description": "The location to fetch the weather for."
}
},
"required": [
"location"
],
"title": "fetch_weather_args",
"type": "object"
}
fetch_data
Read the contents of a file.
{
"properties": {
"path": {
"description": "The path to the file to read.",
"title": "Path",
"type": "string"
},
"directory": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "The directory to read the file from.",
"title": "Directory"
}
},
"required": [
"path"
],
"title": "fetch_data_args",
"type": "object"
}
```
### Returning images or files from function tools
In addition to returning text outputs, you can return one or many images or files as the output of a function tool. To do so, you can return any of:
- Images: [`ToolOutputImage`][agents.tool.ToolOutputImage] (or the TypedDict version, [`ToolOutputImageDict`][agents.tool.ToolOutputImageDict])
- Files: [`ToolOutputFileContent`][agents.tool.ToolOutputFileContent] (or the TypedDict version, [`ToolOutputFileContentDict`][agents.tool.ToolOutputFileContentDict])
- Text: either a string or stringable objects, or [`ToolOutputText`][agents.tool.ToolOutputText] (or the TypedDict version, [`ToolOutputTextDict`][agents.tool.ToolOutputTextDict])
### Custom function tools
Sometimes, you don't want to use a Python function as a tool. You can directly create a [`FunctionTool`][agents.tool.FunctionTool] if you prefer. You'll need to provide:
- `name`
- `description`
- `params_json_schema`, which is the JSON schema for the arguments
- `on_invoke_tool`, which is an async function that receives a [`ToolContext`][agents.tool_context.ToolContext] and the arguments as a JSON string, and must return the tool output as a string.
```python
from typing import Any
from pydantic import BaseModel
from agents import RunContextWrapper, FunctionTool
def do_some_work(data: str) -> str:
return "done"
class FunctionArgs(BaseModel):
username: str
age: int
async def run_function(ctx: RunContextWrapper[Any], args: str) -> str:
parsed = FunctionArgs.model_validate_json(args)
return do_some_work(data=f"{parsed.username} is {parsed.age} years old")
tool = FunctionTool(
name="process_user",
description="Processes extracted user data",
params_json_schema=FunctionArgs.model_json_schema(),
on_invoke_tool=run_function,
)
```
### Automatic argument and docstring parsing
As mentioned before, we automatically parse the function signature to extract the schema for the tool, and we parse the docstring to extract descriptions for the tool and for individual arguments. Some notes on that:
1. The signature parsing is done via the `inspect` module. We use type annotations to understand the types for the arguments, and dynamically build a Pydantic model to represent the overall schema. It supports most types, including Python primitives, Pydantic models, TypedDicts, and more.
2. We use `griffe` to parse docstrings. Supported docstring formats are `google`, `sphinx` and `numpy`. We attempt to automatically detect the docstring format, but this is best-effort and you can explicitly set it when calling `function_tool`. You can also disable docstring parsing by setting `use_docstring_info` to `False`.
The code for the schema extraction lives in [`agents.function_schema`][].
## Agents as tools
In some workflows, you may want a central agent to orchestrate a network of specialized agents, instead of handing off control. You can do this by modeling agents as tools.
```python
from agents import Agent, Runner
import asyncio
spanish_agent = Agent(
name="Spanish agent",
instructions="You translate the user's message to Spanish",
)
french_agent = Agent(
name="French agent",
instructions="You translate the user's message to French",
)
orchestrator_agent = Agent(
name="orchestrator_agent",
instructions=(
"You are a translation agent. You use the tools given to you to translate."
"If asked for multiple translations, you call the relevant tools."
),
tools=[
spanish_agent.as_tool(
tool_name="translate_to_spanish",
tool_description="Translate the user's message to Spanish",
),
french_agent.as_tool(
tool_name="translate_to_french",
tool_description="Translate the user's message to French",
),
],
)
async def main():
result = await Runner.run(orchestrator_agent, input="Say 'Hello, how are you?' in Spanish.")
print(result.final_output)
```
### Customizing tool-agents
The `agent.as_tool` function is a convenience method to make it easy to turn an agent into a tool. It doesn't support all configuration though; for example, you can't set `max_turns`. For advanced use cases, use `Runner.run` directly in your tool implementation:
```python
@function_tool
async def run_my_agent() -> str:
"""A tool that runs the agent with custom configs"""
agent = Agent(name="My agent", instructions="...")
result = await Runner.run(
agent,
input="...",
max_turns=5,
run_config=...
)
return str(result.final_output)
```
### Custom output extraction
In certain cases, you might want to modify the output of the tool-agents before returning it to the central agent. This may be useful if you want to:
- Extract a specific piece of information (e.g., a JSON payload) from the sub-agent's chat history.
- Convert or reformat the agent’s final answer (e.g., transform Markdown into plain text or CSV).
- Validate the output or provide a fallback value when the agent’s response is missing or malformed.
You can do this by supplying the `custom_output_extractor` argument to the `as_tool` method:
```python
async def extract_json_payload(run_result: RunResult) -> str:
# Scan the agent’s outputs in reverse order until we find a JSON-like message from a tool call.
for item in reversed(run_result.new_items):
if isinstance(item, ToolCallOutputItem) and item.output.strip().startswith("{"):
return item.output.strip()
# Fallback to an empty JSON object if nothing was found
return "{}"
json_tool = data_agent.as_tool(
tool_name="get_data_json",
tool_description="Run the data agent and return only its JSON payload",
custom_output_extractor=extract_json_payload,
)
```
### Conditional tool enabling
You can conditionally enable or disable agent tools at runtime using the `is_enabled` parameter. This allows you to dynamically filter which tools are available to the LLM based on context, user preferences, or runtime conditions.
```python
import asyncio
from agents import Agent, AgentBase, Runner, RunContextWrapper
from pydantic import BaseModel
class LanguageContext(BaseModel):
language_preference: str = "french_spanish"
def french_enabled(ctx: RunContextWrapper[LanguageContext], agent: AgentBase) -> bool:
"""Enable French for French+Spanish preference."""
return ctx.context.language_preference == "french_spanish"
# Create specialized agents
spanish_agent = Agent(
name="spanish_agent",
instructions="You respond in Spanish. Always reply to the user's question in Spanish.",
)
french_agent = Agent(
name="french_agent",
instructions="You respond in French. Always reply to the user's question in French.",
)
# Create orchestrator with conditional tools
orchestrator = Agent(
name="orchestrator",
instructions=(
"You are a multilingual assistant. You use the tools given to you to respond to users. "
"You must call ALL available tools to provide responses in different languages. "
"You never respond in languages yourself, you always use the provided tools."
),
tools=[
spanish_agent.as_tool(
tool_name="respond_spanish",
tool_description="Respond to the user's question in Spanish",
is_enabled=True, # Always enabled
),
french_agent.as_tool(
tool_name="respond_french",
tool_description="Respond to the user's question in French",
is_enabled=french_enabled,
),
],
)
async def main():
context = RunContextWrapper(LanguageContext(language_preference="french_spanish"))
result = await Runner.run(orchestrator, "How are you?", context=context.context)
print(result.final_output)
asyncio.run(main())
```
The `is_enabled` parameter accepts:
- **Boolean values**: `True` (always enabled) or `False` (always disabled)
- **Callable functions**: Functions that take `(context, agent)` and return a boolean
- **Async functions**: Async functions for complex conditional logic
Disabled tools are completely hidden from the LLM at runtime, making this useful for:
- Feature gating based on user permissions
- Environment-specific tool availability (dev vs prod)
- A/B testing different tool configurations
- Dynamic tool filtering based on runtime state
## Handling errors in function tools
When you create a function tool via `@function_tool`, you can pass a `failure_error_function`. This is a function that provides an error response to the LLM in case the tool call crashes.
- By default (i.e. if you don't pass anything), it runs a `default_tool_error_function` which tells the LLM an error occurred.
- If you pass your own error function, it runs that instead, and sends the response to the LLM.
- If you explicitly pass `None`, then any tool call errors will be re-raised for you to handle. This could be a `ModelBehaviorError` if the model produced invalid JSON, or a `UserError` if your code crashed, etc.
```python
from agents import function_tool, RunContextWrapper
from typing import Any
def my_custom_error_function(context: RunContextWrapper[Any], error: Exception) -> str:
"""A custom function to provide a user-friendly error message."""
print(f"A tool call failed with the following error: {error}")
return "An internal server error occurred. Please try again later."
@function_tool(failure_error_function=my_custom_error_function)
def get_user_profile(user_id: str) -> str:
"""Fetches a user profile from a mock API.
This function demonstrates a 'flaky' or failing API call.
"""
if user_id == "user_123":
return "User profile for user_123 successfully retrieved."
else:
raise ValueError(f"Could not retrieve profile for user_id: {user_id}. API returned an error.")
```
If you are manually creating a `FunctionTool` object, then you must handle errors inside the `on_invoke_tool` function.
File: docs/tracing.md
# Tracing
The Agents SDK includes built-in tracing, collecting a comprehensive record of events during an agent run: LLM generations, tool calls, handoffs, guardrails, and even custom events that occur. Using the [Traces dashboard](https://platform.openai.com/traces), you can debug, visualize, and monitor your workflows during development and in production.
!!!note
Tracing is enabled by default. There are two ways to disable tracing:
1. You can globally disable tracing by setting the env var `OPENAI_AGENTS_DISABLE_TRACING=1`
2. You can disable tracing for a single run by setting [`agents.run.RunConfig.tracing_disabled`][] to `True`
***For organizations operating under a Zero Data Retention (ZDR) policy using OpenAI's APIs, tracing is unavailable.***
## Traces and spans
- **Traces** represent a single end-to-end operation of a "workflow". They're composed of Spans. Traces have the following properties:
- `workflow_name`: This is the logical workflow or app. For example "Code generation" or "Customer service".
- `trace_id`: A unique ID for the trace. Automatically generated if you don't pass one. Must have the format `trace_<32_alphanumeric>`.
- `group_id`: Optional group ID, to link multiple traces from the same conversation. For example, you might use a chat thread ID.
- `disabled`: If True, the trace will not be recorded.
- `metadata`: Optional metadata for the trace.
- **Spans** represent operations that have a start and end time. Spans have:
- `started_at` and `ended_at` timestamps.
- `trace_id`, to represent the trace they belong to
- `parent_id`, which points to the parent Span of this Span (if any)
- `span_data`, which is information about the Span. For example, `AgentSpanData` contains information about the Agent, `GenerationSpanData` contains information about the LLM generation, etc.
## Default tracing
By default, the SDK traces the following:
- The entire `Runner.{run, run_sync, run_streamed}()` is wrapped in a `trace()`.
- Each time an agent runs, it is wrapped in `agent_span()`
- LLM generations are wrapped in `generation_span()`
- Function tool calls are each wrapped in `function_span()`
- Guardrails are wrapped in `guardrail_span()`
- Handoffs are wrapped in `handoff_span()`
- Audio inputs (speech-to-text) are wrapped in a `transcription_span()`
- Audio outputs (text-to-speech) are wrapped in a `speech_span()`
- Related audio spans may be parented under a `speech_group_span()`
By default, the trace is named "Agent workflow". You can set this name if you use `trace`, or you can configure the name and other properties with the [`RunConfig`][agents.run.RunConfig].
In addition, you can set up [custom trace processors](#custom-tracing-processors) to push traces to other destinations (as a replacement, or secondary destination).
## Higher level traces
Sometimes, you might want multiple calls to `run()` to be part of a single trace. You can do this by wrapping the entire code in a `trace()`.
```python
from agents import Agent, Runner, trace
async def main():
agent = Agent(name="Joke generator", instructions="Tell funny jokes.")
with trace("Joke workflow"): # (1)!
first_result = await Runner.run(agent, "Tell me a joke")
second_result = await Runner.run(agent, f"Rate this joke: {first_result.final_output}")
print(f"Joke: {first_result.final_output}")
print(f"Rating: {second_result.final_output}")
```
1. Because the two calls to `Runner.run` are wrapped in a `with trace()`, the individual runs will be part of the overall trace rather than creating two traces.
## Creating traces
You can use the [`trace()`][agents.tracing.trace] function to create a trace. Traces need to be started and finished. You have two options to do so:
1. **Recommended**: use the trace as a context manager, i.e. `with trace(...) as my_trace`. This will automatically start and end the trace at the right time.
2. You can also manually call [`trace.start()`][agents.tracing.Trace.start] and [`trace.finish()`][agents.tracing.Trace.finish].
The current trace is tracked via a Python [`contextvar`](https://docs.python.org/3/library/contextvars.html). This means that it works with concurrency automatically. If you manually start/end a trace, you'll need to pass `mark_as_current` and `reset_current` to `start()`/`finish()` to update the current trace.
## Creating spans
You can use the various [`*_span()`][agents.tracing.create] methods to create a span. In general, you don't need to manually create spans. A [`custom_span()`][agents.tracing.custom_span] function is available for tracking custom span information.
Spans are automatically part of the current trace, and are nested under the nearest current span, which is tracked via a Python [`contextvar`](https://docs.python.org/3/library/contextvars.html).
## Sensitive data
Certain spans may capture potentially sensitive data.
The `generation_span()` stores the inputs/outputs of the LLM generation, and `function_span()` stores the inputs/outputs of function calls. These may contain sensitive data, so you can disable capturing that data via [`RunConfig.trace_include_sensitive_data`][agents.run.RunConfig.trace_include_sensitive_data].
Similarly, Audio spans include base64-encoded PCM data for input and output audio by default. You can disable capturing this audio data by configuring [`VoicePipelineConfig.trace_include_sensitive_audio_data`][agents.voice.pipeline_config.VoicePipelineConfig.trace_include_sensitive_audio_data].
## Custom tracing processors
The high level architecture for tracing is:
- At initialization, we create a global [`TraceProvider`][agents.tracing.setup.TraceProvider], which is responsible for creating traces.
- We configure the `TraceProvider` with a [`BatchTraceProcessor`][agents.tracing.processors.BatchTraceProcessor] that sends traces/spans in batches to a [`BackendSpanExporter`][agents.tracing.processors.BackendSpanExporter], which exports the spans and traces to the OpenAI backend in batches.
To customize this default setup, to send traces to alternative or additional backends or modifying exporter behavior, you have two options:
1. [`add_trace_processor()`][agents.tracing.add_trace_processor] lets you add an **additional** trace processor that will receive traces and spans as they are ready. This lets you do your own processing in addition to sending traces to OpenAI's backend.
2. [`set_trace_processors()`][agents.tracing.set_trace_processors] lets you **replace** the default processors with your own trace processors. This means traces will not be sent to the OpenAI backend unless you include a `TracingProcessor` that does so.
## Tracing with Non-OpenAI Models
You can use an OpenAI API key with non-OpenAI Models to enable free tracing in the OpenAI Traces dashboard without needing to disable tracing.
```python
import os
from agents import set_tracing_export_api_key, Agent, Runner
from agents.extensions.models.litellm_model import LitellmModel
tracing_api_key = os.environ["OPENAI_API_KEY"]
set_tracing_export_api_key(tracing_api_key)
model = LitellmModel(
model="your-model-name",
api_key="your-api-key",
)
agent = Agent(
name="Assistant",
model=model,
)
```
## Notes
- View free traces at Openai Traces dashboard.
## External tracing processors list
- [Weights & Biases](https://weave-docs.wandb.ai/guides/integrations/openai_agents)
- [Arize-Phoenix](https://docs.arize.com/phoenix/tracing/integrations-tracing/openai-agents-sdk)
- [Future AGI](https://docs.futureagi.com/future-agi/products/observability/auto-instrumentation/openai_agents)
- [MLflow (self-hosted/OSS)](https://mlflow.org/docs/latest/tracing/integrations/openai-agent)
- [MLflow (Databricks hosted)](https://docs.databricks.com/aws/en/mlflow/mlflow-tracing#-automatic-tracing)
- [Braintrust](https://braintrust.dev/docs/guides/traces/integrations#openai-agents-sdk)
- [Pydantic Logfire](https://logfire.pydantic.dev/docs/integrations/llms/openai/#openai-agents)
- [AgentOps](https://docs.agentops.ai/v1/integrations/agentssdk)
- [Scorecard](https://docs.scorecard.io/docs/documentation/features/tracing#openai-agents-sdk-integration)
- [Keywords AI](https://docs.keywordsai.co/integration/development-frameworks/openai-agent)
- [LangSmith](https://docs.smith.langchain.com/observability/how_to_guides/trace_with_openai_agents_sdk)
- [Maxim AI](https://www.getmaxim.ai/docs/observe/integrations/openai-agents-sdk)
- [Comet Opik](https://www.comet.com/docs/opik/tracing/integrations/openai_agents)
- [Langfuse](https://langfuse.com/docs/integrations/openaiagentssdk/openai-agents)
- [Langtrace](https://docs.langtrace.ai/supported-integrations/llm-frameworks/openai-agents-sdk)
- [Okahu-Monocle](https://github.com/monocle2ai/monocle)
- [Galileo](https://v2docs.galileo.ai/integrations/openai-agent-integration#openai-agent-integration)
- [Portkey AI](https://portkey.ai/docs/integrations/agents/openai-agents)
- [LangDB AI](https://docs.langdb.ai/getting-started/working-with-agent-frameworks/working-with-openai-agents-sdk)
- [Agenta](https://docs.agenta.ai/observability/integrations/openai-agents)
File: docs/usage.md
# Usage
The Agents SDK automatically tracks token usage for every run. You can access it from the run context and use it to monitor costs, enforce limits, or record analytics.
## What is tracked
- **requests**: number of LLM API calls made
- **input_tokens**: total input tokens sent
- **output_tokens**: total output tokens received
- **total_tokens**: input + output
- **details**:
- `input_tokens_details.cached_tokens`
- `output_tokens_details.reasoning_tokens`
## Accessing usage from a run
After `Runner.run(...)`, access usage via `result.context_wrapper.usage`.
```python
result = await Runner.run(agent, "What's the weather in Tokyo?")
usage = result.context_wrapper.usage
print("Requests:", usage.requests)
print("Input tokens:", usage.input_tokens)
print("Output tokens:", usage.output_tokens)
print("Total tokens:", usage.total_tokens)
```
Usage is aggregated across all model calls during the run (including tool calls and handoffs).
### Enabling usage with LiteLLM models
LiteLLM providers do not report usage metrics by default. When you are using [`LitellmModel`](models/litellm.md), pass `ModelSettings(include_usage=True)` to your agent so that LiteLLM responses populate `result.context_wrapper.usage`.
```python
from agents import Agent, ModelSettings, Runner
from agents.extensions.models.litellm_model import LitellmModel
agent = Agent(
name="Assistant",
model=LitellmModel(model="your/model", api_key="..."),
model_settings=ModelSettings(include_usage=True),
)
result = await Runner.run(agent, "What's the weather in Tokyo?")
print(result.context_wrapper.usage.total_tokens)
```
## Accessing usage with sessions
When you use a `Session` (e.g., `SQLiteSession`), each call to `Runner.run(...)` returns usage for that specific run. Sessions maintain conversation history for context, but each run's usage is independent.
```python
session = SQLiteSession("my_conversation")
first = await Runner.run(agent, "Hi!", session=session)
print(first.context_wrapper.usage.total_tokens) # Usage for first run
second = await Runner.run(agent, "Can you elaborate?", session=session)
print(second.context_wrapper.usage.total_tokens) # Usage for second run
```
Note that while sessions preserve conversation context between runs, the usage metrics returned by each `Runner.run()` call represent only that particular execution. In sessions, previous messages may be re-fed as input to each run, which affects the input token count in consequent turns.
## Using usage in hooks
If you're using `RunHooks`, the `context` object passed to each hook contains `usage`. This lets you log usage at key lifecycle moments.
```python
class MyHooks(RunHooks):
async def on_agent_end(self, context: RunContextWrapper, agent: Agent, output: Any) -> None:
u = context.usage
print(f"{agent.name} → {u.requests} requests, {u.total_tokens} total tokens")
```
## API Reference
For detailed API documentation, see:
- [`Usage`][agents.usage.Usage] - Usage tracking data structure
- [`RunContextWrapper`][agents.run.RunContextWrapper] - Access usage from run context
- [`RunHooks`][agents.run.RunHooks] - Hook into usage tracking lifecycle
File: docs/visualization.md
# Agent Visualization
Agent visualization allows you to generate a structured graphical representation of agents and their relationships using **Graphviz**. This is useful for understanding how agents, tools, and handoffs interact within an application.
## Installation
Install the optional `viz` dependency group:
```bash
pip install "openai-agents[viz]"
```
## Generating a Graph
You can generate an agent visualization using the `draw_graph` function. This function creates a directed graph where:
- **Agents** are represented as yellow boxes.
- **MCP Servers** are represented as grey boxes.
- **Tools** are represented as green ellipses.
- **Handoffs** are directed edges from one agent to another.
### Example Usage
```python
import os
from agents import Agent, function_tool
from agents.mcp.server import MCPServerStdio
from agents.extensions.visualization import draw_graph
@function_tool
def get_weather(city: str) -> str:
return f"The weather in {city} is sunny."
spanish_agent = Agent(
name="Spanish agent",
instructions="You only speak Spanish.",
)
english_agent = Agent(
name="English agent",
instructions="You only speak English",
)
current_dir = os.path.dirname(os.path.abspath(__file__))
samples_dir = os.path.join(current_dir, "sample_files")
mcp_server = MCPServerStdio(
name="Filesystem Server, via npx",
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", samples_dir],
},
)
triage_agent = Agent(
name="Triage agent",
instructions="Handoff to the appropriate agent based on the language of the request.",
handoffs=[spanish_agent, english_agent],
tools=[get_weather],
mcp_servers=[mcp_server],
)
draw_graph(triage_agent)
```

This generates a graph that visually represents the structure of the **triage agent** and its connections to sub-agents and tools.
## Understanding the Visualization
The generated graph includes:
- A **start node** (`__start__`) indicating the entry point.
- Agents represented as **rectangles** with yellow fill.
- Tools represented as **ellipses** with green fill.
- MCP Servers represented as **rectangles** with grey fill.
- Directed edges indicating interactions:
- **Solid arrows** for agent-to-agent handoffs.
- **Dotted arrows** for tool invocations.
- **Dashed arrows** for MCP server invocations.
- An **end node** (`__end__`) indicating where execution terminates.
**Note:** MCP servers are rendered in recent versions of the
`agents` package (verified in **v0.2.8**). If you don’t see MCP boxes
in your visualization, upgrade to the latest release.
## Customizing the Graph
### Showing the Graph
By default, `draw_graph` displays the graph inline. To show the graph in a separate window, write the following:
```python
draw_graph(triage_agent).view()
```
### Saving the Graph
By default, `draw_graph` displays the graph inline. To save it as a file, specify a filename:
```python
draw_graph(triage_agent, filename="agent_graph")
```
This will generate `agent_graph.png` in the working directory.