Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions assets/lab/environments/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ During rollouts, the model can call tools, receive results, and continue reasoni

### MCP Tool Environments

For tools implemented as MCP (Model Context Protocol) servers, `MCPEnv` extends `ToolEnv` to provide an integration that automatically connects to MCP servers and exposes their tools to the model:
For tools implemented as MCP (Model Context Protocol) servers, `MCPEnv` extends `StatefulToolEnv` and can connect to MCP servers over stdio, streamable HTTP, or per-rollout sandbox transports:

```python
mcp_servers = [
Expand All @@ -456,14 +456,31 @@ mcp_servers = [

vf_env = vf.MCPEnv(
mcp_servers=mcp_servers,
transport_type="stdio", # or "http" / "sandbox"
dataset=dataset,
rubric=rubric,
)
```

By default, stdio/http transports are shared across rollouts (`connection_scope="shared"`), which is a good fit for stateless read-only MCP servers. For stateful MCP workflows, use `connection_scope="rollout"` or the sandbox transport, which defaults to isolated per-rollout state.

For HTTP transports, each server needs a URL either inline or via `http_urls`:

```python
vf_env = vf.MCPEnv(
mcp_servers=[{"name": "remote-search", "url": "https://example.com/mcp"}],
transport_type="http",
http_timeout=30.0, # applies to MCP handshake and tool calls
dataset=dataset,
rubric=rubric,
)
```

For sandbox transports, `command`/`args` must start an MCP server that serves streamable HTTP on the exposed sandbox port. `MCPEnv` will expose that port and connect to the server's `/mcp` endpoint.

### Stateful Tool Environments

`ToolEnv` and `MCPEnv` are designed for stateless, read-only tools where no session state needs to persist across calls within a rollout. For tools that require per-rollout state—such as a sandbox container, database connection, or session ID—use `StatefulToolEnv`.
`ToolEnv` is designed for stateless, read-only tools where no session state needs to persist across calls within a rollout. For tools that require per-rollout state—such as a sandbox container, database connection, or session ID—use `StatefulToolEnv`. `MCPEnv` builds on this same stateful foundation for MCP-backed tools.

The `setup_state` method is called at the beginning of each rollout for all environments which extend `MultiTurnEnv`, but is a no-op by default (including in `ToolEnv`).

Expand Down Expand Up @@ -598,7 +615,7 @@ Verifiers defines a hierarchy of error types under `vf.Error`:
- `vf.ModelError` — errors from model interactions (e.g., `vf.EmptyModelResponseError`)
- `vf.OverlongPromptError` — prompt exceeds model context length
- `vf.ToolError` — tool-related errors (`vf.ToolParseError`, `vf.ToolCallError`)
- `vf.InfraError` — infrastructure errors (e.g., `vf.SandboxError`)
- `vf.InfraError` — infrastructure errors (e.g., `vf.SandboxError`, `vf.TunnelError`)

When a `vf.Error` is raised during a rollout, it is automatically caught and stored in `state["error"]`, triggering the built-in `has_error` stop condition at the next check. This allows rollouts to terminate gracefully rather than crashing.

Expand Down
21 changes: 19 additions & 2 deletions docs/environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,7 @@ During rollouts, the model can call tools, receive results, and continue reasoni

### MCP Tool Environments

For tools implemented as MCP (Model Context Protocol) servers, `MCPEnv` extends `ToolEnv` to provide an integration that automatically connects to MCP servers and exposes their tools to the model:
For tools implemented as MCP (Model Context Protocol) servers, `MCPEnv` extends `StatefulToolEnv` and can connect to MCP servers over stdio, streamable HTTP, or per-rollout sandbox transports:

```python
mcp_servers = [
Expand All @@ -450,14 +450,31 @@ mcp_servers = [

vf_env = vf.MCPEnv(
mcp_servers=mcp_servers,
transport_type="stdio", # or "http" / "sandbox"
dataset=dataset,
rubric=rubric,
)
```

By default, stdio/http transports are shared across rollouts (`connection_scope="shared"`), which is a good fit for stateless read-only MCP servers. For stateful MCP workflows, use `connection_scope="rollout"` or the sandbox transport, which defaults to isolated per-rollout state.

For HTTP transports, each server needs a URL either inline or via `http_urls`:

```python
vf_env = vf.MCPEnv(
mcp_servers=[{"name": "remote-search", "url": "https://example.com/mcp"}],
transport_type="http",
http_timeout=30.0, # applies to MCP handshake and tool calls
dataset=dataset,
rubric=rubric,
)
```

For sandbox transports, `command`/`args` must start an MCP server that serves streamable HTTP on the exposed sandbox port. `MCPEnv` will expose that port and connect to the server's `/mcp` endpoint.

### Stateful Tool Environments

`ToolEnv` and `MCPEnv` are designed for stateless, read-only tools where no session state needs to persist across calls within a rollout. For tools that require per-rollout state—such as a sandbox container, database connection, or session ID—use `StatefulToolEnv`.
`ToolEnv` is designed for stateless, read-only tools where no session state needs to persist across calls within a rollout. For tools that require per-rollout state—such as a sandbox container, database connection, or session ID—use `StatefulToolEnv`. `MCPEnv` builds on this same stateful foundation for MCP-backed tools.
Comment thread
cursor[bot] marked this conversation as resolved.
Comment thread
cursor[bot] marked this conversation as resolved.

The `setup_state` method is called at the beginning of each rollout for all environments which extend `MultiTurnEnv`, but is a no-op by default (including in `ToolEnv`).

Expand Down
43 changes: 43 additions & 0 deletions docs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,49 @@ Tool calling with stateless Python functions. Automatically converts functions t

Tools requiring per-rollout state. Override `setup_state` and `update_tool_args` to inject state.

#### MCPEnv

```python
class MCPEnv(StatefulToolEnv):
def __init__(
self,
mcp_servers: list[MCPServerConfig | dict] | None = None,
tools: list[Callable] | None = None,
transport_type: Literal["stdio", "http", "sandbox"] = "stdio",
connection_scope: Literal["shared", "rollout"] | None = None,
http_urls: dict[str, str] | None = None,
http_timeout: float = 30.0,
http_max_retries: int = 3,
sandbox_image: str = "python:3.11-slim",
sandbox_start_command: str = "tail -f /dev/null",
sandbox_environment_vars: dict[str, str] | None = None,
sandbox_cpu_cores: int = 1,
sandbox_memory_gb: int = 2,
sandbox_disk_size_gb: int = 5,
sandbox_timeout_minutes: int = 60,
sandbox_port_to_expose: int = 8000,
**kwargs,
): ...
```

Transport-backed MCP tool environment built on `StatefulToolEnv`.

**Key parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `mcp_servers` | `list[MCPServerConfig \| dict] \| None` | MCP server definitions |
| `transport_type` | `"stdio" \| "http" \| "sandbox"` | MCP transport backend |
| `connection_scope` | `"shared" \| "rollout" \| None` | Shared transports across rollouts or isolated per-rollout transports |
| `http_urls` | `dict[str, str] \| None` | Per-server URL overrides for HTTP transports |
| `http_timeout` | `float` | Timeout for MCP handshake and tool calls |
| `sandbox_image` | `str` | Docker image used for sandbox MCP servers |
| `sandbox_start_command` | `str` | Initial sandbox start command |
| `sandbox_environment_vars` | `dict[str, str] \| None` | Extra sandbox environment variables |
| `sandbox_port_to_expose` | `int` | Sandbox port exposed for MCP connectivity |

By default, stdio and HTTP transports use `connection_scope="shared"` while sandbox transports default to `connection_scope="rollout"`.

#### SandboxEnv

```python
Expand Down
23 changes: 20 additions & 3 deletions environments/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,7 @@ During rollouts, the model can call tools, receive results, and continue reasoni

### MCP Tool Environments

For tools implemented as MCP (Model Context Protocol) servers, `MCPEnv` extends `ToolEnv` to provide an integration that automatically connects to MCP servers and exposes their tools to the model:
For tools implemented as MCP (Model Context Protocol) servers, `MCPEnv` extends `StatefulToolEnv` and can connect to MCP servers over stdio, streamable HTTP, or per-rollout sandbox transports:

```python
mcp_servers = [
Expand All @@ -456,14 +456,31 @@ mcp_servers = [

vf_env = vf.MCPEnv(
mcp_servers=mcp_servers,
transport_type="stdio", # or "http" / "sandbox"
dataset=dataset,
rubric=rubric,
)
```

By default, stdio/http transports are shared across rollouts (`connection_scope="shared"`), which is a good fit for stateless read-only MCP servers. For stateful MCP workflows, use `connection_scope="rollout"` or the sandbox transport, which defaults to isolated per-rollout state.

For HTTP transports, each server needs a URL either inline or via `http_urls`:

```python
vf_env = vf.MCPEnv(
mcp_servers=[{"name": "remote-search", "url": "https://example.com/mcp"}],
transport_type="http",
http_timeout=30.0, # applies to MCP handshake and tool calls
dataset=dataset,
rubric=rubric,
)
```

For sandbox transports, `command`/`args` must start an MCP server that serves streamable HTTP on the exposed sandbox port. `MCPEnv` will expose that port and connect to the server's `/mcp` endpoint.

### Stateful Tool Environments

`ToolEnv` and `MCPEnv` are designed for stateless, read-only tools where no session state needs to persist across calls within a rollout. For tools that require per-rollout state—such as a sandbox container, database connection, or session ID—use `StatefulToolEnv`.
`ToolEnv` is designed for stateless, read-only tools where no session state needs to persist across calls within a rollout. For tools that require per-rollout state—such as a sandbox container, database connection, or session ID—use `StatefulToolEnv`. `MCPEnv` builds on this same stateful foundation for MCP-backed tools.

The `setup_state` method is called at the beginning of each rollout for all environments which extend `MultiTurnEnv`, but is a no-op by default (including in `ToolEnv`).

Expand Down Expand Up @@ -598,7 +615,7 @@ Verifiers defines a hierarchy of error types under `vf.Error`:
- `vf.ModelError` — errors from model interactions (e.g., `vf.EmptyModelResponseError`)
- `vf.OverlongPromptError` — prompt exceeds model context length
- `vf.ToolError` — tool-related errors (`vf.ToolParseError`, `vf.ToolCallError`)
- `vf.InfraError` — infrastructure errors (e.g., `vf.SandboxError`)
- `vf.InfraError` — infrastructure errors (e.g., `vf.SandboxError`, `vf.TunnelError`)

When a `vf.Error` is raised during a rollout, it is automatically caught and stored in `state["error"]`, triggering the built-in `has_error` stop condition at the next check. This allows rollouts to terminate gracefully rather than crashing.

Expand Down
Loading