Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/upstream-projects.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ projects:

- id: toolhive
repo: stacklok/toolhive
version: v0.27.2
version: v0.28.0
# toolhive is a monorepo covering the CLI, the Kubernetes
# operator, and the vMCP gateway. It also introduces cross-
# cutting features that land in concepts/, integrations/,
Expand Down
15 changes: 15 additions & 0 deletions docs/toolhive/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,21 @@ export TOOLHIVE_USAGE_METRICS_ENABLED=false
Once you opt out, ToolHive stops collecting and sending usage metrics. You need
to restart any running servers for the change to take effect.

### How do I disable update checks?

ToolHive periodically checks for new versions. To disable this check (and the
usage-metrics collection it gates), set the `TOOLHIVE_SKIP_UPDATE_CHECK`
environment variable to `true`:

```bash
export TOOLHIVE_SKIP_UPDATE_CHECK=true
```

The setting is honored by the CLI, the API server, and the Kubernetes operator
telemetry service. For the operator, add it to the `operator.env` list in your
Helm values. Update checks are also skipped automatically when ToolHive detects
a CI environment.

## Security and permissions

### Is it safe to run MCP servers?
Expand Down
13 changes: 13 additions & 0 deletions docs/toolhive/guides-cli/run-mcp-servers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,19 @@ specific proxy port instead, use the `--proxy-port` flag:
thv run --proxy-port <PORT_NUMBER> <SERVER>
```

### Override the session timeout

ToolHive's proxy evicts idle MCP sessions after 2 hours by default. To raise or
lower this inactivity timeout for a workload, pass `--session-ttl` with a Go
duration string:

```bash
thv run --session-ttl 4h <SERVER>
```

Set a longer value when clients hold sessions open for long-running operations,
or a shorter value to free resources faster.

### Run a server exposing only selected tools

ToolHive can filter the tools returned to the client as result of a `tools/list`
Expand Down
76 changes: 71 additions & 5 deletions docs/toolhive/guides-k8s/auth-k8s.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -779,12 +779,78 @@ from the configured endpoint, and the `fieldMapping` section maps
provider-specific response fields to standard user identity fields (for example,
GitHub returns `login` instead of the standard `name` field).

When you omit `userInfo`, the embedded auth server runs in synthesis mode for
this upstream: it derives a non-personally-identifying subject (with a `tk-`
prefix) from the access token and leaves `name` and `email` empty. Use this
configuration for OAuth 2.0 servers that don't expose a userinfo endpoint, such
as MCP authorization servers that comply with the
When you omit `userInfo` and `identityFromToken`, the embedded auth server runs
in synthesis mode for this upstream: it derives a non-personally-identifying
subject (with a `tk-` prefix) from the access token and leaves `name` and
`email` empty. Use this configuration for OAuth 2.0 servers that don't expose a
userinfo endpoint and don't return identity in the token response, such as MCP
authorization servers that comply with the
[MCP authorization specification](https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization).
For OAuth 2.0 servers that return identity in the token response itself, see
[Extract identity from the token response](#extract-identity-from-the-token-response).

:::

### Extract identity from the token response

Some providers don't expose a userinfo endpoint but return user identity in the
OAuth 2.0 token response itself. For these providers, set `identityFromToken` on
`oauth2Config` instead of `userInfo`. The embedded auth server then skips the
userinfo HTTP call and extracts identity from the token response body using
[gjson dot-notation paths](https://github.com/tidwall/gjson#path-syntax):
`username` extracts a top-level field, `authed_user.id` extracts a nested field,
and the pipe operator chains modifiers like `@upstreamjwt`.

For example, Slack's `oauth.v2.access` response includes the authenticated user
ID at `authed_user.id`:

```yaml title="oauth2Config snippet for Slack"
oauth2Config:
# highlight-start
identityFromToken:
subjectPath: authed_user.id
# highlight-end
```

Snowflake returns the authenticated login name as a top-level `username` field
in every authorization-code grant response, and does not expose a userinfo
endpoint:

```yaml title="oauth2Config snippet for Snowflake"
oauth2Config:
# highlight-start
identityFromToken:
subjectPath: username
namePath: username
# highlight-end
```

For providers whose token response embeds identity inside a JWT-shaped access
token, the `@upstreamjwt` modifier decodes the JWT payload so subsequent path
segments can drill into it:

```yaml title="oauth2Config snippet for JWT-embedded identity"
oauth2Config:
# highlight-start
identityFromToken:
subjectPath: 'access_token|@upstreamjwt|sub'
# highlight-end
```

`subjectPath` is required; `namePath` and `emailPath` are optional. Omit
`namePath` and `emailPath` rather than setting them to empty strings.

If you set both `identityFromToken` and `userInfo`, `identityFromToken` takes
precedence and the userinfo HTTP call is skipped. If `identityFromToken` is set
and extraction fails (path missing or unexpected type), authentication fails for
that login attempt. There is no fallback to `userInfo`.

:::warning[Trust model]

Claims read from the token response are trusted via TLS only and are not
cryptographically verified. The `@upstreamjwt` modifier decodes the JWT payload
without verifying its signature. Prefer OIDC ID tokens when you need
cryptographically verifiable claims.

:::

Expand Down
61 changes: 55 additions & 6 deletions docs/toolhive/guides-k8s/rate-limiting.mdx
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
title: Rate limiting
description:
Configure per-user and shared rate limits on MCPServer resources to prevent
noisy neighbors and protect downstream services.
Configure per-user and shared rate limits on MCPServer and VirtualMCPServer
resources to prevent noisy neighbors and protect downstream services.
---

Configure token bucket rate limits on MCPServer resources to control how many
tool invocations users can make. Rate limiting prevents individual users from
monopolizing shared servers and protects downstream services from traffic
spikes.
Configure token bucket rate limits on MCPServer and VirtualMCPServer resources
to control how many tool invocations users can make. Rate limiting prevents
individual users from monopolizing shared servers and protects downstream
services from traffic spikes.

ToolHive supports two scopes of rate limiting:

Expand Down Expand Up @@ -219,6 +219,55 @@ In this example:
also count toward the 100 server-level limit).
- All users combined can make 50 `shared_resource` calls per minute.

## Rate limit a VirtualMCPServer

VirtualMCPServer resources accept the same rate limit shape under
`spec.config.rateLimiting`. The fields and token bucket semantics match the
MCPServer examples above, but the prerequisites are stricter:

- `spec.sessionStorage.provider` must be `redis`. The CRD rejects any
`rateLimiting` configuration without Redis-backed session storage.
- `spec.incomingAuth.type` must be `oidc` when you configure any per-user
bucket - either at the server level or on a per-tool override.

A request must pass both the server-level vMCP limit and the per-tool limit (if
defined). Limits apply to the vMCP aggregator and are independent from any
limits configured on the backend MCPServers it routes to.

```yaml title="vmcp-ratelimit.yaml"
apiVersion: toolhive.stacklok.dev/v1beta1
kind: VirtualMCPServer
metadata:
name: shared-toolkit
namespace: toolhive-system
spec:
groupRef:
name: my-backends
incomingAuth:
type: oidc
oidcConfigRef:
name: my-oidc-config
audience: shared-toolkit
sessionStorage:
provider: redis
address: <YOUR_REDIS_ADDRESS>
config:
# highlight-start
rateLimiting:
shared:
maxTokens: 5000
refillPeriod: 1m0s
perUser:
maxTokens: 200
refillPeriod: 1m0s
tools:
- name: expensive_search
perUser:
maxTokens: 20
refillPeriod: 1m0s
# highlight-end
```

## Next steps

- [Token exchange](./token-exchange-k8s.mdx) to configure token exchange for
Expand Down
11 changes: 11 additions & 0 deletions docs/toolhive/guides-vmcp/authentication.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,17 @@ at `authed_user.access_token`). Add a `tokenResponseMapping` block to the

:::

:::tip[Identity in the token response]

When an upstream returns user identity in the token response itself (Slack
returns it at `authed_user.id`; Snowflake embeds it in the access-token JWT),
set `identityFromToken` on the `oauth2Config` with gjson dot-notation paths for
`subjectPath` (required), `namePath`, and `emailPath`. See
[Extract identity from the token response](../guides-k8s/auth-k8s.mdx#extract-identity-from-the-token-response)
for the full pattern and trust-model caveats.

:::

### Incoming auth with the embedded auth server

When using the embedded auth server, configure `incomingAuth` to validate the
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/guides-vmcp/local-cli.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,7 @@ All `thv vmcp` flags, with their defaults:
| `--optimizer-embedding` | `false` | Enable Tier 2 semantic optimizer (implies `--optimizer`) |
| `--embedding-model` | `BAAI/bge-small-en-v1.5` | HuggingFace model name for the managed TEI container |
| `--embedding-image` | `ghcr.io/huggingface/text-embeddings-inference:cpu-latest` | TEI container image |
| `--session-ttl` | `30m` | Session inactivity timeout as a Go duration (`30m`, `2h`, `168h`) |

### `thv vmcp init`

Expand Down
8 changes: 5 additions & 3 deletions docs/toolhive/guides-vmcp/scaling-and-performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -159,9 +159,11 @@ configure Redis session storage. Total capacity scales as `replicas × 1,000`.

### Session time-to-live (TTL)

The vMCP server applies a **30-minute inactivity TTL** to session metadata. A
session that receives no activity for 30 minutes expires, and the client must
reinitialize it.
The vMCP server applies a **30-minute inactivity TTL** to session metadata by
default. A session that receives no activity for the TTL window expires, and the
client must reinitialize it. When running locally with `thv vmcp serve`, pass
`--session-ttl` (Go duration, for example `--session-ttl=2h`) to raise or lower
this default.

With Redis session storage, the TTL is a sliding window: every request
atomically refreshes the key's expiry. Active sessions remain valid indefinitely
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_client_register.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Valid clients:
- cline: VS Code Cline extension
- codex: OpenAI Codex CLI
- continue: Continue.dev IDE plugins
- copilot-cli: GitHub Copilot CLI
- cursor: Cursor editor
- factory: Factory.ai Droid CLI
- gemini-cli: Google Gemini CLI
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_client_remove.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Valid clients:
- cline: VS Code Cline extension
- codex: OpenAI Codex CLI
- continue: Continue.dev IDE plugins
- copilot-cli: GitHub Copilot CLI
- cursor: Cursor editor
- factory: Factory.ai Droid CLI
- gemini-cli: Google Gemini CLI
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_run.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ thv run [flags] SERVER_OR_IMAGE_OR_PROTOCOL [-- ARGS...]
--runtime-add-package stringArray Add additional packages to install in the builder and runtime stages (can be repeated)
--runtime-image string Override the default base image for protocol schemes (e.g., golang:1.24-alpine, node:20-alpine, python:3.11-slim)
--secret stringArray Specify a secret to be fetched from the secrets manager and set as an environment variable (format: NAME,target=TARGET)
--session-ttl duration Session inactivity timeout (e.g., 30m, 2h); zero uses the default (2h)
--stateless Declare the server as stateless (POST-only, no SSE). Use for MCP servers implementing streamable-HTTP stateless mode.
--target-host string Host to forward traffic to (only applicable to SSE or Streamable HTTP transport) (default "127.0.0.1")
--target-port int Port for the container to expose (only applicable to SSE or Streamable HTTP transport)
Expand Down
1 change: 1 addition & 0 deletions docs/toolhive/reference/cli/thv_vmcp_serve.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ thv vmcp serve [flags]
--optimizer Enable FTS5 keyword optimizer (Tier 1): exposes find_tool and call_tool instead of all backend tools
--optimizer-embedding Enable managed TEI semantic optimizer (Tier 2); implies --optimizer
--port int Port to listen on (default 4483)
--session-ttl duration Session inactivity timeout (e.g., 30m, 2h); zero uses the default (30m)
```

### Options inherited from parent commands
Expand Down
23 changes: 23 additions & 0 deletions docs/toolhive/reference/client-compatibility.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ We've tested ToolHive with these clients:
| Client | Supported | Auto-configuration | Skills support | Notes |
| -------------------------- | :-------: | :----------------: | :------------: | ------------------------------------------- |
| GitHub Copilot (VS Code) | ✅ | ✅ | ✅ | v1.102+ or Insiders version ([see note][3]) |
| GitHub Copilot CLI | ✅ | ✅ | ❌ | |
| Claude Code | ✅ | ✅ | ✅ | v1.0.27+ |
| Cursor | ✅ | ✅ | ✅ | v0.50.0+ |
| Cline (VS Code) | ✅ | ✅ | ✅ | v3.17.10+ |
Expand Down Expand Up @@ -281,6 +282,28 @@ global MCP configuration file whenever you run an MCP server. You can also
configure project-specific MCP servers by creating a
`.continue/mcpServers/<name>.yaml` file in your project directory.

### GitHub Copilot CLI

The [GitHub Copilot CLI](https://docs.github.com/en/copilot/how-tos/copilot-cli)
stores its MCP configuration in a JSON file in your home directory.

- **All platforms**: `~/.copilot/mcp-config.json`

Example configuration:

```json
{
"mcpServers": {
"github": { "url": "http://localhost:19046/mcp", "type": "http" },
"fetch": { "url": "http://localhost:43832/mcp", "type": "http" },
"sqlite": { "url": "http://localhost:51712/sse#sqlite", "type": "sse" }
}
}
```

When you register the Copilot CLI as a client, ToolHive automatically updates
this file whenever you run an MCP server.

## Manual configuration

If your client doesn't support automatic configuration, you'll need to set up
Expand Down
Loading