Skip to content

Conversation

@yrobla
Copy link
Contributor

@yrobla yrobla commented Feb 9, 2026

Description

Add comprehensive documentation for Virtual MCP Server sizing guidance and health check configuration to help operators plan deployments and monitor backend availability.

Type of change

  • New documentation

Related issues/PRs

#512

Submitter checklist

Content and formatting

  • I have reviewed the content for technical accuracy
  • I have reviewed the content for spelling, grammar, and style

Navigation

  • New pages include a frontmatter section with title and description at a minimum
  • Sidebar navigation (sidebars.ts) updated for added, deleted, reordered, or renamed files
  • Redirects added to vercel.json for moved, renamed, or deleted pages (i.e., if the URL slug changed)

Reviewer checklist

Content

  • I have reviewed the content for technical accuracy
  • I have reviewed the content for spelling, grammar, and style

Copilot AI review requested due to automatic review settings February 9, 2026 12:14
@vercel
Copy link

vercel bot commented Feb 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs-website Ready Ready Preview, Comment Feb 11, 2026 9:19am

Request Review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds operator-focused documentation for Virtual MCP Server (vMCP) deployment planning/sizing and for configuring and observing backend health checks, to help plan production deployments and monitor backend availability.

Changes:

  • Added “Deployment planning” guidance (baseline resources, scaling factors, and operational indicators) to the vMCP introduction.
  • Added a new “Configure health checks” section to backend discovery docs, including CRD-based configuration examples and operational monitoring notes.
  • Updated the vMCP configuration guide to link to the new health check documentation and related backend discovery info.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
docs/toolhive/guides-vmcp/intro.mdx Adds deployment planning guidance for sizing/capacity and scaling considerations.
docs/toolhive/guides-vmcp/configuration.mdx Updates “Next steps” and related links to point readers to health checks and backend discovery.
docs/toolhive/guides-vmcp/backend-discovery.mdx Documents health check configuration, circuit breaker settings, timeouts, and health status monitoring.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

danbarr
danbarr previously approved these changes Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Copy link
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few blocking questions, but otherwise LGTM

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines +276 to +303
:::caution[Network overhead]

Enabling health checks for remote backends increases network traffic to external
services. Only enable this if you need real-time health status for remote
endpoints.

:::

#### Degraded backend detection

Backends are marked as **degraded** when:

- Health checks succeed but response times exceed 5 seconds (slow performance)
- Backend recently recovered from failures and is stabilizing

Degraded backends remain in the routing table but may indicate performance
problems.

#### Health check behavior

1. **Initial check**: All backends checked immediately on startup
2. **Periodic checks**: Repeated at `healthCheckInterval` (default: 30s)
3. **Status updates**: Reported to Kubernetes at `statusReportingInterval`
(default: 30s)
4. **Backend unhealthy**: After `unhealthyThreshold` consecutive failures
(default: 3), backend marked unhealthy
5. **Recovery**: One successful check marks backend as healthy (or degraded if
slow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like too much content for this page. I'd recommend deleting it. This page is really just "how do I configure X" without detail on the implementation or implications.

Comment on lines +90 to +95
## Next steps

- Review [performance and sizing guidance](./performance.mdx) for deployment
planning
- Follow the [Quickstart: Virtual MCP Server](../tutorials/quickstart-vmcp.mdx)
tutorial
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is redundant with "Related Information." I'd also recommend putting the Quickstart at the top of the list, given this is an intro page.

Comment on lines +37 to +46
## Backend scale recommendations

vMCP performs well across different scales:

| Backend Count | Use Case | Notes |
| ------------- | ----------------------------- | ------------------------------------ |
| 1-5 | Small teams, focused toolsets | Minimal resource overhead |
| 5-15 | Medium teams, diverse tools | Recommended range for most use cases |
| 15-30 | Large teams, comprehensive | Increase health check interval |
| 30+ | Enterprise-scale deployments | Consider multiple vMCP instances |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: delete. This is a long page and doesn't add much information.

Comment on lines +111 to +119
:::info[Why no replicas field?]

VirtualMCPServer intentionally omits a `spec.replicas` field to avoid conflicts
with HPA/VPA autoscaling. This design allows you to choose between static
scaling (kubectl) or dynamic autoscaling (HPA/VPA) without operator
interference.

For static replica counts, scale the Deployment after creating the
VirtualMCPServer. The operator will preserve your scaling configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: delete. It's redundant with the information above.

Comment on lines +267 to +280
### Monitoring

Track these metrics via [telemetry integration](./telemetry-and-metrics.mdx):

| Metric | Healthy State | Action Threshold |
| ------------------------- | ------------- | -------------------------- |
| Backend request latency | P95 < SLO | Alert on spikes |
| Backend error rate | < 1% | Investigate > 5% |
| Health check success rate | > 95% | Early warning |
| Workflow execution time | Varies | Check for serial execution |

**Setup:** Create dashboards for trend analysis and configure alerts for
anomalies. Catches degradation before users notice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: delete. Anyone operating vMCP at scale would already be doing this.

Comment on lines +123 to +153
:::caution[Backend scaling]

When scaling vMCP horizontally, the backend MCP servers will also see increased
load. Ensure your backend deployments (MCPServer resources) are also scaled
appropriately to handle the additional traffic.

:::

**Session affinity is required** when using multiple replicas. Clients must be
routed to the same vMCP instance for the duration of their session. Configure
based on your deployment:

- **Kubernetes Service**: Use `sessionAffinity: ClientIP` for basic
client-to-pod stickiness
- Note: This is IP-based and may not work well behind proxies or with changing
client IPs
- **Ingress Controller**: Configure cookie-based sticky sessions (recommended)
- nginx: Use `nginx.ingress.kubernetes.io/affinity: cookie`
- Other controllers: Consult your Ingress controller documentation
- **Gateway API**: Use appropriate session affinity configuration based on your
Gateway implementation

:::tip[Session affinity recommendations]

- For **stateless backends**: Cookie-based sticky sessions work well and provide
reliable routing through proxies
- For **stateful backends** (Playwright, databases): Consider vertical scaling
or dedicated vMCP instances instead of horizontal scaling with session
affinity, as session resumption may not work reliably

:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: delete. I'd prefer to reduce the information here. A lot of this is specific to the environment vMCP is specifically scaled in.

@@ -0,0 +1,287 @@
---
title: Performance and sizing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker: rename this page "Scaling" and keep the content just focused on:

  1. How can I vertically scale?
  2. How can I horizontally scale?
  3. When is horizontally scaling hard?

Comment on lines +13 to +23
### Baseline resources

**Minimal deployment** (development/testing):

- **CPU**: 100m (0.1 cores)
- **Memory**: 128Mi

**Production deployment** (recommended):

- **CPU**: 500m (0.5 cores)
- **Memory**: 512Mi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: delete. This isn't that useful and takes up a lot of visual space

Image

Comment on lines +168 to +186
## When to scale

### Scale up (increase resources)

Increase CPU and memory when you observe:

- High CPU usage (>70% sustained) during normal operations
- Memory pressure or OOM (out-of-memory) kills
- Slow response times (>1 second) for simple tool calls
- Health check timeouts or frequent backend unavailability

### Scale out (increase replicas)

Add more vMCP instances when:

- CPU usage remains high despite increasing resources
- You need higher availability and fault tolerance
- Request volume exceeds capacity of a single instance
- You want to distribute load across multiple availability zones
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: delete. This is redundant with information elsewhere.

items: [
'toolhive/guides-vmcp/intro',
'toolhive/guides-vmcp/configuration',
'toolhive/guides-vmcp/performance',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance should be the bottom-most item in the vMCP guide. It's arguably the most advanced topic, so it should come last.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants