-
Notifications
You must be signed in to change notification settings - Fork 2
Document vMCP performance and health check configuration #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds operator-focused documentation for Virtual MCP Server (vMCP) deployment planning/sizing and for configuring and observing backend health checks, to help plan production deployments and monitor backend availability.
Changes:
- Added “Deployment planning” guidance (baseline resources, scaling factors, and operational indicators) to the vMCP introduction.
- Added a new “Configure health checks” section to backend discovery docs, including CRD-based configuration examples and operational monitoring notes.
- Updated the vMCP configuration guide to link to the new health check documentation and related backend discovery info.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/toolhive/guides-vmcp/intro.mdx | Adds deployment planning guidance for sizing/capacity and scaling considerations. |
| docs/toolhive/guides-vmcp/configuration.mdx | Updates “Next steps” and related links to point readers to health checks and backend discovery. |
| docs/toolhive/guides-vmcp/backend-discovery.mdx | Documents health check configuration, circuit breaker settings, timeouts, and health status monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
jerm-dro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few blocking questions, but otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
| :::caution[Network overhead] | ||
|
|
||
| Enabling health checks for remote backends increases network traffic to external | ||
| services. Only enable this if you need real-time health status for remote | ||
| endpoints. | ||
|
|
||
| ::: | ||
|
|
||
| #### Degraded backend detection | ||
|
|
||
| Backends are marked as **degraded** when: | ||
|
|
||
| - Health checks succeed but response times exceed 5 seconds (slow performance) | ||
| - Backend recently recovered from failures and is stabilizing | ||
|
|
||
| Degraded backends remain in the routing table but may indicate performance | ||
| problems. | ||
|
|
||
| #### Health check behavior | ||
|
|
||
| 1. **Initial check**: All backends checked immediately on startup | ||
| 2. **Periodic checks**: Repeated at `healthCheckInterval` (default: 30s) | ||
| 3. **Status updates**: Reported to Kubernetes at `statusReportingInterval` | ||
| (default: 30s) | ||
| 4. **Backend unhealthy**: After `unhealthyThreshold` consecutive failures | ||
| (default: 3), backend marked unhealthy | ||
| 5. **Recovery**: One successful check marks backend as healthy (or degraded if | ||
| slow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like too much content for this page. I'd recommend deleting it. This page is really just "how do I configure X" without detail on the implementation or implications.
| ## Next steps | ||
|
|
||
| - Review [performance and sizing guidance](./performance.mdx) for deployment | ||
| planning | ||
| - Follow the [Quickstart: Virtual MCP Server](../tutorials/quickstart-vmcp.mdx) | ||
| tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is redundant with "Related Information." I'd also recommend putting the Quickstart at the top of the list, given this is an intro page.
| ## Backend scale recommendations | ||
|
|
||
| vMCP performs well across different scales: | ||
|
|
||
| | Backend Count | Use Case | Notes | | ||
| | ------------- | ----------------------------- | ------------------------------------ | | ||
| | 1-5 | Small teams, focused toolsets | Minimal resource overhead | | ||
| | 5-15 | Medium teams, diverse tools | Recommended range for most use cases | | ||
| | 15-30 | Large teams, comprehensive | Increase health check interval | | ||
| | 30+ | Enterprise-scale deployments | Consider multiple vMCP instances | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: delete. This is a long page and doesn't add much information.
| :::info[Why no replicas field?] | ||
|
|
||
| VirtualMCPServer intentionally omits a `spec.replicas` field to avoid conflicts | ||
| with HPA/VPA autoscaling. This design allows you to choose between static | ||
| scaling (kubectl) or dynamic autoscaling (HPA/VPA) without operator | ||
| interference. | ||
|
|
||
| For static replica counts, scale the Deployment after creating the | ||
| VirtualMCPServer. The operator will preserve your scaling configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: delete. It's redundant with the information above.
| ### Monitoring | ||
|
|
||
| Track these metrics via [telemetry integration](./telemetry-and-metrics.mdx): | ||
|
|
||
| | Metric | Healthy State | Action Threshold | | ||
| | ------------------------- | ------------- | -------------------------- | | ||
| | Backend request latency | P95 < SLO | Alert on spikes | | ||
| | Backend error rate | < 1% | Investigate > 5% | | ||
| | Health check success rate | > 95% | Early warning | | ||
| | Workflow execution time | Varies | Check for serial execution | | ||
|
|
||
| **Setup:** Create dashboards for trend analysis and configure alerts for | ||
| anomalies. Catches degradation before users notice. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: delete. Anyone operating vMCP at scale would already be doing this.
| :::caution[Backend scaling] | ||
|
|
||
| When scaling vMCP horizontally, the backend MCP servers will also see increased | ||
| load. Ensure your backend deployments (MCPServer resources) are also scaled | ||
| appropriately to handle the additional traffic. | ||
|
|
||
| ::: | ||
|
|
||
| **Session affinity is required** when using multiple replicas. Clients must be | ||
| routed to the same vMCP instance for the duration of their session. Configure | ||
| based on your deployment: | ||
|
|
||
| - **Kubernetes Service**: Use `sessionAffinity: ClientIP` for basic | ||
| client-to-pod stickiness | ||
| - Note: This is IP-based and may not work well behind proxies or with changing | ||
| client IPs | ||
| - **Ingress Controller**: Configure cookie-based sticky sessions (recommended) | ||
| - nginx: Use `nginx.ingress.kubernetes.io/affinity: cookie` | ||
| - Other controllers: Consult your Ingress controller documentation | ||
| - **Gateway API**: Use appropriate session affinity configuration based on your | ||
| Gateway implementation | ||
|
|
||
| :::tip[Session affinity recommendations] | ||
|
|
||
| - For **stateless backends**: Cookie-based sticky sessions work well and provide | ||
| reliable routing through proxies | ||
| - For **stateful backends** (Playwright, databases): Consider vertical scaling | ||
| or dedicated vMCP instances instead of horizontal scaling with session | ||
| affinity, as session resumption may not work reliably | ||
|
|
||
| ::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: delete. I'd prefer to reduce the information here. A lot of this is specific to the environment vMCP is specifically scaled in.
| @@ -0,0 +1,287 @@ | |||
| --- | |||
| title: Performance and sizing | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocker: rename this page "Scaling" and keep the content just focused on:
- How can I vertically scale?
- How can I horizontally scale?
- When is horizontally scaling hard?
| ### Baseline resources | ||
|
|
||
| **Minimal deployment** (development/testing): | ||
|
|
||
| - **CPU**: 100m (0.1 cores) | ||
| - **Memory**: 128Mi | ||
|
|
||
| **Production deployment** (recommended): | ||
|
|
||
| - **CPU**: 500m (0.5 cores) | ||
| - **Memory**: 512Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## When to scale | ||
|
|
||
| ### Scale up (increase resources) | ||
|
|
||
| Increase CPU and memory when you observe: | ||
|
|
||
| - High CPU usage (>70% sustained) during normal operations | ||
| - Memory pressure or OOM (out-of-memory) kills | ||
| - Slow response times (>1 second) for simple tool calls | ||
| - Health check timeouts or frequent backend unavailability | ||
|
|
||
| ### Scale out (increase replicas) | ||
|
|
||
| Add more vMCP instances when: | ||
|
|
||
| - CPU usage remains high despite increasing resources | ||
| - You need higher availability and fault tolerance | ||
| - Request volume exceeds capacity of a single instance | ||
| - You want to distribute load across multiple availability zones |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: delete. This is redundant with information elsewhere.
| items: [ | ||
| 'toolhive/guides-vmcp/intro', | ||
| 'toolhive/guides-vmcp/configuration', | ||
| 'toolhive/guides-vmcp/performance', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance should be the bottom-most item in the vMCP guide. It's arguably the most advanced topic, so it should come last.

Description
Add comprehensive documentation for Virtual MCP Server sizing guidance and health check configuration to help operators plan deployments and monitor backend availability.
Type of change
Related issues/PRs
#512
Submitter checklist
Content and formatting
Navigation
sidebars.ts) updated for added, deleted, reordered, or renamed filesvercel.jsonfor moved, renamed, or deleted pages (i.e., if the URL slug changed)Reviewer checklist
Content