Conversation
There was a problem hiding this comment.
Pull request overview
Adds an RFC proposing a shift from per-request/lazy MCP backend client creation to session-scoped client lifecycle management to simplify the current pooling approach while preserving backend state across tool calls.
Changes:
- Introduces a new RFC describing a session-scoped backend client map owned by
VMCPSession. - Specifies eager client initialization during session setup (
AfterInitialize) and cleanup during session close. - Outlines phased implementation steps, testing strategy, and alternatives considered.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f935166 to
6e503d4
Compare
The RFC captures the complete architectural vision from the discussion, providing a roadmap for simplifying the client pooling implementation while maintaining all current functionality.
6e503d4 to
4f970e6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
840b512 to
ddf4273
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
50f4742 to
ec70d08
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
54dd1b4 to
d714819
Compare
jerm-dro
left a comment
There was a problem hiding this comment.
I left two bigger comments elaborating on my thoughts. To summarize, I think there are larger problems with how the session concept is handled within vMCP. Namely, we have session concerns scattered throughout the codebase and all of these concerns are tightly coupled. As a solution, I think we should create an interface that encapsulates these concerns:
// Session represents an active MCP session with all its capabilities and resources.
// This is a pure domain object - no protocol concerns and can be run without being spun up in a server.
type Session interface {
ID() string
// Capabilities - returns discovered tools/resources for this session
// Perhaps these could be combined into one "GetCapabilities" method.
Tools() []Tool
Resources() []Resource
// MCP operations - routing logic is encapsulated here
CallTool(ctx context.Context, name string, arguments map[string]any) (*ToolResult, error)
ReadResource(ctx context.Context, uri string) (*ResourceResult, error)
GetPrompt(ctx context.Context, name string, arguments map[string]any) (*PromptResult, error)
// Lifecycle
Close() error
}That would make solving this client-lifetime problem trivial.
Address architectural feedback from PR #38 review, specifically jerm-dro's comments about the root cause being scattered session concerns rather than just client lifecycle management. Major changes: 1. **Expanded Problem Statement**: - Identified root cause: session concerns scattered across middleware, adapters, and hooks - Current VMCPSession is a passive data container, not a domain object - Session construction tangled with SDK lifecycle callbacks - Hard to create session-scoped resources (clients, optimizer features) 2. **Introduced Core Interfaces**: - Session interface: Domain object that owns clients, encapsulates routing, manages lifecycle (not just a data container) - SessionFactory interface: Creates fully-formed sessions with all dependencies (discovery, client init, routing setup) - SessionManager: Bridges between domain (Session) and protocol (SDK) 3. **Updated Goals**: - Primary goal: Encapsulate session behavior in clean interfaces - Decouple session creation from SDK wiring - Enable testing without full server (addresses toolhive#2852) - Enable future features through interface decoration 4. **Revised Implementation Plan**: - Phase 1: Introduce Session/SessionFactory interfaces - Phase 2: Wire SessionManager to SDK - Phase 3: Migrate existing code to use Session interface - Phase 4: Cleanup and observability 5. **Added Benefits Documentation**: - Concrete examples of decorator pattern (optimizer-in-vmcp) - Simplified capability refresh approach - Clear testing strategy without server dependency 6. **Fixed Formatting Issues** (from Copilot review): - Split Goals & Non-Goals into separate sections - Changed "Security" to "Security Considerations" - Restructured Compatibility section - Fixed References to use full repo-qualified format - Added credential refresh considerations The RFC now addresses the architectural root cause (scattered session concerns) rather than just the symptom (per-request client creation). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactor RFC to address scattered session architecture concerns Address architectural feedback from PR #38 review, specifically jerm-dro's comments about the root cause being scattered session concerns rather than just client lifecycle management. Major changes: 1. **Expanded Problem Statement**: - Identified root cause: session concerns scattered across middleware, adapters, and hooks - Current VMCPSession is a passive data container, not a domain object - Session construction tangled with SDK lifecycle callbacks - Hard to create session-scoped resources (clients, optimizer features) 2. **Introduced Core Interfaces**: - Session interface: Domain object that owns clients, encapsulates routing, manages lifecycle (not just a data container) - SessionFactory interface: Creates fully-formed sessions with all dependencies (discovery, client init, routing setup) - SessionManager: Bridges between domain (Session) and protocol (SDK) 3. **Updated Goals**: - Primary goal: Encapsulate session behavior in clean interfaces - Decouple session creation from SDK wiring - Enable testing without full server (addresses toolhive#2852) - Enable future features through interface decoration 4. **Revised Implementation Plan**: - Phase 1: Introduce Session/SessionFactory interfaces - Phase 2: Wire SessionManager to SDK - Phase 3: Migrate existing code to use Session interface - Phase 4: Cleanup and observability 5. **Added Benefits Documentation**: - Concrete examples of decorator pattern (optimizer-in-vmcp) - Simplified capability refresh approach - Clear testing strategy without server dependency 6. **Fixed Formatting Issues** (from Copilot review): - Split Goals & Non-Goals into separate sections - Changed "Security" to "Security Considerations" - Restructured Compatibility section - Fixed References to use full repo-qualified format - Added credential refresh considerations The RFC now addresses the architectural root cause (scattered session concerns) rather than just the symptom (per-request client creation). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
d698c3e to
9654f19
Compare
Refactor RFC to address scattered session architecture concerns Address architectural feedback from PR #38 review, specifically jerm-dro's comments about the root cause being scattered session concerns rather than just client lifecycle management. Major changes: 1. **Expanded Problem Statement**: - Identified root cause: session concerns scattered across middleware, adapters, and hooks - Current VMCPSession is a passive data container, not a domain object - Session construction tangled with SDK lifecycle callbacks - Hard to create session-scoped resources (clients, optimizer features) 2. **Introduced Core Interfaces**: - Session interface: Domain object that owns clients, encapsulates routing, manages lifecycle (not just a data container) - SessionFactory interface: Creates fully-formed sessions with all dependencies (discovery, client init, routing setup) - SessionManager: Bridges between domain (Session) and protocol (SDK) 3. **Updated Goals**: - Primary goal: Encapsulate session behavior in clean interfaces - Decouple session creation from SDK wiring - Enable testing without full server (addresses toolhive#2852) - Enable future features through interface decoration 4. **Revised Implementation Plan**: - Phase 1: Introduce Session/SessionFactory interfaces - Phase 2: Wire SessionManager to SDK - Phase 3: Migrate existing code to use Session interface - Phase 4: Cleanup and observability 5. **Added Benefits Documentation**: - Concrete examples of decorator pattern (optimizer-in-vmcp) - Simplified capability refresh approach - Clear testing strategy without server dependency 6. **Fixed Formatting Issues** (from Copilot review): - Split Goals & Non-Goals into separate sections - Changed "Security" to "Security Considerations" - Restructured Compatibility section - Fixed References to use full repo-qualified format - Added credential refresh considerations The RFC now addresses the architectural root cause (scattered session concerns) rather than just the symptom (per-request client creation). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Update Last Updated date to 2026-02-09
There was a problem hiding this comment.
Hey, appreciate the work on this RFC. The per-request client creation problem is very real and worth solving (I filed stacklok/toolhive#2417 about exactly this pain point). The session-scoped client lifecycle and the Session-as-domain-object approach are solid improvements. That said, I have some concerns about how this interacts with the existing session storage architecture we built specifically for externalizing sessions to Redis/Valkey.
Let me dig in...
What we already have in place
Over PRs #1677, #1989, #1770, and #1771, we built a deliberate, phased session management architecture:
Storageinterface (pkg/transport/session/storage.go): A 5-method pluggable abstraction (Store,Load,Delete,DeleteExpired,Close) designed specifically to swap between in-memory and distributed backends (Redis/Valkey)LocalStorage: In-memorysync.Mapimplementation behind theStorageinterfaceManager: UsesStorage+Factory, withNewManagerWithStorage()for injecting custom backendsserialization.go: JSON ser/deser already prepared for Redis/Valkey (currentlynolint:unused, ready for when we implement a distributed backend)
The whole point of this layering was: when we're ready for distributed sessions, we implement a RedisStorage (or equivalent) satisfying the Storage interface, and nothing else changes. Not the Manager, not the proxies, not the vMCP server.
With this in mind, here are my concerns:
The parallel storage path problem
The RFC's SessionManager introduces its own sync.Map for session storage:
// From RFC: SessionManager
type sessionManager struct {
sessions sync.Map // bypasses the Storage interface
factory SessionFactory
}So, basically, this creates a parallel session management path that doesn't go through transportsession.Manager or the Storage interface. The RFC says "TTL management and storage are handled by the existing session storage layer", but the proposed implementation stores sessions in its own sync.Map, not through the existing Storage interface.
This means when we implement Redis/Valkey for transport sessions, the vMCP session data managed by this new SessionManager won't be externalized. We'd need to separately refactor this sync.Map later, which is exactly the kind of rework the Storage interface was designed to prevent.
Suggestion: The SessionManager should delegate session storage to transportsession.Manager (which already uses the pluggable Storage interface) rather than maintaining its own sync.Map. The existing Manager.AddSession(), Manager.Get(), and Manager.Delete() methods already provide exactly what's needed here.
How do the two Session interfaces compose?
The RFC proposes a new Session interface (behavior-oriented: CallTool(), ReadResource(), etc.) while we already have transportsession.Session (storage-oriented: ID(), Type(), Touch(), GetData()). These serve different purposes, but the RFC doesn't show how they compose.
Note that currently VMCPSession embeds StreamableSession which implements transportsession.Session. That's what makes it compatible with the Storage interface and Manager. The new behavior-oriented Session needs to either:
- Embed/implement
transportsession.Sessionso it remains compatible with the existing storage layer, or - Clearly document the two-layer model: transport session (storage) wraps/references domain session (behavior)
Without this, the new Session type won't be storable through the Storage interface, which breaks the externalization path.
Partial initialization and serializability
The RFC allows sessions to continue with partial backends (some failed during MakeSession()). If we later add Redis/Valkey persistence, a session restored on a different instance wouldn't know which backends originally failed vs. which were never attempted. The session state model needs to account for this if it's going to be serializable.
What I think should change
- Wire through
transportsession.Managerinstead of a standalonesync.Map. TheManageralready supports custom factories and pluggable storage. - Show the composition model: How does the new behavior-oriented
Sessionrelate totransportsession.Session? Does it embed it? Wrap it? Replace it? - Add a section on session serializability: What part of the new
Sessionis metadata (storable in Redis/Valkey) vs. runtime state (client connections, goroutines that must stay in-memory)? Issue stacklok/toolhive#2417 already sketched this dual-layer model.
The RFC solves a real problem and the domain-object approach is the right direction. I just want to make sure we don't accidentally create architectural debt that blocks the Redis/Valkey externalization path we've been building toward.
6f2f6a9 to
d79bb82
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b05416f to
63c5d62
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
63c5d62 to
6d03d71
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6d03d71 to
6bcdb92
Compare
|
@JAORMX Thanks for the review and context around I'm not opposed to preserving the existing session storage mechanism as part of this work. The main thrust of this RFC is encapsulating the session concept and behavior into a coherent domain object, and, based on my read of the code, that can still be accomplished without changing the storage mechanism. That said, it's worth acknowledging that we're currently paying a cognitive cost to maintain the storage abstraction layer ahead of a distributed backend that doesn't exist yet. I'd be curious to hear your thoughts on the timeline for that work, since it would help us calibrate how much we invest in preserving compatibility with it right now vs. keeping things simple. In the future, I also think it's worth exploring whether we can simplify the |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - **Bounded concurrency**: Limit parallel goroutines (e.g., 10 concurrent initializations) to avoid resource exhaustion | ||
| - **Per-backend timeout**: Apply context timeout (e.g., 5s per backend) so one slow backend doesn't block session creation | ||
| - **Partial initialization**: If some backends fail, log warnings and continue with successfully initialized backends (failed backends not added to clients map) |
There was a problem hiding this comment.
The parallel client initialization uses bounded concurrency (e.g., 10 concurrent initializations) and per-backend timeouts (e.g., 5s), but these specific values are provided as examples without justification. The optimal values depend on expected network latency, backend responsiveness, and resource constraints. Consider providing guidance on how to tune these parameters based on deployment characteristics, or making them configurable. Additionally, document the trade-offs: higher concurrency reduces session creation latency but increases resource usage and potential backend load spikes.
| - **Bounded concurrency**: Limit parallel goroutines (e.g., 10 concurrent initializations) to avoid resource exhaustion | |
| - **Per-backend timeout**: Apply context timeout (e.g., 5s per backend) so one slow backend doesn't block session creation | |
| - **Partial initialization**: If some backends fail, log warnings and continue with successfully initialized backends (failed backends not added to clients map) | |
| - **Bounded concurrency**: Limit parallel goroutines (for example, 10 concurrent initializations) to avoid resource exhaustion. | |
| - These values **MUST be configurable** (e.g., via server configuration or environment variables), not hard-coded constants in the implementation. | |
| - Tuning guidance: | |
| - Start with a baseline such as `min(numBackends, 8–16)` and adjust based on CPU usage, memory pressure, backend rate limits, and startup latency SLOs. | |
| - Higher concurrency reduces end-to-end session creation latency but increases CPU/memory usage and can cause load spikes or connection storms against backends. | |
| - Lower concurrency smooths backend load and reduces resource contention at the cost of slower session initialization when many backends are configured. | |
| - **Per-backend timeout**: Apply a per-backend context timeout (for example, 5s per backend) so one slow or hung backend does not block overall session creation. | |
| - This timeout SHOULD be deployment-specific: shorter in low-latency, co-located deployments; longer in high-latency or intermittently slow environments. | |
| - Consider setting the per-backend timeout relative to observed p95–p99 connection + handshake latency (e.g., `2–3x p99`) and revisiting it as telemetry is collected in production. | |
| - Trade-off: shorter timeouts improve resilience and responsiveness to unhealthy backends but may prematurely drop slow-yet-healthy backends; longer timeouts improve tolerance for slow backends but increase worst-case session startup time. | |
| - **Partial initialization**: If some backends fail (e.g., time out or error during client creation), log warnings and continue with successfully initialized backends (failed backends are not added to the clients map). |
| 1. **`Generate()` phase**: Creates empty session (UUID only), always succeeds | ||
| - Stores empty session in map | ||
| - Returns session ID to SDK | ||
| - SDK sends `Mcp-Session-Id` header in `InitializeResult` | ||
|
|
||
| 2. **`OnRegisterSession` hook phase**: Calls `SessionFactory.MakeSession()` - failure possible here | ||
| - If `MakeSession()` fails completely (e.g., all backends unreachable): | ||
| - Log error with session ID and failure details | ||
| - Create session with **empty capabilities** (no tools, resources, or prompts) | ||
| - SDK registers empty capability list, client sees "no tools available" | ||
| - Keep session in map (allows graceful handling without special error checks) | ||
| - Client experience: | ||
| - Session exists but has no capabilities | ||
| - Client can query session but sees empty tool list | ||
| - Client may delete session and re-initialize, or backend recovery may populate capabilities later | ||
|
|
||
| **Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead of storing initialization errors that require checks in every method (`InitError()` pattern is error-prone - easy to forget checks when adding new methods), we create sessions with empty capabilities. Failed backends don't advertise tools/resources, so clients never try to call them. This is simpler and safer than requiring defensive `InitError()` checks throughout the codebase. | ||
|
|
There was a problem hiding this comment.
The error handling section describes creating sessions with empty capabilities when MakeSession fails, but this approach could lead to confusing user experience. A client successfully creates a session but then finds no tools available, without clear indication that initialization failed. Consider enhancing the error handling to provide explicit feedback mechanisms, such as adding a session status field that clients can query, or including initialization errors in the session metadata that can be retrieved via a dedicated endpoint.
| 1. **`Generate()` phase**: Creates empty session (UUID only), always succeeds | |
| - Stores empty session in map | |
| - Returns session ID to SDK | |
| - SDK sends `Mcp-Session-Id` header in `InitializeResult` | |
| 2. **`OnRegisterSession` hook phase**: Calls `SessionFactory.MakeSession()` - failure possible here | |
| - If `MakeSession()` fails completely (e.g., all backends unreachable): | |
| - Log error with session ID and failure details | |
| - Create session with **empty capabilities** (no tools, resources, or prompts) | |
| - SDK registers empty capability list, client sees "no tools available" | |
| - Keep session in map (allows graceful handling without special error checks) | |
| - Client experience: | |
| - Session exists but has no capabilities | |
| - Client can query session but sees empty tool list | |
| - Client may delete session and re-initialize, or backend recovery may populate capabilities later | |
| **Rationale**: The two-phase creation pattern (empty session + populate via hook) is necessary because the SDK's `Generate()` must return a session ID. Additionally, the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly. Instead of storing initialization errors that require checks in every method (`InitError()` pattern is error-prone - easy to forget checks when adding new methods), we create sessions with empty capabilities. Failed backends don't advertise tools/resources, so clients never try to call them. This is simpler and safer than requiring defensive `InitError()` checks throughout the codebase. | |
| 1. **`Generate()` phase**: Creates an empty session shell (UUID only), always succeeds | |
| - Stores session in map with status `initializing` | |
| - Returns session ID to SDK | |
| - SDK sends `Mcp-Session-Id` header in `InitializeResult` | |
| 2. **`OnRegisterSession` hook phase**: Calls `SessionFactory.MakeSession()` - failure possible here | |
| - On **success**: | |
| - Populate session capabilities (tools, resources, prompts, clients map, etc.) | |
| - Update session status to `ready` | |
| - Optionally store a brief "initialized at <timestamp>" metadata record | |
| - If `MakeSession()` fails completely (e.g., all backends unreachable): | |
| - Log error with session ID and failure details | |
| - Do **not** attempt to fabricate capabilities; leave capabilities empty | |
| - Update session status to `failed` | |
| - Store an initialization error record in session metadata, e.g.: | |
| - `error_message` (human-readable summary) | |
| - optional `error_code` / `backend_errors` for structured inspection | |
| - Keep session in map (so the ID remains valid and can be inspected/deleted) | |
| **Client experience for failed initialization**: | |
| - Session exists but is marked as `failed` | |
| - Capabilities list is empty (no tools, resources, or prompts) | |
| - A dedicated "session inspection" surface exposes status + error metadata, e.g.: | |
| - via an existing `GetSession`/`ListSessions` API augmented with `status` and `init_error` | |
| - or via a small MCP-specific endpoint / tool that returns session status + error info | |
| - SDK clients / UIs are expected to: | |
| - Check the session status when they observe an empty capability list | |
| - Surface the initialization error message to users | |
| - Allow users (or automation) to delete and re-initialize the session once backends recover | |
| **Rationale**: The two-phase creation pattern (empty session shell + populate via hook) is necessary because the SDK's `Generate()` must return a session ID, and the SDK's `OnRegisterSession` hook does not allow returning an error, so failures during `MakeSession()` cannot be propagated directly in-band. Instead of silently treating initialization failures as "normal" sessions with empty capabilities, we centralize failure handling in the session's `status` and initialization error metadata. This avoids the `InitError()` pattern (which requires checks in every method and is easy to forget when adding new methods), while still giving clients an explicit, queryable signal that initialization failed. Failed backends don't advertise tools/resources, so clients never try to call them, and the additional status+metadata allow UIs and automated clients to distinguish "no tools configured" from "initialization failed." |
| - Store a cryptographic hash of the original authentication token (e.g., `SHA256(bearerToken)`) in the session during creation | ||
| - On each request, validate that the current auth token hash matches the session's bound token hash | ||
| - If mismatch, reject with "session authentication mismatch" error and terminate session | ||
| - This prevents stolen session IDs from being used with different credentials | ||
|
|
There was a problem hiding this comment.
The session hijacking prevention section recommends storing a cryptographic hash of the authentication token (e.g., SHA256(bearerToken)), but doesn't address token rotation scenarios. If a client's authentication token is legitimately refreshed (e.g., OAuth token refresh), the hash would no longer match, causing session validation to fail even for the legitimate client. Consider designing the token binding mechanism to support token refresh patterns, such as storing multiple valid token hashes or implementing a token family concept.
| - Store a cryptographic hash of the original authentication token (e.g., `SHA256(bearerToken)`) in the session during creation | |
| - On each request, validate that the current auth token hash matches the session's bound token hash | |
| - If mismatch, reject with "session authentication mismatch" error and terminate session | |
| - This prevents stolen session IDs from being used with different credentials | |
| - Store a **token binding record** in the session during creation, which includes a cryptographic hash of the initial authentication token (e.g., `SHA256(bearerToken)`) | |
| - Support **token refresh/rotation** by allowing the token binding record to track a small set of non-expired token hashes (e.g., the current token plus one or two recently rotated tokens with short grace periods) | |
| - On each request, validate that the hash of the current auth token matches one of the non-expired hashes in the token binding record; if it matches a new token, update the record to rotate out old hashes | |
| - If no match is found, reject with a "session authentication mismatch" error and terminate the session | |
| - This prevents stolen session IDs from being used with different credentials while remaining compatible with OAuth-style token refresh and rotation |
| - **Note**: Phase 1 will be done incrementally across multiple PRs if necessary, reusing existing implementation pieces. This allows us to introduce the bulk of the code without having to worry about refactoring the existing system. | ||
|
|
There was a problem hiding this comment.
The Phase 1 implementation mentions "reusing existing implementation pieces" and doing work "incrementally across multiple PRs if necessary," but doesn't provide clear guidance on what can be reused versus what needs to be written fresh. This ambiguity could lead to implementation inconsistencies or unnecessary refactoring. Consider providing a more detailed breakdown of which existing components can be leveraged (e.g., aggregator, client factory) and which require new implementations.
| - **Note**: Phase 1 will be done incrementally across multiple PRs if necessary, reusing existing implementation pieces. This allows us to introduce the bulk of the code without having to worry about refactoring the existing system. | |
| - **Execution model**: Phase 1 will be done incrementally across multiple PRs. Each PR should be strictly additive (no behavior changes for existing session types) and should keep the old and new paths side‑by‑side. | |
| - **Reused components**: | |
| - Existing session storage implementations used by `transportsession.Manager` (e.g., in‑memory/local, Redis), including their TTL/`Touch()` behavior and expiration cleanup. | |
| - Existing client creation code paths where possible (e.g., HTTP/backend client factory, aggregators, or other shared client wiring logic), wrapped or adapted inside the new `SessionFactory` rather than re‑implemented. | |
| - Existing logging, metrics, and configuration loading mechanisms for sessions and backends. | |
| - **New components**: | |
| - Domain‑level `Session` type, `SessionFactory`, and `SessionManager` interfaces plus their initial concrete implementations. | |
| - Wiring from `SessionManager` into the existing MCP SDK integration points (e.g., hooks that call `SessionFactory.MakeSession()` and map SDK session IDs to stored domain sessions). | |
| - Any glue code needed to adapt existing client factories/aggregators into the new session‑scoped lifecycle (without changing their external behavior). |
| - **Owns backend clients**: Session stores pre-initialized MCP clients in an internal map | ||
| - **Encapsulates routing**: `CallTool()` looks up tool in routing table, routes to correct backend client | ||
| - **Manageable lifetime**: `Close()` cleans up clients and any other resources. The SessionManager/caller is decoupled from what exactly happens on close() | ||
| - **Thread-safe**: All methods protected by internal RWMutex for concurrent access to prevent data races on internal state. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Note: The RWMutex prevents data races but does not prevent `CallTool()` from using a closed client after `Close()` completes—the session implementation must track closed state explicitly (see "Client Closed Mid-Request" in Error Handling). Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session |
There was a problem hiding this comment.
The thread safety section (line 299) states that "Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession)," but this is speculative without concrete analysis. Removing mutexes in other components could introduce race conditions if those components have concurrent access patterns not covered by Session's internal locking. This statement should either be removed, made conditional ("may enable future investigation into..."), or backed by specific analysis of where mutex removal is safe.
| - **Thread-safe**: All methods protected by internal RWMutex for concurrent access to prevent data races on internal state. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Note: The RWMutex prevents data races but does not prevent `CallTool()` from using a closed client after `Close()` completes—the session implementation must track closed state explicitly (see "Client Closed Mid-Request" in Error Handling). Proper encapsulation may allow removal of mutexes elsewhere (e.g., SessionManager, current VMCPSession) since state is now owned by the Session | |
| - **Thread-safe**: All methods protected by internal RWMutex for concurrent access to prevent data races on internal state. Read operations (Tools, Resources, Prompts, CallTool, ReadResource, GetPrompt) use read locks; write operations (Close) use write locks. Methods like `Tools()`, `Resources()`, `Prompts()` return defensive copies (not references to internal slices) to prevent caller mutations. Internal maps and slices are never exposed directly. Note: The RWMutex prevents data races but does not prevent `CallTool()` from using a closed client after `Close()` completes—the session implementation must track closed state explicitly (see "Client Closed Mid-Request" in Error Handling). Proper encapsulation confines concurrency concerns within the Session and may enable future investigation into whether mutex usage in other components (e.g., SessionManager, current VMCPSession) can be simplified, but this RFC does not prescribe or assume any such changes. |
JAORMX
left a comment
There was a problem hiding this comment.
Thanks for addressing the feedback, Yolanda! The integration with transportsession.Manager (instead of a parallel sync.Map), the dual-layer storage model (metadata vs. runtime), and the composition model (defaultSession embedding transportsession.Session) all look good. This preserves the Redis/Valkey externalization path we built out in #1677/#1989/#1770/#1771 while delivering the session encapsulation improvements.
LGTM!
6bcdb92 to
1f73b30
Compare
jerm-dro
left a comment
There was a problem hiding this comment.
The changes motivated by others' feedback LGTM.
RFC: Session-Scoped Architecture for vMCP
This RFC refactors vMCP session management to fix per-request client creation and scattered session concerns.
Problem
backends (Playwright, databases)
the full server
Solution
Introduce two core interfaces:
Key improvement: Clients created once during session init, reused for all requests, closed on expiration.