-
Notifications
You must be signed in to change notification settings - Fork 1.6k
chore(auto mode): prototype for model router #2788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore(auto mode): prototype for model router #2788
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a prototype model router that uses an external classification API to intelligently select which AI model should handle a user query based on whether it requires reasoning capabilities.
Changes:
- Adds
RouterDecisionFetcherclass that calls an external router API to determine the best model for a given prompt - Integrates the router into
AutomodeServiceto use routing decisions during endpoint resolution - Changes cache structure from storing a single endpoint to an array of endpoints per conversation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/platform/endpoint/node/routerDecisionFetcher.ts | New class that fetches model routing decisions from an external API based on query requirements |
| src/platform/endpoint/node/automodeService.ts | Integrates router decision fetcher, updates cache to store multiple endpoints, and uses router to select models |
Comments suppressed due to low confidence (3)
src/platform/endpoint/node/routerDecisionFetcher.ts:60
- The error message refers to "Reasoning classifier" in multiple places (lines 50, 55, 59) but the class is named RouterDecisionFetcher and the purpose is model routing. The terminology should be consistent - either rename references to use "model router" or clarify the relationship between "reasoning classification" and "routing" in the class documentation.
throw new Error(`Reasoning classifier API request failed: ${response.statusText}`);
}
const result: RouterDecisionResponse = await response.json();
this._logService.trace(`Reasoning classifier prediction: ${result.predicted_label}, model: ${result.chosen_model} (confidence: ${(result.confidence * 100).toFixed(1)}%, scores: needs_reasoning=${(result.scores.needs_reasoning * 100).toFixed(1)}%, no_reasoning=${(result.scores.no_reasoning * 100).toFixed(1)}%) (latency_ms: ${result.latency_ms}, candidate models: ${result.candidate_models.join(', ')}, preferred models: ${preferredModels.join(', ')})`);
return result.chosen_model;
} catch (error) {
this._logService.error('Reasoning classification failed', error);
throw error;
src/platform/endpoint/node/automodeService.ts:173
- The API call does not validate that the chosen_model from the router response exists in the knownEndpoints array before attempting to use it. If the router returns a model that's not in knownEndpoints, selectedModel will be undefined and the fallback at line 174-175 will catch it. However, it would be more robust to validate that the routedModel is actually available in knownEndpoints before setting selectedModel, to avoid potential issues if the fallback logic changes.
const routedModel = await this._routerDecisionFetcher.getRoutedModel(chatRequest.prompt.trim(), availableModels, preferredModels);
selectedModel = knownEndpoints.find(e => e.model === routedModel);
}
src/platform/endpoint/node/automodeService.ts:168
- The preferredModels array includes all previously used models from the cache (line 168), but the cache structure change means endpoints can now accumulate without bound (see related issue with line 179). This could result in preferredModels becoming very large over the lifetime of a conversation, potentially impacting router performance. Consider limiting the number of preferred models or implementing proper cache management.
const cachedModels = this._autoModelCache.get(conversationId)?.endpoints.map(e => e.model) || [];
const preferredModels = [...new Set([reserveToken.selected_model, ...cachedModels])];
| if (chatRequest?.prompt.trim().length) { | ||
| const routedModel = await this._routerDecisionFetcher.getRoutedModel(chatRequest.prompt.trim(), availableModels, preferredModels); | ||
| selectedModel = knownEndpoints.find(e => e.model === routedModel); | ||
| } | ||
| if (!selectedModel) { | ||
| selectedModel = knownEndpoints.find(e => e.model === reserveToken.selected_model) || knownEndpoints[0]; | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The router call will throw an error on network failures, but there's no try-catch around this code. When the router fails, the error is caught in getRoutedModel and re-thrown, which will propagate up and potentially break the entire endpoint resolution. Consider wrapping the router call in a try-catch block to gracefully fall back to the default model selection (reserveToken.selected_model) when the router is unavailable or fails.
This issue also appears in the following locations of the same file:
- line 171
| const autoEndpoint = this._instantiationService.createInstance(AutoChatEndpoint, selectedModel, reserveToken.session_token, reserveToken.discounted_costs?.[selectedModel.model] || 0, this._calculateDiscountRange(reserveToken.discounted_costs)); | ||
| this._autoModelCache.set(conversationId, { endpoint: autoEndpoint, tokenBank: reserveTokenBank }); | ||
| const existingEndpoints = this._autoModelCache.get(conversationId)?.endpoints || []; | ||
| const newEndpoints = existingEndpoints.includes(autoEndpoint) ? existingEndpoints : [...existingEndpoints, autoEndpoint]; | ||
| this._autoModelCache.set(conversationId, { endpoints: newEndpoints, tokenBank: reserveTokenBank }); |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoint comparison using includes at line 179 will always be false because AutoChatEndpoint instances are created fresh each time. This means newEndpoints will always grow unboundedly in the cache, leading to a memory leak. The cache should either: (1) compare by model name rather than instance identity, (2) reuse existing endpoint instances when the model hasn't changed, or (3) use a different cache structure that maps model names to endpoints.
This issue also appears in the following locations of the same file:
- line 167
See below for a potential fix:
const existingEndpoints = this._autoModelCache.get(conversationId)?.endpoints || [];
const existingEndpointForModel = existingEndpoints.find(e => e.model === selectedModel.model);
const autoEndpoint = existingEndpointForModel instanceof AutoChatEndpoint
? existingEndpointForModel
: this._instantiationService.createInstance(
AutoChatEndpoint,
selectedModel,
reserveToken.session_token,
reserveToken.discounted_costs?.[selectedModel.model] || 0,
this._calculateDiscountRange(reserveToken.discounted_costs)
);
const newEndpoints = existingEndpointForModel ? existingEndpoints : [...existingEndpoints, autoEndpoint];
|
|
||
| const result: RouterDecisionResponse = await response.json(); | ||
|
|
||
| this._logService.trace(`Reasoning classifier prediction: ${result.predicted_label}, model: ${result.chosen_model} (confidence: ${(result.confidence * 100).toFixed(1)}%, scores: needs_reasoning=${(result.scores.needs_reasoning * 100).toFixed(1)}%, no_reasoning=${(result.scores.no_reasoning * 100).toFixed(1)}%) (latency_ms: ${result.latency_ms}, candidate models: ${result.candidate_models.join(', ')}, preferred models: ${preferredModels.join(', ')})`); |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When preferredModels is an empty array and passed to join, the log message at line 55 will include "preferred models: " with nothing after it. Consider handling the case where preferredModels is empty to make the log message clearer, such as showing "preferred models: none" or omitting this part when empty.
This issue also appears in the following locations of the same file:
- line 50
| this._logService.trace(`Reasoning classifier prediction: ${result.predicted_label}, model: ${result.chosen_model} (confidence: ${(result.confidence * 100).toFixed(1)}%, scores: needs_reasoning=${(result.scores.needs_reasoning * 100).toFixed(1)}%, no_reasoning=${(result.scores.no_reasoning * 100).toFixed(1)}%) (latency_ms: ${result.latency_ms}, candidate models: ${result.candidate_models.join(', ')}, preferred models: ${preferredModels.join(', ')})`); | |
| const preferredModelsDescription = preferredModels.length > 0 ? preferredModels.join(', ') : 'none'; | |
| this._logService.trace(`Reasoning classifier prediction: ${result.predicted_label}, model: ${result.chosen_model} (confidence: ${(result.confidence * 100).toFixed(1)}%, scores: needs_reasoning=${(result.scores.needs_reasoning * 100).toFixed(1)}%, no_reasoning=${(result.scores.no_reasoning * 100).toFixed(1)}%) (latency_ms: ${result.latency_ms}, candidate models: ${result.candidate_models.join(', ')}, preferred models: ${preferredModelsDescription})`); |
| export class RouterDecisionFetcher extends Disposable { | ||
| constructor( | ||
| private readonly _fetcherService: IFetcherService, | ||
| private readonly _logService: ILogService | ||
| ) { | ||
| super(); | ||
| } | ||
|
|
||
| async getRoutedModel(query: string, availableModels: string[], preferredModels: string[]): Promise<string> { | ||
| try { | ||
| const response = await this._fetcherService.fetch(ROUTER_API_URL, { | ||
| method: 'POST', | ||
| headers: { | ||
| 'Content-Type': 'application/json' | ||
| }, | ||
| body: JSON.stringify({ prompt: query, available_models: availableModels, preferred_models: preferredModels }) | ||
| }); | ||
|
|
||
| if (!response.ok) { | ||
| throw new Error(`Reasoning classifier API request failed: ${response.statusText}`); | ||
| } | ||
|
|
||
| const result: RouterDecisionResponse = await response.json(); | ||
|
|
||
| this._logService.trace(`Reasoning classifier prediction: ${result.predicted_label}, model: ${result.chosen_model} (confidence: ${(result.confidence * 100).toFixed(1)}%, scores: needs_reasoning=${(result.scores.needs_reasoning * 100).toFixed(1)}%, no_reasoning=${(result.scores.no_reasoning * 100).toFixed(1)}%) (latency_ms: ${result.latency_ms}, candidate models: ${result.candidate_models.join(', ')}, preferred models: ${preferredModels.join(', ')})`); | ||
|
|
||
| return result.chosen_model; | ||
| } catch (error) { | ||
| this._logService.error('Reasoning classification failed', error); | ||
| throw error; | ||
| } | ||
| } | ||
| } |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RouterDecisionFetcher class lacks test coverage. Given that there is a test file for similar endpoint functionality (copilotChatEndpoint.spec.ts) and this is a critical path for model selection, tests should be added to verify: (1) successful routing responses, (2) handling of API failures, (3) proper model selection from the response, and (4) logging behavior.
| import { IFetcherService } from '../../networking/common/fetcherService'; | ||
|
|
||
| // Remote reasoning classifier configuration | ||
| export const ROUTER_API_URL = 'https://gh-model-router-v1.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; |
Copilot
AI
Jan 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The router API URL is hardcoded and publicly exposed. Consider whether this endpoint should have authentication or if the URL should be configurable through environment variables or settings, especially for testing and deployment flexibility. This would also align with patterns used elsewhere in the codebase for external service endpoints.
| export const ROUTER_API_URL = 'https://gh-model-router-v1.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; | |
| export const ROUTER_API_URL = process.env.GH_MODEL_ROUTER_API_URL ?? 'https://gh-model-router-v1.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; |
| import { IFetcherService } from '../../networking/common/fetcherService'; | ||
|
|
||
| // Remote reasoning classifier configuration | ||
| export const ROUTER_API_URL = 'https://gh-model-router-v1.yellowforest-598004f3.westus3.azurecontainerapps.io/predict'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's bring this in via ExP
| if (chatRequest?.prompt.trim().length) { | ||
| const routedModel = await this._routerDecisionFetcher.getRoutedModel(chatRequest.prompt.trim(), availableModels, preferredModels); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (chatRequest?.prompt.trim().length) { | |
| const routedModel = await this._routerDecisionFetcher.getRoutedModel(chatRequest.prompt.trim(), availableModels, preferredModels); | |
| const prompt = chatRequest?.prompt.trim(); | |
| if (prompt?.length) { | |
| const routedModel = await this._routerDecisionFetcher.getRoutedModel(prompt, availableModels, preferredModels); |
| import { IFetcherService } from '../../networking/common/fetcherService'; | ||
|
|
||
| // Remote reasoning classifier configuration | ||
| export const ROUTER_API_URL = ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have this and instead get the url from ExP
No description provided.