Skip to content

Commit 9bdb70a

Browse files
authored
🤖 fix: handle GitHub Copilot context limits and retry behavior (#2431)
## Summary Fix GitHub Copilot models getting stuck in infinite retry loops when the prompt exceeds the provider's context window. Three root causes addressed: model stats lookup failures, error misclassification, and incorrect tokenizer fallbacks. ## Background A user reported Mux showing repeated **Stream Error (API)** messages with Copilot models (e.g., `prompt token count of 128067 exceeds the limit of 128000`) and auto-retrying endlessly (attempt 9+), never surfacing the existing **"Compact & retry"** recovery UI. Three independent issues combined to cause this: 1. **Model stats lookup failed for Copilot models.** `models.json` has Copilot entries under `github_copilot/` keys (with underscore) but without cost fields. Our `isValidModelData()` required cost fields, and `generateLookupKeys()` checked bare model names first (matching OpenAI entries) and didn't normalize `github-copilot` → `github_copilot`. 2. **Token-limit errors were misclassified.** `categorizeError()` only detected Anthropic-style context errors (`prompt is too long`). Copilot's `prompt token count ... exceeds the limit ...` message was classified as `"api"` (retryable), not `"context_exceeded"` (non-retryable). 3. **Wrong tokenizer fallback.** Copilot hosts models from multiple providers (Claude, Gemini, GPT), but all `github-copilot:*` models fell back to the OpenAI tokenizer regardless of the underlying model. ## Implementation ### 1. `modelStats.ts` — Fix Copilot token limit resolution - Added provider alias mapping (`github-copilot` → `github_copilot`) for LiteLLM key generation - Reversed lookup priority: provider-prefixed keys checked first, bare model name last - Relaxed `isValidModelData()` to only require `max_input_tokens` (not cost fields) - Default missing costs to `0` (Copilot is subscription-based) - Added `parseNum()` helper for safe numeric string parsing ### 2. `streamManager.ts` — Classify Copilot context errors correctly - Expanded `categorizeError()` to detect `"token" + "exceeds" + "limit"` pattern - This makes Copilot token-limit errors return `context_exceeded`, which: - Stops auto-retry (already in `NON_RETRYABLE_STREAM_ERRORS`) - Shows existing **"Compact & retry"** UI (already in `StreamErrorMessage`) ### 3. `tokenizer.ts` — Smart fallback for Copilot model tokenizers - When provider is `github-copilot`, infer tokenizer from model name prefix: - `claude-*` → Anthropic tokenizer - `gemini-*` → Google tokenizer - `gpt-*` / others → OpenAI tokenizer ## Validation - `make static-check` — all checks pass - `bun test src/common/utils/tokens/modelStats.test.ts` — 27/27 pass (4 new Copilot-specific tests) --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$1.99`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=1.99 -->
1 parent 62deee1 commit 9bdb70a

4 files changed

Lines changed: 89 additions & 26 deletions

File tree

src/common/utils/tokens/modelStats.test.ts

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,33 @@ describe("getModelStats", () => {
8383
});
8484
});
8585

86+
describe("github copilot models", () => {
87+
test("should prefer github copilot provider-specific limits", () => {
88+
const stats = getModelStats("github-copilot:gpt-4-o-preview");
89+
expect(stats).not.toBeNull();
90+
expect(stats?.max_input_tokens).toBe(64000);
91+
});
92+
93+
test("should default missing copilot costs to zero", () => {
94+
const stats = getModelStats("github-copilot:gpt-4.1");
95+
expect(stats).not.toBeNull();
96+
expect(stats?.max_input_tokens).toBe(128000);
97+
expect(stats?.input_cost_per_token).toBe(0);
98+
expect(stats?.output_cost_per_token).toBe(0);
99+
});
100+
101+
test("should resolve claude sonnet copilot entries", () => {
102+
const stats = getModelStats("github-copilot:claude-sonnet-4.5");
103+
expect(stats).not.toBeNull();
104+
expect(stats?.max_input_tokens).toBeGreaterThan(0);
105+
});
106+
107+
test("should resolve claude haiku copilot entries", () => {
108+
const stats = getModelStats("github-copilot:claude-haiku-4.5");
109+
expect(stats).not.toBeNull();
110+
});
111+
});
112+
86113
describe("unknown models", () => {
87114
test("should return null for completely unknown model", () => {
88115
const stats = getModelStats("unknown:fake-model-9000");

src/common/utils/tokens/modelStats.ts

Lines changed: 34 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,44 @@ interface RawModelData {
2121
[key: string]: unknown;
2222
}
2323

24+
const PROVIDER_KEY_ALIASES: Record<string, string> = {
25+
// GitHub Copilot keys in models.json use underscores for LiteLLM provider names.
26+
"github-copilot": "github_copilot",
27+
};
28+
29+
function parseNum(value: unknown): number | null {
30+
if (typeof value === "number" && Number.isFinite(value)) {
31+
return value;
32+
}
33+
34+
if (typeof value === "string") {
35+
const parsed = Number(value.replace(/,/g, "").trim());
36+
return Number.isFinite(parsed) ? parsed : null;
37+
}
38+
39+
return null;
40+
}
41+
2442
/**
2543
* Validates raw model data has required fields
2644
*/
2745
function isValidModelData(data: RawModelData): boolean {
28-
return (
29-
typeof data.max_input_tokens === "number" &&
30-
typeof data.input_cost_per_token === "number" &&
31-
typeof data.output_cost_per_token === "number"
32-
);
46+
const maxInputTokens = parseNum(data.max_input_tokens);
47+
return maxInputTokens != null && maxInputTokens > 0;
3348
}
3449

3550
/**
3651
* Extracts ModelStats from validated raw data
3752
*/
3853
function extractModelStats(data: RawModelData): ModelStats {
39-
// Type assertions are safe here because isValidModelData() already validated these fields
40-
/* eslint-disable @typescript-eslint/non-nullable-type-assertion-style */
4154
return {
42-
max_input_tokens: data.max_input_tokens as number,
43-
max_output_tokens:
44-
typeof data.max_output_tokens === "number" ? data.max_output_tokens : undefined,
45-
input_cost_per_token: data.input_cost_per_token as number,
46-
output_cost_per_token: data.output_cost_per_token as number,
55+
max_input_tokens: parseNum(data.max_input_tokens) ?? 0,
56+
max_output_tokens: parseNum(data.max_output_tokens) ?? undefined,
57+
// Subscription providers like GitHub Copilot omit per-token costs.
58+
input_cost_per_token:
59+
typeof data.input_cost_per_token === "number" ? data.input_cost_per_token : 0,
60+
output_cost_per_token:
61+
typeof data.output_cost_per_token === "number" ? data.output_cost_per_token : 0,
4762
cache_creation_input_token_cost:
4863
typeof data.cache_creation_input_token_cost === "number"
4964
? data.cache_creation_input_token_cost
@@ -53,7 +68,6 @@ function extractModelStats(data: RawModelData): ModelStats {
5368
? data.cache_read_input_token_cost
5469
: undefined,
5570
};
56-
/* eslint-enable @typescript-eslint/non-nullable-type-assertion-style */
5771
}
5872

5973
/**
@@ -64,26 +78,24 @@ function generateLookupKeys(modelString: string): string[] {
6478
const colonIndex = modelString.indexOf(":");
6579
const provider = colonIndex !== -1 ? modelString.slice(0, colonIndex) : "";
6680
const modelName = colonIndex !== -1 ? modelString.slice(colonIndex + 1) : modelString;
81+
const litellmProvider = PROVIDER_KEY_ALIASES[provider] ?? provider;
6782

68-
const keys: string[] = [
69-
modelName, // Direct model name (e.g., "claude-opus-4-1")
70-
];
83+
const keys: string[] = [];
7184

72-
// Add provider-prefixed variants for Ollama and other providers
85+
// Prefer provider-scoped matches first so provider-specific limits win over generic entries.
7386
if (provider) {
74-
keys.push(
75-
`${provider}/${modelName}`, // "ollama/gpt-oss:20b"
76-
`${provider}/${modelName}-cloud` // "ollama/gpt-oss:20b-cloud" (LiteLLM convention)
77-
);
87+
keys.push(`${litellmProvider}/${modelName}`, `${litellmProvider}/${modelName}-cloud`);
7888

7989
// Fallback: strip size suffix for base model lookup
8090
// "ollama:gpt-oss:20b" → "ollama/gpt-oss"
8191
if (modelName.includes(":")) {
8292
const baseModel = modelName.split(":")[0];
83-
keys.push(`${provider}/${baseModel}`);
93+
keys.push(`${litellmProvider}/${baseModel}`);
8494
}
8595
}
8696

97+
keys.push(modelName);
98+
8799
return keys;
88100
}
89101

src/node/services/streamManager.ts

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2353,9 +2353,17 @@ export class StreamManager extends EventEmitter {
23532353
return "model_not_found";
23542354
}
23552355

2356-
// Check for Anthropic context exceeded errors
2356+
// Check for context exceeded errors (Anthropic + OpenAI-compatible / Copilot)
23572357
const msgLower = error.message.toLowerCase();
2358-
if (msgLower.includes("prompt is too long") || msgLower.includes("input is too long")) {
2358+
2359+
// Anthropic: "prompt is too long" / "input is too long"
2360+
// Copilot / OpenAI-compatible: "prompt token count of X exceeds the limit of Y"
2361+
const isContextExceeded =
2362+
msgLower.includes("prompt is too long") ||
2363+
msgLower.includes("input is too long") ||
2364+
(msgLower.includes("token") && msgLower.includes("exceeds") && msgLower.includes("limit"));
2365+
2366+
if (isContextExceeded) {
23592367
return "context_exceeded";
23602368
}
23612369

src/node/utils/main/tokenizer.ts

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,26 @@ function resolveModelName(modelString: string): ModelName {
8181

8282
if (!modelName) {
8383
const provider = normalized.split(":")[0] || "anthropic";
84+
85+
// GitHub Copilot hosts models from multiple providers.
86+
// Infer the tokenizer family from the model name prefix.
87+
let effectiveProvider = provider;
88+
if (provider === "github-copilot") {
89+
const modelId = normalized.split(":")[1] || "";
90+
if (modelId.startsWith("claude-")) {
91+
effectiveProvider = "anthropic";
92+
} else if (modelId.startsWith("gemini-")) {
93+
effectiveProvider = "google";
94+
} else {
95+
// gpt-*, grok-*, and unknown models use OpenAI tokenizer
96+
effectiveProvider = "openai";
97+
}
98+
}
99+
84100
const fallbackModel =
85-
provider === "anthropic"
101+
effectiveProvider === "anthropic"
86102
? "anthropic/claude-sonnet-4.5"
87-
: provider === "google"
103+
: effectiveProvider === "google"
88104
? "google/gemini-2.5-pro"
89105
: "openai/gpt-5";
90106

0 commit comments

Comments
 (0)