Skip to content

Don't crash BYOK/custom models when the response hits the length limit#324121

Merged
vritant24 merged 1 commit into
mainfrom
agents/investigate-unit-test-failure-fix
Jul 3, 2026
Merged

Don't crash BYOK/custom models when the response hits the length limit#324121
vritant24 merged 1 commit into
mainfrom
agents/investigate-unit-test-failure-fix

Conversation

@vritant24

Copy link
Copy Markdown
Member

Fixes #319592

When a custom/BYOK model returns finish_reason: "length" (hit its context
window), we threw Error: Response too long. and discarded everything the model
already streamed.

By the time we get the Length result the partial text has already been
streamed via the finished callback, so instead of throwing we just resolve and
keep it. This matches how the rest of the codebase already handles Length
(defaultIntentRequestHandler, agentIntent, codeMapper all use the partial
text rather than failing).

Copilot AI review requested due to automatic review settings July 2, 2026 22:41
@vritant24 vritant24 self-assigned this Jul 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issue #319592, where custom/BYOK OpenAI-compatible endpoints that return finish_reason: "length" (i.e. hit their context window) caused the Copilot language-model wrapper to throw Error: Response too long. and discard all text the model had already streamed. The fix intercepts the ChatFetchResponseType.Length result in CopilotLanguageModelWrapper._provideLanguageModelResponse and resolves gracefully (returning undefined) instead of throwing, since the partial text was already reported to the consumer via the streaming finishedCb. A warning is logged, and a unit test verifies the truncated response no longer surfaces as a hard failure.

Changes:

  • Treat a Length finish reason as a successful (truncated) response rather than throwing, keeping the already-streamed partial text.
  • Log a warning when a response is truncated due to the length limit.
  • Add a unit test asserting provideLanguageModelResponse does not throw on a Length response.
Show a summary per file
File Description
extensions/copilot/src/extension/conversation/vscode-node/languageModelAccess.ts Adds an early return for ChatFetchResponseType.Length before the failure path, logging a warning and returning undefined instead of throwing "Response too long."
extensions/copilot/src/extension/conversation/vscode-node/test/languageModelAccess.test.ts Adds a "length finish reason" suite verifying that a truncated (Length) response does not throw.

Review details

  • Files reviewed: 2/2 changed files
  • Comments generated: 0
  • Review effort level: Medium

@vritant24 vritant24 enabled auto-merge July 2, 2026 22:50
@vritant24 vritant24 merged commit f85c531 into main Jul 3, 2026
47 of 48 checks passed
@vritant24 vritant24 deleted the agents/investigate-unit-test-failure-fix branch July 3, 2026 00:47
@vs-code-engineering vs-code-engineering Bot added this to the 1.128.0 milestone Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Custom Endpoints with finish_reason: "length" violently crash with "Response too long" instead of rendering generated text

3 participants