fix(copilot): populate Context Window indicator for BYOK chat providers#314801
Open
ba-work wants to merge 4 commits intomicrosoft:mainfrom
Open
fix(copilot): populate Context Window indicator for BYOK chat providers#314801ba-work wants to merge 4 commits intomicrosoft:mainfrom
ba-work wants to merge 4 commits intomicrosoft:mainfrom
Conversation
Fixes microsoft#314722. The host-internal `ExtensionContributedChatEndpoint.makeChatRequest2` hardcoded a zero-filled `APIUsage` literal regardless of what the extension-contributed `LanguageModelChatProvider` actually streamed back, which left the Context Window indicator stuck at 0 / max for every BYOK provider that goes through this endpoint (Anthropic, Gemini-native, and any custom `registerLanguageModelChatProvider`). Add a new `CustomDataPartMimeTypes.Usage = 'usage'` branch on the existing `LanguageModelDataPart` switch in the response stream, plus a permissive `parseExtensionContributedUsage(data)` helper that decodes the UTF-8 JSON payload into the host's existing `APIUsage` shape. When the provider doesn't emit a Usage part the host keeps the historical zero-fallback behaviour, so this is purely additive for existing extensions — no behaviour change unless the provider opts in. Provider opt-in is one extra `progress.report` on the response stream: progress.report(new vscode.LanguageModelDataPart( new TextEncoder().encode(JSON.stringify({ prompt_tokens: 12345, completion_tokens: 678, total_tokens: 13023, prompt_tokens_details: { cached_tokens: 9000 } })), 'usage' // CustomDataPartMimeTypes.Usage )); Last-write-wins on multiple Usage parts, matching the OpenAI streaming convention where the terminating chunk carries the final tally. Parser tolerates partial payloads (zero-fills missing fields) and malformed JSON (returns `undefined`, host falls back to zeros) so a misbehaving provider can't break the host. Defensive coercion: all numeric fields are clamped to non-negative finite numbers via `Math.max(0, ...)` on both the strict and permissive paths. `isApiUsage` only validates `typeof === 'number'`, so without this clamp a provider could emit negatives that would silently corrupt the host's monotonic completion-token counter in `ChatResponseModel.setUsage`. The strict path also preserves any extra fields the provider sent (`cache_creation_input_tokens`, `completion_tokens_details`) for telemetry/future consumers. Tests: 14 new specs in `extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts` covering the parse helper (full / partial / malformed / non-object / type-coerced / negative-clamp / strict-path-extras payloads) and the end-to-end stream behaviour (zero-fallback regression test for microsoft#314722, full usage propagation, cached_tokens path, malformed-JSON fallback, partial payload, last-write-wins). All 241 endpoint specs in `extensions/copilot/src/platform/endpoint/` pass; lint and tsc are clean on the three changed files.
Author
|
@microsoft-github-policy-service agree |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes missing Context Window token-usage reporting for extension-contributed (BYOK) chat providers by allowing them to stream an APIUsage payload through a new LanguageModelDataPart MIME type ('usage') and propagating it through ExtensionContributedChatEndpoint.makeChatRequest2.
Changes:
- Add
CustomDataPartMimeTypes.Usage = 'usage'to define the new stream payload type for usage. - Implement
parseExtensionContributedUsageand consume'usage'data parts inExtensionContributedChatEndpointto populate the success envelope’susage(with zero fallback). - Add vitest coverage for parsing behavior and end-to-end usage propagation through the endpoint.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| extensions/copilot/src/platform/endpoint/common/endpointTypes.ts | Adds the new 'usage' custom data part MIME type and documents its intent/format. |
| extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts | Parses and captures streamed usage parts from extension-contributed providers and returns them in the fetch result. |
| extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts | Adds unit + end-to-end tests validating usage parsing and propagation (including fallback behavior). |
…tighten last-valid-wins semantics Three follow-ups from the Copilot review on microsoft#314801: 1. parseExtensionContributedUsage now validates and clamps every numeric field inside the optional 'prompt_tokens_details' / 'completion_tokens_details' nested objects, not just the three top-level counters and 'cached_tokens'. Without this, a provider could leak negative or non-finite 'cache_creation_input_tokens', 'reasoning_tokens', etc. through to OTel attributes / telemetry. Nested values that are not a plain object (string, array, null) are dropped rather than passed through, so downstream consumers only ever see well-formed shapes. Refactored the strict and permissive paths into a single unified path with two small builder helpers. 2. The 'coerces non-numeric fields to 0' test was passing total_tokens: NaN through JSON.stringify, which lossily becomes null on the wire and so didn't actually exercise non-finite handling. Replaced the NaN value with an array (which exercises the same coercion code path) and added a dedicated 'coerces non-finite numeric fields (Infinity) to 0' test that builds the JSON by hand so '1e999' / '-1e999' parse to Infinity / -Infinity and prove the Number.isFinite gate fires. 3. The dispatch-site comment claimed 'last-write-wins' but the code only updates extensionUsage when parsing succeeds, so the actual contract is 'last-*valid*-wins'. Updated the comment to match reality and added a regression test that emits a valid usage part followed by a malformed one and asserts the earlier valid reading is preserved. Also adds two more spec cases pinning the nested-clamp and drop-on-malformed contracts. 18 specs in extChatEndpoint.spec.ts (was 14); 245 specs in extensions/copilot/src/platform/endpoint/ (was 241). Lint and tsc clean.
…rify Usage doc
1. parseExtensionContributedUsage now rejects top-level JSON arrays.
Without this, a stray '[]' chunk would parse to a zero-filled
APIUsage (since 'typeof [] === "object"' in JS) and could
overwrite an earlier valid reading at the last-valid-wins
dispatch site in makeChatRequest2.
2. CustomDataPartMimeTypes.Usage doc comment used to say the payload
was 'at minimum prompt_tokens/completion_tokens/total_tokens',
which contradicted parseExtensionContributedUsage's permissive
partial-payload behaviour. Updated the comment to reflect that
all fields are optional and missing/non-finite values are treated
as 0, so extension authors aren't misled.
Also extended the existing non-object spec with two array-rejection
cases ('[]', '[1,2,3]') to pin the contract. 245/245 endpoint specs
pass; lint and tsc clean.
…harden ContextManagement shape
1. parseExtensionContributedUsage now rejects payloads that coerce to
an all-zero, detail-less APIUsage. The previous round's key-presence
gate let inputs like {prompt_tokens_details:'oops'},
{completion_tokens_details:null}, {prompt_tokens_details:{}}, and
{prompt_tokens:-3,completion_tokens:-5} pass through to a zero-filled
result that would clobber an earlier valid reading at the
last-valid-wins dispatch site in makeChatRequest2. The new gate
checks the *coerced* result for any positive signal (a non-zero
top-level counter or a non-zero nested-detail field) before
returning truthy. Fully-shaped strict-path payloads still carry the
historical prompt_tokens_details:{cached_tokens:0} placeholder, but
that placeholder no longer counts as a signal on its own.
2. CustomDataPartMimeTypes.ContextManagement now mirrors the first-step
shape rejection from parseExtensionContributedUsage: a parsed
payload that is null, a primitive, or an array is treated the same
as a JSON.parse throw — the data part is silently skipped instead
of forwarded into streamRecorder.callback as a malformed
ContextManagementResponse. Indentation of the new block was also
corrected to match the sibling Usage / StatefulMarker branches.
3. JSDoc on parseExtensionContributedUsage and on
CustomDataPartMimeTypes.Usage updated to describe the new
no-signal-rejection contract instead of the previous (and never
quite accurate) key-presence framing. The function's own contract
now matches the mime-type contract.
Tests: 5 new specs (rejects no-signal-after-coercion, rejects empty /
keyless objects, accepts non-zero nested-detail-only, ContextManagement
malformed-JSON tolerance, ContextManagement non-object-shape rejection
via the same path) plus updates to two existing coercion tests so they
keep at least one positive signal and observe coercion as intended.
250/250 endpoint specs pass; lint and tsc clean on the three changed
files.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #314722.
Problem
The host-internal
ExtensionContributedChatEndpoint.makeChatRequest2hardcoded a zero-filledAPIUsageliteral in its success envelope regardless of what the extension-contributedLanguageModelChatProvideractually streamed back. As a result, the Context Window indicator stayed pinned at 0 / max for every BYOK provider routed through this endpoint (Anthropic, Gemini-native, and any customregisterLanguageModelChatProvider).Fix
Add a new
CustomDataPartMimeTypes.Usage = 'usage'branch on the existingLanguageModelDataPartswitch in the response stream, plus a permissiveparseExtensionContributedUsage(data)helper that decodes the UTF-8 JSON payload into the host's existingAPIUsageshape. When the provider doesn't emit a Usage part the host keeps the historical zero-fallback behaviour, so this is purely additive — no behaviour change for existing extensions unless the provider opts in.Last-write-wins on multiple Usage parts, matching the OpenAI streaming convention where the terminating chunk carries the final tally. The parser tolerates partial payloads (zero-fills missing fields) and malformed JSON (returns
undefined, host falls back to zeros) so a misbehaving provider cannot break the host.Provider opt-in
One extra
progress.reporton the response stream:Defensive coercion
All numeric fields are clamped to non-negative finite numbers via
Math.max(0, ...)on both the strict and permissive paths.isApiUsageonly validatestypeof === 'number', so without this clamp a provider could emit negatives that would silently corrupt the host's monotonic completion-token counter inChatResponseModel.setUsage. The strict path also preserves any extra fields the provider sent (cache_creation_input_tokens,completion_tokens_details) for telemetry / future consumers.Tests
14 new specs in
extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.tscovering:cached_tokenspath, malformed-JSON fallback, partial payload, last-write-winsAll 241 specs in
extensions/copilot/src/platform/endpoint/pass. Lint andtsc --noEmitare clean on the three changed files.How to verify
extChatEndpoint.ts.LanguageModelChatProvider) and have it report usage via the snippet above.