Skip to content

fix(copilot): populate Context Window indicator for BYOK chat providers#314801

Open
ba-work wants to merge 4 commits intomicrosoft:mainfrom
ba-work:ba-work/byok-ext-usage-314722
Open

fix(copilot): populate Context Window indicator for BYOK chat providers#314801
ba-work wants to merge 4 commits intomicrosoft:mainfrom
ba-work:ba-work/byok-ext-usage-314722

Conversation

@ba-work
Copy link
Copy Markdown

@ba-work ba-work commented May 6, 2026

Fixes #314722.

Problem

The host-internal ExtensionContributedChatEndpoint.makeChatRequest2 hardcoded a zero-filled APIUsage literal in its success envelope regardless of what the extension-contributed LanguageModelChatProvider actually streamed back. As a result, the Context Window indicator stayed pinned at 0 / max for every BYOK provider routed through this endpoint (Anthropic, Gemini-native, and any custom registerLanguageModelChatProvider).

Fix

Add a new CustomDataPartMimeTypes.Usage = 'usage' branch on the existing LanguageModelDataPart switch in the response stream, plus a permissive parseExtensionContributedUsage(data) helper that decodes the UTF-8 JSON payload into the host's existing APIUsage shape. When the provider doesn't emit a Usage part the host keeps the historical zero-fallback behaviour, so this is purely additive — no behaviour change for existing extensions unless the provider opts in.

Last-write-wins on multiple Usage parts, matching the OpenAI streaming convention where the terminating chunk carries the final tally. The parser tolerates partial payloads (zero-fills missing fields) and malformed JSON (returns undefined, host falls back to zeros) so a misbehaving provider cannot break the host.

Provider opt-in

One extra progress.report on the response stream:

progress.report(new vscode.LanguageModelDataPart(
    new TextEncoder().encode(JSON.stringify({
        prompt_tokens: 12345,
        completion_tokens: 678,
        total_tokens: 13023,
        prompt_tokens_details: { cached_tokens: 9000 }
    })),
    'usage' // CustomDataPartMimeTypes.Usage
));

Defensive coercion

All numeric fields are clamped to non-negative finite numbers via Math.max(0, ...) on both the strict and permissive paths. isApiUsage only validates typeof === 'number', so without this clamp a provider could emit negatives that would silently corrupt the host's monotonic completion-token counter in ChatResponseModel.setUsage. The strict path also preserves any extra fields the provider sent (cache_creation_input_tokens, completion_tokens_details) for telemetry / future consumers.

Tests

14 new specs in extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts covering:

All 241 specs in extensions/copilot/src/platform/endpoint/ pass. Lint and tsc --noEmit are clean on the three changed files.

How to verify

  1. Install the build with the patched extChatEndpoint.ts.
  2. Use any extension-contributed BYOK provider (e.g. an Anthropic / Gemini-native LanguageModelChatProvider) and have it report usage via the snippet above.
  3. Open the Context Window widget in the chat side-panel — the bar should now reflect real prompt/completion tokens and (if the provider also reports it) the hatched output-buffer band.
image

Fixes microsoft#314722.

The host-internal `ExtensionContributedChatEndpoint.makeChatRequest2`
hardcoded a zero-filled `APIUsage` literal regardless of what the
extension-contributed `LanguageModelChatProvider` actually streamed
back, which left the Context Window indicator stuck at 0 / max for
every BYOK provider that goes through this endpoint (Anthropic,
Gemini-native, and any custom `registerLanguageModelChatProvider`).

Add a new `CustomDataPartMimeTypes.Usage = 'usage'` branch on the
existing `LanguageModelDataPart` switch in the response stream, plus a
permissive `parseExtensionContributedUsage(data)` helper that decodes
the UTF-8 JSON payload into the host's existing `APIUsage` shape. When
the provider doesn't emit a Usage part the host keeps the historical
zero-fallback behaviour, so this is purely additive for existing
extensions — no behaviour change unless the provider opts in.

Provider opt-in is one extra `progress.report` on the response stream:

    progress.report(new vscode.LanguageModelDataPart(
        new TextEncoder().encode(JSON.stringify({
            prompt_tokens: 12345,
            completion_tokens: 678,
            total_tokens: 13023,
            prompt_tokens_details: { cached_tokens: 9000 }
        })),
        'usage' // CustomDataPartMimeTypes.Usage
    ));

Last-write-wins on multiple Usage parts, matching the OpenAI streaming
convention where the terminating chunk carries the final tally. Parser
tolerates partial payloads (zero-fills missing fields) and malformed
JSON (returns `undefined`, host falls back to zeros) so a misbehaving
provider can't break the host.

Defensive coercion: all numeric fields are clamped to non-negative
finite numbers via `Math.max(0, ...)` on both the strict and permissive
paths. `isApiUsage` only validates `typeof === 'number'`, so without
this clamp a provider could emit negatives that would silently corrupt
the host's monotonic completion-token counter in
`ChatResponseModel.setUsage`. The strict path also preserves any extra
fields the provider sent (`cache_creation_input_tokens`,
`completion_tokens_details`) for telemetry/future consumers.

Tests: 14 new specs in
`extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts`
covering the parse helper (full / partial / malformed / non-object /
type-coerced / negative-clamp / strict-path-extras payloads) and the
end-to-end stream behaviour (zero-fallback regression test for
microsoft#314722, full usage propagation, cached_tokens path, malformed-JSON
fallback, partial payload, last-write-wins). All 241 endpoint specs
in `extensions/copilot/src/platform/endpoint/` pass; lint and tsc are
clean on the three changed files.
Copilot AI review requested due to automatic review settings May 6, 2026 18:31
@ba-work
Copy link
Copy Markdown
Author

ba-work commented May 6, 2026

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes missing Context Window token-usage reporting for extension-contributed (BYOK) chat providers by allowing them to stream an APIUsage payload through a new LanguageModelDataPart MIME type ('usage') and propagating it through ExtensionContributedChatEndpoint.makeChatRequest2.

Changes:

  • Add CustomDataPartMimeTypes.Usage = 'usage' to define the new stream payload type for usage.
  • Implement parseExtensionContributedUsage and consume 'usage' data parts in ExtensionContributedChatEndpoint to populate the success envelope’s usage (with zero fallback).
  • Add vitest coverage for parsing behavior and end-to-end usage propagation through the endpoint.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
extensions/copilot/src/platform/endpoint/common/endpointTypes.ts Adds the new 'usage' custom data part MIME type and documents its intent/format.
extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts Parses and captures streamed usage parts from extension-contributed providers and returns them in the fetch result.
extensions/copilot/src/platform/endpoint/vscode-node/test/extChatEndpoint.spec.ts Adds unit + end-to-end tests validating usage parsing and propagation (including fallback behavior).

Comment thread extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts Outdated
Comment thread extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts Outdated
…tighten last-valid-wins semantics

Three follow-ups from the Copilot review on microsoft#314801:

1. parseExtensionContributedUsage now validates and clamps every
   numeric field inside the optional 'prompt_tokens_details' /
   'completion_tokens_details' nested objects, not just the three
   top-level counters and 'cached_tokens'. Without this, a provider
   could leak negative or non-finite 'cache_creation_input_tokens',
   'reasoning_tokens', etc. through to OTel attributes / telemetry.
   Nested values that are not a plain object (string, array, null) are
   dropped rather than passed through, so downstream consumers only
   ever see well-formed shapes. Refactored the strict and permissive
   paths into a single unified path with two small builder helpers.

2. The 'coerces non-numeric fields to 0' test was passing
   total_tokens: NaN through JSON.stringify, which lossily becomes
   null on the wire and so didn't actually exercise non-finite
   handling. Replaced the NaN value with an array (which exercises
   the same coercion code path) and added a dedicated
   'coerces non-finite numeric fields (Infinity) to 0' test that
   builds the JSON by hand so '1e999' / '-1e999' parse to Infinity /
   -Infinity and prove the Number.isFinite gate fires.

3. The dispatch-site comment claimed 'last-write-wins' but the code
   only updates extensionUsage when parsing succeeds, so the actual
   contract is 'last-*valid*-wins'. Updated the comment to match
   reality and added a regression test that emits a valid usage part
   followed by a malformed one and asserts the earlier valid reading
   is preserved.

Also adds two more spec cases pinning the nested-clamp and
drop-on-malformed contracts. 18 specs in extChatEndpoint.spec.ts
(was 14); 245 specs in extensions/copilot/src/platform/endpoint/
(was 241). Lint and tsc clean.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts Outdated
Comment thread extensions/copilot/src/platform/endpoint/common/endpointTypes.ts Outdated
…rify Usage doc

1. parseExtensionContributedUsage now rejects top-level JSON arrays.
   Without this, a stray '[]' chunk would parse to a zero-filled
   APIUsage (since 'typeof [] === "object"' in JS) and could
   overwrite an earlier valid reading at the last-valid-wins
   dispatch site in makeChatRequest2.

2. CustomDataPartMimeTypes.Usage doc comment used to say the payload
   was 'at minimum prompt_tokens/completion_tokens/total_tokens',
   which contradicted parseExtensionContributedUsage's permissive
   partial-payload behaviour. Updated the comment to reflect that
   all fields are optional and missing/non-finite values are treated
   as 0, so extension authors aren't misled.

Also extended the existing non-object spec with two array-rejection
cases ('[]', '[1,2,3]') to pin the contract. 245/245 endpoint specs
pass; lint and tsc clean.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread extensions/copilot/src/platform/endpoint/common/endpointTypes.ts Outdated
Comment thread extensions/copilot/src/platform/endpoint/vscode-node/extChatEndpoint.ts Outdated
…harden ContextManagement shape

1. parseExtensionContributedUsage now rejects payloads that coerce to
   an all-zero, detail-less APIUsage. The previous round's key-presence
   gate let inputs like {prompt_tokens_details:'oops'},
   {completion_tokens_details:null}, {prompt_tokens_details:{}}, and
   {prompt_tokens:-3,completion_tokens:-5} pass through to a zero-filled
   result that would clobber an earlier valid reading at the
   last-valid-wins dispatch site in makeChatRequest2. The new gate
   checks the *coerced* result for any positive signal (a non-zero
   top-level counter or a non-zero nested-detail field) before
   returning truthy. Fully-shaped strict-path payloads still carry the
   historical prompt_tokens_details:{cached_tokens:0} placeholder, but
   that placeholder no longer counts as a signal on its own.

2. CustomDataPartMimeTypes.ContextManagement now mirrors the first-step
   shape rejection from parseExtensionContributedUsage: a parsed
   payload that is null, a primitive, or an array is treated the same
   as a JSON.parse throw — the data part is silently skipped instead
   of forwarded into streamRecorder.callback as a malformed
   ContextManagementResponse. Indentation of the new block was also
   corrected to match the sibling Usage / StatefulMarker branches.

3. JSDoc on parseExtensionContributedUsage and on
   CustomDataPartMimeTypes.Usage updated to describe the new
   no-signal-rejection contract instead of the previous (and never
   quite accurate) key-presence framing. The function's own contract
   now matches the mime-type contract.

Tests: 5 new specs (rejects no-signal-after-coercion, rejects empty /
keyless objects, accepts non-zero nested-detail-only, ContextManagement
malformed-JSON tolerance, ContextManagement non-object-shape rejection
via the same path) plus updates to two existing coercion tests so they
keep at least one positive signal and observe coercion as intended.
250/250 endpoint specs pass; lint and tsc clean on the three changed
files.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context Window indicator stays at 0% for third-party BYOK chat providers because ExtensionContributedChatEndpoint hardcodes usage to zero

4 participants