Skip to content

Conversation

@SteveSandersonMS
Copy link
Contributor

@SteveSandersonMS SteveSandersonMS commented Jan 16, 2026

As of updating to latest CLI, the E2E tests failed because of response format changes.

1. Update snapshots for Anthropic extended thinking

The CLI now coalesces tool calls into single assistant messages for Anthropic extended thinking compatibility. This updates all affected snapshots to match the new format.

2. Skip writing snapshots on test failure

Prevents corrupted snapshots from being written when tests fail. Each language uses its native test framework hooks:

  • Node.js: vitest \onTestFailed()\ hook
  • Python: \pytest_runtest_makereport\ hook
  • Go: \ .Failed()\ check in cleanup
  • .NET: Checks \CI\ env var (xUnit lacks failure detection hooks)

@SteveSandersonMS SteveSandersonMS requested a review from a team as a code owner January 16, 2026 00:37
Copilot AI review requested due to automatic review settings January 16, 2026 00:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces two improvements to the E2E test infrastructure: preventing corrupted snapshots on test failure and updating snapshots to reflect CLI changes for Anthropic extended thinking compatibility.

Changes:

  • Implements test failure detection in each language's test framework to skip writing snapshots when tests fail
  • Updates snapshot files to reflect new CLI behavior that coalesces tool calls into single assistant messages for Anthropic extended thinking

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/harness/replayingCapiProxy.ts Adds support for skipWritingCache parameter to prevent snapshot writes on test failure
python/e2e/testharness/proxy.py Adds skip_writing_cache parameter to stop() method
python/e2e/testharness/context.py Passes test_failed flag to proxy teardown to conditionally skip snapshot writes
python/e2e/conftest.py Implements pytest hook to track test failures and skip snapshot writes
nodejs/test/e2e/harness/sdkTestContext.ts Uses Vitest's onTestFailed hook to track failures and adds COPILOT_CLI_PATH env var support
nodejs/test/e2e/harness/CapiProxy.ts Adds skipWritingCache parameter to stop() method
go/e2e/testharness/proxy.go Adds StopWithOptions method with skipWritingCache parameter
go/e2e/testharness/context.go Uses Go's t.Failed() to detect test failures and skip snapshot writes
dotnet/test/Harness/E2ETestContext.cs Checks CI environment variable to skip snapshot writes (xUnit limitation)
dotnet/test/Harness/CapiProxy.cs Adds skipWritingCache parameter to StopAsync method
test/snapshots/tools/invokes_built_in_tools.yaml Updated snapshot reflecting coalesced tool calls format
test/snapshots/session/*.yaml Updated snapshots with minor variations in assistant responses
test/snapshots/permissions/*.yaml Updated snapshots with coalesced tool calls and response variations
test/snapshots/mcp-and-agents/*.yaml Updated snapshots with response variations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SteveSandersonMS SteveSandersonMS force-pushed the stevesa/e2e-infra-snapshots branch 4 times, most recently from d82a2c3 to 40de300 Compare January 16, 2026 00:48
@SteveSandersonMS SteveSandersonMS force-pushed the stevesa/e2e-infra-snapshots branch 3 times, most recently from 2055997 to 456dc7d Compare January 16, 2026 01:03
@SteveSandersonMS SteveSandersonMS force-pushed the stevesa/e2e-infra-snapshots branch from 456dc7d to 4388a4e Compare January 16, 2026 01:06
@SteveSandersonMS SteveSandersonMS changed the title E2E test infrastructure improvements React to CLI update and Anthropic response format changes Jan 16, 2026
@SteveSandersonMS SteveSandersonMS added this pull request to the merge queue Jan 16, 2026
Merged via the queue into main with commit 1e23513 Jan 16, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants