Skip to content

feat(rewrite): address assertion rewriting blind spots with systematic tests#14448

Open
RonnyPfannschmidt wants to merge 13 commits intopytest-dev:mainfrom
RonnyPfannschmidt:ronny/assert-rewrite-blind-spots
Open

feat(rewrite): address assertion rewriting blind spots with systematic tests#14448
RonnyPfannschmidt wants to merge 13 commits intopytest-dev:mainfrom
RonnyPfannschmidt:ronny/assert-rewrite-blind-spots

Conversation

@RonnyPfannschmidt
Copy link
Copy Markdown
Member

Summary

Note: This is an AI-assisted experiment (Cursor + Claude Opus 4) that executed a plan after assessing initial details about assertion rewriting blind spots. More work is needed to make this clean and production-ready — this PR has known weaknesses and is meant as a starting point for discussion, not a final implementation.

This PR builds on top of the walrus operator double-evaluation fix (#14445) and addresses additional blind spots in the assertion rewriter:

  • Add a systematic test infrastructure for assertion rewriting coverage (testing/test_assertrewrite_coverage.py)
  • Implement visit_Subscript for container[key] introspection (shows container and key separately in failure messages)
  • Implement visit_IfExp for ternary expression introspection (shows condition value while preserving short-circuit semantics)
  • Flatten method call display to show where result = obj.method(args) instead of nesting the bound-method intermediate

Before/After

Subscript (new)

# Before: assert 1 == 99
# After:  assert 1 == 99
#          +  where 1 = {'a': 1, 'b': 2}['a']

IfExp (new)

# Before: assert 0 == 99
# After:  assert 0 == 99
#          +  where 0 = (... if True else ...)

Method calls (improved)

# Before: assert 42 == 100
#          +  where 42 = compute()
#          +    where compute = Obj().compute
# After:  assert 42 == 100
#          +  where 42 = Obj().compute()

Known weaknesses / areas for improvement

  • The visit_IfExp only introspects the condition, not the branch values (to preserve short-circuit) — a more sophisticated implementation could emit an if/else block to get branch values too
  • Container literal introspection (list, dict, set literals) is still unaddressed (xfail)
  • Callable variables still show <function ...> repr instead of variable name (xfail)
  • The method call flattening may interact unexpectedly with deeply chained attribute access patterns not yet covered
  • No changelog fragment yet — this needs proper evaluation before merging
  • Includes the Walrus expression duplicate evaluation failures with rewrite #14445 walrus fix commits (those should be merged first or this PR rebased after)

Test plan

  • 75 tests pass in testing/test_assertrewrite_coverage.py (61 pass + 14 edge cases)
  • 3 remaining xfail for lower-priority blind spots (container literals, callable variable repr)
  • All 132 existing testing/test_assertrewrite.py tests pass (no regressions)
  • All 162 testing/test_assertion.py tests pass (no regressions)
  • Single-evaluation guarantees verified for all new visitors
  • Short-circuit semantics preserved for IfExp
  • Pre-commit hooks pass (ruff, mypy, codespell)

Made with Cursor

RonnyPfannschmidt and others added 12 commits May 8, 2026 10:37
Fixes pytest-dev#14445 - assertion rewriting evaluated NamedExpr (:=) expressions
multiple times, causing side effects to fire repeatedly.

The root cause was the `variables_overwrite` mechanism which stored and
re-evaluated NamedExpr AST nodes in subsequent assertions, in
`_call_reprcompare`'s results tuple, and in explanation formatting.

The fix:
- visit_NamedExpr: reference the target variable in explanations instead
  of re-evaluating the full expression
- visit_Compare: assign left-side NamedExpr to a temp before right-side
  hoisting; freeze left_res when a comparator walrus targets the same
  name; replace NamedExpr entries in `results` with target variables
- visit_BoolOp: capture short-circuit condition in a stable temp for the
  explanation path; remove walrus target rename logic
- visit_Call: remove variables_overwrite substitution (walrus now properly
  assigns to user variables in its natural evaluation position)
- Remove variables_overwrite, scope tracking, Sentinel class

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Sonnet 4 <claude@anthropic.com>
Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Sonnet 4 <claude@anthropic.com>
Add tests for two remaining walrus double-evaluation scenarios:
- Bare NamedExpr as BoolOp operand evaluated twice via condition check
- Same walrus target in chained comparison evaluated multiple times

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Sonnet 4 <claude@anthropic.com>
Use the already-assigned res_var to build the short-circuit condition
instead of the raw visitor result, preventing bare NamedExpr operands
from being evaluated a second time when checking truthiness.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Sonnet 4 <claude@anthropic.com>
In a chained comparison like `(x := f()) < (x := g()) < (x := h())`,
each NamedExpr comparator is now assigned to a temp variable so it
evaluates exactly once. Previously the raw NamedExpr node would be
reused as left_res in the next iteration, causing double evaluation.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Sonnet 4 <claude@anthropic.com>
Create testing/test_assertrewrite_coverage.py with reusable helpers for
verifying assertion rewriting behavior across all expression types:

- get_failure_message: compile rewritten source and extract failure text
- assert_introspects: verify failure messages contain expected intermediates
- assert_single_evaluation: verify no double-evaluation of side effects
- assert_passes_when_true: verify rewritten asserts don't false-positive
- assert_semantically_equivalent: verify rewrite preserves pass/fail semantics

Includes smoke tests validating the helpers themselves.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Add parametrized test classes that document current assertion rewriting
behavior for each AST expression type:

- Compare, BoolOp, UnaryOp, BinOp: verified working with correct output
- Call: works but shows <function ...> repr for local/variable callables
- Attribute, Name, Walrus: verified working
- Subscript, IfExp, ContainerLiteral: marked xfail (blind spots)
- MethodCall: shows noisy bound-method intermediate (xfail)
- Comprehension, FString: semantics preserved, result shown in compare

The xfail tests document expected improvements and will be unmarked as
each blind spot is addressed in subsequent commits.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Add TestSingleEvaluation class that systematically verifies no expression
is evaluated multiple times during assertion rewriting. Covers:

- Calls in compare, boolean, unary, binop contexts
- Attribute/property access
- Subscript (dict __getitem__)
- Walrus operator in compare, boolean, and chained compare
- Method calls
- Nested calls (inner + outer counted separately)
- Multiple comparators in chained comparisons
- IfExp conditions
- Comprehension generators

All tests pass, confirming the pytest-dev#14445 fix holds across all expression types.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Implement a dedicated visitor for ast.Subscript in AssertionRewriter
that decomposes container[key] expressions into separate container and
key introspection. This produces failure messages like:

    assert 1 == 99
     +  where 1 = {'a': 1, 'b': 2}['a']

Previously, subscript expressions hit generic_visit and only showed
the final value without decomposition into container and key.

Slices (a[1:3]) still fall back to generic_visit since decomposing
start/stop/step is rarely useful in assertion messages.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Implement a dedicated visitor for ast.IfExp that introspects the
condition value while preserving short-circuit semantics. Produces
failure messages like:

    assert 0 == 99
     +  where 0 = (... if True else ...)

The condition is rewritten for introspection (showing its evaluated
value), but branches are kept as-is to preserve Python's short-circuit
behavior — only the selected branch is evaluated.

Previously, IfExp hit generic_visit showing only the final result
without any insight into which branch was taken or why.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Refactor visit_Call to detect obj.method() patterns (where call.func is
an ast.Attribute) and produce a flat explanation format:

    assert 42 == 100
     +  where 42 = Obj().compute()

Instead of the previous nested format:

    assert 42 == 100
     +  where 42 = compute()
     +    where compute = Obj().compute

The bound-method intermediate line was noisy and unhelpful — users want
to see what object the method was called on and what it returned, not
the method object itself.

Regular function calls (non-attribute) are unchanged.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Add TestEdgeCases class combining the new visitors (Subscript, IfExp,
method call) with existing ones to verify correct behavior in complex
scenarios:

- Subscript with variable keys, call keys, and nested subscripts
- Method calls with arguments, chained calls, and global objects
- IfExp with call conditions
- Walrus operator in subscript keys
- Single-evaluation guarantees for all new visitors
- Custom assert messages still work with new decomposition
- Complex assertions combining multiple visitor types

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
Copilot AI review requested due to automatic review settings May 8, 2026 16:15
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label May 8, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends pytest’s assertion rewriting to reduce “blind spots” and regressions (notably around walrus operator single-evaluation), while introducing a new systematic test suite to track introspection/semantics coverage across expression types.

Changes:

  • Add targeted regression tests for #14445 (walrus operator double-evaluation) in testing/test_assertrewrite.py.
  • Introduce a new systematic assertion-rewriting coverage test module (testing/test_assertrewrite_coverage.py) with helpers for introspection, semantic equivalence, and single-evaluation checks.
  • Update the assertion rewriter to improve introspection for subscripts (container[key]), if-expressions (a if cond else b), and to flatten method-call explanations.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
testing/test_assertrewrite.py Updates expectations for walrus/boolop output and adds #14445 regression tests.
testing/test_assertrewrite_coverage.py New structured coverage suite for assertion rewriting behaviors (introspection/semantics/single-eval).
src/_pytest/assertion/rewrite.py Implements new visitors (Subscript/IfExp), flattens method call formatting, and refines walrus-related evaluation behavior.
changelog/14445.bugfix.rst Adds changelog entry documenting the walrus double-evaluation fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread testing/test_assertrewrite_coverage.py Outdated
Comment on lines +188 to +207
# Run without rewriting
plain_code = compile(src, "<test-plain>", "exec")
plain_ns: dict[str, object] = {}
if extra_ns is not None:
plain_ns.update(extra_ns)
exec(plain_code, plain_ns)
plain_func = cast(Callable[[], None], plain_ns["check"])
plain_raised = False
try:
plain_func()
except AssertionError:
plain_raised = True

# Run with rewriting
mod = _rewrite_source(src)
rewritten_code = compile(mod, "<test-rewritten>", "exec")
rewritten_ns: dict[str, object] = {}
if extra_ns is not None:
rewritten_ns.update(extra_ns)
exec(rewritten_code, rewritten_ns)
Update raises_group.py::test_assert_matches to expect the new flat
method call format (where False = RaisesExc(TypeError).matches(...))
instead of the old nested format with a separate bound-method line.

Also address Copilot review: use copy.deepcopy in
assert_semantically_equivalent to isolate mutable state between the
plain and rewritten execution runs.

Co-authored-by: Cursor AI <ai@cursor.sh>
Co-authored-by: Anthropic Claude Opus 4 <claude@anthropic.com>
@RonnyPfannschmidt RonnyPfannschmidt mentioned this pull request May 9, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided (automation) changelog entry is part of PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants