Skip to content

Repeated test execution using --repeat CLI option and #[Repeat] attribute#6591

Draft
sebastianbergmann wants to merge 4 commits intomainfrom
issue-5718/repeat
Draft

Repeated test execution using --repeat CLI option and #[Repeat] attribute#6591
sebastianbergmann wants to merge 4 commits intomainfrom
issue-5718/repeat

Conversation

@sebastianbergmann
Copy link
Copy Markdown
Owner

@sebastianbergmann sebastianbergmann commented Apr 16, 2026

Use Cases

Stress-testing concurrent or stateful code

Code that manages connections, caches, file handles, or other resources may behave correctly once but leak or corrupt state over repeated invocations. --repeat provides a lightweight way to exercise these paths without writing dedicated stress tests.

Detecting flaky tests

Tests that pass in isolation but fail intermittently under repeated execution are a common source of CI instability. Causes include shared mutable state, timing-dependent logic, non-deterministic ordering, and resource leaks. Running each test multiple times in a single PHPUnit invocation surfaces these failures without requiring external scripting or CI-level retry loops.

Tolerating known-flaky tests with a failure threshold

Some tests interact with inherently unreliable systems (network services, hardware timing, random seeds). The #[Repeat] attribute's failureThreshold parameter lets a test tolerate a configured number of failures before being considered failed, avoiding false negatives while still catching systematic regressions.

Prior work

--repeat before PHPUnit 10

PHPUnit had a --repeat CLI option from early versions through PHPUnit 9. Its semantics were fundamentally different from the implementation proposed here: it re-ran the entire test suite N times rather than repeating each test individually.

The old implementation worked by adding the suite to itself multiple times in TestRunner:

foreach (range(1, $configuration->repeat()) as $step) {
    $_suite->addTest($suite);
}

This produced an execution order of A, B, A, B (interleaved) rather than A, A, B, B (grouped). There was no per-test failure isolation. If test A failed on the second run, all remaining tests in that suite iteration still executed.

The --repeat option was removed in 442b9ab as part of the work on PHPUnit 10. I always/only considered the whole-suite repetition model a benchmarking feature that did not fit the direction of PHPUnit 10's redesigned architecture and event system.

Community discussion

Shortly after PHPUnit 10's release, issue #5718 was opened requesting that --repeat be brought back. It received over 50 thumbs-up reactions, reflecting strong community demand for built-in repetition support. Commenters described use cases including flaky test detection, CI stability, and stress-testing stateful code.

At the Code Sprint in Munich in October 2024, a new consensus emerged: --repeat should return with per-test repetition semantics rather than the old whole-suite model. Each test would run up to N times, stopping at the first failure. This per-test granularity provides more useful failure isolation and matches the expectations of developers using repetition for flaky test detection.

PR #6397

PR #6397 by @nikophil was the first implementation attempt following the new semantics. It introduced the RepeatTestSuite concept: a dedicated wrapper class that holds N TestCase instances for a single test method and controls their execution. This design decision, separating RepeatTestSuite from TestSuite as a leaf node rather than a container, informed the architecture suggested in this pull request.

Inspiration from JUnit 5

During the discussion on PR #6397, @marcphilipp pointed to JUnit 5's @RepeatedTest annotation as a reference design.

JUnit 5 supports @RepeatedTest(value = 100, failureThreshold = 1), where each repetition is reported as a child of a container node in the test tree. The failureThreshold parameter allows tolerating a configured number of failures before considering the test failed.

This directly inspired the #[Repeat(int $times, int $failureThreshold)] attribute suggested in this pull request. It provides the same per-method granularity and failure tolerance semantics.

--repeat CLI option and #[Repeat] attribute

Mechanism Scope Granularity
--repeat <N> All eligible tests Global, from CLI
#[Repeat(times, failureThreshold)] Single test method Per-method, in source code

When --repeat is used then the semantics of the #[Repeat] attribute takes precedence over the general --repeat semantics.

Architecture

Test suite structure

Repeated tests are wrapped in a RepeatTestSuite, a dedicated class that implements Test, Reorderable, and SelfDescribing. It is intentionally not a subclass of TestSuite. It is a leaf node in the suite tree, not a container that can be recursively iterated.

A RepeatTestSuite holds N independent TestCase instances for the same test method. Each instance has its own repetition (1-based index) and totalRepetitions values set via TestCase::setRepetition().

TestSuite (class level)
  RepeatTestSuite [testOne, 3 repetitions]
    TestCase (testOne, repetition 1 of 3)
    TestCase (testOne, repetition 2 of 3)
    TestCase (testOne, repetition 3 of 3)
  RepeatTestSuite [testTwo, 3 repetitions]
    TestCase (testTwo, repetition 1 of 3)
    ...

Execution flow

RepeatTestSuite::run() iterates its internal tests sequentially. Each TestCase::run() goes through the normal TestRunner path, emitting the full lifecycle of events (TestPreparationStarted, TestPrepared, TestPassed/TestFailed, TestFinished).

When a test fails or errors, the failure count is incremented. Once the failure count reaches the configured threshold (default 1), all remaining repetitions are skipped via TestCase::markSkippedForRepeatAbort(), which emits a TestSkipped event with a message identifying which repetition caused the abort.

Not every test can be repeated

TestBuilder checks two conditions before wrapping a test in RepeatTestSuite:

  1. Void return type: Tests that return values are used by #[Depends] to pass data between tests. Repeating such a test would produce N potentially different return values, creating ambiguity. Only void-returning tests are repeated.

  2. No dependencies: Tests attributed with #[Depends] are not repeated. They run once, after all repetitions of their dependency have passed.

Tests that fail these checks run exactly once, regardless of --repeat or #[Repeat].

Interaction with data providers

When a test uses #[DataProvider] and is eligible for repetition, each data set gets its own RepeatTestSuite:

DataProviderTestSuite (testFoo)
  RepeatTestSuite [data set #0, 3 repetitions]
    TestCase (testFoo, data set #0, repetition 1 of 3)
    TestCase (testFoo, data set #0, repetition 2 of 3)
    TestCase (testFoo, data set #0, repetition 3 of 3)
  RepeatTestSuite [data set #1, 3 repetitions]
    TestCase (testFoo, data set #1, repetition 1 of 3)
    ...

A failure in one data set's repetitions does not affect other data sets. This provides per-data-set granularity: if data set 0 fails on repetition 2, its remaining repetitions are skipped, but data set 1 still runs all its repetitions independently.

Interaction with dependencies

When test B depends on test A (via #[Depends]):

  • Test A is wrapped in a RepeatTestSuite (if eligible) and runs all N repetitions first.
  • Only if all repetitions of A pass does test B start.
  • Test B itself is not repeated (it has a dependency).
  • If A fails any repetition, B is skipped due to the unsatisfied dependency.

Event System

The TestMethod value object carries repetition and totalRepetitions properties, populated from the TestCase by TestMethodBuilder::fromTestCase(). These properties affect two methods:

  • id() appends (repetition N of M) when totalRepetitions > 1. This ensures each repetition has a distinct identity in debug output, logging, and result collection.
  • name() appends the same suffix. This appears in failure messages, JUnit XML, and Open Test Reporting output.

Both default to 1, so non-repeated tests are completely unaffected.

The isRepeated() convenience method returns true when totalRepetitions > 1.

Failure threshold

The failureThreshold parameter (available only via #[Repeat], defaults to 1) controls how many failures are tolerated before aborting:

  • #[Repeat(10)]: Run up to 10 times, stop at first failure (threshold = 1)
  • #[Repeat(10, 3)]: Run up to 10 times, stop after 3 failures
  • If the test completes all 10 runs with fewer than 3 failures, it is considered passed

This is inspired by JUnit 5's @RepeatedTest annotation.

@sebastianbergmann sebastianbergmann added type/enhancement A new idea that should be implemented feature/test-runner CLI test runner labels Apr 16, 2026
@sebastianbergmann
Copy link
Copy Markdown
Owner Author

@nikophil Would be great to get your feedback on this.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.62%. Comparing base (2dbf59d) to head (b563551).
⚠️ Report is 19 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #6591      +/-   ##
============================================
+ Coverage     96.59%   96.62%   +0.03%     
- Complexity     8063     8127      +64     
============================================
  Files           838      841       +3     
  Lines         24898    25110     +212     
============================================
+ Hits          24050    24263     +213     
+ Misses          848      847       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jakubostalec
Copy link
Copy Markdown

Hi @sebastianbergmann
Could you add a feature like the one in the bshaffer/phpunit-retry-annotations package? Something like "retry until the test passes.".

Here's an example attribute class that briefly describes its purpose:

#[\Attribute(\Attribute::TARGET_METHOD)]
final class Retry
{
    /**
     * @param int                           $attempts        Max number of attempts (including the first run)
     * @param int                           $delaySeconds    Fixed delay in seconds between retries
     * @param int                           $delayMultiplier When > 0, delay = attempt * multiplier (linear back-off)
     * @param int                           $maxDelaySeconds Cap for multiplier-based delay (0 = no cap)
     * @param class-string<\Throwable>|null $onlyOnException Retry only when this exception type is thrown
     */
    public function __construct(
        public readonly int $attempts = 3,
        public readonly int $delaySeconds = 0,
        public readonly int $delayMultiplier = 0,
        public readonly int $maxDelaySeconds = 0,
        public readonly ?string $onlyOnException = null,
    ) {
    }
}

@sebastianbergmann
Copy link
Copy Markdown
Owner Author

Could you add a feature like the one in the bshaffer/phpunit-retry-annotations package? Something like "retry until the test passes.".

Thank you for your suggestion.

At this time, I am not able to consider suggestions like this. I think that --repeat and #[Repeat] should be implemented together at the same time, but this already is a lot. Maybe too much and we need to see how it goes.

If and when this work is merged, then and only then am I able to think about further additions.

I mean no disrespect, but right now such suggestions are a distraction for me

@stof
Copy link
Copy Markdown
Contributor

stof commented Apr 17, 2026

@sebastianbergmann I think using #[Repeat] on a non eligible test should trigger a PHPUnit warning to make the dev aware that their test that is intended to be always repeated is actually not repeated, as this attribute makes it clear that repetition of this test is intended.

@sebastianbergmann
Copy link
Copy Markdown
Owner Author

I think using #[Repeat] on a non eligible test should trigger a PHPUnit warning

Implemented now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature/test-runner CLI test runner type/enhancement A new idea that should be implemented

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants