Skip to content

ref(jest-balance): Parallelize balancer across 8 shards#117059

Open
ryan953 wants to merge 6 commits into
masterfrom
ryan953/jest-parallel-balancer
Open

ref(jest-balance): Parallelize balancer across 8 shards#117059
ryan953 wants to merge 6 commits into
masterfrom
ryan953/jest-parallel-balancer

Conversation

@ryan953

@ryan953 ryan953 commented Jun 7, 2026

Copy link
Copy Markdown
Member

The single-runner balancer took ~35mins to run the entire Jest suite sequentially, but often fails with flakey tests. This splits the work across 8 workers using the same sharding mechanism as the frontend CI (jest-test-config.sh + CI_NODE_TOTAL/CI_NODE_INDEX env vars), so the jest.config.ts balancing logic is reused without modification.

So it runs faster now, and if there are failing tests we're now omitting them from the balance metrics. This means we'll always have up-to-date balance data for non-flakey tests. I think as a side-effect this means that flakes will sort to the bottom of the list, so people will findout later that a flake caused CI to fail, which isn't ideal, but that's something we can address separately in frontend.yml or somthing; if it becomes visible.

Architecture:

  • jest-config job: reuses jest-test-config.sh to list all test files and
    compute the shard matrix. For schedule/workflow_dispatch events the
    script always produces 8 runners with the full test list.

  • jest-balance job (8x matrix): each shard downloads jest-test-files.json
    (so jest.config.ts can split tests) and the previous run's
    jest-balance.json (so the split is duration-aware rather than naive
    alphabetical). The resultsProcessor writes per-shard timing data.

  • combine-balance job: merges the 8 per-shard JSON files into a single
    jest-balance.json artifact that downstream consumers (frontend.yml)
    fetch for their own balanced sharding.

The combine job includes setup-node-pnpm because combine.ts uses Node's
native TypeScript stripping (requires Node 24 from .node-version).

Also renames index.js → resultsProcessor.ts (TS file now! failing tests no longer throw, but we skip them in the output) and adds combine.ts for the merge step.

@github-actions github-actions Bot added Scope: Frontend Automatically applied to PRs that change frontend components Scope: Backend Automatically applied to PRs that change backend components labels Jun 7, 2026
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

The single-runner balancer took ~60 minutes to run the entire Jest suite
sequentially. This splits the work across 8 workers using the same
sharding mechanism as the frontend CI (jest-test-config.sh + CI_NODE_TOTAL/
CI_NODE_INDEX env vars), so the jest.config.ts balancing logic is reused
without modification.

Architecture:

- jest-config job: reuses jest-test-config.sh to list all test files and
  compute the shard matrix. For schedule/workflow_dispatch events the
  script always produces 8 runners with the full test list.

- jest-balance job (8x matrix): each shard downloads jest-test-files.json
  (so jest.config.ts can split tests) and the previous run's
  jest-balance.json (so the split is duration-aware rather than naive
  alphabetical). The resultsProcessor writes per-shard timing data.

- combine-balance job: merges the 8 per-shard JSON files into a single
  jest-balance.json artifact that downstream consumers (frontend.yml)
  fetch for their own balanced sharding.

The combine job includes setup-node-pnpm because combine.ts uses Node's
native TypeScript stripping (requires Node 24 from .node-version).

Also renames index.js → resultsProcessor.ts (typed, skips failing tests
instead of throwing) and adds combine.ts for the merge step.
@ryan953 ryan953 force-pushed the ryan953/jest-parallel-balancer branch from ad1d004 to 8d9e370 Compare June 7, 2026 19:34
@ryan953 ryan953 marked this pull request as ready for review June 7, 2026 19:35
@ryan953 ryan953 requested review from a team as code owners June 7, 2026 19:35
Comment thread tests/js/test-balancer/combine.ts
search_artifacts: true
if_no_artifact_found: warn

- name: jest balancer (shard ${{ matrix.index }}/${{ matrix.total }})

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cursoragent If this step fails i want to keep going and not fail the whole run. Do we need continue-on-error: true somewhere?

@cursor cursor Bot Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've added continue-on-error: true to the jest balancer step at line 66.

This change means:

  • Flaky test failures won't block the workflow - each shard will continue and upload its artifact even if some tests fail
  • The combine job will always run - since individual jest-balance jobs won't be marked as failed
  • You get timing data from all successful tests - aligning with your goal to "omit failing tests from balance metrics"

The change has been committed and pushed to your branch.

Open in Web Open in Cursor 

cursoragent and others added 2 commits June 9, 2026 22:08
Ensures artifacts are uploaded even when tests fail, allowing the
combine job to merge partial results from successful shards.

Co-authored-by: Ryan Albrecht <ryan@ryanalbrecht.ca>
Co-authored-by: Ryan Albrecht <ryan@ryanalbrecht.ca>
Correct @types/node references from 22.19.19 (non-existent) to 22.19.15
in @jest/console and jest-util snapshots.

Co-authored-by: Ryan Albrecht <ryan@ryanalbrecht.ca>

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 630d6c2. Configure here.

- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # 4.6.2
with:
name: jest-balance.json
name: jest-balance-${{ matrix.index }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale balance shard artifact uploaded

Medium Severity

Each shard downloads the prior full jest-balance.json into the same path resultsProcessor.ts writes, but the upload step always publishes that file even when Jest exits before the processor runs. A shard can then upload the entire previous balance as its shard artifact, and combine.ts merges misleading timings into the published artifact.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 630d6c2. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components Scope: Frontend Automatically applied to PRs that change frontend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants