ref(jest-balance): Parallelize balancer across 8 shards#117059
Conversation
|
🚨 Warning: This pull request contains Frontend and Backend changes! It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently. Have questions? Please ask in the |
The single-runner balancer took ~60 minutes to run the entire Jest suite sequentially. This splits the work across 8 workers using the same sharding mechanism as the frontend CI (jest-test-config.sh + CI_NODE_TOTAL/ CI_NODE_INDEX env vars), so the jest.config.ts balancing logic is reused without modification. Architecture: - jest-config job: reuses jest-test-config.sh to list all test files and compute the shard matrix. For schedule/workflow_dispatch events the script always produces 8 runners with the full test list. - jest-balance job (8x matrix): each shard downloads jest-test-files.json (so jest.config.ts can split tests) and the previous run's jest-balance.json (so the split is duration-aware rather than naive alphabetical). The resultsProcessor writes per-shard timing data. - combine-balance job: merges the 8 per-shard JSON files into a single jest-balance.json artifact that downstream consumers (frontend.yml) fetch for their own balanced sharding. The combine job includes setup-node-pnpm because combine.ts uses Node's native TypeScript stripping (requires Node 24 from .node-version). Also renames index.js → resultsProcessor.ts (typed, skips failing tests instead of throwing) and adds combine.ts for the merge step.
ad1d004 to
8d9e370
Compare
| search_artifacts: true | ||
| if_no_artifact_found: warn | ||
|
|
||
| - name: jest balancer (shard ${{ matrix.index }}/${{ matrix.total }}) |
There was a problem hiding this comment.
@cursoragent If this step fails i want to keep going and not fail the whole run. Do we need continue-on-error: true somewhere?
There was a problem hiding this comment.
Done! I've added continue-on-error: true to the jest balancer step at line 66.
This change means:
- Flaky test failures won't block the workflow - each shard will continue and upload its artifact even if some tests fail
- The combine job will always run - since individual jest-balance jobs won't be marked as failed
- You get timing data from all successful tests - aligning with your goal to "omit failing tests from balance metrics"
The change has been committed and pushed to your branch.
Ensures artifacts are uploaded even when tests fail, allowing the combine job to merge partial results from successful shards. Co-authored-by: Ryan Albrecht <ryan@ryanalbrecht.ca>
Co-authored-by: Ryan Albrecht <ryan@ryanalbrecht.ca>
Correct @types/node references from 22.19.19 (non-existent) to 22.19.15 in @jest/console and jest-util snapshots. Co-authored-by: Ryan Albrecht <ryan@ryanalbrecht.ca>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 630d6c2. Configure here.
| - uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # 4.6.2 | ||
| with: | ||
| name: jest-balance.json | ||
| name: jest-balance-${{ matrix.index }} |
There was a problem hiding this comment.
Stale balance shard artifact uploaded
Medium Severity
Each shard downloads the prior full jest-balance.json into the same path resultsProcessor.ts writes, but the upload step always publishes that file even when Jest exits before the processor runs. A shard can then upload the entire previous balance as its shard artifact, and combine.ts merges misleading timings into the published artifact.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 630d6c2. Configure here.




The single-runner balancer took ~35mins to run the entire Jest suite sequentially, but often fails with flakey tests. This splits the work across 8 workers using the same sharding mechanism as the frontend CI (jest-test-config.sh + CI_NODE_TOTAL/CI_NODE_INDEX env vars), so the jest.config.ts balancing logic is reused without modification.
So it runs faster now, and if there are failing tests we're now omitting them from the balance metrics. This means we'll always have up-to-date balance data for non-flakey tests. I think as a side-effect this means that flakes will sort to the bottom of the list, so people will findout later that a flake caused CI to fail, which isn't ideal, but that's something we can address separately in frontend.yml or somthing; if it becomes visible.
Architecture:
jest-config job: reuses jest-test-config.sh to list all test files and
compute the shard matrix. For schedule/workflow_dispatch events the
script always produces 8 runners with the full test list.
jest-balance job (8x matrix): each shard downloads jest-test-files.json
(so jest.config.ts can split tests) and the previous run's
jest-balance.json (so the split is duration-aware rather than naive
alphabetical). The resultsProcessor writes per-shard timing data.
combine-balance job: merges the 8 per-shard JSON files into a single
jest-balance.json artifact that downstream consumers (frontend.yml)
fetch for their own balanced sharding.
The combine job includes setup-node-pnpm because combine.ts uses Node's
native TypeScript stripping (requires Node 24 from .node-version).
Also renames index.js → resultsProcessor.ts (TS file now! failing tests no longer throw, but we skip them in the output) and adds combine.ts for the merge step.