fix(core): wait for replay propagation before suspending#1961
fix(core): wait for replay propagation before suspending#1961karthikscale3 wants to merge 14 commits intomainfrom
Conversation
Adds an e2e regression test that exercises 50 concurrent items each running two sequential steps (search → addResult). The pattern produces the timing skew that triggers the scheduleWhenIdle premature-suspension race seen in production runs (e.g. wrun_01KQ05J17ZJHGZFRYZ20QM1DBS, where 250 steps completed cleanly server-side but the workflow still failed with WorkflowRuntimeError "Unconsumed event in event log" due to scheduleWhenIdle firing WorkflowSuspension before the addResult callback could be registered). The race only manifests reliably when flow handlers run across separate function invocations, so this test should be evaluated against a real Vercel deployment (which CI does for the nextjs-turbopack matrix entry). Co-authored-by: Cursor <cursoragent@cursor.com>
🦋 Changeset detectedLatest commit: d48b6df The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
|
Replaces the simple 50-item × 2-step pattern with the production failure pattern: 80 concurrent items each running 5 nested waves of steps (parallel search reps → sequential addResult → sequential getProjectResults → parallel exa-source loop → sequential getToday + parallel fetchStatus). A few items per wave 1 are stragglers whose searchStep lags 10-15s behind the rest of the batch, mirroring the T97/T9T/T9V pattern from production run wrun_01KQ05J17ZJHGZFRYZ20QM1DBS. This timing skew is what triggers scheduleWhenIdle to fire WorkflowSuspension in the gap between fast hydrations completing (pendingDeliveries → 0) and the next useStep callback registering, leaving the next-wave step's step_created event unclaimed → WorkflowRuntimeError. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds focused unit tests for scheduleWhenIdle's fast and deferred paths, plus a local REPLAY_PROPAGATION_DELAY_MS alias and a doc note explaining why the deferred timer does not need cancellation when a follow-up useStep registers mid-wait. Co-authored-by: Cursor <cursoragent@cursor.com>
Move 96_many_steps.ts to the canonical workbench/example/workflows location and add the matching symlinks in every workbench whose workflows directory is a real directory (nextjs-turbopack, nextjs-webpack, nitro-v3, sveltekit). Adapters whose workflows directory is itself a symlink chain (express/fastify/hono/nitro/ nitro-v2/nuxt/tanstack-start/vite via nitro-v3, astro via sveltekit, nest via example) pick the file up automatically through the chain. Drop the `test.skipIf(APP_NAME !== 'nextjs-turbopack')` guard so every adapter in the e2e matrix exercises the regression workflow. Co-authored-by: Cursor <cursoragent@cursor.com>
Switches private.test.ts to vi.useFakeTimers + advanceTimersByTimeAsync so the state-machine assertions are deterministic and instant rather than wall-clock dependent. Uses a getter-based pendingDeliveries stub to avoid sinon-fake-timers' loopLimit when polling iterations would otherwise persist across many 0ms re-polls. Also adds a TODO next to REPLAY_PROPAGATION_DELAY_MS pointing at a future deterministic VM-resumption-in-flight counter so the 100ms heuristic isn't load-bearing forever. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the high-concurrency replay stress case on the representative Next.js Turbopack lane after CI showed the full adapter matrix timing out or failing under the added load. Co-authored-by: Cursor <cursoragent@cursor.com>
Limit the high-concurrency replay regression to the Vercel-backed Next.js Turbopack lane so local and Windows matrices do not time out under the stress workload. Co-authored-by: Cursor <cursoragent@cursor.com>
Reduce the regression workload to keep the Vercel Turbopack lane mergeable while preserving the same multi-wave replay shape. Co-authored-by: Cursor <cursoragent@cursor.com>
TooTallNate
left a comment
There was a problem hiding this comment.
Comment-only review
Looks good overall — fix logic is sound and the test setup is well-instrumented. A few small observations, none blocking.
What I verified
Diagnosis matches the fix. The race-window argument in the PR description is exactly right: between when host-side delivery decrements pendingDeliveries and when the workflow VM re-enters and registers the next-wave useStep callback, there's a synchronous window where pendingDeliveries === 0 but the workflow is not idle. The fix mirrors the existing pattern in EventsConsumer.deferredCheck (packages/core/src/events-consumer.ts:130-150) and reuses the same DEFERRED_CHECK_DELAY_MS = 100 budget — keeping the two propagation guards in sync via a single source of truth, with a TODO for replacing the wall-clock heuristic with a deterministic counter later. Sensible.
State-machine traversal: walked through each combination (fast path / observed-then-drained / drained-then-reappeared mid-wait) and the logic is correct. runWhenStillIdle does the queue → setTimeout(0) → queue dance to flush cross-VM microtasks before declaring idle, and fireWhenReady adds the propagation delay only when sawPendingDeliveries is set, preserving the fast suspension path for ordinary new-work scheduling.
Tests pass locally: 944 / 944 in @workflow/core, 3 / 3 in the new private.test.ts. The fake-timer harness is well-thought-out — the stubPendingDeliveries getter-with-sequence pattern sidesteps the loopLimit issue with infinite 0ms polling, and the DRAIN_MS = floor(DEFERRED_CHECK_DELAY_MS / 2) constant is the right approach to drive the microtask chain past internal hops without crossing the propagation boundary.
E2E regression is appropriately scoped: test.skipIf(APP_NAME !== 'nextjs-turbopack' || !WORKFLOW_VERCEL_ENV) keeps a known-flaky stress workflow off the local matrix while still exercising the production-shaped replay race on the representative adapter. The PR description's transparency about prior matrix-wide failures (timeouts on local/Windows lanes, Cannot read properties of undefined (reading 'map') on some adapters) is appreciated.
CI
107 / 107 of the test-matrix checks pass. The one red check, Benchmark Vercel (express), fails on consumeAndVerifyStreams from 97_bench.ts with Stream 5 correctness failure: expected 1048576 bytes, got 917504 — a stream truncation in express's bench workflow, unrelated to scheduleWhenIdle. Benchmark Vercel (nitro-v3) and Benchmark Vercel (nextjs-turbopack) pass on the same run, so this is an express-specific bench flake, not a regression from this PR.
Inline comments
Three small non-blocking observations — see inline.
Suggestion for landing
14 commits is a bit much for a "fix one bug + add one test" PR (most of the churn is iterating on e2e gating + scaling down the stress workflow). Worth squashing on merge so the eventual main history shows just fix(core): wait for replay propagation before suspending plus the test commit, rather than the dev-time exploration. GitHub's "Squash and merge" defaults to using the PR title, which already reads cleanly.
| } else { | ||
| fn(); | ||
| } | ||
| }, REPLAY_PROPAGATION_DELAY_MS); |
There was a problem hiding this comment.
Minor: this sawPendingDeliveries = true is unreachable. fireWhenReady early-returns when sawPendingDeliveries is false, so by the time this setTimeout callback fires, the flag is already true. Can drop the assignment for clarity.
| }, REPLAY_PROPAGATION_DELAY_MS); | |
| setTimeout(() => { | |
| if (ctx.pendingDeliveries > 0) { | |
| ctx.promiseQueue.then(() => { | |
| setTimeout(check, 0); | |
| }); | |
| } else { | |
| fn(); | |
| } | |
| }, REPLAY_PROPAGATION_DELAY_MS); |
Non-blocking — same observation applies regardless of how the assignment ordering shakes out.
| await vi.advanceTimersByTimeAsync(DEFERRED_CHECK_DELAY_MS * 2); | ||
| expect(fn).toHaveBeenCalledTimes(1); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Heads up — the PR description says these tests cover four cases:
(a) the fast path firing immediately when no deliveries are observed, (b) the deferred path firing after the propagation window when deliveries were observed, (c) re-looping when
pendingDeliveriesreappears mid-wait, and (d) continued polling while deliveries persist.
…but only (a), (b), and (c) are actually here. (d) — the multi-iteration polling case where pendingDeliveries stays > 0 across several check rounds before finally draining — isn't tested. Either:
- Add a test that drives
[2, 1, 1, 0, 0]or similar and assertsfndoesn't fire until the late0s - Or drop case (d) from the PR description
Non-blocking; just keeps the description accurate.
| } | ||
| } | ||
|
|
||
| export async function concurrentMultiWaveWorkflow() { |
There was a problem hiding this comment.
Tiny observation about the symlink set: this file gets symlinked into nextjs-turbopack, nextjs-webpack, nitro-v3, and sveltekit, but the concurrentMultiWaveWorkflow e2e test below in e2e.test.ts skips unless APP_NAME === 'nextjs-turbopack' && WORKFLOW_VERCEL_ENV is set. So in the gated CI matrix today, only the nextjs-turbopack symlink is reachable — the other three are dead weight.
Not really a problem (harmless, and keeps the door open if anyone wants to widen the gate or run it manually from those workbenches). Just flagging in case the symlink set was intended to match the test matrix.
Summary
Fixes a false-positive
WorkflowRuntimeError: Unconsumed event in event logthat can happen during high-concurrency replay when many parallel branches advance into follow-up sequential steps.Problem
pendingDeliveriescan briefly drop to0after a step result finishes hydrating, before the workflow VM has resumed across the VM boundary and registered callbacks for the next wave ofuseStepcalls.scheduleWhenIdletreated that transient0as truly idle and firedWorkflowSuspensionimmediately. In that case replay could abort before the next-wave step callback was registered, leaving an existingstep_createdevent unclaimed. The deferred unconsumed-event check then failed the run withWorkflowRuntimeErroreven though the event log and step execution were valid.Fix
Update
scheduleWhenIdleso idle suspension waits for replay delivery propagation before firing:promiseQueueto drain.setTimeout(0)so cross-VM promise continuations can run.promiseQueueagain.pendingDeliveriesand loop if new deliveries appeared.EventsConsumer'sDEFERRED_CHECK_DELAY_MS) before suspending, then re-checkpendingDeliveriesone more time.The non-zero deferred delay only applies after
scheduleWhenIdlehas seen replay deliveries in flight. That gives follow-upuseStepcallbacks time to register after hydrated results cross the VM boundary, while preserving the fast suspension path for ordinary new-work scheduling where no replay delivery was active.Latency note
The added wait is bounded to "idle cycles that observed deliveries", which in practice is at most one per replay round per scheduling site, not per step. Cold-start single-step workflows and ordinary new-work suspensions are unaffected. Worth watching CI/prod data, but it should not be a meaningful added cost relative to the I/O involved in writing
step_createdevents.Why no cancellation hook
Unlike
EventsConsumer's deferred unconsumed-event check, this propagation timer is intentionally not cancelled when a follow-upuseStep/hook/sleep registers during the wait. If a callback arrives mid-wait and consumes the pending*_createdevent, the suspension still fires after the delay, but it is harmless: the matching invocation already hashasCreatedEvent=true, so the suspension handler does not re-create the step and the run simply continues replay from the persisted log.Tests
packages/core/src/private.test.ts) for thescheduleWhenIdlestate machine: covers (a) the fast path firing immediately when no deliveries are observed, (b) the deferred path firing after the propagation window when deliveries were observed, (c) re-looping whenpendingDeliveriesreappears mid-wait, and (d) continued polling while deliveries persist.96_many_steps.ts::concurrentMultiWaveWorkflowwith concurrent branches and multiple sequential waves. The workflow lives inworkbench/example/workflows/(canonical) with symlinks across the rest of the matrix.nextjs-turbopacklane (APP_NAME=nextjs-turbopackwithWORKFLOW_VERCEL_ENVset), which is the representative adapter/environment for the production-shaped replay race. What we observed in CI when it ran more broadly: local dev/prod/postgres and Windows lanes timed out at the 600s test limit with the workflow still running, while some full-matrix Vercel adapter lanes failed inside the stress workflow withCannot read properties of undefined (reading 'map'). Even on Vercel Turbopack, the original 45-item x 3-rep workload could run for ~40 minutes, so the regression now uses a smaller 12-item x 2-rep workload with 2-3s stragglers to preserve the multi-wave replay shape without making required CI depend on an oversized stress run.pnpm --filter @workflow/core buildandpnpm --filter @workflow/core testpass locally.