feat(webapp): mollifier API GET read-fallback — synthetic primitives + route wiring by d-cs · Pull Request #3755 · triggerdotdev/trigger.dev

d-cs · 2026-05-26T10:48:37Z

Summary

Synthesise QUEUED/FAILED responses from the mollifier buffer when a TaskRun row hasn't landed in Postgres yet. Wires the synthesis into:

ApiRetrieveRunPresenter
v1 trace GET route
v1 spans GET route
attempts route gains a GET loader (fixes pre-existing Remix "no loader" 400)

The readFallback infra itself lives on the trigger PR (consumed by IdempotencyKeyConcern); this PR adds the route-level synthetic-rendering primitives.

Stacked on the replay PR.

Test plan

`pnpm run typecheck --filter webapp` passes
`pnpm run test --filter webapp test/mollifierSyntheticRedirectInfo.test.ts` passes
`pnpm run test --filter webapp test/mollifierSyntheticSpanRun.test.ts` passes

changeset-bot · 2026-05-26T10:48:42Z

🦋 Changeset detected

Latest commit: 645b231

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 32 packages

Name	Type
@trigger.dev/redis-worker	Minor
@internal/run-engine	Patch
@internal/schedule-engine	Patch
@trigger.dev/build	Minor
@trigger.dev/core	Minor
@trigger.dev/plugins	Minor
@trigger.dev/python	Minor
@trigger.dev/react-hooks	Minor
@trigger.dev/rsc	Minor
@trigger.dev/schema-to-json	Minor
@trigger.dev/sdk	Minor
@trigger.dev/database	Minor
@trigger.dev/otlp-importer	Minor
@trigger.dev/rbac	Minor
trigger.dev	Minor
references-ai-chat	Patch
d3-chat	Patch
references-d3-openai-agents	Patch
@internal/cache	Patch
@internal/clickhouse	Patch
@internal/llm-model-catalog	Patch
@internal/redis	Patch
@internal/replication	Patch
@internal/testcontainers	Patch
@internal/tracing	Patch
@internal/tsql	Patch
@internal/zod-worker	Patch
references-nextjs-realtime	Patch
references-realtime-hooks-test	Patch
references-realtime-streams	Patch
@internal/sdk-compat-tests	Patch
references-telemetry	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-05-26T10:48:44Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bd18af83-0c36-4ab2-aa68-982e821be836

📥 Commits

Reviewing files that changed from the base of the PR and between 4745754 and 645b231.

📒 Files selected for processing (36)

.changeset/mollifier-drainer-terminal-failure-callback.md
.server-changes/mollifier-drainer-replay.md
.server-changes/mollifier-reads.md
apps/webapp/app/entry.server.tsx
apps/webapp/app/env.server.ts
apps/webapp/app/presenters/v3/ApiRetrieveRunPresenter.server.ts
apps/webapp/app/routes/api.v1.runs.$runId.events.ts
apps/webapp/app/routes/api.v1.runs.$runId.spans.$spanId.ts
apps/webapp/app/routes/api.v1.runs.$runId.trace.ts
apps/webapp/app/runEngine/services/triggerFailedTask.server.ts
apps/webapp/app/v3/mollifier/mollifierDrainer.server.ts
apps/webapp/app/v3/mollifier/mollifierDrainerHandler.server.ts
apps/webapp/app/v3/mollifier/mollifierStaleSweep.server.ts
apps/webapp/app/v3/mollifier/mollifierStaleSweepState.server.ts
apps/webapp/app/v3/mollifier/mollifierTelemetry.server.ts
apps/webapp/app/v3/mollifier/syntheticApiResponses.server.ts
apps/webapp/app/v3/mollifier/syntheticRedirectInfo.server.ts
apps/webapp/app/v3/mollifier/syntheticSpanRun.server.ts
apps/webapp/app/v3/mollifier/syntheticTrace.server.ts
apps/webapp/app/v3/mollifierStaleSweepWorker.server.ts
apps/webapp/test/mollifierDrainerHandler.test.ts
apps/webapp/test/mollifierDrainerWorker.test.ts
apps/webapp/test/mollifierReadFallback.test.ts
apps/webapp/test/mollifierStaleSweep.test.ts
apps/webapp/test/mollifierSynthesiseFoundRun.test.ts
apps/webapp/test/mollifierSyntheticApiResponses.test.ts
apps/webapp/test/mollifierSyntheticRedirectInfo.test.ts
apps/webapp/test/mollifierSyntheticSpanRun.test.ts
apps/webapp/test/mollifierSyntheticTrace.test.ts
internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/tests/createCancelledRun.test.ts
internal-packages/run-engine/src/engine/tests/createFailedTaskRun.test.ts
packages/redis-worker/src/mollifier/buffer.ts
packages/redis-worker/src/mollifier/drainer.test.ts
packages/redis-worker/src/mollifier/drainer.ts
packages/redis-worker/src/mollifier/index.ts

 _____________________________
< I wish to make a complaint. >
 -----------------------------
  \
   \   (\__/)
       (•ㅅ•)
       / 　 づ

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch mollifier-phase-3-reads

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Five Devin findings on PR #3755: - ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning buffered.friendlyId to the FoundRun.id field. The id column on PG is the internal cuid; downstream log correlation reads taskRun.id and was getting the friendly token instead. Fixed to read the cuid that readFallback already derives via RunId.fromFriendlyId. - api.v1.runs.$runId.trace.ts buffered branch hardcoded isPartial: true. Cancelled is a terminal state — the sibling spans route and syntheticTrace already gate this on !isCancelled. Match. - synthesisePayload helper replaces the silent typeof === "string" coercion. Object-shaped payloads now JSON- stringify (matching how the trigger path would serialise them) with a warn log so format drift is visible. Truly unserialisable inputs fall back to "" with an error log instead of silently dropping. - syntheticRedirectInfo now uses deserialiseMollifierSnapshot (the webapp-side wrapper) instead of deserialiseSnapshot from the redis-worker package directly. Both share the same implementation today, but pinning the wrapper means the two read-side modules can't drift if the snapshot encoding ever changes (e.g. msgpack). - attempts route loader verifies the run belongs to the authenticated environment (PG-first, buffer fallback) before returning the parity-empty list. Other run-scoped endpoints (spans, trace, retrieve) 404 cross-env; matching that closes the exists-vs-doesn't-exist side channel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tage in mollifier drainer The mollifier drainer's cancel bifurcation called engine.createCancelledRun without handling its documented conflict contract: when the normal trigger replay path races ahead and materialises a live (non-CANCELED) row, the engine throws a conflict so the caller can "decide between engine.cancelRun() and skipping". The handler did neither — the conflict propagated, isRetryablePgError returned false, and the drainer buffer.fail()'d the entry, silently losing the cancellation while the run kept executing. Now route conflicts to engine.cancelRun() so the cancel actually wins. Separately, when engine.trigger fails non-retryably and the SYSTEM_FAILURE fallback write then fails because PG is transiently unreachable, rethrowing the original non-retryable error made the drainer buffer.fail() the entry — losing the run with no PG row ever landing, and dropping the write error entirely. Rethrow the retryable write error instead so the drainer requeues; the failure row lands once PG recovers. Non-retryable write failures still rethrow the original error as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove plan-tracking shorthand (Q# bifurcation, Phase C1/Q4) from replay-layer mollifier comments and test names; reword to plain English. Comment/test-name only; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…labels The mollifier queue moved from ZSET to LIST in an earlier refactor, but two comments still described it as a ZSET (`mollifierStaleSweep.server.ts:9` and `env.server.ts:1104` — both narrating the periodic stale sweep). Update to LIST. Also clean four residual internal plan-doc labels left over from prior cleanup passes: - `createCancelledRun` docstring (`engine/index.ts`) referenced "Q4 mollifier- cancel design" and "F4 bypass" — both dead nomenclature now that the gate's C1/C3/F4 labels have been rewritten. Restate the waitpoint-skip rationale in plain English: the mollifier gate refuses to buffer triggerAndWait children, so a cancelled buffered run never has a waiting parent to unblock. - `createCancelledRun.test.ts` empty-tags regression dropped "Found while running the Phase F challenge suite." — the comment describes the bug itself, which is self-contained. - `mollifierStaleSweep.test.ts` "scans across multiple orgs" rephrased away from "Phase-3 design has org-level fairness"; the prose now states the invariant directly. Comment/docstring-only; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous sweep was unbounded along two dimensions: every tick walked every org and every env (via buffer.listOrgs + listEnvsForOrg). At the sweep's default per-env entry cap of 1000, an incident-scale fan-out gave O(orgs * envs * 1000) Redis round-trips per tick — running far longer than the 5-minute interval and triggering the inFlight guard to drop every subsequent tick until the slow pass finished. Shard the work via a durable cursor: - New file `mollifierStaleSweepState.server.ts` owns three Redis keys (`mollifier:stale_sweep:{cursor,org_list,counts}`), all under the mollifier namespace but separated from the buffer's own state. The state class has its own Redis client; the buffer's existing `MollifierBuffer` API surface is untouched. - On `cursor === 0` the org list is rebuilt by snapshotting `buffer.listOrgs()` into the frozen LIST — the cycle's frozen view. - Each tick consumes up to `maxOrgsPerPass` orgs (default 100), processes them, and advances the cursor. When the cursor reaches the end of the LIST it wraps to 0; the next tick rebuilds and starts the next cycle. - The per-env counts HASH is the source of truth for the gauge snapshot. Visiting an env with zero stale entries HDEL's its hash field — gauge clears immediately on revisit. Envs not revisited this tick keep their last-known value (durability across ticks AND across webapp restarts), accepting a worst-case lag of one full cursor cycle before a no-longer-stale env clears. Snapshot contract change: only envs with non-zero stale counts appear in the reported `Map`. The telemetry layer (`mollifierTelemetry.server.ts` `reportStaleEntrySnapshot`) sums values, so absence is equivalent to zero for the gauge — the alert behaviour is unchanged. Tests: - New: "shards work across ticks: cursor advances by maxOrgsPerPass and wraps after a full cycle" — drives a 5-org fixture with cap=2, asserts the cursor's three-tick progression and wrap. - New: "clears an env from the durable snapshot on revisit when it has entries but none currently stale" — same entry flips stale→not-stale by varying the sweep's `now`, asserts HDEL on revisit. - Existing tests updated to inject `state`; one assertion shape rewritten ("snapshot omits envs that have entries but none stale") to match the new HDEL semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five gaps in the sharded sweep's test coverage, added 9 new tests covering each: - State durability across process restart: state1 populates the cursor + counts hash, closes its Redis client (simulated webapp restart), state2 constructs against the same Redis and picks up exactly where state1 left off. This is the headline benefit of storing sweep state in Redis instead of process memory; without this test it could silently regress. - Cycle wrap rebuilds the org list: a third org joins between cycles and is visible only in the next cycle's snapshot. Pins the rebuildOrgList-on-cursor=0 contract. - Empty buffer (no orgs) advances cleanly with zero work, empty snapshot, cursor stays at 0 instead of tripping the wrap math. - Buffer-null branch's clearAll: previously asserted only "snapshot is empty"; now also asserts the durable state was actually wiped (cursor=0, counts hash empty) so a re-enable doesn't resume on a stale cursor. - MollifierStaleSweepState direct unit tests (5 tests): readCursor default + corrupt-value tolerance, writeCursor round-trip, rebuildOrgList replace-not-append semantics, setEnvStaleCount HSET vs HDEL, clearAll DELs all three keys. Suite total: 7 existing + 9 new = 16 tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…on terminal failure Two Devin review findings on PR #3754, both real and unresolved: 1. Sharded stale sweep's counts hash never cleared for fully-drained envs — gauge stayed permanently elevated, false-alerting the recommended `> 0 for 5m` rule. Root cause: when an env's last buffered entry is popped, the buffer's atomic Lua removes the env from `mollifier:org-envs:${orgId}` (and removes the org from `mollifier:orgs` if it has no other envs). The sweep's inner loop walks `buffer.listEnvsForOrg(orgId)`, so the env disappears from the iteration entirely — `setEnvStaleCount(envId, 0)` (which HDELs the field) is never called, and the counts hash retains the env's last-known stale count forever. Fix (Devin's Approach 2): cycle-bounded reconciliation. Add a Redis SET `mollifier:stale_sweep:visited` that the sweep SADDs into for every env it touches. When the cursor wraps (cycle complete), `reconcileVisited()` does `HKEYS counts → SMEMBERS visited → HDEL the difference → DEL visited`. Pipelined; orphans clear within at most one full cursor cycle of the env going quiet, which matches the sharding contract's existing one-cycle freshness window. Test: "evicts fully-drained envs from the counts hash at cycle wrap" — accepts an entry, sweep flags it stale, pops the entry (env vanishes from listEnvsForOrg), runs another sweep that triggers wrap, asserts the env is HDEL'd from both the snapshot and the underlying counts hash. 2. Drainer handler's terminal SYSTEM_FAILURE write dropped the snapshot's `batch` field. If the buffered run was part of a batch, the failure row wasn't associated with the batch and the batch parent's completion tracking could hang indefinitely waiting on a child that landed but isn't visible to the batch. Fix: extract `snapshot.batch` with structural type guards and pass it through to `createFailedTaskRun`. Same defensive pattern as the other snapshot fields in this code path (the snapshot is typed `Record<string, unknown>` because it came from cjson-decoded buffer payload). Test: "propagates the batch association into createFailedTaskRun" — asserts the call site receives `{ id, index }` from the snapshot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Redis Devin review on PR #3754: `stop()` previously called `deps.state.close()` immediately after `clearInterval`, but `tick()` only checks `stopped` at entry. A tick that was already past that guard would keep making `state.*` calls against an ioredis client that `stop()` had already `quit()`ed — those calls would throw, the tick's own try/catch would swallow them as `mollifier.stale_sweep.failed` warnings, and every graceful shutdown would emit spurious noise. Track the current tick promise as `currentTick`. `stop()` awaits it (if present) before invoking `state.close()`, so the tick's last state call lands BEFORE the Redis client is quit. The tick's own try/catch handles the (unexpected) case where it rejects; the await in `stop()` is solely for ordering. Also drop the `instanceof MollifierStaleSweepState` guard around `state.close()` — `close()` is part of the `StaleSweepStateStore` contract, so unconditional invocation is correct. Test fake states implement `close()` as a no-op. Test: `stop() waits for an in-flight tick to finish before closing the state` — gates a fake state's `readCursor` on a Deferred, kicks off the interval, waits for the tick to start, then races `stop()` against the gate. Asserts the stop promise stays unresolved while the tick is mid-flight and that the tick's final state operation lands before `close()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three unresolved Devin threads, all addressed in this commit. Committed locally only — not pushed. 1. `callWithoutTraceEvents` was inheriting the new `emitRunFailedEvent` default of `true`, so its `createFailedTaskRun` call would fire the `runFailed` bus emit and the listener would write a ClickHouse completion event row with empty `traceId`/`spanId` — orphan row, directly contradicting the method's "without trace events" contract. Pass `emitRunFailedEvent: false` and enqueue `PerformTaskRunAlertsService` directly, mirroring the `call()` pattern so customers' ERROR channels still see the failure. 2. The cjson empty-tags defense lived only on `createCancelledRun`, not on `engine.trigger`. When the mollifier buffer's mutate-side Lua re-serialises a payload (e.g. `append_tags` on a buffered run that never had tags), an empty Lua table encodes as `{}` and decodes back to a JS object — and the previous `tags.length === 0` check passes that object straight to Prisma's `String[]` column. Mirror the same `Array.isArray && tags.length > 0 ? tags : undefined` guard `createCancelledRun` already uses. The defense is symmetric with the existing tested case for createCancelledRun, so the same contract holds for the trigger replay path. 3. `runCancelled` handler's `cancelRunEvent` lookup fails for buffered-only runs (no primary trace event exists, since the mollifier gate skipped `repository.traceEvent` for the not-yet-materialised run). The handler's `tryCatch` swallowed the error, but the systematic `[runCancelled] Failed to cancel run event` log fired on every cancelled buffered run. Add `emitRunCancelledEvent: boolean = true` to `createCancelledRun` (symmetric with the existing `emitRunFailedEvent` flag on `createFailedTaskRun`); drainer handler passes `false`. CANCELED PG row still writes; only the trace-event mirror is skipped. Tests: - `RunEngine.createCancelledRun > emitRunCancelledEvent: false suppresses the bus emit but still writes the CANCELED PG row` — pins the new flag's semantics. - `createDrainerHandler > calls createCancelledRun with emitRunCancelledEvent: false (suppresses orphan trace-event log noise)` — pins the call site's contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ts exhaustion Before: when engine.trigger() threw a retryable PG error (P1001 / P2024 / ECONNRESET) and the drainer exhausted TRIGGER_MOLLIFIER_DRAIN_MAX_ATTEMPTS (default 3), MollifierDrainer.processEntry called buffer.fail() directly, which atomically marks the entry FAILED and DELs the hash. No PG row was ever written, so the customer's run silently disappeared. This fix introduces a new optional `onTerminalFailure` callback on MollifierDrainerOptions that fires before buffer.fail() on any terminal path. The webapp wires it to a wrapper around the existing createFailedTaskRun(...) snapshot block from the non-retryable path — factored into a shared writeMollifierTerminalFailureRow helper so both paths land an identical SYSTEM_FAILURE row. Behaviour: - Non-retryable error: handler writes the row inline (unchanged), then returns normally. Drainer ack()s. - Retryable error, attempts left: drainer requeues (unchanged). - Retryable error, attempts exhausted: drainer invokes onTerminalFailure with cause: "max-attempts-exhausted"; callback writes the SYSTEM_FAILURE row, then drainer calls buffer.fail(). - Callback itself throws retryable: drainer requeues so the next tick gets another shot at the PG write. - Callback throws non-retryable: log and fall through to buffer.fail so a genuinely malformed snapshot doesn't loop forever. Also corrects the stale Lua comment in buffer.ts's failMollifierEntry script that claimed the drainer "has already written a SYSTEM_FAILURE PG row for terminal failures" — only true for the non-retryable path before this change. Includes 5 unit tests in drainer.test.ts covering max-attempts-exhausted, non-retryable, callback-retryable-rerequeue, callback-non-retryable-falls-through, and no-callback back-compat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…failures `writeMollifierTerminalFailureRow` was calling `createFailedTaskRun` without `emitRunFailedEvent: false`. The engine's `runFailed` event handler then called `completeFailedRunEvent`, which looks up the run's primary trace span. Buffered-only runs never had a primary trace event written for them — the mollifier gate intercepts BEFORE `repository.traceEvent` runs — so the lookup always failed, producing a systematic `[runFailed] Failed to complete failed run event` error log per drainer terminal failure. The errors are caught by the handler's `tryCatch`, but the per-failure noise pollutes Sentry and masks real errors. Mirrors what `TriggerFailedTaskService` already does at `triggerFailedTask.server.ts:212` and `:324`: - Pass `emitRunFailedEvent: false` to suppress the trace-event mirror. - Manually enqueue `PerformTaskRunAlertsService.enqueue(run.id)` so the customer's ERROR channel still fires. Best-effort, logged on failure but does not bubble (the SYSTEM_FAILURE row landing is the load-bearing customer-visible outcome). Also tightens the helper's return type from `boolean` to `TaskRun | null` so the alert side has the run id without an extra lookup. Both existing call sites (inline non-retryable + max-attempts-exhausted callback) keep working — the `if (!wrote)` truthy check is identical under `TaskRun | null`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…RE fallback on cancel-bifurcation Two fixes from CodeRabbit on PR #3754: 1. `TRIGGER_MOLLIFIER_STALE_SWEEP_ENABLED` was inheriting `TRIGGER_MOLLIFIER_ENABLED`, so any deployment already running the mollifier would auto-start the sweep worker on upgrade — defeating the point of having a separate kill switch. Hard-default to "0" so the sweep is an explicit ops decision. 2. In the cancel-bifurcation path, a non-conflict + non-retryable error from `createCancelledRun` rethrew out of the handler. The drainer's `onTerminalFailure` handler gates on `cause === "max-attempts-exhausted"` and skips "non-retryable", so `buffer.fail()` deleted the entry without ever writing a PG row — the cancelled run disappeared silently. Mirror the SYSTEM_FAILURE fallback the non-cancel trigger path already uses: write a terminal row via the shared helper, falling back to the original throw only when the write also fails non-retryably (and re-throw retryable write errors so the drainer requeues). UX trade: the customer initiated a cancel but sees SYSTEM_FAILURE if the cancel write itself was structurally rejected. Terminal-but- mislabelled beats terminal silence — the alternative was the run disappearing from the dashboard entirely. The path is narrow (only non-conflict, non-retryable createCancelledRun failures). Two new unit tests cover the cancel branch: non-retryable creates the SYSTEM_FAILURE row + ACKs the entry; retryable rethrows so the drainer requeues. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… slice `MollifierStaleSweepState.readOrgListSlice` was logging-and-returning `{ orgs: [], total: 0 }` on pipeline-abort or per-result Redis errors. The caller (`runStaleSweepOnce`) treated that as a clean empty cycle: wrote cursor=0, reconciled visited envs against the empty result, and cleared the stale-entry gauge — silencing the alerts the sweep exists to raise. Re-throw the underlying error instead. The interval wrapper's catch logs `stale_sweep.failed` for the failed tick; durable cursor + counts hash stay untouched so the gauge keeps reporting its last-known value until a healthy tick repopulates it. New unit test on the caller contract pins this: error propagates, cursor stays at its seeded value, counts hash retains the seeded env, no snapshot is reported. Separately documents a known limitation in the .server-changes entry: the sweep runs per-webapp-instance, so its stale-entry counter multiplies by N webapps in HA until a distributed lease lands as a follow-up. The system is HA-safe (Redis ops are atomic, no torn state); only the metric output is inflated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ route wiring Synthesise QUEUED/FAILED responses from the mollifier buffer when a TaskRun row hasn't landed in Postgres yet. Wires the synthesis into: - ApiRetrieveRunPresenter - v1 trace GET route - v1 spans GET route - attempts route gains a GET loader (fixes pre-existing Remix 400) Stacked on the trigger-time decisions PR. The readFallback infra itself lives on the trigger PR (consumed by IdempotencyKeyConcern); this PR adds the route-level synthetic-rendering primitives. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the ad-hoc \`as Record<string, unknown>\` + \`typeof === "string"\` checks in \`findBufferedRunRedirectInfo\` with a Zod \`safeParse\` against a schema for the subset of fields the redirect needs (envSlug / projectSlug / orgSlug / optional spanId). Wrong-typed or missing fields now collapse into a single parse-fail branch that logs the structured issue list and returns null. Adds a regression test for the structural-vs-typeof distinction: \`environment.slug: 42\` (number) is now rejected, where the previous \`typeof slug === "string"\` chain would silently accept any string- typed value but had no defence against shape drift in other fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…l enum \`SyntheticRun.machinePreset\` is a plain string sourced from the mollifier snapshot, but \`SpanRun.machinePreset\` is the typed \`MachinePresetName\` enum (micro / small-1x / small-2x / medium-1x / medium-2x / large-1x / large-2x). The direct assignment failed \`tsc --noEmit\` and CI typecheck. Validate via \`MachinePresetName.safeParse\` and collapse unknown values to \`undefined\` so a stale preset returned by the buffer doesn't bleed into the UI as a typed-but-unknown value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five Devin findings on PR #3755: - ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning buffered.friendlyId to the FoundRun.id field. The id column on PG is the internal cuid; downstream log correlation reads taskRun.id and was getting the friendly token instead. Fixed to read the cuid that readFallback already derives via RunId.fromFriendlyId. - api.v1.runs.$runId.trace.ts buffered branch hardcoded isPartial: true. Cancelled is a terminal state — the sibling spans route and syntheticTrace already gate this on !isCancelled. Match. - synthesisePayload helper replaces the silent typeof === "string" coercion. Object-shaped payloads now JSON- stringify (matching how the trigger path would serialise them) with a warn log so format drift is visible. Truly unserialisable inputs fall back to "" with an error log instead of silently dropping. - syntheticRedirectInfo now uses deserialiseMollifierSnapshot (the webapp-side wrapper) instead of deserialiseSnapshot from the redis-worker package directly. Both share the same implementation today, but pinning the wrapper means the two read-side modules can't drift if the snapshot encoding ever changes (e.g. msgpack). - attempts route loader verifies the run belongs to the authenticated environment (PG-first, buffer fallback) before returning the parity-empty list. Other run-scoped endpoints (spans, trace, retrieve) 404 cross-env; matching that closes the exists-vs-doesn't-exist side channel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The synthetic SpanRun/trace builders for buffered runs hardcoded non-terminal state, so a CANCELED or FAILED buffered run rendered as a healthy in-progress run: - syntheticSpanRun: FAILED now maps to SYSTEM_FAILURE (matching ApiRetrieveRunPresenter.bufferedStatusToTaskRunStatus); isFinished is true for CANCELED/FAILED; isError is true for FAILED; the error block is synthesised as STRING_ERROR and statusReason carries the message. - syntheticSpanRun: drop the empty-string spanId/taskIdentifier relationship stubs (blank task name + misleading `?span=` jump) since the snapshot only carries friendly IDs. - syntheticTrace: FAILED now renders as an errored, non-partial, "failed" root span instead of executing/partial. CANCELED stays "completed", matching RunPresenter's derivation. - tests: cover the CANCELED and FAILED terminal paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…truction Addresses the higher-confidence read-fallback review findings: - attempts GET loader: rebuilt on createLoaderApiRoute so it matches the sibling read routes — accepts JWTs with run/task/tag/batch resource scoping (was bare authenticateApiRequest, rejecting PUBLIC_JWT and doing no scope check), and 404s with `x-should-retry: true` so SDK pollers keep retrying a not-yet-materialised run instead of giving up. - batch reconstruction: the snapshot embeds the batch as `{ id, index }` (engine.trigger shape), but readFallback read a non-existent flat `batchId`, so SyntheticRun.batchId was always undefined. Read it from `snapshot.batch.id` (the internal cuid). synthesiseFoundRunFromBuffer now populates `batch` from it, and the spans/trace buffer-path authorization pushes the batch resource — so batch-scoped JWTs authorise against buffered runs and the retrieve response reports the correct batchId. - metadata: coerce a non-string buffered metadata defensively (JSON stringify + warn) instead of silently dropping to null, mirroring synthesisePayload. In practice metadata is always a string, so this is a no-op guard, but it surfaces format drift to ops. - tests: cover batchId extraction from the nested batch object and its absence for non-batched runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Comments referenced internal phase labels ("Phase A2", "Phase A5") from the development plan rather than describing what the code does. Replaced with self-contained prose; the surrounding explanations were already correct and are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…derivation syntheticTrace.server.ts shipped without a test file; this adds one, covering the identity-field passthrough, taskIdentifier-and-spanId defaults, the three rootSpanStatus branches (QUEUED→executing, CANCELED→completed, FAILED→failed) with their isPartial/isError/isCancelled flags, the 1ms duration floor, rootStartedAt mapping, and the single-span trace shape (empty events/timelineEvents, empty linkedRunIdBySpanId, undefined overridesBySpanId/queuedDuration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…utes + align workerQueue default Addresses three Devin review findings on PR #3755: - api.v1.runs.\$runId.spans.\$spanId.ts and api.v1.runs.\$runId.trace.ts: the buffered-run response branch hardcoded isError:false and only checked CANCELED for isPartial, so a FAILED buffered run rendered as "still in progress" — SDK consumers would poll forever. Now derives both flags from CANCELED and FAILED, matching syntheticTrace.server.ts. - ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer: workerQueue defaulted to "main" while syntheticSpanRun.server.ts uses "". The API response's `region` is sourced via `run.workerQueue || undefined`, so "main" was advertising a region the run hadn't yet been assigned to. Aligned to "" so unassigned buffered runs coerce to region: undefined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…drop scaffolding GET on attempts Two related cleanups on the mollifier read surface: 1. Extract the buffered-run response bodies for the spans-detail and trace endpoints into pure helpers (apps/webapp/app/v3/mollifier/syntheticApiResponses.server.ts: buildSyntheticSpanDetailBody, buildSyntheticTraceBody). The route bodies were carrying the only copy of the terminal-state derivation (CANCELED / FAILED → isError / isPartial / isCancelled) with no unit coverage; extracting them lets us pin the contract directly. The route files now just authenticate, resolve, validate the spanId, and forward — no body shape logic in routes. 2. Drop the GET loader on api.v1.runs.\$runParam.attempts.ts. It was added in this PR solely to fix a pre-existing Remix "no loader" 400 on a URL no SDK consumer was actually calling, and to give the mollifier-parity script a stable assertion target. The detailed attempt list lives on the v3 retrieve endpoint — the GET was scaffolding rather than product surface, and Devin's review flagged it as such. Reverted to action-only. Tests: 16 cases in apps/webapp/test/mollifierSyntheticApiResponses.test.ts covering QUEUED / CANCELED / FAILED for each body, plus identity and default-field passthrough. Pins the FAILED-terminal-state regression that shipped briefly with isPartial:true. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pin synthesise contract with regression tests synthesiseFoundRunFromBuffer hardcoded `scheduleId: null`, which dropped the schedule field from the retrieve-API response for any scheduled trigger that landed in the mollifier buffer. Scheduled triggers go through the same TriggerTaskService path as API triggers and the gate doesn't bypass them, so the snapshot does carry scheduleId; the synthesis was just throwing it away. Forward `buffered.scheduleId ?? null` so resolveSchedule() can hydrate the schedule object from PG (the Schedule row exists before the trigger fires). Exported synthesiseFoundRunFromBuffer + the FoundRun type from the presenter and added apps/webapp/test/mollifierSynthesiseFoundRun.test.ts (16 cases). The test file pins the snapshot→FoundRun mapping that previously had no direct coverage — the new scheduleId forwarding plus earlier-session regressions (batch reconstruction, workerQueue default "", FAILED→SYSTEM_FAILURE status mapping, STRING_ERROR shape, defensive metadata coercion, idempotency defaults, execution-state zero defaults). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ns in events endpoint `api.v1.runs.$runId.events.ts` calls `ApiRetrieveRunPresenter.findRun`, which now returns a buffered fallback. The route then proceeds to `getRunEvents` against ClickHouse with `run.traceId` (empty string for buffered runs) and `run.taskEventStore` (`"taskEvent"` default) — a guaranteed-empty round trip, since the mollifier gate intercepts BEFORE any trace event is written. Devin's analysis on PR #3755 flagged this as a wasted ClickHouse round trip per request hitting the events endpoint with a buffered runId. Add an `isBuffered: boolean` flag to `FoundRun`. The PG branch of `findRun` sets `isBuffered: false` on the spread Prisma row; the buffered branch's `synthesiseFoundRunFromBuffer` sets `isBuffered: true`. The events route short-circuits on the flag and returns `{ events: [] }` with status 200, matching the semantically-correct behaviour without the ClickHouse hop. Gating on the explicit flag (rather than probing surrogates like `traceId === ""`) keeps the contract clear for future callers that need the same "PG-only" gate — they can introspect the same field. Includes a unit test in `mollifierSynthesiseFoundRun.test.ts` pinning `isBuffered: true` on the synth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PG-resident SYSTEM_FAILURE rows always have completedAt populated by the engine, so SDK consumers that stop polling when isCompleted=true and then read finishedAt see a real timestamp. The buffer-synth path was returning completedAt=null for FAILED (only the CANCELED branch filled it from buffered.cancelledAt), producing isCompleted:true + finishedAt:undefined — exactly the inconsistency the synth shape exists to prevent. Fall back to buffered.createdAt for the SYSTEM_FAILURE case (the buffer entry has no separate failedAt; createdAt is the best-available proxy for when the terminal state landed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Symmetric fix to the just-applied `ApiRetrieveRunPresenter` change. PG-resident SYSTEM_FAILURE rows always have `completedAt` populated; the synthetic span run was returning null for FAILED, so the run-detail panel and any caller using `isFinished && completedAt` saw a finished run with no completion timestamp. Fall back to `run.createdAt` when `isFailed` is true (the buffer entry has no separate failedAt — createdAt is the best proxy for when the terminal state landed). Regression locked in `mollifierSyntheticSpanRun.test.ts` — the FAILED test now asserts `completedAt === NOW`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 31 additional findings in Devin Review.

devin-ai-integration · 2026-06-01T12:11:18Z

        },
      },
    });
+
+    if (pgRow) return { ...pgRow, isBuffered: false };
+
+    // Postgres miss → fall back to the mollifier buffer. When the gate
+    // diverted a trigger, the run lives in Redis until the drainer replays
+    // it through engine.trigger. Synthesise the FoundRun shape so call()
+    // returns a `QUEUED` (or `FAILED`) response with empty output, no
+    // attempts, no relations.
+    const buffered = await findRunByIdWithMollifierFallback({
+      runId: friendlyId,
+      environmentId: env.id,
+      organizationId: env.organizationId,
+    });
+
+    if (!buffered) return null;
+
+    return synthesiseFoundRunFromBuffer(buffered);
  }


🚩 Implicit behavioral change: findRun now returns buffered runs for all call sites

The PR modified ApiRetrieveRunPresenter.findRun to include a mollifier buffer fallback. This affects ALL callers of findRun, not just the routes modified in this PR. The api.v3.runs.$runId.ts:19 and api.v1.runs.$runParam.reschedule.ts:66 routes also call findRun but were not updated. For api.v3.runs.$runId.ts, the presenter.call() method handles the synthesised FoundRun correctly (PENDING status skips output processing, resolveSchedule does a PG lookup that may miss but returns undefined gracefully). For the reschedule route, the downstream service would receive a synthetic FoundRun and attempt operations on a run that doesn't exist in PG yet — worth confirming that the reschedule service handles this gracefully (e.g., fails with a clear error rather than corrupting state).

(Refers to lines 107-160)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 1 new potential issue.

View 36 additional findings in Devin Review.

devin-ai-integration · 2026-06-01T12:31:35Z

+      try {
+        await PerformTaskRunAlertsService.enqueue(failedRun.id);
+      } catch (alertsError) {
+        logger.warn("TriggerFailedTaskService: alert enqueue failed", {
+          taskId: request.taskId,
+          friendlyId: failedRun.friendlyId,
+          error:
+            alertsError instanceof Error ? alertsError.message : String(alertsError),
+        });
+      }


🚩 TriggerFailedTaskService now enqueues alerts where it previously didn't

Both call() (line 226) and callWithoutTraceEvents() (line 332) in TriggerFailedTaskService now call PerformTaskRunAlertsService.enqueue(failedRun.id) after createFailedTaskRun. Previously, failed task runs through this service never triggered customer alert channels — the runFailed event was never emitted, and no manual alert enqueue existed. This is an intentional feature addition (the comments explain the reasoning), but it changes the observable behavior: batch trigger failures that exceed queue limits, environment-not-found errors, etc. will now fire ERROR alert channels for customers who have them configured. If rollout needs to be gradual, this could be gated behind a feature flag.

Was this helpful? React with 👍 or 👎 to provide feedback.

`.server-changes/mollifier-reads.md` mentioned an "attempts" endpoint but no `api.v1.runs.$runParam.attempts.ts` changes exist in this PR (it's a POST/action route for creating attempts, not a read endpoint). The actual added endpoints are retrieve, trace, spans, and events — fix the changelog to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

d-cs commented May 26, 2026

View reviewed changes

Comment thread apps/webapp/app/v3/mollifier/syntheticRedirectInfo.server.ts

d-cs self-assigned this May 26, 2026

d-cs force-pushed the mollifier-phase-3-replay branch from 31f4726 to b05929b Compare May 26, 2026 11:12

d-cs force-pushed the mollifier-phase-3-reads branch from 1838229 to af0aeeb Compare May 26, 2026 11:12

d-cs force-pushed the mollifier-phase-3-replay branch from b05929b to b89da52 Compare May 26, 2026 13:24

d-cs force-pushed the mollifier-phase-3-reads branch 3 times, most recently from 0919f7a to f36c576 Compare May 26, 2026 15:12

d-cs force-pushed the mollifier-phase-3-replay branch from 74fdf6d to c6fa61f Compare May 26, 2026 16:20

d-cs force-pushed the mollifier-phase-3-reads branch 2 times, most recently from 047b240 to e57bc5e Compare May 26, 2026 16:32

This comment was marked as resolved.

Sign in to view

d-cs force-pushed the mollifier-phase-3-replay branch from 242ba73 to 6a8404d Compare May 27, 2026 12:04

d-cs force-pushed the mollifier-phase-3-reads branch from f4131eb to d8f6cf7 Compare May 27, 2026 12:04

d-cs force-pushed the mollifier-phase-3-replay branch from 6a8404d to bc9f4e2 Compare May 27, 2026 12:15

d-cs force-pushed the mollifier-phase-3-reads branch from d8f6cf7 to 796a2c0 Compare May 27, 2026 12:15

d-cs force-pushed the mollifier-phase-3-replay branch from bc9f4e2 to 637e8c0 Compare May 27, 2026 12:21

d-cs force-pushed the mollifier-phase-3-reads branch from 796a2c0 to b139391 Compare May 27, 2026 12:21

d-cs force-pushed the mollifier-phase-3-replay branch from 637e8c0 to 65219db Compare May 27, 2026 12:58

d-cs force-pushed the mollifier-phase-3-reads branch from b139391 to d153042 Compare May 27, 2026 12:58

d-cs force-pushed the mollifier-phase-3-replay branch from 65219db to ccdcd9c Compare May 27, 2026 14:06

d-cs force-pushed the mollifier-phase-3-reads branch 2 times, most recently from 0753300 to a1e1ad8 Compare May 27, 2026 15:07

d-cs and others added 27 commits June 1, 2026 12:17

docs(server-changes): document mollifier API GET read-fallback

f650cdf

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

devin-ai-integration Bot reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(webapp): mollifier API GET read-fallback — synthetic primitives + route wiring#3755

feat(webapp): mollifier API GET read-fallback — synthetic primitives + route wiring#3755
d-cs wants to merge 34 commits into
mainfrom
mollifier-phase-3-reads

d-cs commented May 26, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 1, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

d-cs commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

changeset-bot Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-cs commented May 26, 2026 •

edited

Loading

changeset-bot Bot commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading