Skip to content

feat(webapp): mollifier API GET read-fallback — synthetic primitives + route wiring#3755

Open
d-cs wants to merge 34 commits into
mainfrom
mollifier-phase-3-reads
Open

feat(webapp): mollifier API GET read-fallback — synthetic primitives + route wiring#3755
d-cs wants to merge 34 commits into
mainfrom
mollifier-phase-3-reads

Conversation

@d-cs
Copy link
Copy Markdown
Collaborator

@d-cs d-cs commented May 26, 2026

Summary

Synthesise QUEUED/FAILED responses from the mollifier buffer when a TaskRun row hasn't landed in Postgres yet. Wires the synthesis into:

  • ApiRetrieveRunPresenter
  • v1 trace GET route
  • v1 spans GET route
  • attempts route gains a GET loader (fixes pre-existing Remix "no loader" 400)

The readFallback infra itself lives on the trigger PR (consumed by IdempotencyKeyConcern); this PR adds the route-level synthetic-rendering primitives.

Stacked on the replay PR.

Test plan

  • `pnpm run typecheck --filter webapp` passes
  • `pnpm run test --filter webapp test/mollifierSyntheticRedirectInfo.test.ts` passes
  • `pnpm run test --filter webapp test/mollifierSyntheticSpanRun.test.ts` passes

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 26, 2026

🦋 Changeset detected

Latest commit: 645b231

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 32 packages
Name Type
@trigger.dev/redis-worker Minor
@internal/run-engine Patch
@internal/schedule-engine Patch
@trigger.dev/build Minor
@trigger.dev/core Minor
@trigger.dev/plugins Minor
@trigger.dev/python Minor
@trigger.dev/react-hooks Minor
@trigger.dev/rsc Minor
@trigger.dev/schema-to-json Minor
@trigger.dev/sdk Minor
@trigger.dev/database Minor
@trigger.dev/otlp-importer Minor
@trigger.dev/rbac Minor
trigger.dev Minor
references-ai-chat Patch
d3-chat Patch
references-d3-openai-agents Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@internal/redis Patch
@internal/replication Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
@internal/sdk-compat-tests Patch
references-telemetry Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bd18af83-0c36-4ab2-aa68-982e821be836

📥 Commits

Reviewing files that changed from the base of the PR and between 4745754 and 645b231.

📒 Files selected for processing (36)
  • .changeset/mollifier-drainer-terminal-failure-callback.md
  • .server-changes/mollifier-drainer-replay.md
  • .server-changes/mollifier-reads.md
  • apps/webapp/app/entry.server.tsx
  • apps/webapp/app/env.server.ts
  • apps/webapp/app/presenters/v3/ApiRetrieveRunPresenter.server.ts
  • apps/webapp/app/routes/api.v1.runs.$runId.events.ts
  • apps/webapp/app/routes/api.v1.runs.$runId.spans.$spanId.ts
  • apps/webapp/app/routes/api.v1.runs.$runId.trace.ts
  • apps/webapp/app/runEngine/services/triggerFailedTask.server.ts
  • apps/webapp/app/v3/mollifier/mollifierDrainer.server.ts
  • apps/webapp/app/v3/mollifier/mollifierDrainerHandler.server.ts
  • apps/webapp/app/v3/mollifier/mollifierStaleSweep.server.ts
  • apps/webapp/app/v3/mollifier/mollifierStaleSweepState.server.ts
  • apps/webapp/app/v3/mollifier/mollifierTelemetry.server.ts
  • apps/webapp/app/v3/mollifier/syntheticApiResponses.server.ts
  • apps/webapp/app/v3/mollifier/syntheticRedirectInfo.server.ts
  • apps/webapp/app/v3/mollifier/syntheticSpanRun.server.ts
  • apps/webapp/app/v3/mollifier/syntheticTrace.server.ts
  • apps/webapp/app/v3/mollifierStaleSweepWorker.server.ts
  • apps/webapp/test/mollifierDrainerHandler.test.ts
  • apps/webapp/test/mollifierDrainerWorker.test.ts
  • apps/webapp/test/mollifierReadFallback.test.ts
  • apps/webapp/test/mollifierStaleSweep.test.ts
  • apps/webapp/test/mollifierSynthesiseFoundRun.test.ts
  • apps/webapp/test/mollifierSyntheticApiResponses.test.ts
  • apps/webapp/test/mollifierSyntheticRedirectInfo.test.ts
  • apps/webapp/test/mollifierSyntheticSpanRun.test.ts
  • apps/webapp/test/mollifierSyntheticTrace.test.ts
  • internal-packages/run-engine/src/engine/index.ts
  • internal-packages/run-engine/src/engine/tests/createCancelledRun.test.ts
  • internal-packages/run-engine/src/engine/tests/createFailedTaskRun.test.ts
  • packages/redis-worker/src/mollifier/buffer.ts
  • packages/redis-worker/src/mollifier/drainer.test.ts
  • packages/redis-worker/src/mollifier/drainer.ts
  • packages/redis-worker/src/mollifier/index.ts
 _____________________________
< I wish to make a complaint. >
 -----------------------------
  \
   \   (\__/)
       (•ㅅ•)
       /   づ
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mollifier-phase-3-reads

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread apps/webapp/app/v3/mollifier/syntheticRedirectInfo.server.ts
@d-cs d-cs self-assigned this May 26, 2026
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from 31f4726 to b05929b Compare May 26, 2026 11:12
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch from 1838229 to af0aeeb Compare May 26, 2026 11:12
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from b05929b to b89da52 Compare May 26, 2026 13:24
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch 3 times, most recently from 0919f7a to f36c576 Compare May 26, 2026 15:12
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from 74fdf6d to c6fa61f Compare May 26, 2026 16:20
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch 2 times, most recently from 047b240 to e57bc5e Compare May 26, 2026 16:32
devin-ai-integration[bot]

This comment was marked as resolved.

d-cs added a commit that referenced this pull request May 26, 2026
Five Devin findings on PR #3755:

- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning
  buffered.friendlyId to the FoundRun.id field. The id column on PG is
  the internal cuid; downstream log correlation reads taskRun.id and
  was getting the friendly token instead. Fixed to read the cuid that
  readFallback already derives via RunId.fromFriendlyId.

- api.v1.runs.$runId.trace.ts buffered branch hardcoded
  isPartial: true. Cancelled is a terminal state — the sibling spans
  route and syntheticTrace already gate this on !isCancelled. Match.

- synthesisePayload helper replaces the silent
  typeof === "string" coercion. Object-shaped payloads now JSON-
  stringify (matching how the trigger path would serialise them) with
  a warn log so format drift is visible. Truly unserialisable inputs
  fall back to "" with an error log instead of silently dropping.

- syntheticRedirectInfo now uses deserialiseMollifierSnapshot
  (the webapp-side wrapper) instead of deserialiseSnapshot from the
  redis-worker package directly. Both share the same implementation
  today, but pinning the wrapper means the two read-side modules can't
  drift if the snapshot encoding ever changes (e.g. msgpack).

- attempts route loader verifies the run belongs to the authenticated
  environment (PG-first, buffer fallback) before returning the
  parity-empty list. Other run-scoped endpoints (spans, trace,
  retrieve) 404 cross-env; matching that closes the
  exists-vs-doesn't-exist side channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from 242ba73 to 6a8404d Compare May 27, 2026 12:04
d-cs added a commit that referenced this pull request May 27, 2026
Five Devin findings on PR #3755:

- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning
  buffered.friendlyId to the FoundRun.id field. The id column on PG is
  the internal cuid; downstream log correlation reads taskRun.id and
  was getting the friendly token instead. Fixed to read the cuid that
  readFallback already derives via RunId.fromFriendlyId.

- api.v1.runs.$runId.trace.ts buffered branch hardcoded
  isPartial: true. Cancelled is a terminal state — the sibling spans
  route and syntheticTrace already gate this on !isCancelled. Match.

- synthesisePayload helper replaces the silent
  typeof === "string" coercion. Object-shaped payloads now JSON-
  stringify (matching how the trigger path would serialise them) with
  a warn log so format drift is visible. Truly unserialisable inputs
  fall back to "" with an error log instead of silently dropping.

- syntheticRedirectInfo now uses deserialiseMollifierSnapshot
  (the webapp-side wrapper) instead of deserialiseSnapshot from the
  redis-worker package directly. Both share the same implementation
  today, but pinning the wrapper means the two read-side modules can't
  drift if the snapshot encoding ever changes (e.g. msgpack).

- attempts route loader verifies the run belongs to the authenticated
  environment (PG-first, buffer fallback) before returning the
  parity-empty list. Other run-scoped endpoints (spans, trace,
  retrieve) 404 cross-env; matching that closes the
  exists-vs-doesn't-exist side channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch from f4131eb to d8f6cf7 Compare May 27, 2026 12:04
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from 6a8404d to bc9f4e2 Compare May 27, 2026 12:15
d-cs added a commit that referenced this pull request May 27, 2026
Five Devin findings on PR #3755:

- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning
  buffered.friendlyId to the FoundRun.id field. The id column on PG is
  the internal cuid; downstream log correlation reads taskRun.id and
  was getting the friendly token instead. Fixed to read the cuid that
  readFallback already derives via RunId.fromFriendlyId.

- api.v1.runs.$runId.trace.ts buffered branch hardcoded
  isPartial: true. Cancelled is a terminal state — the sibling spans
  route and syntheticTrace already gate this on !isCancelled. Match.

- synthesisePayload helper replaces the silent
  typeof === "string" coercion. Object-shaped payloads now JSON-
  stringify (matching how the trigger path would serialise them) with
  a warn log so format drift is visible. Truly unserialisable inputs
  fall back to "" with an error log instead of silently dropping.

- syntheticRedirectInfo now uses deserialiseMollifierSnapshot
  (the webapp-side wrapper) instead of deserialiseSnapshot from the
  redis-worker package directly. Both share the same implementation
  today, but pinning the wrapper means the two read-side modules can't
  drift if the snapshot encoding ever changes (e.g. msgpack).

- attempts route loader verifies the run belongs to the authenticated
  environment (PG-first, buffer fallback) before returning the
  parity-empty list. Other run-scoped endpoints (spans, trace,
  retrieve) 404 cross-env; matching that closes the
  exists-vs-doesn't-exist side channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch from d8f6cf7 to 796a2c0 Compare May 27, 2026 12:15
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from bc9f4e2 to 637e8c0 Compare May 27, 2026 12:21
d-cs added a commit that referenced this pull request May 27, 2026
Five Devin findings on PR #3755:

- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning
  buffered.friendlyId to the FoundRun.id field. The id column on PG is
  the internal cuid; downstream log correlation reads taskRun.id and
  was getting the friendly token instead. Fixed to read the cuid that
  readFallback already derives via RunId.fromFriendlyId.

- api.v1.runs.$runId.trace.ts buffered branch hardcoded
  isPartial: true. Cancelled is a terminal state — the sibling spans
  route and syntheticTrace already gate this on !isCancelled. Match.

- synthesisePayload helper replaces the silent
  typeof === "string" coercion. Object-shaped payloads now JSON-
  stringify (matching how the trigger path would serialise them) with
  a warn log so format drift is visible. Truly unserialisable inputs
  fall back to "" with an error log instead of silently dropping.

- syntheticRedirectInfo now uses deserialiseMollifierSnapshot
  (the webapp-side wrapper) instead of deserialiseSnapshot from the
  redis-worker package directly. Both share the same implementation
  today, but pinning the wrapper means the two read-side modules can't
  drift if the snapshot encoding ever changes (e.g. msgpack).

- attempts route loader verifies the run belongs to the authenticated
  environment (PG-first, buffer fallback) before returning the
  parity-empty list. Other run-scoped endpoints (spans, trace,
  retrieve) 404 cross-env; matching that closes the
  exists-vs-doesn't-exist side channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch from 796a2c0 to b139391 Compare May 27, 2026 12:21
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from 637e8c0 to 65219db Compare May 27, 2026 12:58
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch from b139391 to d153042 Compare May 27, 2026 12:58
@d-cs d-cs force-pushed the mollifier-phase-3-replay branch from 65219db to ccdcd9c Compare May 27, 2026 14:06
d-cs added a commit that referenced this pull request May 27, 2026
Five Devin findings on PR #3755:

- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning
  buffered.friendlyId to the FoundRun.id field. The id column on PG is
  the internal cuid; downstream log correlation reads taskRun.id and
  was getting the friendly token instead. Fixed to read the cuid that
  readFallback already derives via RunId.fromFriendlyId.

- api.v1.runs.$runId.trace.ts buffered branch hardcoded
  isPartial: true. Cancelled is a terminal state — the sibling spans
  route and syntheticTrace already gate this on !isCancelled. Match.

- synthesisePayload helper replaces the silent
  typeof === "string" coercion. Object-shaped payloads now JSON-
  stringify (matching how the trigger path would serialise them) with
  a warn log so format drift is visible. Truly unserialisable inputs
  fall back to "" with an error log instead of silently dropping.

- syntheticRedirectInfo now uses deserialiseMollifierSnapshot
  (the webapp-side wrapper) instead of deserialiseSnapshot from the
  redis-worker package directly. Both share the same implementation
  today, but pinning the wrapper means the two read-side modules can't
  drift if the snapshot encoding ever changes (e.g. msgpack).

- attempts route loader verifies the run belongs to the authenticated
  environment (PG-first, buffer fallback) before returning the
  parity-empty list. Other run-scoped endpoints (spans, trace,
  retrieve) 404 cross-env; matching that closes the
  exists-vs-doesn't-exist side channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the mollifier-phase-3-reads branch 2 times, most recently from 0753300 to a1e1ad8 Compare May 27, 2026 15:07
d-cs and others added 27 commits June 1, 2026 12:17
…tage in mollifier drainer

The mollifier drainer's cancel bifurcation called engine.createCancelledRun
without handling its documented conflict contract: when the normal trigger
replay path races ahead and materialises a live (non-CANCELED) row, the engine
throws a conflict so the caller can "decide between engine.cancelRun() and
skipping". The handler did neither — the conflict propagated, isRetryablePgError
returned false, and the drainer buffer.fail()'d the entry, silently losing the
cancellation while the run kept executing. Now route conflicts to
engine.cancelRun() so the cancel actually wins.

Separately, when engine.trigger fails non-retryably and the SYSTEM_FAILURE
fallback write then fails because PG is transiently unreachable, rethrowing the
original non-retryable error made the drainer buffer.fail() the entry — losing
the run with no PG row ever landing, and dropping the write error entirely.
Rethrow the retryable write error instead so the drainer requeues; the failure
row lands once PG recovers. Non-retryable write failures still rethrow the
original error as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove plan-tracking shorthand (Q# bifurcation, Phase C1/Q4) from replay-layer mollifier comments and test names; reword to plain English. Comment/test-name only; no behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…labels

The mollifier queue moved from ZSET to LIST in an earlier refactor, but two
comments still described it as a ZSET (`mollifierStaleSweep.server.ts:9` and
`env.server.ts:1104` — both narrating the periodic stale sweep). Update to
LIST.

Also clean four residual internal plan-doc labels left over from prior cleanup
passes:

- `createCancelledRun` docstring (`engine/index.ts`) referenced "Q4 mollifier-
  cancel design" and "F4 bypass" — both dead nomenclature now that the gate's
  C1/C3/F4 labels have been rewritten. Restate the waitpoint-skip rationale
  in plain English: the mollifier gate refuses to buffer triggerAndWait
  children, so a cancelled buffered run never has a waiting parent to unblock.
- `createCancelledRun.test.ts` empty-tags regression dropped "Found while
  running the Phase F challenge suite." — the comment describes the bug
  itself, which is self-contained.
- `mollifierStaleSweep.test.ts` "scans across multiple orgs" rephrased away
  from "Phase-3 design has org-level fairness"; the prose now states the
  invariant directly.

Comment/docstring-only; no behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous sweep was unbounded along two dimensions: every tick walked
every org and every env (via buffer.listOrgs + listEnvsForOrg). At the
sweep's default per-env entry cap of 1000, an incident-scale fan-out
gave O(orgs * envs * 1000) Redis round-trips per tick — running far
longer than the 5-minute interval and triggering the inFlight guard
to drop every subsequent tick until the slow pass finished.

Shard the work via a durable cursor:

- New file `mollifierStaleSweepState.server.ts` owns three Redis keys
  (`mollifier:stale_sweep:{cursor,org_list,counts}`), all under the
  mollifier namespace but separated from the buffer's own state. The
  state class has its own Redis client; the buffer's existing
  `MollifierBuffer` API surface is untouched.
- On `cursor === 0` the org list is rebuilt by snapshotting
  `buffer.listOrgs()` into the frozen LIST — the cycle's frozen view.
- Each tick consumes up to `maxOrgsPerPass` orgs (default 100),
  processes them, and advances the cursor. When the cursor reaches the
  end of the LIST it wraps to 0; the next tick rebuilds and starts the
  next cycle.
- The per-env counts HASH is the source of truth for the gauge
  snapshot. Visiting an env with zero stale entries HDEL's its hash
  field — gauge clears immediately on revisit. Envs not revisited this
  tick keep their last-known value (durability across ticks AND across
  webapp restarts), accepting a worst-case lag of one full cursor cycle
  before a no-longer-stale env clears.

Snapshot contract change: only envs with non-zero stale counts appear
in the reported `Map`. The telemetry layer (`mollifierTelemetry.server.ts`
`reportStaleEntrySnapshot`) sums values, so absence is equivalent to
zero for the gauge — the alert behaviour is unchanged.

Tests:
- New: "shards work across ticks: cursor advances by maxOrgsPerPass and
  wraps after a full cycle" — drives a 5-org fixture with cap=2,
  asserts the cursor's three-tick progression and wrap.
- New: "clears an env from the durable snapshot on revisit when it has
  entries but none currently stale" — same entry flips
  stale→not-stale by varying the sweep's `now`, asserts HDEL on
  revisit.
- Existing tests updated to inject `state`; one assertion shape
  rewritten ("snapshot omits envs that have entries but none stale")
  to match the new HDEL semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five gaps in the sharded sweep's test coverage, added 9 new tests
covering each:

- State durability across process restart: state1 populates the
  cursor + counts hash, closes its Redis client (simulated webapp
  restart), state2 constructs against the same Redis and picks up
  exactly where state1 left off. This is the headline benefit of
  storing sweep state in Redis instead of process memory; without
  this test it could silently regress.
- Cycle wrap rebuilds the org list: a third org joins between cycles
  and is visible only in the next cycle's snapshot. Pins the
  rebuildOrgList-on-cursor=0 contract.
- Empty buffer (no orgs) advances cleanly with zero work, empty
  snapshot, cursor stays at 0 instead of tripping the wrap math.
- Buffer-null branch's clearAll: previously asserted only "snapshot
  is empty"; now also asserts the durable state was actually wiped
  (cursor=0, counts hash empty) so a re-enable doesn't resume on a
  stale cursor.
- MollifierStaleSweepState direct unit tests (5 tests): readCursor
  default + corrupt-value tolerance, writeCursor round-trip,
  rebuildOrgList replace-not-append semantics, setEnvStaleCount HSET
  vs HDEL, clearAll DELs all three keys.

Suite total: 7 existing + 9 new = 16 tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on terminal failure

Two Devin review findings on PR #3754, both real and unresolved:

1. Sharded stale sweep's counts hash never cleared for fully-drained
   envs — gauge stayed permanently elevated, false-alerting the
   recommended `> 0 for 5m` rule.

   Root cause: when an env's last buffered entry is popped, the buffer's
   atomic Lua removes the env from `mollifier:org-envs:${orgId}` (and
   removes the org from `mollifier:orgs` if it has no other envs). The
   sweep's inner loop walks `buffer.listEnvsForOrg(orgId)`, so the env
   disappears from the iteration entirely — `setEnvStaleCount(envId, 0)`
   (which HDELs the field) is never called, and the counts hash retains
   the env's last-known stale count forever.

   Fix (Devin's Approach 2): cycle-bounded reconciliation. Add a Redis
   SET `mollifier:stale_sweep:visited` that the sweep SADDs into for
   every env it touches. When the cursor wraps (cycle complete),
   `reconcileVisited()` does `HKEYS counts → SMEMBERS visited → HDEL the
   difference → DEL visited`. Pipelined; orphans clear within at most
   one full cursor cycle of the env going quiet, which matches the
   sharding contract's existing one-cycle freshness window.

   Test: "evicts fully-drained envs from the counts hash at cycle wrap"
   — accepts an entry, sweep flags it stale, pops the entry (env
   vanishes from listEnvsForOrg), runs another sweep that triggers
   wrap, asserts the env is HDEL'd from both the snapshot and the
   underlying counts hash.

2. Drainer handler's terminal SYSTEM_FAILURE write dropped the
   snapshot's `batch` field. If the buffered run was part of a batch,
   the failure row wasn't associated with the batch and the batch
   parent's completion tracking could hang indefinitely waiting on a
   child that landed but isn't visible to the batch.

   Fix: extract `snapshot.batch` with structural type guards and pass
   it through to `createFailedTaskRun`. Same defensive pattern as the
   other snapshot fields in this code path (the snapshot is typed
   `Record<string, unknown>` because it came from cjson-decoded buffer
   payload).

   Test: "propagates the batch association into createFailedTaskRun" —
   asserts the call site receives `{ id, index }` from the snapshot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Redis

Devin review on PR #3754: `stop()` previously called `deps.state.close()`
immediately after `clearInterval`, but `tick()` only checks `stopped` at
entry. A tick that was already past that guard would keep making
`state.*` calls against an ioredis client that `stop()` had already
`quit()`ed — those calls would throw, the tick's own try/catch would
swallow them as `mollifier.stale_sweep.failed` warnings, and every
graceful shutdown would emit spurious noise.

Track the current tick promise as `currentTick`. `stop()` awaits it (if
present) before invoking `state.close()`, so the tick's last state call
lands BEFORE the Redis client is quit. The tick's own try/catch handles
the (unexpected) case where it rejects; the await in `stop()` is solely
for ordering.

Also drop the `instanceof MollifierStaleSweepState` guard around
`state.close()` — `close()` is part of the `StaleSweepStateStore`
contract, so unconditional invocation is correct. Test fake states
implement `close()` as a no-op.

Test: `stop() waits for an in-flight tick to finish before closing the
state` — gates a fake state's `readCursor` on a Deferred, kicks off the
interval, waits for the tick to start, then races `stop()` against the
gate. Asserts the stop promise stays unresolved while the tick is
mid-flight and that the tick's final state operation lands before
`close()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three unresolved Devin threads, all addressed in this commit. Committed
locally only — not pushed.

1. `callWithoutTraceEvents` was inheriting the new `emitRunFailedEvent`
   default of `true`, so its `createFailedTaskRun` call would fire the
   `runFailed` bus emit and the listener would write a ClickHouse
   completion event row with empty `traceId`/`spanId` — orphan row,
   directly contradicting the method's "without trace events" contract.
   Pass `emitRunFailedEvent: false` and enqueue
   `PerformTaskRunAlertsService` directly, mirroring the `call()`
   pattern so customers' ERROR channels still see the failure.

2. The cjson empty-tags defense lived only on `createCancelledRun`, not
   on `engine.trigger`. When the mollifier buffer's mutate-side Lua
   re-serialises a payload (e.g. `append_tags` on a buffered run that
   never had tags), an empty Lua table encodes as `{}` and decodes
   back to a JS object — and the previous `tags.length === 0` check
   passes that object straight to Prisma's `String[]` column.
   Mirror the same `Array.isArray && tags.length > 0 ? tags : undefined`
   guard `createCancelledRun` already uses. The defense is symmetric
   with the existing tested case for createCancelledRun, so the same
   contract holds for the trigger replay path.

3. `runCancelled` handler's `cancelRunEvent` lookup fails for
   buffered-only runs (no primary trace event exists, since the
   mollifier gate skipped `repository.traceEvent` for the
   not-yet-materialised run). The handler's `tryCatch` swallowed the
   error, but the systematic `[runCancelled] Failed to cancel run
   event` log fired on every cancelled buffered run.
   Add `emitRunCancelledEvent: boolean = true` to `createCancelledRun`
   (symmetric with the existing `emitRunFailedEvent` flag on
   `createFailedTaskRun`); drainer handler passes `false`. CANCELED PG
   row still writes; only the trace-event mirror is skipped.

Tests:
- `RunEngine.createCancelledRun > emitRunCancelledEvent: false
  suppresses the bus emit but still writes the CANCELED PG row` —
  pins the new flag's semantics.
- `createDrainerHandler > calls createCancelledRun with
  emitRunCancelledEvent: false (suppresses orphan trace-event log
  noise)` — pins the call site's contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts exhaustion

Before: when engine.trigger() threw a retryable PG error (P1001 / P2024 /
ECONNRESET) and the drainer exhausted TRIGGER_MOLLIFIER_DRAIN_MAX_ATTEMPTS
(default 3), MollifierDrainer.processEntry called buffer.fail() directly,
which atomically marks the entry FAILED and DELs the hash. No PG row was
ever written, so the customer's run silently disappeared.

This fix introduces a new optional `onTerminalFailure` callback on
MollifierDrainerOptions that fires before buffer.fail() on any terminal
path. The webapp wires it to a wrapper around the existing
createFailedTaskRun(...) snapshot block from the non-retryable path —
factored into a shared writeMollifierTerminalFailureRow helper so both
paths land an identical SYSTEM_FAILURE row.

Behaviour:
- Non-retryable error: handler writes the row inline (unchanged), then
  returns normally. Drainer ack()s.
- Retryable error, attempts left: drainer requeues (unchanged).
- Retryable error, attempts exhausted: drainer invokes
  onTerminalFailure with cause: "max-attempts-exhausted"; callback
  writes the SYSTEM_FAILURE row, then drainer calls buffer.fail().
- Callback itself throws retryable: drainer requeues so the next tick
  gets another shot at the PG write.
- Callback throws non-retryable: log and fall through to buffer.fail
  so a genuinely malformed snapshot doesn't loop forever.

Also corrects the stale Lua comment in buffer.ts's failMollifierEntry
script that claimed the drainer "has already written a SYSTEM_FAILURE
PG row for terminal failures" — only true for the non-retryable path
before this change.

Includes 5 unit tests in drainer.test.ts covering max-attempts-exhausted,
non-retryable, callback-retryable-rerequeue, callback-non-retryable-falls-through,
and no-callback back-compat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…failures

`writeMollifierTerminalFailureRow` was calling `createFailedTaskRun`
without `emitRunFailedEvent: false`. The engine's `runFailed` event
handler then called `completeFailedRunEvent`, which looks up the run's
primary trace span. Buffered-only runs never had a primary trace event
written for them — the mollifier gate intercepts BEFORE
`repository.traceEvent` runs — so the lookup always failed, producing
a systematic `[runFailed] Failed to complete failed run event` error
log per drainer terminal failure. The errors are caught by the
handler's `tryCatch`, but the per-failure noise pollutes Sentry and
masks real errors.

Mirrors what `TriggerFailedTaskService` already does at
`triggerFailedTask.server.ts:212` and `:324`:
- Pass `emitRunFailedEvent: false` to suppress the trace-event mirror.
- Manually enqueue `PerformTaskRunAlertsService.enqueue(run.id)` so the
  customer's ERROR channel still fires. Best-effort, logged on failure
  but does not bubble (the SYSTEM_FAILURE row landing is the
  load-bearing customer-visible outcome).

Also tightens the helper's return type from `boolean` to `TaskRun |
null` so the alert side has the run id without an extra lookup. Both
existing call sites (inline non-retryable + max-attempts-exhausted
callback) keep working — the `if (!wrote)` truthy check is identical
under `TaskRun | null`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…RE fallback on cancel-bifurcation

Two fixes from CodeRabbit on PR #3754:

1. `TRIGGER_MOLLIFIER_STALE_SWEEP_ENABLED` was inheriting
   `TRIGGER_MOLLIFIER_ENABLED`, so any deployment already running the
   mollifier would auto-start the sweep worker on upgrade — defeating
   the point of having a separate kill switch. Hard-default to "0" so
   the sweep is an explicit ops decision.

2. In the cancel-bifurcation path, a non-conflict + non-retryable
   error from `createCancelledRun` rethrew out of the handler. The
   drainer's `onTerminalFailure` handler gates on
   `cause === "max-attempts-exhausted"` and skips "non-retryable", so
   `buffer.fail()` deleted the entry without ever writing a PG row —
   the cancelled run disappeared silently. Mirror the SYSTEM_FAILURE
   fallback the non-cancel trigger path already uses: write a terminal
   row via the shared helper, falling back to the original throw only
   when the write also fails non-retryably (and re-throw retryable
   write errors so the drainer requeues).

UX trade: the customer initiated a cancel but sees SYSTEM_FAILURE if
the cancel write itself was structurally rejected. Terminal-but-
mislabelled beats terminal silence — the alternative was the run
disappearing from the dashboard entirely. The path is narrow (only
non-conflict, non-retryable createCancelledRun failures).

Two new unit tests cover the cancel branch: non-retryable creates the
SYSTEM_FAILURE row + ACKs the entry; retryable rethrows so the
drainer requeues.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… slice

`MollifierStaleSweepState.readOrgListSlice` was logging-and-returning
`{ orgs: [], total: 0 }` on pipeline-abort or per-result Redis errors.
The caller (`runStaleSweepOnce`) treated that as a clean empty cycle:
wrote cursor=0, reconciled visited envs against the empty result, and
cleared the stale-entry gauge — silencing the alerts the sweep exists
to raise.

Re-throw the underlying error instead. The interval wrapper's catch
logs `stale_sweep.failed` for the failed tick; durable cursor + counts
hash stay untouched so the gauge keeps reporting its last-known value
until a healthy tick repopulates it.

New unit test on the caller contract pins this: error propagates,
cursor stays at its seeded value, counts hash retains the seeded env,
no snapshot is reported.

Separately documents a known limitation in the .server-changes entry:
the sweep runs per-webapp-instance, so its stale-entry counter
multiplies by N webapps in HA until a distributed lease lands as a
follow-up. The system is HA-safe (Redis ops are atomic, no torn
state); only the metric output is inflated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ route wiring

Synthesise QUEUED/FAILED responses from the mollifier buffer when a
TaskRun row hasn't landed in Postgres yet. Wires the synthesis into:
- ApiRetrieveRunPresenter
- v1 trace GET route
- v1 spans GET route
- attempts route gains a GET loader (fixes pre-existing Remix 400)

Stacked on the trigger-time decisions PR. The readFallback infra
itself lives on the trigger PR (consumed by IdempotencyKeyConcern);
this PR adds the route-level synthetic-rendering primitives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the ad-hoc \`as Record<string, unknown>\` + \`typeof ===
"string"\` checks in \`findBufferedRunRedirectInfo\` with a Zod
\`safeParse\` against a schema for the subset of fields the redirect
needs (envSlug / projectSlug / orgSlug / optional spanId). Wrong-typed
or missing fields now collapse into a single parse-fail branch that
logs the structured issue list and returns null.

Adds a regression test for the structural-vs-typeof distinction:
\`environment.slug: 42\` (number) is now rejected, where the previous
\`typeof slug === "string"\` chain would silently accept any string-
typed value but had no defence against shape drift in other fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l enum

\`SyntheticRun.machinePreset\` is a plain string sourced from the
mollifier snapshot, but \`SpanRun.machinePreset\` is the typed
\`MachinePresetName\` enum (micro / small-1x / small-2x / medium-1x /
medium-2x / large-1x / large-2x). The direct assignment failed
\`tsc --noEmit\` and CI typecheck. Validate via \`MachinePresetName.safeParse\`
and collapse unknown values to \`undefined\` so a stale preset
returned by the buffer doesn't bleed into the UI as a typed-but-unknown
value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five Devin findings on PR #3755:

- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer was assigning
  buffered.friendlyId to the FoundRun.id field. The id column on PG is
  the internal cuid; downstream log correlation reads taskRun.id and
  was getting the friendly token instead. Fixed to read the cuid that
  readFallback already derives via RunId.fromFriendlyId.

- api.v1.runs.$runId.trace.ts buffered branch hardcoded
  isPartial: true. Cancelled is a terminal state — the sibling spans
  route and syntheticTrace already gate this on !isCancelled. Match.

- synthesisePayload helper replaces the silent
  typeof === "string" coercion. Object-shaped payloads now JSON-
  stringify (matching how the trigger path would serialise them) with
  a warn log so format drift is visible. Truly unserialisable inputs
  fall back to "" with an error log instead of silently dropping.

- syntheticRedirectInfo now uses deserialiseMollifierSnapshot
  (the webapp-side wrapper) instead of deserialiseSnapshot from the
  redis-worker package directly. Both share the same implementation
  today, but pinning the wrapper means the two read-side modules can't
  drift if the snapshot encoding ever changes (e.g. msgpack).

- attempts route loader verifies the run belongs to the authenticated
  environment (PG-first, buffer fallback) before returning the
  parity-empty list. Other run-scoped endpoints (spans, trace,
  retrieve) 404 cross-env; matching that closes the
  exists-vs-doesn't-exist side channel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The synthetic SpanRun/trace builders for buffered runs hardcoded
non-terminal state, so a CANCELED or FAILED buffered run rendered as a
healthy in-progress run:

- syntheticSpanRun: FAILED now maps to SYSTEM_FAILURE (matching
  ApiRetrieveRunPresenter.bufferedStatusToTaskRunStatus); isFinished is
  true for CANCELED/FAILED; isError is true for FAILED; the error block
  is synthesised as STRING_ERROR and statusReason carries the message.
- syntheticSpanRun: drop the empty-string spanId/taskIdentifier
  relationship stubs (blank task name + misleading `?span=` jump) since
  the snapshot only carries friendly IDs.
- syntheticTrace: FAILED now renders as an errored, non-partial,
  "failed" root span instead of executing/partial. CANCELED stays
  "completed", matching RunPresenter's derivation.
- tests: cover the CANCELED and FAILED terminal paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…truction

Addresses the higher-confidence read-fallback review findings:

- attempts GET loader: rebuilt on createLoaderApiRoute so it matches the
  sibling read routes — accepts JWTs with run/task/tag/batch resource
  scoping (was bare authenticateApiRequest, rejecting PUBLIC_JWT and doing
  no scope check), and 404s with `x-should-retry: true` so SDK pollers keep
  retrying a not-yet-materialised run instead of giving up.
- batch reconstruction: the snapshot embeds the batch as `{ id, index }`
  (engine.trigger shape), but readFallback read a non-existent flat
  `batchId`, so SyntheticRun.batchId was always undefined. Read it from
  `snapshot.batch.id` (the internal cuid). synthesiseFoundRunFromBuffer now
  populates `batch` from it, and the spans/trace buffer-path authorization
  pushes the batch resource — so batch-scoped JWTs authorise against
  buffered runs and the retrieve response reports the correct batchId.
- metadata: coerce a non-string buffered metadata defensively (JSON
  stringify + warn) instead of silently dropping to null, mirroring
  synthesisePayload. In practice metadata is always a string, so this is a
  no-op guard, but it surfaces format drift to ops.
- tests: cover batchId extraction from the nested batch object and its
  absence for non-batched runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comments referenced internal phase labels ("Phase A2", "Phase A5") from
the development plan rather than describing what the code does. Replaced
with self-contained prose; the surrounding explanations were already
correct and are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…derivation

syntheticTrace.server.ts shipped without a test file; this adds one,
covering the identity-field passthrough, taskIdentifier-and-spanId
defaults, the three rootSpanStatus branches (QUEUED→executing,
CANCELED→completed, FAILED→failed) with their isPartial/isError/isCancelled
flags, the 1ms duration floor, rootStartedAt mapping, and the
single-span trace shape (empty events/timelineEvents, empty
linkedRunIdBySpanId, undefined overridesBySpanId/queuedDuration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…utes + align workerQueue default

Addresses three Devin review findings on PR #3755:

- api.v1.runs.\$runId.spans.\$spanId.ts and api.v1.runs.\$runId.trace.ts:
  the buffered-run response branch hardcoded isError:false and only
  checked CANCELED for isPartial, so a FAILED buffered run rendered as
  "still in progress" — SDK consumers would poll forever. Now derives
  both flags from CANCELED and FAILED, matching syntheticTrace.server.ts.
- ApiRetrieveRunPresenter.synthesiseFoundRunFromBuffer: workerQueue
  defaulted to "main" while syntheticSpanRun.server.ts uses "". The API
  response's `region` is sourced via `run.workerQueue || undefined`, so
  "main" was advertising a region the run hadn't yet been assigned to.
  Aligned to "" so unassigned buffered runs coerce to region: undefined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…drop scaffolding GET on attempts

Two related cleanups on the mollifier read surface:

1. Extract the buffered-run response bodies for the spans-detail and
   trace endpoints into pure helpers
   (apps/webapp/app/v3/mollifier/syntheticApiResponses.server.ts:
   buildSyntheticSpanDetailBody, buildSyntheticTraceBody). The route
   bodies were carrying the only copy of the terminal-state derivation
   (CANCELED / FAILED → isError / isPartial / isCancelled) with no unit
   coverage; extracting them lets us pin the contract directly. The
   route files now just authenticate, resolve, validate the spanId, and
   forward — no body shape logic in routes.

2. Drop the GET loader on api.v1.runs.\$runParam.attempts.ts. It was
   added in this PR solely to fix a pre-existing Remix "no loader" 400
   on a URL no SDK consumer was actually calling, and to give the
   mollifier-parity script a stable assertion target. The detailed
   attempt list lives on the v3 retrieve endpoint — the GET was
   scaffolding rather than product surface, and Devin's review flagged
   it as such. Reverted to action-only.

Tests: 16 cases in apps/webapp/test/mollifierSyntheticApiResponses.test.ts
covering QUEUED / CANCELED / FAILED for each body, plus identity and
default-field passthrough. Pins the FAILED-terminal-state regression that
shipped briefly with isPartial:true.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pin synthesise contract with regression tests

synthesiseFoundRunFromBuffer hardcoded `scheduleId: null`, which dropped
the schedule field from the retrieve-API response for any scheduled
trigger that landed in the mollifier buffer. Scheduled triggers go
through the same TriggerTaskService path as API triggers and the gate
doesn't bypass them, so the snapshot does carry scheduleId; the synthesis
was just throwing it away. Forward `buffered.scheduleId ?? null` so
resolveSchedule() can hydrate the schedule object from PG (the Schedule
row exists before the trigger fires).

Exported synthesiseFoundRunFromBuffer + the FoundRun type from the
presenter and added apps/webapp/test/mollifierSynthesiseFoundRun.test.ts
(16 cases). The test file pins the snapshot→FoundRun mapping that
previously had no direct coverage — the new scheduleId forwarding plus
earlier-session regressions (batch reconstruction, workerQueue default
"", FAILED→SYSTEM_FAILURE status mapping, STRING_ERROR shape, defensive
metadata coercion, idempotency defaults, execution-state zero defaults).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ns in events endpoint

`api.v1.runs.$runId.events.ts` calls `ApiRetrieveRunPresenter.findRun`,
which now returns a buffered fallback. The route then proceeds to
`getRunEvents` against ClickHouse with `run.traceId` (empty string for
buffered runs) and `run.taskEventStore` (`"taskEvent"` default) — a
guaranteed-empty round trip, since the mollifier gate intercepts BEFORE
any trace event is written. Devin's analysis on PR #3755 flagged this
as a wasted ClickHouse round trip per request hitting the events
endpoint with a buffered runId.

Add an `isBuffered: boolean` flag to `FoundRun`. The PG branch of
`findRun` sets `isBuffered: false` on the spread Prisma row; the
buffered branch's `synthesiseFoundRunFromBuffer` sets `isBuffered:
true`. The events route short-circuits on the flag and returns
`{ events: [] }` with status 200, matching the semantically-correct
behaviour without the ClickHouse hop.

Gating on the explicit flag (rather than probing surrogates like
`traceId === ""`) keeps the contract clear for future callers that
need the same "PG-only" gate — they can introspect the same field.

Includes a unit test in `mollifierSynthesiseFoundRun.test.ts` pinning
`isBuffered: true` on the synth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PG-resident SYSTEM_FAILURE rows always have completedAt populated by
the engine, so SDK consumers that stop polling when isCompleted=true
and then read finishedAt see a real timestamp. The buffer-synth path
was returning completedAt=null for FAILED (only the CANCELED branch
filled it from buffered.cancelledAt), producing isCompleted:true +
finishedAt:undefined — exactly the inconsistency the synth shape
exists to prevent.

Fall back to buffered.createdAt for the SYSTEM_FAILURE case (the
buffer entry has no separate failedAt; createdAt is the best-available
proxy for when the terminal state landed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symmetric fix to the just-applied `ApiRetrieveRunPresenter` change.
PG-resident SYSTEM_FAILURE rows always have `completedAt` populated;
the synthetic span run was returning null for FAILED, so the
run-detail panel and any caller using `isFinished && completedAt` saw
a finished run with no completion timestamp. Fall back to
`run.createdAt` when `isFailed` is true (the buffer entry has no
separate failedAt — createdAt is the best proxy for when the terminal
state landed).

Regression locked in `mollifierSyntheticSpanRun.test.ts` —
the FAILED test now asserts `completedAt === NOW`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 31 additional findings in Devin Review.

Open in Devin Review

Comment on lines 140 to 160
},
},
});

if (pgRow) return { ...pgRow, isBuffered: false };

// Postgres miss → fall back to the mollifier buffer. When the gate
// diverted a trigger, the run lives in Redis until the drainer replays
// it through engine.trigger. Synthesise the FoundRun shape so call()
// returns a `QUEUED` (or `FAILED`) response with empty output, no
// attempts, no relations.
const buffered = await findRunByIdWithMollifierFallback({
runId: friendlyId,
environmentId: env.id,
organizationId: env.organizationId,
});

if (!buffered) return null;

return synthesiseFoundRunFromBuffer(buffered);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Implicit behavioral change: findRun now returns buffered runs for all call sites

The PR modified ApiRetrieveRunPresenter.findRun to include a mollifier buffer fallback. This affects ALL callers of findRun, not just the routes modified in this PR. The api.v3.runs.$runId.ts:19 and api.v1.runs.$runParam.reschedule.ts:66 routes also call findRun but were not updated. For api.v3.runs.$runId.ts, the presenter.call() method handles the synthesised FoundRun correctly (PENDING status skips output processing, resolveSchedule does a PG lookup that may miss but returns undefined gracefully). For the reschedule route, the downstream service would receive a synthetic FoundRun and attempt operations on a run that doesn't exist in PG yet — worth confirming that the reschedule service handles this gracefully (e.g., fails with a clear error rather than corrupting state).

(Refers to lines 107-160)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 36 additional findings in Devin Review.

Open in Devin Review

Comment on lines +225 to +234
try {
await PerformTaskRunAlertsService.enqueue(failedRun.id);
} catch (alertsError) {
logger.warn("TriggerFailedTaskService: alert enqueue failed", {
taskId: request.taskId,
friendlyId: failedRun.friendlyId,
error:
alertsError instanceof Error ? alertsError.message : String(alertsError),
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 TriggerFailedTaskService now enqueues alerts where it previously didn't

Both call() (line 226) and callWithoutTraceEvents() (line 332) in TriggerFailedTaskService now call PerformTaskRunAlertsService.enqueue(failedRun.id) after createFailedTaskRun. Previously, failed task runs through this service never triggered customer alert channels — the runFailed event was never emitted, and no manual alert enqueue existed. This is an intentional feature addition (the comments explain the reasoning), but it changes the observable behavior: batch trigger failures that exceed queue limits, environment-not-found errors, etc. will now fire ERROR alert channels for customers who have them configured. If rollout needs to be gradual, this could be gated behind a feature flag.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

`.server-changes/mollifier-reads.md` mentioned an "attempts" endpoint
but no `api.v1.runs.$runParam.attempts.ts` changes exist in this PR
(it's a POST/action route for creating attempts, not a read
endpoint). The actual added endpoints are retrieve, trace, spans, and
events — fix the changelog to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants