fix(world-postgres): serialize graphile-worker bootstrap with advisory lock#1966
fix(world-postgres): serialize graphile-worker bootstrap with advisory lock#1966TooTallNate wants to merge 2 commits into
Conversation
…y lock Prevents "duplicate key value violates unique constraint pg_namespace_nspname_index" errors when multiple processes (e.g. dev server + test runner) call world.start() concurrently against a fresh database. PostgreSQL's CREATE SCHEMA IF NOT EXISTS is not race-safe across concurrent sessions, and graphile-worker's installSchema does not lock its bootstrap path.
🦋 Changeset detectedLatest commit: f8f2980 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
workflow with 1 step💻 Local Development
▲ Production (Vercel)
workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a flaky race during @workflow/world-postgres startup where concurrent processes can both attempt graphile-worker schema installation on a fresh database, triggering duplicate key value violates unique constraint "pg_namespace_nspname_index".
Changes:
- Serialize graphile-worker bootstrap (
makeWorkerUtils+migrate) using a Postgres advisory lock inpackages/world-postgres/src/queue.ts. - Update unit test pool mocks to include
pool.connect()to match the new startup path. - Add a changeset documenting the patch release.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/world-postgres/src/queue.ts | Adds transaction-scoped advisory lock wrapper around graphile-worker bootstrap to avoid concurrent schema install races. |
| packages/world-postgres/src/queue.test.ts | Updates pool mock to include connect() for the new locking code path. |
| packages/world-postgres/src/reenqueue.test.ts | Updates pool mock to include connect() for the new locking code path. |
| .changeset/fix-world-postgres-graphile-bootstrap-race.md | Declares a patch release and describes the race/lock fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Release the locked connection before makeWorkerUtils() so the bootstrap can't deadlock against the same pool when maxPoolSize is small (Copilot review #1). Achieved by replicating graphile-worker's installSchema DDL ourselves under the lock; once the schema exists graphile-worker's own installSchema is a no-op. - Add a unit test that asserts the lock-ordering: pool.connect -> BEGIN -> pg_advisory_xact_lock -> CREATE SCHEMA -> COMMIT -> release -> makeWorkerUtils (Copilot review #2).
Summary
Fixes the
E2E Local Postgres Testsflake onmain(example failing job) where the first test (addTenWorkflow) fails with:Root cause
PR #1959 removed
instrumentation.tsfrom the workbenches, which previously pre-warmedworld.start()in the Next.js dev server before any HTTP traffic arrived. Without that pre-warm,world.start()is now called lazily on the first workflow request — and in the local-postgres CI matrix, the vitest test process itself also initializes a postgres world (becauseWORKFLOW_TARGET_WORLD=@workflow/world-postgresis set in the test env). So two independent Node processes (the test runner and the Next.js dev server) race to install thegraphile_workerschema on a fresh database.PostgreSQL's
CREATE SCHEMA IF NOT EXISTSis not race-safe across concurrent sessions — the existence check happens at one MVCC snapshot but thepg_namespaceinsert happens at commit, so two sessions can both pass the check and one then fails withduplicate key value violates unique constraint "pg_namespace_nspname_index". graphile-worker'sinstallSchemadoes no locking around this.Fix
Wrap the
makeWorkerUtils({pgPool}).migrate()call inpackages/world-postgres/src/queue.tswith a Postgres advisory lock (pg_advisory_xact_lock) so that concurrent callers across processes are serialized on graphile-worker schema bootstrap. The lock is transaction-scoped, so it is automatically released if the bootstrapping process dies mid-flight.Verification
pnpm --filter @workflow/world-postgres typecheck✓pnpm --filter @workflow/world-postgres build✓pnpm --filter @workflow/world-postgres test✓ (109/109, including the real-Postgres integration tests intest/spec.test.tsthat exercise the fullworld.start()path)