Fix Materializer to handle duplicate records idempotently#3789
Fix Materializer to handle duplicate records idempotently#3789
Conversation
📝 WalkthroughWalkthroughThis PR adds changeset documentation and modifies the Materializer to handle duplicate records idempotently. When the same record surfaces in both snapshot and replication stream, the system now skips the insertion while preserving tag tracking for move-out operations, rather than raising an error. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3789 +/- ##
===========================================
+ Coverage 75.75% 87.34% +11.58%
===========================================
Files 11 23 +12
Lines 693 2039 +1346
Branches 172 542 +370
===========================================
+ Hits 525 1781 +1256
- Misses 167 256 +89
- Partials 1 2 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment has been minimized.
This comment has been minimized.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (1)
303-320: Conflicting test: This test expects errors on duplicate inserts, but the new behavior skips them.This existing test expects the Materializer to raise an error when inserting a duplicate key. However, the changes in
materializer.ex(lines 316-326) now make duplicate handling idempotent by skipping the insert instead of raising an error.This test appears to be testing behavior that no longer exists. It should either be:
- Removed if the idempotent behavior is the intended final state
- Updated to verify the new idempotent behavior (though the new tests at lines 912-1002 already cover this)
🤖 Fix all issues with AI agents
In `@packages/sync-service/lib/electric/shapes/consumer/materializer.ex`:
- Around line 316-326: The test "insert of a PK we've already seen raises"
conflicts with the idempotent duplicate handling in the materializer: when
duplicate PKs are encountered the code now skips the insert, updates tag_indices
via add_row_to_tag_indices/1 for move_out tracking, and returns
increment_value/2 only for new inserts; update or remove the test accordingly —
either remove it, or change it to assert that calling the materializer path (the
branch that uses cast!/2, Map.put, add_row_to_tag_indices and returns
counts_and_events) does not raise, that the primary key remains present in the
index, and that tag_indices was updated for move_out instead of expecting an
error.
🧹 Nitpick comments (1)
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (1)
942-970: Test verifies no events for duplicate, but doesn't verify tag tracking.The test confirms that no events are emitted when a duplicate arrives with different
move_tags. However, it doesn't verify thatdifferent_tagis actually added to the tag indices. Test 3 covers this scenario, so this is acceptable, but you could optionally strengthen this test by adding amove_outassertion fordifferent_tag.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.changeset/fix-materializer-duplicate-records.mdpackages/sync-service/lib/electric/shapes/consumer/materializer.expackages/sync-service/test/electric/shapes/consumer/materializer_test.exs
🧰 Additional context used
📓 Path-based instructions (2)
packages/sync-service/lib/electric/shapes/**/*.{ex,exs}
📄 CodeRabbit inference engine (packages/sync-service/AGENTS.md)
Shapes should be defined and managed in
lib/electric/shapes/*for lifecycle management (creation, handles, checkpoints)
Files:
packages/sync-service/lib/electric/shapes/consumer/materializer.ex
packages/sync-service/test/**/*.{exs,ex}
📄 CodeRabbit inference engine (packages/sync-service/AGENTS.md)
Prefer a single structural assert with pattern matching for returned structures in tests rather than multiple asserts on individual elements
Files:
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
🧠 Learnings (6)
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/replication/**/*.{ex,exs} : Replication logic consuming WAL and producing change events should be implemented in `lib/electric/replication/*`
Applied to files:
packages/sync-service/lib/electric/shapes/consumer/materializer.expackages/sync-service/test/electric/shapes/consumer/materializer_test.exs
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/shapes/**/*.{ex,exs} : Shapes should be defined and managed in `lib/electric/shapes/*` for lifecycle management (creation, handles, checkpoints)
Applied to files:
packages/sync-service/lib/electric/shapes/consumer/materializer.expackages/sync-service/test/electric/shapes/consumer/materializer_test.exs
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/shape_cache/**/*.{ex,exs} : Shape cache persistence and snapshot/changelog serving logic should be in `lib/electric/shape_cache/*`
Applied to files:
packages/sync-service/lib/electric/shapes/consumer/materializer.expackages/sync-service/test/electric/shapes/consumer/materializer_test.exs
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/{lib/electric/postgres/**/*.{ex,exs},lib/pg_interop/**/*.{ex,exs}} : Postgres integration including connections, identifiers, and type handling should be in `lib/electric/postgres/*` and `lib/pg_interop/*`
Applied to files:
packages/sync-service/lib/electric/shapes/consumer/materializer.ex
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/application.ex : Changes to `Electric.Application` APIs (`api/1`, `api_plug_opts/1`, `configuration/1`) can break downstream consumers including elixir-client and Phoenix.Sync
Applied to files:
packages/sync-service/lib/electric/shapes/consumer/materializer.ex
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/test/**/*.{exs,ex} : Prefer a single structural assert with pattern matching for returned structures in tests rather than multiple asserts on individual elements
Applied to files:
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
🧬 Code graph analysis (1)
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (1)
packages/sync-service/lib/electric/shapes/consumer/materializer.ex (2)
get_link_values(58-63)new_changes(50-52)
🔇 Additional comments (3)
.changeset/fix-materializer-duplicate-records.md (1)
1-5: LGTM!The changeset accurately describes the fix and correctly marks it as a patch-level change.
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (2)
912-940: Good test coverage for the idempotent handling.The test correctly verifies that:
- The value remains unchanged when a duplicate arrives
- No spurious
move_inevent is emitted
972-1002: Well-structured test for the critical tag-tracking behavior.This test validates the key invariant: even when a duplicate record is skipped, its
move_tagsare still tracked, allowing subsequentmove_outevents to correctly remove the record.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
packages/sync-service/lib/electric/shapes/consumer/materializer.ex
Outdated
Show resolved
Hide resolved
icehaunter
left a comment
There was a problem hiding this comment.
Crashing versions of those functions were used explicitly because "not seeing an out of order row" is an assumed invariant
One case which led to the same crash was another bug which I've addressed.
When another "newRecord" arrives with some move tags, you don't know whether it's from snapshot, or actual move in or what, and if they are out of order/duplicated, you don't know which move tag set is valid, I don't think merging it is correct.
I would much rather keep the crash as an invariant insurance and fix only the underlying bug
) ## Summary Fixes #3788 — when a dependency shape is removed (due to crash/cleanup) before its dependent shape finishes registering, `DependencyLayers.add_after_dependencies/3` crashed with `FunctionClauseError`, taking down the `ShapeLogCollector` and cascading into OOM failures. This is the companion fix to #3789, which addressed the Materializer duplicate-key crash that triggered the dependency removal in the first place. ### The race condition 1. Dependency shape A is added to DependencyLayers 2. Dependent shape B starts its Consumer, checks `all_materializers_alive?` → passes 3. Shape A's Materializer crashes (e.g. duplicate key, fixed by #3789) 4. Shape A's Consumer terminates and calls `remove_shape` 5. Shape B's Consumer calls `add_shape` 6. `RequestBatcher` batches both operations together 7. `ShapeLogCollector` processes the batch — **removals first, then additions** 8. Shape A is removed from layers → layers becomes `[[]]` 9. Shape B tries to add with dependency on A → **CRASH** ### ECS logs from production (27 Jan 2026) ``` 18:10:10.437 [error] GenServer Materializer "97489818-..." terminating ** (RuntimeError) Key "public"."offers"/"d3c8d8a5-..." already exists 18:10:10.449 [error] GenServer ShapeLogCollector terminating ** (FunctionClauseError) no function clause matching in Electric.Shapes.DependencyLayers.add_after_dependencies/3 (electric) lib/electric/shapes/dependency_layers.ex args: [[], "30042127-...", MapSet.new(["97489818-..."])] ``` The 12ms gap between crashes shows the sequence: Materializer crash → Consumer termination → `remove_shape` batched with `add_shape` → removal processed first → crash on addition. ### Changes - **`DependencyLayers.add_dependency/3`** now returns `{:ok, layers}` or `{:error, {:missing_dependencies, missing}}` instead of crashing - **`add_after_dependencies/3`** gets a new clause for exhausted layers with unfound dependencies - **`ShapeLogCollector`** handles the error gracefully — logs a warning and fails the shape registration instead of crashing - **`handle_continue(:restore_shapes)`** pattern-matches `{:ok, layers}` (crashes on error, which is correct since restored shapes should always have valid dependencies) ## Test plan - [x] Updated existing `DependencyLayersTest` tests for new return type - [x] Added error handling tests: missing dependency, removed dependency, partially missing dependencies, ok-tuple for no-dependency shapes - [x] All 8 dependency layers tests pass - [x] All 24 shape log collector tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Fixed a race condition that caused crashes when shape dependencies were unavailable during registration or restoration * Improved error handling for missing dependencies with graceful recovery instead of system failures <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
746791f to
2f11f91
Compare
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary
Partial fix for #3787 - addresses the snapshot/replication race condition in the Materializer. The Dependency layers issue will be fixed in a separate PR.
The race condition
The Materializer subscribes to the Consumer before reading from storage:
During the window between subscribing and reading, any changes that arrive via
Consumer.new_changes()will be delivered to the Materializer. If those changes include records that are also in the snapshot being read, the Materializer receives duplicates and crashes.Production example
From the maxwell instance crash on 27 Jan 2026:
The transaction that triggered it:
The record was already in the snapshot (matched via
is_template = trueOR the subquery), AND was delivered via replication withmove_tagsas a move-in event.The fix
Make the Materializer idempotent - when a duplicate record arrives, skip the insert but still track move_tags so subsequent move_out events work correctly.
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.