Skip to content

Fix Materializer to handle duplicate records idempotently#3789

Closed
robacourt wants to merge 0 commit intomainfrom
rob/fix-materializer-race
Closed

Fix Materializer to handle duplicate records idempotently#3789
robacourt wants to merge 0 commit intomainfrom
rob/fix-materializer-race

Conversation

@robacourt
Copy link
Contributor

@robacourt robacourt commented Jan 28, 2026

Summary

Partial fix for #3787 - addresses the snapshot/replication race condition in the Materializer. The Dependency layers issue will be fixed in a separate PR.

The race condition

The Materializer subscribes to the Consumer before reading from storage:

# In handle_continue(:start_materializer, ...)
Consumer.subscribe_materializer(stack_id, shape_handle, self())  # <- Subscribes first
{:noreply, state, {:continue, {:read_stream, shape_storage}}}    # <- Then reads storage

During the window between subscribing and reading, any changes that arrive via Consumer.new_changes() will be delivered to the Materializer. If those changes include records that are also in the snapshot being read, the Materializer receives duplicates and crashes.

Production example

From the maxwell instance crash on 27 Jan 2026:

18:10:10.437 [error] GenServer Materializer "97489818-..." terminating
** (RuntimeError) Key "public"."offers"/"d3c8d8a5-5060-4a36-a67d-240de0c95a88" already exists

The transaction that triggered it:

%Electric.Replication.Changes.NewRecord{
  relation: {"public", "offers"},
  record: %{"id" => "d3c8d8a5-5060-4a36-a67d-240de0c95a88"},
  key: "\"public\".\"offers\"/\"d3c8d8a5-5060-4a36-a67d-240de0c95a88\"",
  move_tags: ["e12422d3af57a36d01a50b4645a517e4"]  # <- Move-in event
}

The record was already in the snapshot (matched via is_template = true OR the subquery), AND was delivered via replication with move_tags as a move-in event.

The fix

Make the Materializer idempotent - when a duplicate record arrives, skip the insert but still track move_tags so subsequent move_out events work correctly.

Test plan

  • Added tests verifying duplicate records are handled idempotently
  • Verified move_tags are tracked from duplicates so subsequent move_out works correctly

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Fixed handling of duplicate records appearing in both snapshot and replication streams to prevent errors and ensure idempotent processing.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

This PR adds changeset documentation and modifies the Materializer to handle duplicate records idempotently. When the same record surfaces in both snapshot and replication stream, the system now skips the insertion while preserving tag tracking for move-out operations, rather than raising an error.

Changes

Cohort / File(s) Summary
Changeset Documentation
.changeset/fix-materializer-duplicate-records.md
Declares patch-level fix for @core/sync-service addressing idempotent handling of duplicate records
Materializer Core Logic
packages/sync-service/lib/electric/shapes/consumer/materializer.ex
Adds conditional check in NewRecord handling: if key exists, skip insert but update tag indices; if key missing, execute original insert/tag/count logic
Materializer Tests
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
Adds test group "idempotent handling of duplicate records" with three test cases: duplicate from snapshot, duplicate with different move_tags, and move_out after duplicate

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • icehaunter

Poem

🐰 When records duplicate, oh what a sight,
No more the crash, no error to fight!
Check, skip, and track with tags held tight,
Idempotent hops make things just right,
Snapshot and stream now dance in flight! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: making the Materializer handle duplicate records idempotently, which is the core fix addressing the snapshot/replication race condition.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.34%. Comparing base (2f11f91) to head (746791f).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #3789       +/-   ##
===========================================
+ Coverage   75.75%   87.34%   +11.58%     
===========================================
  Files          11       23       +12     
  Lines         693     2039     +1346     
  Branches      172      542      +370     
===========================================
+ Hits          525     1781     +1256     
- Misses        167      256       +89     
- Partials        1        2        +1     
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 93.31% <ø> (?)
packages/y-electric 56.05% <ø> (ø)
typescript 87.34% <ø> (+11.58%) ⬆️
unit-tests 87.34% <ø> (+11.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blacksmith-sh

This comment has been minimized.

@robacourt
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (1)

303-320: Conflicting test: This test expects errors on duplicate inserts, but the new behavior skips them.

This existing test expects the Materializer to raise an error when inserting a duplicate key. However, the changes in materializer.ex (lines 316-326) now make duplicate handling idempotent by skipping the insert instead of raising an error.

This test appears to be testing behavior that no longer exists. It should either be:

  1. Removed if the idempotent behavior is the intended final state
  2. Updated to verify the new idempotent behavior (though the new tests at lines 912-1002 already cover this)
🤖 Fix all issues with AI agents
In `@packages/sync-service/lib/electric/shapes/consumer/materializer.ex`:
- Around line 316-326: The test "insert of a PK we've already seen raises"
conflicts with the idempotent duplicate handling in the materializer: when
duplicate PKs are encountered the code now skips the insert, updates tag_indices
via add_row_to_tag_indices/1 for move_out tracking, and returns
increment_value/2 only for new inserts; update or remove the test accordingly —
either remove it, or change it to assert that calling the materializer path (the
branch that uses cast!/2, Map.put, add_row_to_tag_indices and returns
counts_and_events) does not raise, that the primary key remains present in the
index, and that tag_indices was updated for move_out instead of expecting an
error.
🧹 Nitpick comments (1)
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (1)

942-970: Test verifies no events for duplicate, but doesn't verify tag tracking.

The test confirms that no events are emitted when a duplicate arrives with different move_tags. However, it doesn't verify that different_tag is actually added to the tag indices. Test 3 covers this scenario, so this is acceptable, but you could optionally strengthen this test by adding a move_out assertion for different_tag.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f11f91 and 746791f.

📒 Files selected for processing (3)
  • .changeset/fix-materializer-duplicate-records.md
  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
  • packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
🧰 Additional context used
📓 Path-based instructions (2)
packages/sync-service/lib/electric/shapes/**/*.{ex,exs}

📄 CodeRabbit inference engine (packages/sync-service/AGENTS.md)

Shapes should be defined and managed in lib/electric/shapes/* for lifecycle management (creation, handles, checkpoints)

Files:

  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
packages/sync-service/test/**/*.{exs,ex}

📄 CodeRabbit inference engine (packages/sync-service/AGENTS.md)

Prefer a single structural assert with pattern matching for returned structures in tests rather than multiple asserts on individual elements

Files:

  • packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
🧠 Learnings (6)
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/replication/**/*.{ex,exs} : Replication logic consuming WAL and producing change events should be implemented in `lib/electric/replication/*`

Applied to files:

  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
  • packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/shapes/**/*.{ex,exs} : Shapes should be defined and managed in `lib/electric/shapes/*` for lifecycle management (creation, handles, checkpoints)

Applied to files:

  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
  • packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/shape_cache/**/*.{ex,exs} : Shape cache persistence and snapshot/changelog serving logic should be in `lib/electric/shape_cache/*`

Applied to files:

  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
  • packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/{lib/electric/postgres/**/*.{ex,exs},lib/pg_interop/**/*.{ex,exs}} : Postgres integration including connections, identifiers, and type handling should be in `lib/electric/postgres/*` and `lib/pg_interop/*`

Applied to files:

  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/lib/electric/application.ex : Changes to `Electric.Application` APIs (`api/1`, `api_plug_opts/1`, `configuration/1`) can break downstream consumers including elixir-client and Phoenix.Sync

Applied to files:

  • packages/sync-service/lib/electric/shapes/consumer/materializer.ex
📚 Learning: 2026-01-22T19:39:13.351Z
Learnt from: CR
Repo: electric-sql/electric PR: 0
File: packages/sync-service/AGENTS.md:0-0
Timestamp: 2026-01-22T19:39:13.351Z
Learning: Applies to packages/sync-service/test/**/*.{exs,ex} : Prefer a single structural assert with pattern matching for returned structures in tests rather than multiple asserts on individual elements

Applied to files:

  • packages/sync-service/test/electric/shapes/consumer/materializer_test.exs
🧬 Code graph analysis (1)
packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (1)
packages/sync-service/lib/electric/shapes/consumer/materializer.ex (2)
  • get_link_values (58-63)
  • new_changes (50-52)
🔇 Additional comments (3)
.changeset/fix-materializer-duplicate-records.md (1)

1-5: LGTM!

The changeset accurately describes the fix and correctly marks it as a patch-level change.

packages/sync-service/test/electric/shapes/consumer/materializer_test.exs (2)

912-940: Good test coverage for the idempotent handling.

The test correctly verifies that:

  1. The value remains unchanged when a duplicate arrives
  2. No spurious move_in event is emitted

972-1002: Well-structured test for the critical tag-tracking behavior.

This test validates the key invariant: even when a duplicate record is skipped, its move_tags are still tracked, allowing subsequent move_out events to correctly remove the record.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@icehaunter icehaunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crashing versions of those functions were used explicitly because "not seeing an out of order row" is an assumed invariant
One case which led to the same crash was another bug which I've addressed.

When another "newRecord" arrives with some move tags, you don't know whether it's from snapshot, or actual move in or what, and if they are out of order/duplicated, you don't know which move tag set is valid, I don't think merging it is correct.

I would much rather keep the crash as an invariant insurance and fix only the underlying bug

robacourt added a commit that referenced this pull request Jan 28, 2026
)

## Summary

Fixes #3788 — when a dependency shape is removed (due to crash/cleanup)
before its dependent shape finishes registering,
`DependencyLayers.add_after_dependencies/3` crashed with
`FunctionClauseError`, taking down the `ShapeLogCollector` and cascading
into OOM failures.

This is the companion fix to #3789, which addressed the Materializer
duplicate-key crash that triggered the dependency removal in the first
place.

### The race condition

1. Dependency shape A is added to DependencyLayers
2. Dependent shape B starts its Consumer, checks
`all_materializers_alive?` → passes
3. Shape A's Materializer crashes (e.g. duplicate key, fixed by #3789)
4. Shape A's Consumer terminates and calls `remove_shape`
5. Shape B's Consumer calls `add_shape`
6. `RequestBatcher` batches both operations together
7. `ShapeLogCollector` processes the batch — **removals first, then
additions**
8. Shape A is removed from layers → layers becomes `[[]]`
9. Shape B tries to add with dependency on A → **CRASH**

### ECS logs from production (27 Jan 2026)

```
18:10:10.437 [error] GenServer Materializer "97489818-..." terminating
** (RuntimeError) Key "public"."offers"/"d3c8d8a5-..." already exists

18:10:10.449 [error] GenServer ShapeLogCollector terminating
** (FunctionClauseError) no function clause matching in Electric.Shapes.DependencyLayers.add_after_dependencies/3
    (electric) lib/electric/shapes/dependency_layers.ex
    args: [[], "30042127-...", MapSet.new(["97489818-..."])]
```

The 12ms gap between crashes shows the sequence: Materializer crash →
Consumer termination → `remove_shape` batched with `add_shape` → removal
processed first → crash on addition.

### Changes

- **`DependencyLayers.add_dependency/3`** now returns `{:ok, layers}` or
`{:error, {:missing_dependencies, missing}}` instead of crashing
- **`add_after_dependencies/3`** gets a new clause for exhausted layers
with unfound dependencies
- **`ShapeLogCollector`** handles the error gracefully — logs a warning
and fails the shape registration instead of crashing
- **`handle_continue(:restore_shapes)`** pattern-matches `{:ok, layers}`
(crashes on error, which is correct since restored shapes should always
have valid dependencies)

## Test plan

- [x] Updated existing `DependencyLayersTest` tests for new return type
- [x] Added error handling tests: missing dependency, removed
dependency, partially missing dependencies, ok-tuple for no-dependency
shapes
- [x] All 8 dependency layers tests pass
- [x] All 24 shape log collector tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Bug Fixes**
* Fixed a race condition that caused crashes when shape dependencies
were unavailable during registration or restoration
* Improved error handling for missing dependencies with graceful
recovery instead of system failures

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
@robacourt robacourt closed this Jan 28, 2026
@robacourt robacourt force-pushed the rob/fix-materializer-race branch from 746791f to 2f11f91 Compare January 28, 2026 16:58
@netlify
Copy link

netlify bot commented Jan 28, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 2f11f91
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/697a404913cf1100080b0f97
😎 Deploy Preview https://deploy-preview-3789--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants