Skip to content

Replication stream multi-cursor#10147

Draft
robholland wants to merge 6 commits intotemporalio:mainfrom
robholland:rh-stream-readergroup
Draft

Replication stream multi-cursor#10147
robholland wants to merge 6 commits intotemporalio:mainfrom
robholland:rh-stream-readergroup

Conversation

@robholland
Copy link
Copy Markdown
Contributor

@robholland robholland commented May 1, 2026

What changed?

  • Added ReplicationReaderGroup, a type that encapsulates the 3-scope QueueReaderState watermark arithmetic (catchup begin, reader-state construction, failover watermark) that was previously duplicated across recvSyncReplicationState and getSendCatchupBeginInclusiveWatermark. Gated by EnableReplicationReaderGroup.
  • Added namespaceIsolationManager, which coordinates per-namespace dedicated LOW readers on the sender. When the receiver signals a namespace is throttled, a dedicated goroutine is created for it and the default LOW reader skips it. When the namespace is un-throttled the dedicated reader drains to the recorded cursor before removing itself. Gated by
    EnableReplicationNamespaceIsolation.
  • Added NamespaceThrottler interface and NoopNamespaceThrottler default for the receiver side to report which namespaces exceeded the LOW-lane threshold.
  • Added throttled_namespace_ids to SyncReplicationState proto so the receiver can communicate the throttled namespace set to the sender on each ack.

Why?

A large backfill can flood the LOW-priority replication lane and stall catchup for every other namespace behind it. The isolation manager gives those heavily loaded namespaces their own reader so the default LOW reader is unblocked. The ReplicationReaderGroup is a prerequisite refactor, the scope-index arithmetic needs to live in one place before we can extend it to support per-namespace cursors cleanly. Both features are off by default and the stream restarts itself if either flag changes at runtime. The namespace throttler allows per-namespace adjustments of low priority limits.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

Any change is risky. Identify all risks you are aware of. If none, remove this section.

@robholland robholland changed the title Rh stream readergroup Replication stream multi-cursor May 1, 2026
h, ok := m.readers[namespaceID]
m.mu.RUnlock()
if !ok {
return 0, false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 if not found?

// namespace IDs to create dedicated readers, keeping the default LOW reader from stalling.
type NamespaceThrottler interface {
// RecordTask records an incoming LOW-priority task for the given namespace.
RecordTask(namespaceID string)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this work without pass in the task id?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually care about the task per se, just that we've had a task for the given namespace. Is there better naming here maybe to make that clearer? The standard for throttling interfaces like this has been Allow but we aren't returning yes/no here as the throttling system is after-the-fact in this instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants