[WIP]: fix by asddongmen · Pull Request #4076 · pingcap/ticdc

asddongmen · 2026-01-28T06:59:36Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Chores
- More efficient resolved-timestamp computation and caching of minimum timestamps for unlocked ranges.
- Added optional per-table testing/instrumentation to simulate locking behavior and gather periodic stats.
Tests
- Updated tests for the new per-table behavior and added benchmarks to compare timestamp/heap approaches across region mixes.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: dongmen <414110582@qq.com>

asddongmen · 2026-01-28T06:59:46Z

/test all

gemini-code-assist · 2026-01-28T06:59:59Z

Summary of Changes

Hello @asddongmen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the performance of the RangeLock mechanism within the logpuller service by optimizing how the minimum resolved timestamp is calculated. The changes introduce a caching strategy for unlocked ranges and utilize a min-heap for locked ranges, drastically reducing the computational overhead associated with frequent queries for the global minimum resolved timestamp. These improvements are critical for maintaining the efficiency and responsiveness of the log service.

Highlights

Performance Optimization for Resolved Timestamp Calculation: The ResolvedTs method within the RangeLock component has been significantly optimized to improve efficiency in determining the minimum resolved timestamp across all ranges.
Caching of Unlocked Range Minimum Timestamp: A new field, unlockedRangesMinTs, was introduced to cache the minimum resolved timestamp of unlocked ranges, thereby eliminating the need for repeated, potentially expensive, recalculations.
Efficient Locked Range Minimum Timestamp Retrieval: The ResolvedTs method now leverages lockedRangeStateHeap.PeekTop() for O(1) retrieval of the minimum resolved timestamp from locked ranges, replacing a less efficient iteration over a btree.
Refactored ResolvedTs Update Logic: The logic in region_event_handler.go for updating the RangeLock's state and retrieving the resolved timestamp has been adjusted to utilize the newly optimized RangeLock.ResolvedTs() method.
New Performance Benchmarks: Dedicated benchmark tests (BenchmarkGetHeapMinTs and BenchmarkGetHeapMinTsVsResolvedTs) have been added to validate and compare the performance improvements of the new resolved timestamp calculation methods.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to optimize resolved timestamp calculation in the log puller using caching and a min-heap. While logservice/logpuller/regionlock/region_range_lock.go changes are sound, a critical high-severity logic error exists in logservice/logpuller/region_event_handler.go. The min-heap update logic was incorrectly placed within a throttled block, leading to potential heap inconsistency. This can cause the system to report an incorrect resolved timestamp, potentially resulting in silent data loss in CDC systems like TiCDC. It is crucial to ensure the heap is updated consistently whenever a region's resolved timestamp changes to maintain data integrity.

logservice/logpuller/region_event_handler.go

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot · 2026-01-28T08:10:24Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hongyunyan for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-01-28T08:10:48Z

📝 Walkthrough

Walkthrough

Adds a cached minimum resolved timestamp for unlocked ranges, introduces a tableID parameter to RangeLock, implements a hack/testing mode with a lock cap and background stats loop, updates call sites and tests to pass tableID, and adds benchmarks and helpers; no core control-flow changes outside range lock internals.

Changes

Cohort / File(s)	Summary
Range lock implementation `logservice/logpuller/regionlock/region_range_lock.go`	Added `unlockedRangesMinTs` and `hackLimitLock`; `NewRangeLock` gains `tableID` param and initializes `unlockedRangesMinTs`; `ResolvedTs()` now combines heap top (`PeekTop`) and cached unlocked min; added hack-mode logic, `logStatsLoop`, and minor unlock/lock-path adjustments.
Range lock helpers `logservice/logpuller/regionlock/range_ts_map.go`	Added `hackMode` flag, `countRanges()`, `hackSplitInto(n, ts)` and `isHackMode()` helpers to support testing/hack behavior and key-splitting.
Tests & benchmarks `logservice/logpuller/regionlock/region_range_lock_test.go`	Updated constructor calls to include new param; added `setupRangeLockForBenchmark`, `BenchmarkGetHeapMinTs`, `BenchmarkGetHeapMinTsVsResolvedTs`; tests refresh heap via `UpdateLockedRangeStateHeap(...)`.
Call sites / event handling `logservice/logpuller/region_event_handler.go`, `logservice/logpuller/region_event_handler_test.go`, `logservice/logpuller/subscription_client.go`	Updated `NewRangeLock` call sites to pass `tableID`; `handleResolvedTs()` includes new comments about potential large-region performance impacts and uses `span.rangeLock.ResolvedTs()`.
Manifest / module `manifest_file`, `go.mod`	Supporting manifest and module updates included in the diff.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I peek the heap and tuck a min away,

A table tag for hops, a tiny play,
I split the ground for tests and count the day,
Stats hum soft while unlocked whispers stay.

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete; the template is only filled with placeholders (Issue Number: close `#xxx`, empty 'What is changed' section, unanswered questions, and no release notes).	Fill in the actual issue number, describe the changes to RangeLock and range_ts_map, answer performance/compatibility questions, and provide a release note.
Docstring Coverage	⚠️ Warning	Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title '[WIP]: fix' is too vague and generic; it provides no meaningful information about the actual changes (range lock optimization, unlockedRangesMinTs caching, hack mode for table 930).	Replace with a descriptive title summarizing the main change, e.g., 'Add caching for unlocked range timestamps in RangeLock' or similar.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: dongmen <414110582@qq.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@logservice/logpuller/regionlock/region_range_lock.go`:
- Around line 491-496: The created buffered channel (ch) returned when
l.hackLimitLock is reached is never signaled/closed, causing callers of
LockRange's WaitFn to block; modify the hack path in region_range_lock.go (the
block that checks l.hackLimitLock and lockedRanges) to return a channel that is
closed (or otherwise signaled) so waiters unblock—e.g., create ch := make(chan
interface{},1); close(ch) before returning the []<-chan interface{} so
LockRangeResult{Status: LockRangeStatusWait} consumers and the WaitFn (the wait
logic used by LockRange) do not leak goroutines.
- Around line 138-140: Remove the WIP debug code: delete the hackLimitLock field
from the struct and remove all code paths that reference it (the hardcoded table
ID 930 checks, the logic that force-sets the limit to 2/30, and the goroutine
that logs every 30s). If this behavior must be preserved for testing, replace it
with a proper, documented configuration flag (e.g., DebugLimitLock or
LimitLockConfig) passed into RegionRangeLock via constructor/config instead of
hardcoded values; otherwise just eliminate the field and associated
branches/goroutine so no magic IDs or periodic test logging remain.

🧹 Nitpick comments (1)

logservice/logpuller/regionlock/region_range_lock.go (1)
307-320: Good optimization, but inconsistent with GetHeapMinTs().

The ResolvedTs() method now uses the cached unlockedRangesMinTs for O(1) lookup instead of scanning all unlocked ranges. However, GetHeapMinTs() (lines 470-472) still calls l.unlockedRanges.getMinTs() directly:
// GetHeapMinTs (line 470)
unlockedMinTs := l.unlockedRanges.getMinTs()
Consider updating GetHeapMinTs() to also use the cached value for consistency, or document why the difference is intentional.

coderabbitai · 2026-01-28T08:51:02Z

logservice/logpuller/regionlock/region_range_lock.go

+
+	// HACK: for testing table 930, limit locked ranges to 30
+	hackLimitLock int


⚠️ Potential issue | 🟠 Major

Remove debug/hack code before merging.

The hackLimitLock field and associated logic (lines 159-164, 491-496, 169-192) are clearly WIP/debug code:

Hardcoded magic table ID 930

Comment says "limit locked ranges to 30" but code sets it to 2

Starts a goroutine that logs every 30 seconds

This should either be removed or converted to a proper testing/debugging feature with configuration flags rather than hardcoded values.

🤖 Prompt for AI Agents

In `@logservice/logpuller/regionlock/region_range_lock.go` around lines 138 - 140, Remove the WIP debug code: delete the hackLimitLock field from the struct and remove all code paths that reference it (the hardcoded table ID 930 checks, the logic that force-sets the limit to 2/30, and the goroutine that logs every 30s). If this behavior must be preserved for testing, replace it with a proper, documented configuration flag (e.g., DebugLimitLock or LimitLockConfig) passed into RegionRangeLock via constructor/config instead of hardcoded values; otherwise just eliminate the field and associated branches/goroutine so no magic IDs or periodic test logging remain.

coderabbitai · 2026-01-28T08:51:02Z

logservice/logpuller/regionlock/region_range_lock.go

+	// HACK: for testing table 930, limit locked ranges to hackLimitLock
+	if l.hackLimitLock > 0 && l.lockedRanges.Len() >= l.hackLimitLock {
+		ch := make(chan interface{}, 1)
+		// Return wait status so it keeps retrying but never succeeds
+		return LockRangeResult{Status: LockRangeStatusWait}, []<-chan interface{}{ch}
+	}


⚠️ Potential issue | 🔴 Critical

Goroutine leak: channel is never signaled.

This hack creates a channel that is never signaled or closed. When callers invoke WaitFn() (see lines 205-221 in LockRange), they will block forever waiting on this channel unless the context is cancelled.

ch := make(chan interface{}, 1) // ch is never written to or closed return LockRangeResult{Status: LockRangeStatusWait}, []<-chan interface{}{ch}

This causes goroutine leaks for any lock attempt when hackLimitLock is reached. Even for debugging purposes, this should at least close the channel or signal it periodically.

🤖 Prompt for AI Agents

In `@logservice/logpuller/regionlock/region_range_lock.go` around lines 491 - 496, The created buffered channel (ch) returned when l.hackLimitLock is reached is never signaled/closed, causing callers of LockRange's WaitFn to block; modify the hack path in region_range_lock.go (the block that checks l.hackLimitLock and lockedRanges) to return a channel that is closed (or otherwise signaled) so waiters unblock—e.g., create ch := make(chan interface{},1); close(ch) before returning the []<-chan interface{} so LockRangeResult{Status: LockRangeStatusWait} consumers and the WaitFn (the wait logic used by LockRange) do not leak goroutines.

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot · 2026-01-28T10:44:48Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

logservice/logpuller/regionlock/range_ts_map.go (1)
27-34: Preserve hackMode in clones

rangeTsMap.clone doesn’t copy the new hackMode flag, so a clone created after hackSplitInto will unexpectedly allow set/unset to mutate entries. Consider copying the flag to avoid surprising behavior in hack mode.
🧩 Proposed fix
func (m *rangeTsMap) clone() (res *rangeTsMap) {
	res = &rangeTsMap{
		m:     btree.NewG(16, rangeTsEntryLess),
		start: m.start,
		end:   m.end,
+		hackMode: m.hackMode,
	}
	m.m.Ascend(func(i rangeTsEntry) bool {
		res.m.ReplaceOrInsert(i)
		return true
	})
	return
}

🤖 Fix all issues with AI agents

In `@logservice/logpuller/regionlock/region_range_lock.go`:
- Around line 128-130: The hackSplitInto(...) call mutates the unlockedRanges
map but does not update the unlockedRangesMinTs cache; add an explicit cache
update after every hackSplitInto invocation to recompute/unify
unlockedRangesMinTs (same approach used in set() and unset()) so the cached
minimum resolved ts stays consistent with unlockedRanges; apply this fix in the
three places where hackSplitInto is called (the block around the first call, and
the other two blocks referenced) ensuring you hold mu while updating
unlockedRangesMinTs.

🧹 Nitpick comments (1)

logservice/logpuller/regionlock/range_ts_map.go (1)
234-250: Guard hackSplitInto against non-positive n

If n <= 0, the map is cleared and left empty, which can make later reads return MaxUint64. A small guard prevents accidental misuse.
🛡️ Suggested guard
func (m *rangeTsMap) hackSplitInto(n int, ts uint64) {
+	if n <= 0 {
+		log.Panic("hackSplitInto requires n > 0")
+	}
	// Clear existing entries
	m.m.Clear(false)

coderabbitai · 2026-01-28T10:51:14Z

logservice/logpuller/regionlock/region_range_lock.go

+	// unlockedRangesMinTs caches the minimum resolvedTs among unlocked ranges.
+	// It is protected by `mu`.
+	unlockedRangesMinTs uint64


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all occurrences of unlockedRanges with context to understand mutation patterns rg -n --type=go -B2 -A2 'unlockedRanges' logservice/logpuller/regionlock/region_range_lock.go

Repository: pingcap/ticdc

Length of output: 3417

🏁 Script executed:

#!/bin/bash # Also check if unlockedRanges is used in other files in the same package or elsewhere fd -e go . logservice/logpuller/regionlock/ | xargs rg -l 'unlockedRanges' 2>/dev/null

Repository: pingcap/ticdc

Length of output: 109

🏁 Script executed:

#!/bin/bash # Get the full context of the file to understand the structure wc -l logservice/logpuller/regionlock/region_range_lock.go

Repository: pingcap/ticdc

Length of output: 113

🏁 Script executed:

#!/bin/bash # Find hackSplitInto implementation to understand if it changes min values rg -n --type=go -B3 -A10 'func.*hackSplitInto'

Repository: pingcap/ticdc

Length of output: 1271

🏁 Script executed:

#!/bin/bash # Check if there are any other methods that mutate unlockedRanges outside of this file rg -n --type=go 'unlockedRanges\s*=' logservice/logpuller/regionlock/ --type=go

Repository: pingcap/ticdc

Length of output: 39

🏁 Script executed:

#!/bin/bash # Search more broadly for any modifications to unlockedRanges (not just assignments) rg -n --type=go '\bunlockedRanges\b' logservice/logpuller/regionlock/region_range_lock.go | head -30

Repository: pingcap/ticdc

Length of output: 1144

🏁 Script executed:

#!/bin/bash # Get the full hackSplitInto implementation rg -n --type=go -A30 'func \(m \*rangeTsMap\) hackSplitInto'

Repository: pingcap/ticdc

Length of output: 2530

🏁 Script executed:

#!/bin/bash # Check if there's a getMinTs call on unlockedRanges anywhere else or if cache must be manually updated rg -n --type=go 'getMinTs|getMin' logservice/logpuller/regionlock/range_ts_map.go | head -20

Repository: pingcap/ticdc

Length of output: 176

🏁 Script executed:

#!/bin/bash # Let's verify that hackSplitInto is only called once in constructor rg -n --type=go 'hackSplitInto' logservice/logpuller/regionlock/

Repository: pingcap/ticdc

Length of output: 386

🏁 Script executed:

#!/bin/bash # Double-check: see the getMinTs implementation to ensure it's working correctly rg -n --type=go -A15 'func \(m \*rangeTsMap\) getMinTs\(\)'

Repository: pingcap/ticdc

Length of output: 1212

Add cache update after hackSplitInto mutation

The hackSplitInto() call at line 162 mutates unlockedRanges but does not update unlockedRangesMinTs. While the cache value happens to be correct (all entries created have ts=startTs), this violates the synchronization principle: every path that mutates unlockedRanges should update the cache. Add an explicit cache update after hackSplitInto() to maintain consistency with the pattern used in set() and unset().

Also applies to: 289-293, 524-528

🤖 Prompt for AI Agents

In `@logservice/logpuller/regionlock/region_range_lock.go` around lines 128 - 130, The hackSplitInto(...) call mutates the unlockedRanges map but does not update the unlockedRangesMinTs cache; add an explicit cache update after every hackSplitInto invocation to recompute/unify unlockedRangesMinTs (same approach used in set() and unset()) so the cached minimum resolved ts stays consistent with unlockedRanges; apply this fix in the three places where hackSplitInto is called (the block around the first call, and the other two blocks referenced) ensuring you hold mu while updating unlockedRangesMinTs.

fix

05850a1

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jan 28, 2026

ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 28, 2026

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

logservice/logpuller/region_event_handler.go Outdated Show resolved Hide resolved

asddongmen added 3 commits January 28, 2026 15:25

fix 2

5e51bb5

Signed-off-by: dongmen <414110582@qq.com>

fix panic

2b99fd1

Signed-off-by: dongmen <414110582@qq.com>

fix 4

af75f18

Signed-off-by: dongmen <414110582@qq.com>

asddongmen added 4 commits January 28, 2026 16:12

fix 5

12dbb20

Signed-off-by: dongmen <414110582@qq.com>

hack for test

5a93987

Signed-off-by: dongmen <414110582@qq.com>

hack for test 2

eb0aed8

Signed-off-by: dongmen <414110582@qq.com>

hack for test 3

5a11e02

Signed-off-by: dongmen <414110582@qq.com>

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

asddongmen added 2 commits January 28, 2026 16:58

hack for test 4

6364c1b

Signed-off-by: dongmen <414110582@qq.com>

hack for test 5

64a11d3

Signed-off-by: dongmen <414110582@qq.com>

ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2026

hack for test 6

1e27e7a

Signed-off-by: dongmen <414110582@qq.com>

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

asddongmen closed this Feb 27, 2026


		// HACK: for testing table 930, limit locked ranges to 30
		hackLimitLock int

Conversation

asddongmen commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Uh oh!

asddongmen commented Jan 28, 2026

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ti-chi-bot bot commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 28, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

asddongmen commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading