Skip to content

Conversation

@andyedison
Copy link

This reverts commit 560d450.

There has been evidence that this backport applied to our version of vitess causes a memory leak because of code that this fix relies upon that is fundamentally different in v21 than the upstream v22 and v23.

Here is a copilot analysis of the fix in v21 https://gist.github.com/andyedison/4287968d85afa780abc8889404521078#file-memory-leak-md

Description

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Copilot AI review requested due to automatic review settings January 15, 2026 16:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reverts commit 560d450 which attempted to fix connection leaks in the connection pool. The revert is necessary because the backported fix from upstream Vitess v22/v23 causes memory leaks in v21 due to fundamental differences in the codebase between versions.

Changes:

  • Reverted logic in pop() method to continue skipping connections that fail borrow
  • Reverted closeIdleResources() to use connReopen() instead of creating new connections via getNew() and tryReturnConn()
  • Removed test TestIdleTimeoutConnectionLeak that was testing the reverted fix

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
go/pools/smartconnpool/pool.go Reverts connection pool handling logic in pop() and closeIdleResources() methods to pre-fix state
go/pools/smartconnpool/pool_test.go Removes test case that validated the reverted fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +438 to 440
if conn.timeUsed.borrow() {
return conn
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reverted logic may reintroduce the original connection leak issue. After reverting, when borrow() fails (connection is expired/busy), the loop continues to the next connection but doesn't handle the failed connection, leaving it in the stack. Consider documenting why this approach is acceptable in v21 and what trade-offs are being made by accepting potential leaked connections versus the memory leak caused by the fix.

Copilot uses AI. Check for mistakes.
pool.closedConn()

// Using context.Background() is fine since MySQL connection already enforces
// a connect timeout via the `db_connect_timeout_ms` config param.
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This revert changes the behavior from creating new connections to reopening existing connection objects. Given that the original fix was reverted due to memory leak issues specific to v21, this code section should include a comment explaining why connReopen() is used here instead of the getNew() + tryReturnConn() approach, and what the known trade-offs are. This will help future maintainers understand the context when they encounter connection pool issues.

Suggested change
// a connect timeout via the `db_connect_timeout_ms` config param.
// a connect timeout via the `db_connect_timeout_ms` config param.
// NOTE: We intentionally call connReopen() here instead of using the
// getNew() + tryReturnConn() pattern. An earlier change switched this
// logic to allocate a new connection and return the old one to the pool,
// but that approach was reverted in v21 due to a version-specific memory
// leak in the connection pooling layer. Reopening the existing connection
// object avoids that leak at the cost of reusing connection structs
// more aggressively and potentially holding onto them longer. If this
// behavior is revisited in the future, carefully reassess memory usage,
// pool accounting, and leak risks before switching back to getNew() +
// tryReturnConn().

Copilot uses AI. Check for mistakes.
@andyedison
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants