Fix on_up() destroying healthy pool after replace-with-same-IP#879
Draft
bitpathfinder wants to merge 1 commit intoscylladb:masterfrom
Draft
Fix on_up() destroying healthy pool after replace-with-same-IP#879bitpathfinder wants to merge 1 commit intoscylladb:masterfrom
bitpathfinder wants to merge 1 commit intoscylladb:masterfrom
Conversation
c1f7d6b to
582e86d
Compare
There was a problem hiding this comment.
Pull request overview
Addresses SCYLLADB-833 where Cluster.on_up() can be invoked with a stale Host instance after a replace-with-same-IP, causing the driver to tear down a healthy pool and briefly fail queries.
Changes:
- Add early-return guards in
Cluster.on_up()to skip handling when theHostreference is stale or when a healthy pool already exists. - Add unit tests covering stale-host and healthy-pool scenarios to prevent regressions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cassandra/cluster.py | Adds on_up() guard clauses to avoid tearing down an already-healthy pool when handling stale host references. |
| tests/unit/test_cluster.py | Adds unit tests validating the new on_up() early-return behavior for stale hosts and pre-existing healthy pools. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When a node is replaced with the same IP, the driver receives both TOPOLOGY_CHANGE NEW_NODE and STATUS_CHANGE UP events. The NEW_NODE handler runs first, replacing the old host and establishing a new pool. The STATUS_CHANGE UP handler fires later with a stale reference to the old host object. Because Host.__eq__/__hash__ are endpoint-based, the stale on_up() tears down the new host's pool, causing a brief window where queries fail with NoHostAvailable. Add two guards at the top of on_up(): 1. If the host has been replaced in metadata (different object, same endpoint, new host already up), skip processing. 2. If a healthy (non-shutdown) pool already exists for this host, call set_up() and skip the teardown/rebuild cycle. Both guards reset _currently_handling_node_up under host.lock, consistent with the existing cleanup paths. Refs: SCYLLADB-833
582e86d to
c9b8e5e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a node is replaced with the same IP, the driver receives both
TOPOLOGY_CHANGE NEW_NODEandSTATUS_CHANGE UPevents. TheNEW_NODEhandler runs first, replacing the old host and establishing a new pool viaon_add. TheSTATUS_CHANGE UPhandler fires later (after a random 0-2s delay) with a stale reference to the old host object. BecauseHost.__eq__/__hash__are endpoint-based, the staleon_up()tears down the new host's pool, causing a brief window where queries fail withNoHostAvailable.Fix
Two early-return guards added at the top of
Cluster.on_up():on_add), mark the host as up and skip the teardown/rebuild cycle.Fixes: SCYLLADB-833