Problem
Host.__eq__ and Host.__hash__ currently use Host.endpoint as identity:
Host == Host compares endpoints.
Host == "address" is supported for backward compatibility.
hash(Host) delegates to hash(endpoint).
This is unsafe because Host.endpoint can be mutated during topology refresh/IP-change handling. It also violates Python's hash/equality contract for the string-compatibility path: host == "127.0.0.1" can be true while hash(host) != hash("127.0.0.1").
Relevant code:
cassandra/pool.py: Host.__eq__, Host.__hash__, Host.__lt__
cassandra/cluster.py: control connection mutates host.endpoint when a node keeps the same host_id but changes endpoint
cassandra/cluster.py: Session._pools is keyed by Host
cassandra/policies.py: load-balancing policies store hosts in frozenset/tuples and use equality for dedup/removal
Impact
Internal structures can lose or duplicate entries after endpoint changes:
Session._pools lookup/removal can fail after endpoint mutation.
- Query execution can report
Host has been marked down or removed even though the host object still exists.
- Load-balancing policy host sets can fail to remove/re-add the intended host.
- Token-aware routing and replica filtering use
Host equality for membership checks.
- Metrics/Insights consume pool state keyed by
Host.
- Custom load-balancing policies may rely on legacy
Host == address behavior.
Reproduction
import uuid
from cassandra.pool import Host
from cassandra.policies import SimpleConvictionPolicy
from cassandra.connection import DefaultEndPoint
hosts = [
Host(DefaultEndPoint("127.0.0.%d" % i), SimpleConvictionPolicy, host_id=uuid.uuid4())
for i in range(64)
]
target = hosts[32]
pools = {host: "pool" for host in hosts}
target.endpoint = DefaultEndPoint("10.0.0.250")
assert pools.get(target) is None
assert target not in pools
String compatibility also violates the hash contract:
host = Host("127.0.0.1", SimpleConvictionPolicy, host_id=uuid.uuid4())
assert host == "127.0.0.1"
assert "127.0.0.1" == host
assert host not in {"127.0.0.1"}
assert "127.0.0.1" not in {host}
Proposed Direction
Define stable Host identity before removing topology mutability:
- Use
host_id as Host identity for __eq__ and __hash__, or use object identity internally and expose explicit host-id comparison.
- Stop using endpoint/address equality for
Host.__eq__; replace with explicit checks such as host.address == addr or metadata.get_host(addr).
- Keep endpoint lookup in metadata via
_host_id_by_endpoint, but update it through one controlled topology-update path.
- Keep health mutable (
is_up, conviction policy, reconnection handler) while making topology fields replaceable/immutable.
- Revisit
__lt__; endpoint-only ordering is inconsistent if equality becomes host-id based.
Compatibility Risk
This breaks tests and possible user code that expect:
- two
Host instances with the same endpoint but different host_id to be equal;
Host == "address" to be true;
- sets of
Host to deduplicate by endpoint.
This likely needs a major-version change or a deprecation phase.
Acceptance Criteria
hash(host) remains stable across endpoint/topology updates.
- Equal objects have equal hashes.
- Session pool lookup/removal works after IP change.
- Load-balancing policies remove/re-add the intended host after endpoint or dc/rack changes.
- Tests are updated for new identity semantics.
- Generated C artifacts are updated for
cassandra/pool.py.
Problem
Host.__eq__andHost.__hash__currently useHost.endpointas identity:Host == Hostcompares endpoints.Host == "address"is supported for backward compatibility.hash(Host)delegates tohash(endpoint).This is unsafe because
Host.endpointcan be mutated during topology refresh/IP-change handling. It also violates Python's hash/equality contract for the string-compatibility path:host == "127.0.0.1"can be true whilehash(host) != hash("127.0.0.1").Relevant code:
cassandra/pool.py:Host.__eq__,Host.__hash__,Host.__lt__cassandra/cluster.py: control connection mutateshost.endpointwhen a node keeps the samehost_idbut changes endpointcassandra/cluster.py:Session._poolsis keyed byHostcassandra/policies.py: load-balancing policies store hosts infrozenset/tuples and use equality for dedup/removalImpact
Internal structures can lose or duplicate entries after endpoint changes:
Session._poolslookup/removal can fail after endpoint mutation.Host has been marked down or removedeven though the host object still exists.Hostequality for membership checks.Host.Host == addressbehavior.Reproduction
String compatibility also violates the hash contract:
Proposed Direction
Define stable
Hostidentity before removing topology mutability:host_idasHostidentity for__eq__and__hash__, or use object identity internally and expose explicit host-id comparison.Host.__eq__; replace with explicit checks such ashost.address == addrormetadata.get_host(addr)._host_id_by_endpoint, but update it through one controlled topology-update path.is_up, conviction policy, reconnection handler) while making topology fields replaceable/immutable.__lt__; endpoint-only ordering is inconsistent if equality becomes host-id based.Compatibility Risk
This breaks tests and possible user code that expect:
Hostinstances with the same endpoint but differenthost_idto be equal;Host == "address"to be true;Hostto deduplicate by endpoint.This likely needs a major-version change or a deprecation phase.
Acceptance Criteria
hash(host)remains stable across endpoint/topology updates.cassandra/pool.py.