Skip to content

Host equality/hash depend on mutable endpoint #867

@dkropachev

Description

@dkropachev

Problem

Host.__eq__ and Host.__hash__ currently use Host.endpoint as identity:

  • Host == Host compares endpoints.
  • Host == "address" is supported for backward compatibility.
  • hash(Host) delegates to hash(endpoint).

This is unsafe because Host.endpoint can be mutated during topology refresh/IP-change handling. It also violates Python's hash/equality contract for the string-compatibility path: host == "127.0.0.1" can be true while hash(host) != hash("127.0.0.1").

Relevant code:

  • cassandra/pool.py: Host.__eq__, Host.__hash__, Host.__lt__
  • cassandra/cluster.py: control connection mutates host.endpoint when a node keeps the same host_id but changes endpoint
  • cassandra/cluster.py: Session._pools is keyed by Host
  • cassandra/policies.py: load-balancing policies store hosts in frozenset/tuples and use equality for dedup/removal

Impact

Internal structures can lose or duplicate entries after endpoint changes:

  • Session._pools lookup/removal can fail after endpoint mutation.
  • Query execution can report Host has been marked down or removed even though the host object still exists.
  • Load-balancing policy host sets can fail to remove/re-add the intended host.
  • Token-aware routing and replica filtering use Host equality for membership checks.
  • Metrics/Insights consume pool state keyed by Host.
  • Custom load-balancing policies may rely on legacy Host == address behavior.

Reproduction

import uuid
from cassandra.pool import Host
from cassandra.policies import SimpleConvictionPolicy
from cassandra.connection import DefaultEndPoint

hosts = [
    Host(DefaultEndPoint("127.0.0.%d" % i), SimpleConvictionPolicy, host_id=uuid.uuid4())
    for i in range(64)
]
target = hosts[32]
pools = {host: "pool" for host in hosts}

target.endpoint = DefaultEndPoint("10.0.0.250")

assert pools.get(target) is None
assert target not in pools

String compatibility also violates the hash contract:

host = Host("127.0.0.1", SimpleConvictionPolicy, host_id=uuid.uuid4())

assert host == "127.0.0.1"
assert "127.0.0.1" == host
assert host not in {"127.0.0.1"}
assert "127.0.0.1" not in {host}

Proposed Direction

Define stable Host identity before removing topology mutability:

  • Use host_id as Host identity for __eq__ and __hash__, or use object identity internally and expose explicit host-id comparison.
  • Stop using endpoint/address equality for Host.__eq__; replace with explicit checks such as host.address == addr or metadata.get_host(addr).
  • Keep endpoint lookup in metadata via _host_id_by_endpoint, but update it through one controlled topology-update path.
  • Keep health mutable (is_up, conviction policy, reconnection handler) while making topology fields replaceable/immutable.
  • Revisit __lt__; endpoint-only ordering is inconsistent if equality becomes host-id based.

Compatibility Risk

This breaks tests and possible user code that expect:

  • two Host instances with the same endpoint but different host_id to be equal;
  • Host == "address" to be true;
  • sets of Host to deduplicate by endpoint.

This likely needs a major-version change or a deprecation phase.

Acceptance Criteria

  • hash(host) remains stable across endpoint/topology updates.
  • Equal objects have equal hashes.
  • Session pool lookup/removal works after IP change.
  • Load-balancing policies remove/re-add the intended host after endpoint or dc/rack changes.
  • Tests are updated for new identity semantics.
  • Generated C artifacts are updated for cassandra/pool.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions