Path Quality and Selection: Analysis of Current State, Open PRs, and a Lightweight Proposal #1995

stachuman · 2026-03-10T19:07:36Z

stachuman
Mar 10, 2026

Current behavior - v1.14

The current path selection is "first packet wins" — when a flood packet arrives at its destination, the first copy to arrive is processed and all duplicates are discarded via hasSeen(). The path from that first copy becomes the stored out_path for direct messaging.

From Mesh.cpp:
// NOTE: this is a 'first packet wins' impl. When receiving from multiple paths, the first to arrive wins.
// For flood mode, the path may not be the 'best' in terms of hops.
// FUTURE: could send back multiple paths, using createPathReturn(), and let sender choose which to use(?)

From BaseChatMesh.cpp:
// NOTE: default impl, we just replace the current 'out_path' regardless, whenever sender sends us a new out_path.
// FUTURE: could store multiple out_paths per contact, and try to find which is the 'best'(?)

The existing rxdelay mechanism provides implicit quality weighting — repeaters with stronger received signals forward faster, so the "first to arrive" tends to correlate with path quality. But this is not guaranteed and doesn't account for link asymmetry or path staleness - manual tests (when checking manually by logging into repeaters and checking signal to noise) show much better quality of connection thus less noise generated.

Problems this causes

Path churn: A newly received path blindly overwrites a working route, even if the new path is worse (more hops, weaker links)
No failure recovery: If a direct path goes stale (repeater offline, moved), there's no fallback — direct sends fail repeatedly until the path is manually reset or a new flood rediscovers a route
No quality awareness: Hop count and signal quality are not considered when accepting a new path

Open PRs addressing this

PR, Approach, Complexity, RAM cost
#1911 │ Gate path updates: accept only if unknown, fewer hops, or stale (>60s) │ Low (~30 lines) │ ~0 bytes/contact
#1908 │ Active + backup path with failure tracking, cooldowns, and failover │ High (~350 lines changed) │ ~78 bytes/contact
#1782 │ Passive path candidates from overheard traffic │ Medium │ Varies

#1911 is the simplest — it prevents the "last path overwrites a working route" problem with three rules: accept if (a) no current path, (b) fewer hops, (c) current path older than 60 seconds. No backup storage, no failure tracking.

#1908 is comprehensive — dual active/backup path storage, failure counting (2 failures triggers switch), 10s switch cooldown, 30min backup expiry, 15s direct-block window forcing flood rediscovery. It fixes real correctness bugs (encoded path_len misuse in memcpy, stale pointer risk). But the RAM overhead is ~78 bytes/contact — with MAX_CONTACTS=350 that's ~27 KB, which may be tight on nRF52/RP2040.

What's missing: signal quality - and honestly, after last improvement (mesh path 2/3 bytes) it could be the biggest impact to limit unnecessary noise.

None of the current proposals consider signal quality. isPathBetter() in PR #1908 compares only hop count and path byte length. A 2-hop path through a marginal link will always beat a 3-hop path through strong links.

Proposal 1 (no changes): Add last-hop SNR to path acceptance (zero protocol changes)

When a destination endpoint receives a flood packet, it already has access to:

Hop count — from the path_len field
Last-hop SNR — from pkt->_snr (already stored by Dispatcher::checkRecv())

The rxdelay mechanism already ensures that at each repeater hop, the strongest-signal copy wins the forwarding race. By the time the packet reaches the destination, the last-hop SNR is a reasonable proxy for overall path health.

Concrete change: Store last_hop_snr (1 byte, SNR*4 as int8_t) alongside out_path in ContactInfo. Use it as a tiebreaker in path acceptance:

bool isPathBetter(uint8_t candidate_len, int8_t candidate_snr,
uint8_t current_len, int8_t current_snr) {
uint8_t candidate_hops = candidate_len & 63;
uint8_t current_hops = current_len & 63;
if (candidate_hops != current_hops) {
return candidate_hops < current_hops; // fewer hops wins
}
return candidate_snr > current_snr + 3; // same hops: better SNR wins (with hysteresis)
}

This adds 1 byte per contact, requires zero protocol changes, and works with both PR #1911 and #1908.

Suggested incremental path

Merge Gate path updates to unknown, better-hop, or stale routes #1911 first — minimal risk, prevents the worst path churn, easy to validate in the field
Add SNR tiebreaker on top of Gate path updates to unknown, better-hop, or stale routes #1911 — 1 byte/contact, simple logic, improves path quality selection
Evaluate Harden contact path failover and simplify routing flow #1908 for a future release — the active/backup failover is valuable but needs field testing; consider reducing backup_out_path to 32 bytes since 32+ hop backup paths are unlikely to be useful

Future (and important!) consideration: min-SNR tracking (requires protocol change)

For true bottleneck detection, a 1-byte min_snr field could be added to flood packets, updated by each repeater: min_snr = min(current, this_hop_snr).
This would let the destination see the weakest link in the entire chain, not just the last hop. This would be a wire format change and should be bundled with other breaking changes (e.g., a future protocol version bump).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path Quality and Selection: Analysis of Current State, Open PRs, and a Lightweight Proposal #1995

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Path Quality and Selection: Analysis of Current State, Open PRs, and a Lightweight Proposal #1995

Uh oh!

stachuman Mar 10, 2026

Replies: 0 comments

stachuman
Mar 10, 2026