Skip to content

Add adaptive congestion control for backbone repeaters#1938

Open
wbijen wants to merge 1 commit intomeshcore-dev:devfrom
wbijen:feature/adaptive-congestion-control
Open

Add adaptive congestion control for backbone repeaters#1938
wbijen wants to merge 1 commit intomeshcore-dev:devfrom
wbijen:feature/adaptive-congestion-control

Conversation

@wbijen
Copy link
Contributor

@wbijen wbijen commented Mar 6, 2026

Adaptive Congestion Control for Backbone Repeaters

Summary

This PR adds a lightweight, adaptive congestion control system to repeater nodes. When the network gets busy, repeaters automatically and gracefully reduce flood hop limits for group messages and advertisements, keeping the mesh healthy without manual intervention.

This addresses #1588 (repeater limit channel flooding) with a busy-adaptive approach rather than a static cap.

How it works

BusyTracker (src/helpers/BusyTracker.h) computes a composite busy score (0.0–1.0) from three weighted signals over a 60-second tumbling window:

Signal Weight Rationale
Airtime ratio 50% "Am I personally overloaded?" — measures actual radio utilization (TX + RX time as a fraction of the window). This is the most direct indicator of channel saturation.
Queue pressure 30% "Is my TX pipeline backed up?" — instantaneous queue fill level. High queue depth means packets are waiting longer to transmit, a leading indicator that airtime will spike next.
Duplicate ratio 20% "Is the mesh around me oversaturated?" — proportion of received floods that are duplicates. High dups mean many neighbors are reforwarding the same packets. This doesn't show up in airtime (dups are short RX, not TX) or queue (dups are dropped, never queued), but it tells us that reducing our own flood reach won't hurt delivery — other paths already cover those destinations.

All three weights are compile-time configurable (BUSY_WEIGHT_AIRTIME, BUSY_WEIGHT_QUEUE, BUSY_WEIGHT_DUP).

The busy score drives a piecewise-linear ramp that smoothly reduces effective hop limits as congestion builds:

effective_hops
     ^
  8  |███
  7  |     \        ← upper segment: max → mid
  6  |        \
  5  |...........█      ← knee (mid)
  4  |              \
  3  |                 \    ← lower segment: mid → floor
  2  |                     ████████
     +--+--+--+--+--+--+--+--> busy
     0  .15       .58     1.0
        ^onset    ^knee
Busy range Label Behavior
0.00 – 0.15 Low Dead zone — full reach, no penalty
0.15 – 0.58 Medium Progressive shedding: max → mid
0.58 – 1.00 High Aggressive shedding: mid → floor

Example: A repeater running at 40% busy will forward group messages up to ~6 hops (down from 8), while advertisements drop to ~5 hops. At 80% busy, groups are limited to ~3 hops and adverts to ~2. Direct/unicast traffic is never throttled.

Traffic-aware retransmit delay

getRetransmitDelay() scales the random jitter window by (1 + busy × 2):

  • At busy = 0 (idle): delay unchanged — same as current behavior
  • At busy = 0.5 (moderate): 2× wider jitter window, spreading retransmissions
  • At busy = 1.0 (saturated): 3× wider jitter window

This naturally reduces collision probability under load by spreading retransmissions over a wider time window. getDirectRetransmitDelay() is intentionally not scaled — DM and ACK traffic should not be penalized during congestion, as these are high-value, low-volume packets that need reliable delivery.

Defaults

All thresholds are compile-time #defines, overridable via build_flags in platformio.ini:

Parameter Default Purpose
BUSY_ONSET 0.15 Busy score below which no shedding occurs
BUSY_WINDOW_MS 60000 Tumbling window for score computation (ms)
BUSY_WEIGHT_AIRTIME 0.5 Weight for airtime ratio component
BUSY_WEIGHT_QUEUE 0.3 Weight for queue pressure component
BUSY_WEIGHT_DUP 0.2 Weight for duplicate ratio component
GROUP_FLOOD_MAX 8 Group message hop limit at idle
GROUP_FLOOD_MID 5 Group hops at the knee of the ramp
GROUP_FLOOD_FLOOR 2 Group hops at full congestion
ADVERT_FLOOD_MAX 8 Advertisement hop limit at idle
ADVERT_FLOOD_MID 4 Advert hops at the knee
ADVERT_FLOOD_FLOOR 1 Advert hops at full congestion
PACKET_POOL_SIZE 32 Packet pool capacity (shared with StaticPoolPacketManager)
BUSY_TX_DELAY_SCALE 3 TX delay scaling factor

Operators can tune per-deployment, for example a high-traffic urban repeater:

build_flags =
  -DGROUP_FLOOD_FLOOR=1
  -DADVERT_FLOOD_FLOOR=0
  -DBUSY_ONSET=0.10
  -DBUSY_WEIGHT_AIRTIME=0.6f
  -DBUSY_WEIGHT_QUEUE=0.2f
  -DBUSY_WEIGHT_DUP=0.2f

What's included

  • Adaptive hop limits in allowPacketForward() — per-type flood limiting for group, advert, and fallback to _prefs.flood_max
  • Traffic-aware retransmit delaygetRetransmitDelay() scales jitter by (1 + busy × 2), spreading retransmissions when congested. Direct message delays are intentionally untouched to protect DM/ACK delivery
  • Counter wrap protection — airtime counters wrap at ~49 days; BusyTracker detects implausible deltas (exceeding elapsed wall time), skips the window, re-baselines, and preserves the previous valid score
  • Misconfiguration guardsgetEffectiveFloodMax() clamps mid and floor if build flag overrides violate ceiling >= mid >= floor
  • stats-busy CLI command — prints busy score, component breakdown, and effective hop limits over serial
  • OLED display line — live status: GRP:5/8 ADV:4/8 B:32%
  • Build flag docs in variants/heltec_v4/platformio.ini

Design choices

  • Zero heap allocation at runtime — everything is stack/static
  • Header-only BusyTracker avoids touching core library build
  • Piecewise-linear ramp gives smooth, predictable behavior without cliff edges
  • Dead zone prevents flapping during normal low-traffic operation
  • PACKET_POOL_SIZE constant shared between pool allocator and BusyTracker to prevent silent drift

Is this overengineered?

Honest question — the core mechanism is simple (composite score → hop limit ramp), but there are a lot of configurable knobs and a linear ramp. The rationale is that different deployments face very different traffic patterns (urban dense mesh vs rural sparse backbone), and we won't know the right defaults until we have field data. The #ifndef guards make these zero-cost to ignore if you don't need them.

That said, if the consensus is that fewer knobs and hardcoded defaults would be better to start with, happy to strip the configurability down to just GROUP_FLOOD_MAX/FLOOR and ADVERT_FLOOD_MAX/FLOOR and bake the rest in. We can always add knobs back later if field testing shows they're needed. Would love to hear thoughts on this.

Files changed

File Change
src/helpers/BusyTracker.h New — busy score computation, getEffectiveFloodMax(), all compile-time defines
examples/simple_repeater/MyMesh.h BusyTracker member, getter, PACKET_POOL_SIZE define
examples/simple_repeater/MyMesh.cpp Wired into loop/forward/delay/stats/CLI
examples/simple_repeater/UITask.h Accept BusyTracker pointer
examples/simple_repeater/UITask.cpp Render effective hop limits on display
examples/simple_repeater/main.cpp Pass BusyTracker to UITask
variants/heltec_v4/platformio.ini Documented congestion build flags

Related work

Issue Title Relationship
#1502 Token bucket rate limiting Complementary. Per-sender fairness vs our global congestion response. Composes well.
#1588 Repeater limit channel flooding Directly addressed. Our GROUP_FLOOD_MAX is this feature, but busy-adaptive rather than static.

Looking forward to feedback! Happy to tune defaults or simplify if it feels like too much for a first pass.

@wbijen wbijen force-pushed the feature/adaptive-congestion-control branch from 92fb4ca to 995dbed Compare March 6, 2026 07:24
@@ -415,10 +415,24 @@ bool MyMesh::isLooped(const mesh::Packet* packet, const uint8_t max_counters[])

bool MyMesh::allowPacketForward(const mesh::Packet *packet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could a max based on min(_prefs->flood_max, getEffectiveFloodMax(...)) be used, to adhere to a configured lower value?

Copy link
Contributor Author

@wbijen wbijen Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowest now wins, fixed and pushed! thanks.

It now has GROUP_FLOOD_MAX and ADVERT_FLOOD_MAX configured at 8. Do you think that value makes sense or you want me to set it to 64 so by default there is no change in behavior at quiet times?

Introduces BusyTracker — a lightweight busy score tracker that computes
a composite congestion metric from airtime ratio, queue pressure, and
flood duplicate ratio. The score drives a piecewise-linear ramp that
smoothly reduces flood hop limits for group messages and advertisements
as congestion builds, while leaving direct/unicast traffic untouched.

Includes counter wrap protection (~49 day uptime), misconfiguration
guards for build flag overrides, stats-busy CLI command, OLED display
integration showing effective hop limits, and traffic-aware retransmit
delay scaling.

All thresholds and weights are compile-time configurable via build_flags.

Addresses meshcore-dev#1588, complements meshcore-dev#1502.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wbijen wbijen force-pushed the feature/adaptive-congestion-control branch from 995dbed to aecd4f9 Compare March 8, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants