Jitter for server_lifetime to break synchronized backend reconnect waves

## Problem

When PgDog opens a cohort of backend connections in a tight window — e.g. during a traffic ramp at the start of a sale — those connections share a near-identical birth time. With a finite `server_lifetime`, the entire cohort hits the cap at near-identical times, producing one large reconnect wave at every `server_lifetime` interval. Clients that need a slot during the wave queue up while PgDog reopens connections, which surfaces as a periodic latency stripe in application metrics.

This is the same failure mode HikariCP addresses with built-in `maxLifetime` jitter and that AWS RDS Proxy avoids by spreading recycle events across a window.

## Proposal

Add a configuration option to randomize `server_lifetime` per connection.

Option A — explicit jitter knob (preferred, matches HikariCP):

```toml
[general]
server_lifetime = 3600000          # 1h base
server_lifetime_jitter = 600000    # +/- up to 10 min, sampled per connection at creation time
```

Each backend connection's effective lifetime would be `server_lifetime + uniform(-server_lifetime_jitter, +server_lifetime_jitter)`. Default `server_lifetime_jitter = 0` preserves current behavior.

Option B — implicit fractional jitter:

```toml
[general]
server_lifetime = 3600000
server_lifetime_jitter_fraction = 0.1   # +/- 10% of server_lifetime
```

Either works; (A) is more explicit and easier to reason about for operators.

### Behavioral details that matter

- Jitter must be sampled **once at connection creation**, not on every check, so a connection's lifetime is deterministic from its birth time. Otherwise a connection could keep "rolling" past its cap.
- Jitter should be additive, not subtractive-only — operators want the *average* lifetime to remain `server_lifetime`, not skew shorter.
- The randomness source can be process-global; cryptographic randomness is not required.

## Workarounds today

The only mitigations available are increasing `server_lifetime` (reduces wave frequency but doesn't eliminate the cohort effect) or removing it entirely (gives up backend-memory recycling). Neither is a substitute for jitter.

## Prior art

- HikariCP: [`maxLifetime`](https://github.com/brettwooldridge/HikariCP/wiki/MBean-(JMX)-Monitoring-and-Management) is jittered internally by 2.5% by default; documented for exactly this reason.
- AWS RDS Proxy spreads idle connection recycling across a window to avoid synchronized retirement.

Happy to send a PR if there's interest in (A). Would also be open to a `min_lifetime`/`max_lifetime` shape if maintainers prefer that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jitter for server_lifetime to break synchronized backend reconnect waves #962

Problem

Proposal

Behavioral details that matter

Workarounds today

Prior art

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Jitter for server_lifetime to break synchronized backend reconnect waves #962

Description

Problem

Proposal

Behavioral details that matter

Workarounds today

Prior art

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions