Skip to content

feat(pool): jitter for server_lifetime#963

Merged
levkk merged 2 commits intopgdogdev:mainfrom
loyaltylion:server-lifetime-jitter
May 7, 2026
Merged

feat(pool): jitter for server_lifetime#963
levkk merged 2 commits intopgdogdev:mainfrom
loyaltylion:server-lifetime-jitter

Conversation

@frangz
Copy link
Copy Markdown
Contributor

@frangz frangz commented May 6, 2026

Closes #962.

Summary

Adds server_lifetime_jitter (milliseconds) at the general, database, and user config levels. When non-zero, each backend connection samples a per-connection offset uniformly from [-jitter, +jitter] once at creation time, so its effective lifetime is fixed at birth and the comparison sites in the pool use that effective value instead of the bare server_lifetime.

This breaks the synchronized retirement of cohorts that opened together (e.g. during a traffic ramp), which otherwise produces a periodic reconnect wave at every server_lifetime interval — visible as a clean N-minute latency stripe in client metrics. See #962 for context and prior art.

Default is 0 (no jitter), preserving existing behavior.

Implementation

Config (pgdog-config)

  • general.rs: new server_lifetime_jitter: u64 with default 0, _Default:_ annotation, and docs URL placeholder per pgdog-config/CONTRIBUTING.md. Env override PGDOG_SERVER_LIFETIME_JITTER.
  • database.rs, users.rs: server_lifetime_jitter: Option<u64> overrides, mirroring the existing server_lifetime precedence.
  • url.rs: query-param parsing for the database-level override.

Pool config (pgdog-stats, pgdog/src/backend/pool/config.rs)

  • New Config::max_age_jitter: Duration (default Duration::ZERO). Resolved User → Database → General, same chain as server_lifetime.

Per-connection sampling (pgdog/src/backend/pool/monitor.rs)

  • After a successful Server::connect, sample once via rand::rng().random_range(-jitter..=+jitter) and store on the connection. Skipped (and free) when jitter is 0.

Effective lifetime (pgdog/src/backend/server.rs)

  • Server::set_max_age_offset_ms(i64) and Server::effective_max_age(base) -> Duration. Saturates at zero on negative overflow; returns base unchanged for offset 0.

Comparison sites (pgdog/src/backend/pool/inner.rs)

  • close_old and maybe_check_in now compare server.age(now) against server.effective_max_age(self.config.max_age).

Behavioral details (matches the issue's contract)

  • Sampled once at connection creation, not on every check, so each connection's lifetime is deterministic from its birth time.
  • Additive (uniform [-jitter, +jitter]) so the average effective lifetime stays at server_lifetime, not skewed shorter.
  • Process-global RNG (non-cryptographic).

Tests

10 new tests, all passing locally:

  • pgdog-config: URL parsing of server_lifetime_jitter.
  • pgdog::backend::pool::config: precedence chain (general → database → user), default-zero, and the existing precedence test extended.
  • pgdog::backend::server: effective_max_age for default / positive / negative / negative-saturating cases.

Existing pool/server tests (27 in pool::inner, full pgdog-config suite) continue to pass.

cargo fmt applied; no new clippy warnings introduced at the touched call sites.

Docs

The /// doc comments reference https://docs.pgdog.dev/configuration/pgdog.toml/{general,databases}/#server_lifetime_jitter per pgdog-config/CONTRIBUTING.md. Happy to follow up with a docs PR (or update those URLs) once you tell me where the new headings should land.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 6, 2026

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 98.38710% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pgdog/src/backend/server.rs 97.18% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Add `server_lifetime_jitter` (in milliseconds) at general, database, and
user levels. When non-zero, each backend connection samples a per-
connection offset uniformly from `[-jitter, +jitter]` once at creation,
so its effective lifetime is fixed at birth time. This breaks
synchronized retirement of cohorts that opened together (e.g. during a
traffic ramp), avoiding the periodic reconnect waves that show up as
clean N-minute latency stripes in client metrics.

Default is 0 (no jitter), preserving existing behavior.

Closes pgdogdev#962
@frangz frangz force-pushed the server-lifetime-jitter branch from bc8549c to 0c63bb4 Compare May 6, 2026 17:46
Comment thread pgdog/src/backend/pool/inner.rs
Copy link
Copy Markdown
Collaborator

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor stylistic change, otherwise this is great!

Comment thread pgdog/src/backend/pool/monitor.rs Outdated
let mut error = Error::ServerError;
let now = Instant::now();

let max_age_jitter_ms = pool.config().max_age_jitter.as_millis() as i64;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: starting from here, let's propagate it as Duration so we are sure not to confuse units - I've done this before, even when the name of the variable is *_ms, it's easy to forget.

Replace Server's i64-millisecond offset with an Option<Duration>
max_age set by apply_lifetime_jitter(base: Duration, jitter: Duration).
The signed sampling math is contained inside the function body; the
public surface is Duration end-to-end so callers can't accidentally
treat ms as seconds. effective_max_age(base) now returns the stored
per-connection value or base. Tests updated for the new API; no
behavior change at jitter = 0.
Copy link
Copy Markdown
Collaborator

@levkk levkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

If you'd like to follow up with docs, you can just add the setting description here: https://github.com/pgdogdev/docs/blob/main/docs/configuration/pgdog.toml/general.md#server_lifetime

I'm going to let the CI finish and merge.

🙏

@levkk levkk merged commit 706d59d into pgdogdev:main May 7, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jitter for server_lifetime to break synchronized backend reconnect waves

3 participants