feat(pool): jitter for server_lifetime#963
Merged
levkk merged 2 commits intopgdogdev:mainfrom May 7, 2026
Merged
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Add `server_lifetime_jitter` (in milliseconds) at general, database, and user levels. When non-zero, each backend connection samples a per- connection offset uniformly from `[-jitter, +jitter]` once at creation, so its effective lifetime is fixed at birth time. This breaks synchronized retirement of cohorts that opened together (e.g. during a traffic ramp), avoiding the periodic reconnect waves that show up as clean N-minute latency stripes in client metrics. Default is 0 (no jitter), preserving existing behavior. Closes pgdogdev#962
bc8549c to
0c63bb4
Compare
levkk
reviewed
May 6, 2026
levkk
reviewed
May 6, 2026
Collaborator
levkk
left a comment
There was a problem hiding this comment.
Just a minor stylistic change, otherwise this is great!
levkk
reviewed
May 6, 2026
| let mut error = Error::ServerError; | ||
| let now = Instant::now(); | ||
|
|
||
| let max_age_jitter_ms = pool.config().max_age_jitter.as_millis() as i64; |
Collaborator
There was a problem hiding this comment.
nit: starting from here, let's propagate it as Duration so we are sure not to confuse units - I've done this before, even when the name of the variable is *_ms, it's easy to forget.
Replace Server's i64-millisecond offset with an Option<Duration> max_age set by apply_lifetime_jitter(base: Duration, jitter: Duration). The signed sampling math is contained inside the function body; the public surface is Duration end-to-end so callers can't accidentally treat ms as seconds. effective_max_age(base) now returns the stored per-connection value or base. Tests updated for the new API; no behavior change at jitter = 0.
levkk
approved these changes
May 7, 2026
Collaborator
levkk
left a comment
There was a problem hiding this comment.
Nice!
If you'd like to follow up with docs, you can just add the setting description here: https://github.com/pgdogdev/docs/blob/main/docs/configuration/pgdog.toml/general.md#server_lifetime
I'm going to let the CI finish and merge.
🙏
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #962.
Summary
Adds
server_lifetime_jitter(milliseconds) at the general, database, and user config levels. When non-zero, each backend connection samples a per-connection offset uniformly from[-jitter, +jitter]once at creation time, so its effective lifetime is fixed at birth and the comparison sites in the pool use that effective value instead of the bareserver_lifetime.This breaks the synchronized retirement of cohorts that opened together (e.g. during a traffic ramp), which otherwise produces a periodic reconnect wave at every
server_lifetimeinterval — visible as a clean N-minute latency stripe in client metrics. See #962 for context and prior art.Default is
0(no jitter), preserving existing behavior.Implementation
Config (
pgdog-config)general.rs: newserver_lifetime_jitter: u64with default 0,_Default:_annotation, and docs URL placeholder perpgdog-config/CONTRIBUTING.md. Env overridePGDOG_SERVER_LIFETIME_JITTER.database.rs,users.rs:server_lifetime_jitter: Option<u64>overrides, mirroring the existingserver_lifetimeprecedence.url.rs: query-param parsing for the database-level override.Pool config (
pgdog-stats,pgdog/src/backend/pool/config.rs)Config::max_age_jitter: Duration(defaultDuration::ZERO). Resolved User → Database → General, same chain asserver_lifetime.Per-connection sampling (
pgdog/src/backend/pool/monitor.rs)Server::connect, sample once viarand::rng().random_range(-jitter..=+jitter)and store on the connection. Skipped (and free) when jitter is 0.Effective lifetime (
pgdog/src/backend/server.rs)Server::set_max_age_offset_ms(i64)andServer::effective_max_age(base) -> Duration. Saturates at zero on negative overflow; returnsbaseunchanged for offset 0.Comparison sites (
pgdog/src/backend/pool/inner.rs)close_oldandmaybe_check_innow compareserver.age(now)againstserver.effective_max_age(self.config.max_age).Behavioral details (matches the issue's contract)
[-jitter, +jitter]) so the average effective lifetime stays atserver_lifetime, not skewed shorter.Tests
10 new tests, all passing locally:
pgdog-config: URL parsing ofserver_lifetime_jitter.pgdog::backend::pool::config: precedence chain (general → database → user), default-zero, and the existing precedence test extended.pgdog::backend::server:effective_max_agefor default / positive / negative / negative-saturating cases.Existing pool/server tests (27 in
pool::inner, fullpgdog-configsuite) continue to pass.cargo fmtapplied; no new clippy warnings introduced at the touched call sites.Docs
The
///doc comments referencehttps://docs.pgdog.dev/configuration/pgdog.toml/{general,databases}/#server_lifetime_jitterperpgdog-config/CONTRIBUTING.md. Happy to follow up with a docs PR (or update those URLs) once you tell me where the new headings should land.