Skip to content

feat: Token caching for external identity providers#935

Open
MagicAbdel wants to merge 3 commits intopgdogdev:mainfrom
MagicAbdel:main
Open

feat: Token caching for external identity providers#935
MagicAbdel wants to merge 3 commits intopgdogdev:mainfrom
MagicAbdel:main

Conversation

@MagicAbdel
Copy link
Copy Markdown
Contributor

Summary

Introduces an in-memory token cache shared by azure_workload_identity and rds_iam authentication backends. Tokens are now fetched once and reused until expiry, instead of being fetched on every connection.

Motivation

Token fetching from external identity providers can be slow — Azure Workload Identity in particular was measured at ~30s per token fetch. This was directly impacting pool startup time, as each connection attempt would block waiting for a fresh token.

Changes

  • Added a shared token_cache module with get/set helpers keyed by host, port, and user
  • Refactored azure_workload_identity to extract fetch_token() returning (String, SystemTime), using the expires_on field from the Azure SDK response as the cache TTL
  • Refactored rds_iam to follow the same pattern, with a fixed 15-minute TTL (RDS IAM tokens are valid for 15 minutes but the AWS SDK does not return an expiry)
  • Added cache hit/miss tests for both backends

Impact

  • Pool startup is significantly faster when multiple connections share the same identity — only the first connection pays the token fetch cost
  • No behavioral change for single-connection or uncached scenarios
  • TTL-based expiry ensures tokens are refreshed before they become invalid

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pgdog/src/backend/auth/azure_workload_identity.rs 69.38% 15 Missing ⚠️
pgdog/src/backend/auth/rds_iam.rs 98.07% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@levkk
Copy link
Copy Markdown
Collaborator

levkk commented Apr 26, 2026

Nice! Quick question: do you think it would be possible to run the token acquisition as a background task instead? That way, the token is always fresh when accessed for creating server connections.

Comment thread pgdog/src/backend/auth/azure_workload_identity.rs Outdated
@MagicAbdel
Copy link
Copy Markdown
Contributor Author

Sorry for the delay, I finally had some time to circle back to your comments.

That was a great suggestion. I’ve added EXPIRY_BUFFER to token_cache.rs to trigger the refresh task 45 seconds before expiry. I extracted that logic into get_or_fetch and applied the same pattern to rds_iam to account for their 15-minute validity.

Let me know if this looks good!


let (token, expires_at) = fetcher(addr.clone()).await?;
set(key, token.clone(), expires_at);
spawn_refresh_task(addr.clone(), expires_at, fetcher);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition here where multiple connections can spawn multiple spawn_refresh_tasks. You want to spawn one task at startup per configured address. A good place for this would be the connection pool (src/backend/pool/monitor.rs).

return;
}

tokio::spawn(async move {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop needs to check for when this address becomes invalid, e.g. when the config is reloaded. Checkout the connection pool implementation, that might be a better place implementation for long-lived loops like this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants