Skip to content

Extract pool blocking-period error state into BlockingPeriodErrorState#4395

Open
mdaigle wants to merge 4 commits into
mainfrom
dev/mdaigle/pool-blocking-period-refactor
Open

Extract pool blocking-period error state into BlockingPeriodErrorState#4395
mdaigle wants to merge 4 commits into
mainfrom
dev/mdaigle/pool-blocking-period-refactor

Conversation

@mdaigle

@mdaigle mdaigle commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

This is Part 1 of 3 splitting #4376 ("Dev/mdaigle/pool rate limit") into stacked, reviewable PRs.

This PR extracts the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class. This keeps the pool's connection-acquisition path focused on capacity/queue concerns. Part 2 will use the new class to implement blocking period support in the ChannelDbConnectionPool.

Changes

  • New BlockingPeriodErrorState — encapsulates cached-exception fast-fail, exponential backoff (5s → 60s cap), exit timer, and synchronization. Takes an injectable TimeProvider so timer scheduling is deterministic in tests.
  • WaitHandleDbConnectionPool — refactored to use BlockingPeriodErrorState instead of inline error-state fields/logic.
  • DbConnectionPoolGroup.IsBlockingPeriodEnabled() — new helper centralizing the PoolBlockingPeriod decision (Auto → not an Azure SQL endpoint, AlwaysBlock → true, NeverBlock → false). Consumed by Part 2.
  • ADP.UnsafeCreateTimer(TimeProvider, ...) — new overload returning ITimer that suppresses ExecutionContext flow while honoring the injected TimeProvider; required by BlockingPeriodErrorState.
  • BlockingPeriodErrorStateTest — unit tests using FakeTimeProvider (29 tests).

Stacking

  • Part 1 (this PR) → main
  • Part 2 (channel-pool rate limiting) → this branch
  • Part 3 (GitHub instructions) → Part 2

Checklist

  • Tests added (29 passing via FakeTimeProvider)
  • No public API changes
  • No breaking changes
  • Verified against customer repro (n/a — internal refactor)

Move the connection pool's blocking-period (error backoff) logic out of WaitHandleDbConnectionPool into a reusable, testable BlockingPeriodErrorState class with exponential backoff (5s..60s), cached-exception fast-fail, and an injectable TimeProvider for deterministic tests. Add DbConnectionPoolGroup.IsBlockingPeriodEnabled() and the ADP.UnsafeCreateTimer(TimeProvider,...) overload it depends on. Includes BlockingPeriodErrorStateTest.
Copilot AI review requested due to automatic review settings June 23, 2026 17:24
@github-project-automation github-project-automation Bot moved this to To triage in SqlClient Board Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the legacy wait-handle connection pool’s blocking-period (error backoff) logic into a dedicated BlockingPeriodErrorState component, adds a pool-group helper for determining when blocking is enabled, and introduces a TimeProvider-based timer factory to enable deterministic unit testing.

Changes:

  • Added BlockingPeriodErrorState (cached exception + exponential backoff + exit timer) and corresponding unit tests using FakeTimeProvider.
  • Refactored WaitHandleDbConnectionPool to delegate blocking-period logic to the new state object and moved the blocking-period enablement decision into DbConnectionPoolGroup.
  • Added an ADP.UnsafeCreateTimer(TimeProvider, ...) overload returning ITimer to support TimeProvider-driven timers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/tests/UnitTests/ConnectionPool/BlockingPeriodErrorStateTest.cs Adds comprehensive unit tests for the new blocking-period state logic using FakeTimeProvider.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/WaitHandleDbConnectionPool.cs Replaces inlined error/backoff fields and timer logic with BlockingPeriodErrorState.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/DbConnectionPoolGroup.cs Centralizes the PoolBlockingPeriod decision into IsBlockingPeriodEnabled().
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/ConnectionPool/BlockingPeriodErrorState.cs Introduces the new reusable blocking-period error state implementation.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs Adds a TimeProvider-aware UnsafeCreateTimer overload returning ITimer.

Comment thread src/Microsoft.Data.SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs Outdated
Comment on lines +102 to +106
newTimer = ADP.UnsafeCreateTimer(
_timeProvider,
ExitCallback,
null,
Timeout.InfiniteTimeSpan,
Comment on lines +5 to +9
using System;
using System.Threading;
using Microsoft.Extensions.Time.Testing;
using Microsoft.Data.SqlClient.ConnectionPool;
using Xunit;
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.87179% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.67%. Comparing base (44e1b72) to head (b7dbf34).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...SqlClient/src/Microsoft/Data/Common/AdapterUtil.cs 75.00% 2 Missing ⚠️
.../SqlClient/ConnectionPool/DbConnectionPoolGroup.cs 75.00% 2 Missing ⚠️
...lient/ConnectionPool/WaitHandleDbConnectionPool.cs 88.23% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4395      +/-   ##
==========================================
- Coverage   65.32%   63.67%   -1.66%     
==========================================
  Files         285      281       -4     
  Lines       43373    66403   +23030     
==========================================
+ Hits        28335    42279   +13944     
- Misses      15038    24124    +9086     
Flag Coverage Δ
CI-SqlClient ?
PR-SqlClient-Project 63.67% <94.87%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

// so callers can inject a test double for deterministic scheduling.
if (ExecutionContext.IsFlowSuppressed())
{
return timeProvider.CreateTimer(callback, state, dueTime, period);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a timer via a TimeProvider allows for faking time in unit tests. I chose not to change existing overloads to feed into this one because there are many spots already use them and it would be a more extensive refactor.

// instead obtained creation mutex

DbConnectionInternal obj = null;
if (ErrorOccurred)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a check/act race condition here. The new class uses a local variable to check and throw.

SqlClientEventSource.Log.TryPoolerTraceEvent("<prov.DbConnectionPool.GetConnection|RES|CPOOL> {0}, Errors are set.", Id);
Interlocked.Decrement(ref _waitCount);
throw TryCloneCachedException();
_errorState.ThrowIfActive();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a chance this may not throw, in which case, loop back around to check available wait handles.


// Reset the error wait:
_errorWait = ERROR_WAIT_DEFAULT;
// A successful creation clears any prior error state and resets backoff.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior was missing before. If we somehow successfully create a connection while in the blocking period (via some race condition with checking the error state), then we should reset the whole error state including the ManualResetEvent and the cached exception, not just the wait period. There's no need to wait until the timer fires to allow more creates if we know we're succeeding now.

newObj = null; // set to null, so we do not return bad new object

// Failed to create instance
_resError = e;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of race conditions possible here. I decided to move this all under a lock so that all of the relevant values are updated atomically.


// Clone SqlExceptions so stack traces are not shared across callers; other
// exception types are rethrown as-is.
throw cached is SqlException sqlEx ? sqlEx.InternalClone() : cached;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the existing cloning logic

@mdaigle mdaigle marked this pull request as ready for review June 23, 2026 20:45
Copilot AI review requested due to automatic review settings June 23, 2026 20:45
@mdaigle mdaigle requested a review from a team as a code owner June 23, 2026 20:45
@mdaigle mdaigle added this to the 7.1.0-preview2 milestone Jun 23, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment on lines +195 to +200
internal static ITimer UnsafeCreateTimer(
TimeProvider timeProvider,
TimerCallback callback,
object state,
TimeSpan dueTime,
TimeSpan period)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - let's put this new code inside a nullable block.

Comment on lines +598 to +608
/// <summary>
/// Verifies that the onEnter callback is invoked after the internal lock is released.
/// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> and calls
/// <see cref="BlockingPeriodErrorState.ThrowIfActive"/> — operations that are safe only
/// when the lock is not held. If the implementation were changed to hold the lock during
/// the callback invocation, any re-entrant call from the callback that tries to acquire the
/// same lock (on a non-re-entrant lock) would deadlock.
/// </summary>
[Fact]
public void OnEnter_CalledOutsideLock_CanCallBackIntoState()
{
Comment on lines +633 to +643
/// <summary>
/// Verifies that the onExit callback is invoked after the internal lock is released.
/// The callback reads <see cref="BlockingPeriodErrorState.HasError"/> — confirming it
/// observes the cleared state — and calls <see cref="BlockingPeriodErrorState.ThrowIfActive"/>
/// without deadlocking. If the implementation were changed to hold the lock during the
/// callback, any re-entrant call from the callback would deadlock on a non-re-entrant lock.
/// </summary>
[Fact]
public void OnExit_CalledOutsideLock_CanCallBackIntoState()
{
// Arrange
// TimeProvider.System) so execution-context suppression lives in a single place.
internal static ITimer UnsafeCreateTimer(
TimeProvider timeProvider,
TimerCallback callback,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame TimerCallback<State> doesn't exist. Would it be worth adding to allow our calling code to be strongly typed?

return true;
case PoolBlockingPeriod.NeverBlock:
return false;
default:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default here hides a problem where we add/remove an enum member and forget to update this switch statement. This is a programming error in code we own (wherever _connectionOptions.PoolBlockingPeriod is assigned). Let's let the compiler tell us we're missing an enum member by omitting the default branch entirely.

I believe using a switch expression (rather than a switch statement) gives us the best exhaustiveness checking for enum member switches.

return _connectionOptions.PoolBlockingPeriod switch
{
  PoolBlockingPeriod.Auto => ...,
  PoolBlockingPeriod.AlwaysBlock => ...,
  PoolBlockingPeriod.NeverBlock => ...
}

ConnectionOptions should be guaranteeing that PoolBlockingPeriod is always a well-defined enum member value. Add that guarantee if it is missing.

return;
}

_disposed = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this last to avoid one of the operations below from failing and then never being able to re-run disposal to completion (i.e. keep IDisposable's idempotency promise). Obviously the actual operations here aren't going to fail (in a non-catastrophic way), but you never know who will be here next adding things.

/// error state so that any waiting callers do not observe a stale exception after
/// the owning pool is torn down.
/// </summary>
public void Dispose()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to put Dispose() (and the finalizer if it exists) beside the constructors since it's essentially the opposite operation, and it's nice to be able to see both together. And I try to remember to use a #region Construction. Just a style suggestion.

private readonly int _ownerPoolId;

// Optional callback invoked (under _lock) when the state enters the blocking period;
// used by the legacy wait-handle pool to signal its error wait handle. Must be cheap

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mention who supplies it or why. But I would add what null means. And I would specify that these callbacks must not throw.


// Bump the backoff for the next failure, capped at MaxWait. FR-008.
TimeSpan doubled = _nextWait + _nextWait;
_nextWait = doubled >= MaxWait ? MaxWait : doubled;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just > is sufficient here.

// (onEnter/onExit) can never diverge from the internal state transitions under
// concurrent Enter/Clear/exit-timer activity. The callbacks are expected to be
// cheap, non-reentrant operations.
_onEnter?.Invoke();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard against misbehaving callbacks and Dispose() with try/catch that swallows. Same for Clear(), ExitCallback(), and our Dispose().

/// Clears the cached error state, disposes the exit timer, and resets the backoff to
/// its initial value.
/// </summary>
internal void Clear()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's definitely some overlap between Clear, ExitCallback, and Dispose. I'm having trouble understanding why they aren't all the same. It seems like _cachedException, _inElevatedState, _exitTimer, and _nextWait could all live/die together in a helper class.

@github-project-automation github-project-automation Bot moved this from To triage to Waiting for customer in SqlClient Board Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting for customer

Development

Successfully merging this pull request may close these issues.

4 participants