JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1852

nhachicha · 2025-12-09T13:47:53Z

Relevant specification changes:

DRIVERS-1934: withTransaction API retries too frequently (#1851)
DRIVERS-1934: clarify drivers back off before all transaction retries (#1868) - the change has not yet been addressed in this PR, because it was introduced after creation of the PR.
DRIVERS-3364: Fix Retry Backoff is Enforced prose test. - the change has not yet been addressed in this PR, because it was introduced after creation of the PR.

Copilot

Pull request overview

This PR implements exponential backoff with jitter for transaction retries in MongoDB's withTransaction convenience API. The implementation adds a configurable backoff mechanism that applies delays between retry attempts when transient transaction errors occur, following the MongoDB specification with a growth factor of 1.5 for transactions.

Key Changes

Introduces ExponentialBackoff utility class with factory methods for transaction retries (5ms base, 500ms max, 1.5x growth) and command retries (100ms base, 10s max, 2.0x growth)
Integrates backoff logic into ClientSessionImpl.withTransaction() to delay between retry attempts
Adjusts test configuration to verify backoff behavior with multiple retry attempts

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java	New utility class implementing exponential backoff with jitter using ThreadLocalRandom
driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java	Adds backoff delay before transaction retries and uses CSOT timeout when available
driver-core/src/test/unit/com/mongodb/internal/ExponentialBackoffTest.java	Comprehensive unit tests validating backoff calculations, growth factors, and maximum caps
driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java	New functional test verifying exponential backoff behavior and adjusted existing test configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

Copilot · 2025-12-09T14:25:58Z

driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java

+            AtomicInteger retryCount = new AtomicInteger(0);
+
+            session.withTransaction(() -> {
+                retryCount.incrementAndGet();  // Count the attempt before the operation that might fail


The test verifies the retry count but does not validate that exponential backoff delays are actually applied. Consider measuring elapsed time and asserting minimum expected delays to ensure backoff is functioning correctly. For example, with 3 retries at delays of ~5ms, ~7.5ms, and ~11.25ms, the total elapsed time should be at least the sum of minimum expected delays.

ExponentialBackoffTest covers these unit tests already.

dariakp · 2025-12-17T19:43:50Z

@nhachicha Please take note of mongodb/specifications#1868

stIncMale

I haven't reviewed ExponentialBackoffTest, because it depends on ExponentialBackoff, where I left many suggestions.
I haven't reviewed ClientSessionImpl, because it has to implement the new specification change DRIVERS-1934: clarify drivers back off before all transaction retries (#1868).
The last reviewed commit is 90ec4d5.

driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java

stIncMale · 2026-01-02T22:32:06Z

driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java

+    public static ExponentialBackoff forTransactionRetry() {
+        return new ExponentialBackoff(
+                TRANSACTION_BASE_BACKOFF_MS,
+                TRANSACTION_MAX_BACKOFF_MS,
+                TRANSACTION_BACKOFF_GROWTH
+        );
+    }


Within this PR, I failed to find any benefits of expressing the backoff computation as behavior of an object (an instance of ExponentialBackoff), rather than just a static method; regardless of the approach, the ClientSessionImpl.withTransaction method has to maintain one new local variable: either the lazily initialized transactionBackoff, or the transactionAttempt (that's how it is named in the spec, but we are free to use a different name). Given this, I propose to go with the more straightforward approach of expressing the backoff computation as a static method¹, rather than as an object behavior.

If in the future we observe that this is not enough and the "object" approach is needed, we'll be able to change the approach. But we will have a clear reason for that.

Also, storing all the constants in each instance of ExponentialBackoff as instance fields is unnecessary. If we have to use the "object" approach in the future, we better implement it in such a way that does not duplicate constants as instance fields in each object instance (an interface / abstract class with two implementations is one way to achieve that).

P.S. Some other review comments I left are written within the current "object" approach (as opposed to he proposed "method" approach). If the suggestion in this comment is applied, those comments should not be automatically discarded, but rather each should be examined and adopted, if applicable, to the "method" approach.

¹ If in the future we need another static method for command retries, we will be able to move the computational logic in a private static method, and call that method from two public methods, passing the suitable constants as method arguments.

I agree with your points (especially the memory overhead of the constants and the state management), however, the current approach also has the following benefit:

Encapsulation: Retry count management is internal - caller can't forget to increment

Configuration bundling: forTransactionRetry() vs forCommandRetry() clearly separates concerns

Type safety: Can't accidentally mix transaction/command constants

I pushed a "middle-ground" stateless solution based on Enum, which only defines convenient transaction retries 👍

The new code is better, but the above description of benefits the current enum approach provides is flawed. The current enum approach is actually conceptually equivalent to the approach with the static method I proposed, but achieves the same in a more complex way (from the source code standpoint), with a potentially less efficient memory organization, and probably allows fewer JIT/javac optimizations¹.

Encapsulation: Retry count management is internal - caller can't forget to increment

I like encapsulation, but the current enum approach cannot handle counting internally, and does not do that: retryCount is passed to the instance method ExponentialBackoff.calculateDelayBeforeNextRetryMs.

Note also that the way the overload retry policy is formulated at the moment (still work in progress), does not even allow us to encapsulate retry counting in ExponentialBackoff, because generally speaking backoff does not need to be computed/applied to each retry attempt, but the retry attempt index used to compute the backoff is the index of the retry attempt in the sequence of all retry attempts, regardless of what retry policy an attempt was executed under. That is, computing the backoff and counting retry attempts have to be done independently of each other. But if ExponentialBackoff were to encapsulate the counting, then they could only have been done together.

Configuration bundling: forTransactionRetry() vs forCommandRetry() clearly separates concerns

With the current enum approach, we'll have ExponentialBackoff.TRANSACTION.calculateDelayBeforeNextRetryMs(retryCount) vs ExponentialBackoff.COMMAND.calculateDelayBeforeNextRetryMs(retryCount), as opposed to what you wrote. With the static method approach, we'll have ExponentialBackoff.forTransaction(retryCount) vs ExponentialBackoff.forCommand(retryCount) - conceptually the same exact thing.

Type safety: Can't accidentally mix transaction/command constants

Enum constants all have the same type: ExponentialBackoff.TRANSACTION and ExponentialBackoff.COMMAND are both of the ExponentialBackoff type. Therefore, the enum approach addresses type-safety to the same extent as the static method approach - it does not.

Of course, an "object" approach where different instances of ExponentialBackoff have different compile-time types still possible, just not with ExponentialBackoff being enum. But it still won't improve type-safety simply because the code that uses it won't be able to benefit from those different compile-time types due to its trivial nature.

Thus, the current enum approach gives us only "configuration bundling", which is achievable to the same extent with the static method approach without involving any objects and enum classes.

¹

I think performance-related considerations are practically inconsequential here, and the code complexity aspect totally dominates the performance aspect. However, when there is conceptual equivalence of two approaches, considering other aspects, including inconsequential ones to make a choice is still not necessarily unreasonable. I am using the performance aspect to add a tiny bit to my argument.

enum constants are references to objects on heap, and accessing their state, even when it's final (baseMs, maxMs, growth) likely allows fewer JIT/javac optimizations than accessing constants from the run-time constant pool (which is where the numeric literals from a static method end up at), because final fields are modifiable, while constants in the pool are not.

driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java

driver-sync/src/test/functional/com/mongodb/client/WithTransactionProseTest.java

stIncMale · 2026-01-03T10:21:06Z

driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

+                    } catch (InterruptedException e) {
+                        Thread.currentThread().interrupt();
+                        throw new MongoClientException("Transaction retry interrupted", e);


Let's use InterruptionUtil.interruptAndCreateMongoInterruptedException.

stIncMale · 2026-01-03T10:23:17Z

driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

+                    long backoffMs = transactionBackoff.calculateDelayMs();
+                    try {
+                        if (backoffMs > 0) {
+                            Thread.sleep(backoffMs);
+                        }
+                    } catch (InterruptedException e) {
+                        Thread.currentThread().interrupt();
+                        throw new MongoClientException("Transaction retry interrupted", e);
+                    }


Let's extract this code to a private static method. The withTransaction method is already too long, we should not make it longer without good reason.

driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

stIncMale · 2026-01-03T10:28:09Z

driver-sync/src/main/com/mongodb/client/internal/ClientSessionImpl.java

    @Override
    public <T> T withTransaction(final TransactionBody<T> transactionBody, final TransactionOptions options) {
        notNull("transactionBody", transactionBody);
        long startTime = ClientSessionClock.INSTANCE.now();


[just a comment on a code that wasn't changed in this PR]

I have just noticed this ClientSessionClock - it uses non-monotonic clock. Horrendous.

dariakp · 2026-01-08T19:46:46Z

@stIncMale @nhachicha Flagging one more relevant spec test adjustment here: mongodb/specifications#1876

stIncMale · 2026-01-08T20:04:04Z

@dariakp Thank you for the heads up, I updated the PR description.

…ff on retries

…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())

…odb/specifications#1868

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

stIncMale

This is a partial review, where I reviewed only ExponentialBackoff.java.

The last reviewed commit is 36ecbf9.

stIncMale · 2026-01-21T23:19:32Z

driver-core/src/main/com/mongodb/internal/time/ExponentialBackoff.java

+    /**
+     * Calculate the next delay in milliseconds based on the retry count and a provided jitter.
+     *
+     * @param retryCount The number of retries that have occurred.
+     * @param jitter     A double in the range [0, 1) to apply as jitter.
+     * @return The calculated delay in milliseconds.
+     */
+    public long calculateDelayBeforeNextRetryMs(final int retryCount, final double jitter) {
+        double backoff = Math.min(baseMs * Math.pow(growth, retryCount), maxMs);
+        return Math.round(jitter * backoff);
+    }


This method is called only from a test that tests this method. This makes the method effectively dead. If the method is introduced in anticipation to become useful in the future, then we should introduce it when and in the PR where it is going to be actually used.

Just for information: if the method were not dead, I would have suggest that the implementation of the calculateDelayBeforeNextRetryMs method should be done by calling calculateDelayBeforeNextRetryMs, to reuse the code instead of duplicating it. But I am not suggesting that because of item 1 above.

stIncMale · 2026-01-22T00:26:37Z

driver-core/src/main/com/mongodb/internal/time/ExponentialBackoff.java

+
+    private final double baseMs, maxMs, growth;
+
+    // TODO remove this global state once https://jira.mongodb.org/browse/JAVA-6060 is done


The tag format is TODO-<ticket ID>. This way we can find all such tags by searching the codebase for "TODO-", and not have them mixed them with the (useless) TODO tags that were introduced to the code before we started using the new approach.

Comments / error messages / etc. with such tags is a mechanism we resort to if the description of a ticket is not enough to conveniently specify all the information. In this particular case, the description of a ticket is very much enough, as we can just say in it something like "Use InternalMongoClientSettings to get rid of the ExponentialBackoff.testJitterSupplier global state". So we should probably not resort to leaving a comment with the TODO-<ticket ID> tag.

When we resort to this mechanism, we should also leave a note in the ticket description that addressing comments with the TODO-<ticket ID> tag is in scope of the ticket. See https://jira.mongodb.org/browse/JAVA-6005, https://jira.mongodb.org/browse/JAVA-6059 as examples (I updated the latter, as well as some other tickets, because they were missing the note). Such notes are important because they draw attention of the assignee to the tagged comments. Without a note, the assignee is more likely to not even realize there are relevant tagged comments.

https://jira.mongodb.org/browse/JAVA-6060 is about introducing InternalMongoClientSettings and getting rid of InternalStreamConnection.setRecordEverything, but is not about getting rid of ExponentialBackoff.testJitterSupplier. Therefore, we should

create another ticket that will use InternalMongoClientSettings to get rid of the ExponentialBackoff.testJitterSupplier global state;

link it to https://jira.mongodb.org/browse/JAVA-6060 via the "depends on" link.

stIncMale · 2026-01-22T06:20:05Z

driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java

+    public static ExponentialBackoff forTransactionRetry() {
+        return new ExponentialBackoff(
+                TRANSACTION_BASE_BACKOFF_MS,
+                TRANSACTION_MAX_BACKOFF_MS,
+                TRANSACTION_BACKOFF_GROWTH
+        );
+    }


The new code is better, but the above description of benefits the current enum approach provides is flawed. The current enum approach is actually conceptually equivalent to the approach with the static method I proposed, but achieves the same in a more complex way (from the source code standpoint), with a potentially less efficient memory organization, and probably allows fewer JIT/javac optimizations¹.

Encapsulation: Retry count management is internal - caller can't forget to increment

I like encapsulation, but the current enum approach cannot handle counting internally, and does not do that: retryCount is passed to the instance method ExponentialBackoff.calculateDelayBeforeNextRetryMs.

Note also that the way the overload retry policy is formulated at the moment (still work in progress), does not even allow us to encapsulate retry counting in ExponentialBackoff, because generally speaking backoff does not need to be computed/applied to each retry attempt, but the retry attempt index used to compute the backoff is the index of the retry attempt in the sequence of all retry attempts, regardless of what retry policy an attempt was executed under. That is, computing the backoff and counting retry attempts have to be done independently of each other. But if ExponentialBackoff were to encapsulate the counting, then they could only have been done together.

Configuration bundling: forTransactionRetry() vs forCommandRetry() clearly separates concerns

With the current enum approach, we'll have ExponentialBackoff.TRANSACTION.calculateDelayBeforeNextRetryMs(retryCount) vs ExponentialBackoff.COMMAND.calculateDelayBeforeNextRetryMs(retryCount), as opposed to what you wrote. With the static method approach, we'll have ExponentialBackoff.forTransaction(retryCount) vs ExponentialBackoff.forCommand(retryCount) - conceptually the same exact thing.

Type safety: Can't accidentally mix transaction/command constants

Enum constants all have the same type: ExponentialBackoff.TRANSACTION and ExponentialBackoff.COMMAND are both of the ExponentialBackoff type. Therefore, the enum approach addresses type-safety to the same extent as the static method approach - it does not.

Of course, an "object" approach where different instances of ExponentialBackoff have different compile-time types still possible, just not with ExponentialBackoff being enum. But it still won't improve type-safety simply because the code that uses it won't be able to benefit from those different compile-time types due to its trivial nature.

Thus, the current enum approach gives us only "configuration bundling", which is achievable to the same extent with the static method approach without involving any objects and enum classes.

¹

I think performance-related considerations are practically inconsequential here, and the code complexity aspect totally dominates the performance aspect. However, when there is conceptual equivalence of two approaches, considering other aspects, including inconsequential ones to make a choice is still not necessarily unreasonable. I am using the performance aspect to add a tiny bit to my argument.

enum constants are references to objects on heap, and accessing their state, even when it's final (baseMs, maxMs, growth) likely allows fewer JIT/javac optimizations than accessing constants from the run-time constant pool (which is where the numeric literals from a static method end up at), because final fields are modifiable, while constants in the pool are not.

stIncMale · 2026-01-22T06:31:05Z

driver-core/src/main/com/mongodb/internal/time/ExponentialBackoff.java

+    /**
+     * Calculate the next delay in milliseconds based on the retry count.
+     *
+     * @param retryCount The number of retries that have occurred.
+     * @return The calculated delay in milliseconds.
+     */
+    public long calculateDelayBeforeNextRetryMs(final int retryCount) {
+        double jitter = testJitterSupplier != null
+                ? testJitterSupplier.getAsDouble()
+                : ThreadLocalRandom.current().nextDouble();
+        double backoff = Math.min(baseMs * Math.pow(growth, retryCount), maxMs);
+        return Math.round(jitter * backoff);
+    }


The class name uses the term "backoff", but the method uses the term "delay" (both in its name and in the documentation comment). Let's not use two different terms to refer to the same thing.

I am guessing the above is an indirect result of you deciding to call backoff the intermediate result when computing the backoff. Given that this intermediate result does not serve any purpose, we don't have to make it a thing and name it, we can just write the whole formula like return Math.round(jitter * Math.min(baseMs * Math.pow(growth, retryCount), maxMs)).

stIncMale · 2026-01-22T07:00:10Z

driver-core/src/main/com/mongodb/internal/time/ExponentialBackoff.java

+public enum ExponentialBackoff {
+    TRANSACTION(5.0, 500.0, 1.5);
+
+    private final double baseMs, maxMs, growth;


We don't use this declaration style in the Java driver codebase. Let's declare each instance field separately.

stIncMale · 2026-01-22T07:19:56Z

driver-core/src/main/com/mongodb/internal/time/ExponentialBackoff.java

+    /**
+     * Calculate the next delay in milliseconds based on the retry count.
+     *
+     * @param retryCount The number of retries that have occurred.
+     * @return The calculated delay in milliseconds.
+     */
+    public long calculateDelayBeforeNextRetryMs(final int retryCount) {
+        double jitter = testJitterSupplier != null
+                ? testJitterSupplier.getAsDouble()
+                : ThreadLocalRandom.current().nextDouble();
+        double backoff = Math.min(baseMs * Math.pow(growth, retryCount), maxMs);
+        return Math.round(jitter * backoff);
+    }


This method accepts retryCount despite:

both the specification and ClientSessionImpl.withTransaction operating 0-based number of a transaction attempt that it is called transactionAttempt;

As a result, ClientSessionImpl has to pass transactionAttempt - 1 when calling calculateDelayBeforeNextRetryMs to compute the backoff for the attempt with number transactionAttempt.

RetryState.attempt() returning 0-based attempt number.

Let's change this method so that it also operates 0-based attempt number by accepting attempt documented as the 0-based number of the transaction attempt for which backoff is to be calculated.

nhachicha self-assigned this Dec 9, 2025

nhachicha mentioned this pull request Dec 9, 2025

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1848

Closed

nhachicha requested a review from Copilot December 9, 2025 14:25

Copilot AI reviewed Dec 9, 2025

View reviewed changes

nhachicha marked this pull request as ready for review December 9, 2025 15:24

nhachicha requested a review from a team as a code owner December 9, 2025 15:24

nhachicha requested review from stIncMale and strogiyotec and removed request for strogiyotec December 9, 2025 15:24

vbabanin self-requested a review December 9, 2025 16:11

stIncMale requested changes Jan 3, 2026

View reviewed changes

nhachicha requested a review from stIncMale January 16, 2026 00:56

nhachicha and others added 15 commits January 19, 2026 17:53

JAVA-5950 - Update Transactions Convenient API with exponential backo…

1ed116f

…ff on retries

Simplifying test, clean up.

eb8b4ad

Fixing test

b8b0e1a

Update driver-sync/src/main/com/mongodb/client/internal/ClientSession…

c05ce05

…Impl.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

retrigger checks

aa96659

retrigger checks

98fc57b

retrigger checks

f98262e

retrigger checks

bfc89fc

test cleanup

d9405ef

retrigger checks

5de452b

Test cleanup

1867ff5

retrigger checks

f89d62d

Update the implementation according to the spec

3d646ae

Added prose test

9b4bf15

Flaky test

da83704

nhachicha and others added 14 commits January 19, 2026 17:53

Remove extra Test annotation

cb95167

Throwing correct exception when CSOT is used

ef734a0

Simplifying implementation by relying on CSOT to throw when timeout i…

96b5ed7

…s exceeded (ex operationContext.getTimeoutContext().getReadTimeoutMS())

Fixing implementation according to spec changes in JAVA-6046 and mong…

43eda52

…odb/specifications#1868

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

a4193dd

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

f2d8263

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

f3daab0

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

7afd9d4

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

ccb8d03

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

bbb9a68

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-sync/src/test/functional/com/mongodb/client/WithTransac…

c44872d

…tionProseTest.java Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

Update driver-core/src/main/com/mongodb/internal/ExponentialBackoff.java

f0dd916

Co-authored-by: Valentin Kovalenko <valentin.male.kovalenko@gmail.com>

PR feedback

0e00c90

remove annotation

36ecbf9

nhachicha force-pushed the nh/withTransaction_delay branch from 5c2145c to 36ecbf9 Compare January 19, 2026 18:01

stIncMale requested changes Jan 22, 2026

View reviewed changes


		private final double baseMs, maxMs, growth;

		// TODO remove this global state once https://jira.mongodb.org/browse/JAVA-6060 is done

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1852

Are you sure you want to change the base?

JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1852

Conversation

nhachicha commented Dec 9, 2025 • edited by stIncMale Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dariakp commented Dec 17, 2025

Uh oh!

stIncMale left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dariakp commented Jan 8, 2026

Uh oh!

stIncMale commented Jan 8, 2026

Uh oh!

stIncMale left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nhachicha commented Dec 9, 2025 •

edited by stIncMale

Loading