[Gateway V1/V2]: Enable ReadConsistencyStrategy for Gateway V1 and Gateway V2#48787
Conversation
a3d4c0f to
26648a7
Compare
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…n modes (Azure#48094) - Add ReadConsistencyStrategy RNTBD header (0x00F0, String) to RntbdConstants and RntbdRequestHeaders for thin client proxy propagation - Replace Gateway-mode warn+ignore with client-side GLOBAL_STRONG validation that works across all modes (direct, gateway, thin client) - Update ReadConsistencyStrategy javadoc to reflect all-modes support - Add unit tests for RNTBD token encoding and round-trip - Add E2E tests for thin client and compute gateway ReadConsistencyStrategy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…is set (Azure#48094) RxGatewayStoreModel.applySessionToken() called RequestHelper.getReadConsistencyStrategyToUse() which had a side-effect of rewriting x-ms-consistency-level header (e.g., LATEST_COMMITTED mapped to BoundedStaleness). The compute gateway rejected this because BoundedStaleness is stricter than the Session account default. Fix: Use a copy of the headers map so the original x-ms-consistency-level is preserved. Gateway/proxy now sees: - x-ms-consistency-level: Session (original, unchanged) - x-ms-cosmos-read-consistency-strategy: LatestCommitted (RCS intent) Verified E2E: LATEST_COMMITTED, EVENTUAL, SESSION, client-level RCS all return 200. GLOBAL_STRONG correctly throws BadRequestException on Session account. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Azure#48094) Compute gateway rejects requests containing both x-ms-consistency-level and x-ms-cosmos-read-consistency-strategy headers. When RCS is non-DEFAULT, remove the consistency-level header — RCS takes precedence. Applied to both client-level and request-options-level RCS paths in RxDocumentClientImpl.getRequestHeaders(). Verified E2E against test4 compute gateway (swkrish-session, Session account): LATEST_COMMITTED, EVENTUAL, SESSION, client-level RCS all return 200. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…yte (Azure#48094) The proxy expects ReadConsistencyStrategy as a Byte enum, not a String: Eventual=1, Session=2, LatestCommitted=3, GlobalStrong=4 With String type, the proxy couldn't parse the RNTBD frame and hung. With Byte type, thin client reads work correctly through the proxy. Added RntbdReadConsistencyStrategy enum to RntbdConstants matching the proxy's ReadConsistencyStrategy.h enum values. Verified E2E against test4 thin client proxy (swkrish-session): SESSION, EVENTUAL, LATEST_COMMITTED all return 200 with tc=true. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…urations (Azure#48094) Removed hardcoded database/container names. Tests now: - Create a unique database and container in @BeforeClass - Clean up in @afterclass - Use TestConfigurations.HOST/MASTER_KEY from cosmos-v4.properties - Use /pk as partition key path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…zure#48094) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t option types (Azure#48094) Cover all 5 surfaces: ItemRequestOptions, QueryRequestOptions, ChangeFeedRequestOptions, ReadManyRequestOptions, CosmosClientBuilder (client-level). Plus write-ignored, GLOBAL_STRONG validation, and CL+RCS precedence tests. Tests use dynamic database/container creation, TestConfigurations, serverless-safe. Follows ThinClientTestBase pattern from PR Azure#47759. Verified E2E: 21/21 PASS (10 thin client + 11 gateway V1) against swkrish-session (test4, Session, North Europe). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Cosmos DB RP now rejects API version 2022-08-15 for accounts with EnableNoSQLVectorSearch capability, requiring 2023-04-15 or later. Error: 'Please use api version 2022-02-15-preview or 2023-04-15 or later' ActivityId: 3e0f30d8-548a-408f-9827-09c3b81a7166 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verifying that the API version rejection was caused by EnableNoSQLVectorSearch. This commit will be reverted after pipeline verification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…QLVectorSearch (Azure#48094) Cosmos DB RP now requires API version 2023-04-15+ for accounts with EnableNoSQLVectorSearch capability. Confirmed by testing: - 2022-08-15 + EnableNoSQLVectorSearch = 400 BadRequest - 2022-08-15 without EnableNoSQLVectorSearch = passes - 2023-04-15 + EnableNoSQLVectorSearch = passes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add resolveEffectiveConsistencyHeaders() in RxGatewayStoreModel that strips x-ms-consistency-level when ReadConsistencyStrategy wins. Called before wrapInHttpRequest — affects both GW V1 (HTTP) and GW V2/ThinClientStoreModel (RNTBD). - Fix contention bug in RxDocumentClientImpl.getRequestHeaders(): options.getConsistencyLevel() no longer re-adds CL header when RCS is already present (Option A guard). - Rules: request-level RCS > client-level RCS; RCS > ConsistencyLevel. Only one consistency header survives on the wire. - 10 unit tests (ConsistencyFlagContentionTest): both-set, request-ctx priority, header-level, DEFAULT transparent, null, idempotency. - 4 new E2E tests: request-level contention and request-level RCS override for both GW V1 and GW V2 paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
38a8292 to
e00f5bb
Compare
…ire (Azure#48094) Verify actual HTTP headers sent by the SDK using the SpyClientUnderTest (Mockito spy on HttpClient) pattern from RequestHeadersSpyWireTest. 5 new tests: - Request-level RCS: x-ms-cosmos-read-consistency-strategy on wire, CL stripped - Client-level RCS: same header verification via builder-level config - Both RCS + CL: RCS wins, CL stripped (contention resolution) - DEFAULT RCS: no RCS header emitted (transparent) - Write with client RCS: no RCS header on write operations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
d12a376 to
f30f417
Compare
…tests (Azure#48094) - Fix existing RntbdReadConsistencyStrategyHeaderTests: update token type assertions from String to Byte, matching the proxy-compatible encoding (Eventual=0x01, Session=0x02, LatestCommitted=0x03, GlobalStrong=0x04) - Add 7 new RNTBD spy-wire tests that simulate ThinClientStoreModel.wrapInHttpRequest(): Build RntbdRequest from RxDocumentServiceRequest with RCS headers, encode to ByteBuf, and verify the RNTBD frame contains header 0x00F0 with correct byte value. Covers all 4 RCS strategies, absent RCS, and CL-stripped scenario. - Fix GLOBAL_STRONG E2E test: disable with TODO — BadRequestException from validateReadConsistencyStrategy() is swallowed by availability strategy and does not propagate to the caller. Validation works (unit tests prove it). - Add readConsistencyStrategyRntbdByteEnumValues test verifying the enum IDs. All 42 tests pass: 11 GW E2E + 10 contention unit + 21 RNTBD unit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
f30f417 to
ce56e3e
Compare
…Azure#48094) Move 5 ReadConsistencyStrategy HTTP spy-wire tests from RequestHeadersSpyWireTest (which extends TestSuiteBase and requires provisioned throughput) into a new standalone ReadConsistencyStrategyHttpSpyWireTest class that creates its own serverless-safe resources (no throughput). This allows the spy-wire tests to run on serverless accounts (e.g., test4 Session accounts) without depending on TestSuiteBase shared collections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sistency-header-propagation # Conflicts: # sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/RxGatewayStoreModelTest.java
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Backend amplification re-verified (post-merge, fresh jar)After merging Workload window (UTC): Test command per account: All 6 runs: Kusto verificationCluster: Backend amplification query (UA-filtered to isolate our workload's traffic from co-tenant noise): BackendEndRequest5M
| where TIMESTAMP between (datetime(2026-06-03T21:50:00Z) .. datetime(2026-06-03T22:05:00Z))
| where GlobalDatabaseAccountName startswith "thin-client-m" and GlobalDatabaseAccountName endswith "-ci"
| where StatusCode == 200
| where OperationType in (2, 3, 15) // 2=Read, 3=ReadFeed, 15=Query
| extend UAClass = case(
UserAgent contains "ThinProxy", "V2_ThinProxy",
UserAgent contains "ComputeGateway-cdb", "V1_ComputeGW",
"Other")
| where UAClass != "Other"
| summarize BackendReads = sum(SampleCount) by GlobalDatabaseAccountName, OperationType, UAClass
| order by GlobalDatabaseAccountName asc, OperationType asc, UAClass ascActual results — backend reads per
|
| Account | OpType | V1 ComputeGW | V2 ThinProxy |
|---|---|---|---|
thin-client-mr-bs-ci |
2 (Read) | 30 | 30 |
thin-client-mr-bs-ci |
15 (Query) | 30 | 30 |
thin-client-mr-eventual-ci |
2 (Read) | 30 | 30 |
thin-client-mr-eventual-ci |
15 (Query) | 30 | 30 |
thin-client-mr-session-ci |
2 (Read) | 30 | 30 |
thin-client-mr-session-ci |
15 (Query) | 30 | 30 |
thin-client-mr-strong-ci |
2 (Read) | 30 | 30 |
thin-client-mr-strong-ci |
15 (Query) | 30 | 30 |
thin-client-mwr-bs-ci |
2 (Read) | 30 | 30 |
thin-client-mwr-bs-ci |
15 (Query) | 30 | 30 |
thin-client-mwr-eventual-ci |
2 (Read) | 30 | 30 |
thin-client-mwr-eventual-ci |
15 (Query) | 30 | 30 |
ReadFeed cross-validated against ThinClientProxyRequest5M.operationType (string) and ComputeRequest5M.OperationName: 15 client → 30 backend per cell across all 6 accounts on both paths.
Client-side baseline (proves 15 client ops per cell)
ThinClientProxyRequest5M
| where TIMESTAMP between (datetime(2026-06-03T21:50:00Z) .. datetime(2026-06-03T22:05:00Z))
| where globalDatabaseAccountName startswith "thin-client-m" and globalDatabaseAccountName endswith "-ci"
| summarize ClientOps = sum(SampleCount) by globalDatabaseAccountName, operationType, statusCodeEvery (account, Read|Query|ReadFeed, 200) cell on the V2 proxy = exactly 15.
Headline
| Metric | Result |
|---|---|
| Maven runs | 6/6 BUILD SUCCESS |
| Cells (6 accounts × 3 ops × 2 paths) | 36/36 show client:backend = 1:2 |
| 13005 / 400 errors | 0 |
| V2 vs V1 backend reads | identical (30 = 30) |
ReadConsistencyStrategy=LATEST_COMMITTED is honored end-to-end on both Gateway V1 and Gateway V2, overriding account default consistency for all 6 accounts (Strong, Bounded Staleness, Session, Eventual, MWR-BS, MWR-Eventual). Confirms the SDK fix that drops x-ms-consistency-level when x-ms-read-consistency-strategy is non-Default works correctly post-merge.
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
LGTM - great test coverage. Thanks for driving this!
kushagraThapar
left a comment
There was a problem hiding this comment.
Hey @jeet1995 — really impressive work here. Walked through this end-to-end (RNTBD encoder, V1 string → V2 byte parity, dual-header strip on RxGatewayStoreModel, the GLOBAL_STRONG fast-fail path, and the spy-wire test framework you built) and the wire-protocol-level care is excellent.
What I loved:
- Byte-for-byte .NET parity on the RNTBD token IDs (
0x01/0x02/0x03/0x04forEventual/Session/LatestCommitted/GlobalStrongmatches theirRntbdConstants.csexactly — verified). This is the kind of cross-SDK contract test that catches regressions years later. RntbdReadConsistencyStrategyHeaderTests.everyEnumValueMapsToConstantBytepinning the explicit byte values is a really nice belt-and-suspenders guard — that's the test I'd write to make sure no one renumbers the enum and silently breaks the wire format.- The dual-header strip in
applySessionToken(clearing bothx-ms-consistency-levelANDx-ms-cosmos-read-consistency-strategywhen the session token is dropped) is actually stricter than .NET's behavior — .NET only strips on V2. That's the right call. - The 1-line javadoc edit on
ReadConsistencyStrategy.java:25("honored across all connection modes") is the API-surface linchpin for this whole change. Glad Fabian called out the test coverage in his approval.
Approving — Fabian's just signed off, so this should be ready to land. I've left 4 inline suggestions below as follow-ups, not blockers. M3 (CHANGELOG placement) is the most time-sensitive — easiest to fix before merge so 4.81.0-beta.1 actually gets the release note. The others are cleanup / agentic-readiness polish that can land in a follow-up if you'd rather not stretch this PR.
One coordination note for the team: when we merge this PR vs. #49345 (the GA-promotion PR), it'd be ideal to land this one first so #49345 inherits the corrected "honored across all connection modes" javadoc wording and any setter-level disclosure updates (see M5 below). If #49345 goes first, customers see GA-stable javadoc that's silent on the new behavior for one release.
Cross-SDK fact tightener (FYI, not blocking): the PR description says "Java is the first SDK to extend RCS beyond Direct mode" — strictly speaking that's .NET PR #5685 (Mar 2026) + #5890 (May 2026), but those landed under #if PREVIEW. Java is the first to ship RCS-on-Gateway in a stable release. Worth tightening if anyone forwards this PR description externally.
Thanks again for the careful work here.
…est, setter javadoc - Move ReadConsistencyStrategy CHANGELOG entries from shipped 4.80.0 to 4.81.0-beta.1 (Unreleased) - Add invariant test that iterates ReadConsistencyStrategy.values() and asserts every non-DEFAULT value emits a non-zero RNTBD byte; adding a new enum value will fail this test until RntbdRequestHeaders switch is updated - Add javadoc to 5 setReadConsistencyStrategy setters (CosmosItemRequestOptions, CosmosQueryRequestOptions, CosmosChangeFeedRequestOptions, CosmosReadManyRequestOptions, CosmosReadManyByPartitionKeysRequestOptions) describing cross-mode support and client-side GLOBAL_STRONG fast-fail Follow-up issue Azure#49370 filed for spy-wire test coverage on Query/ChangeFeed/readMany paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…ed and readManyByPartitionKeys coverage Adds spy-wire assertions on the V1 gateway and V2 thin-client paths for the three feed-style operations that route through the thin client: Query, incremental ChangeFeed, and readManyByPartitionKeys. Mirrors the existing point-read coverage with request-level ReadConsistencyStrategy, default (no-header) and ReadConsistencyStrategy-vs-ConsistencyLevel contention scenarios. ChangeFeed contention is intentionally omitted because CosmosChangeFeedRequestOptions does not expose setConsistencyLevel. Addresses review comment Azure#48787 (comment). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Initialize spy client's operationPolicies field via reflection to avoid NPE on ChangeFeed path (RxDocumentClientImpl bypasses Builder init). - Pre-slice ByteBuf to exactly expectedLength bytes before RntbdRequest.decode so header decoder stops at the frame boundary instead of reading into payload. - Filter query-plan precursor requests (x-ms-cosmos-is-query-plan-request: True) and accept GET (change-feed) in addition to POST (query/readMany) for V1 feed-request matching. Spy-wire suite: 11 failures -> 0. All 29 tests pass on thin-client-multi-region-ci. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - kafka |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
GatewayConnectionConfig's default constructor installs a non-null
Http2ConnectionConfig with enabled=null, which falls back to the
global COSMOS.HTTP2_ENABLED system property at runtime. CI sets that
property to true (sdk/cosmos/tests.yml), which silently flipped the
'V1' spy clients in GatewayReadConsistencyStrategySpyWireTest to the
thin-client path. The test's V1-shaped request assertions
(GET /docs/{id}, POST /docs without id) then could not match the
captured wire requests (POST to thin proxy :10250), producing 9-11
'Expected a document read request' and 'expected LatestCommitted but
was null' failures across all V1 test methods.
Fix: in createSpyClient(rcs, http2Enabled) explicitly call
setHttp2ConnectionConfig(new Http2ConnectionConfig().setEnabled(http2Enabled)).
This pins V1 to HTTP/1.1 and V2 to HTTP/2 regardless of the JVM-wide
property, making the test deterministic in CI and locally.
Verified locally: 29/29 tests pass with -DCOSMOS.HTTP2_ENABLED=true
-DCOSMOS.THINCLIENT_ENABLED=true, which is the exact CI configuration
that previously reproduced the failures.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - kafka |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Issue
Fixes #48094
Summary
Enable
ReadConsistencyStrategy(RCS) across all connection modes — Direct (already working), Gateway V1 (compute gateway), and Gateway V2 (thin client proxy).Prioritization rules (the core of this PR)
When the SDK builds a request, two consistency knobs can be in play: the legacy
ConsistencyLevel(CL) and the newReadConsistencyStrategy(RCS). Both can be set at client-level and at request-level. This PR establishes the precedence:DEFAULTRCS is set,x-ms-consistency-levelis stripped from the wire.RCS.DEFAULTis transparent. It is never serialized as a header — neither as a string nor as an RNTBD token.DEFAULTmeans "fall back to the account default". Emitting it as"Default"would break session-token logic (false negative on Session accounts) and trigger dual-header rejection on Gateway V1. Filtered out at the source (query / changefeed header-building) and defensively in downstream paths (isEffectiveSessionConsistency,resolveEffectiveConsistencyHeaders).GLOBAL_STRONGis validated client-side. Only valid on accounts whose default consistency is Strong; otherwise throwsBadRequestExceptionat request time.LATEST_COMMITTEDon a Session / Eventual account still produces a quorum read (1:2 client-to-backend amplification).What changed on the wire
x-ms-read-consistency-strategy: <Strategy>and dropsx-ms-consistency-levelwhen RCS is non-DEFAULT.:10250): same drop behavior; RCS is serialized as RNTBD token0x00FE(Byte). Matches proxyRntbdConstants.cs L652and proxy PR #2031635 (C++ enum / int, not string).Test coverage
RntbdReadConsistencyStrategyHeaderTests0x00FEmetadata, byte encoding per strategy, encode/decode round-trip, fullresolveEffectiveConsistencyHeaders→wrapInHttpRequestpipeline including contention (CL + RCS both set).GatewayReadConsistencyStrategySpyWireTest:10250routing. Covers all five prioritization rules above.GatewayReadConsistencyStrategyE2ETestCosmosAsyncClients (V1 and V2). Covers point reads, queries, readAll, changeFeed, readMany, client-level defaults, write ops,GLOBAL_STRONGvalidation, contention resolution, and operation-policy overrides.End-to-end validation (Kusto)
Two passes against the 6-account thin-client matrix (Strong / BS / Session / Eventual / MWR-BS / MWR-Eventual):
2026-06-03T20:50–21:15Z15 client → 30 backend = 1:2 amplification, StatusCode=200. Zero13005s. V1/V2 parity exact.Quorum amplification confirmed in every cell, including Session and Eventual accounts — proving rule #5 (RCS overrides account default) works end-to-end on both V1 and V2.
.NET SDK reference
PR #5685 — header name and enum values match.
Proxy coordination
ADO PR #2031635 — "Add ReadConsistencyStrategy for Proxy". Merged and rolled out to thin-client federations.
SDK Contribution checklist
Testing Guidelines