Skip to content

REST: treat HTTP 400 commit-validation failures as CommitFailedException#16644

Draft
martinskeem wants to merge 1 commit into
apache:mainfrom
martinskeem:fix/databricks-bug
Draft

REST: treat HTTP 400 commit-validation failures as CommitFailedException#16644
martinskeem wants to merge 1 commit into
apache:mainfrom
martinskeem:fix/databricks-bug

Conversation

@martinskeem
Copy link
Copy Markdown

@martinskeem martinskeem commented Jun 1, 2026

Some REST catalog implementations (e.g., Databricks Unity Catalog) return HTTP 400 with a "commit validation failed" message for concurrent-write conflicts instead of the spec-mandated HTTP 409. Because CommitErrorHandler previously mapped all 400 responses to BadRequestException, these conflicts escaped SnapshotProducer's retry-with-refresh loop entirely and propagated as fatal errors. For instance:

Coordinator iceberg-sink-connector-epe-log-topics-v1-0 failed to commit for commit 51a29b27-d1b1-45f4-b6f3-61228b4c8481, will try again next cycle","debug_stacktrace":"org.apache.iceberg.exceptions.BadRequestException: Malformed request: Commit validation failed. Please contact Databricks support for assistance. [ErrorCode: 2010]
	at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:341)
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:137)
	at org.apache.iceberg.rest.ErrorHandlers$CommitErrorHandler.accept(ErrorHandlers.java:119)
	at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:242)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:347)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
	at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:112)
	at org.apache.iceberg.rest.RESTClient.post(RESTClient.java:150)
	at org.apache.iceberg.rest.RESTTableOperations.commit(RESTTableOperations.java:206)
	at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:501)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:473)
	at org.apache.iceberg.connect.channel.Coordinator.commitToTable(Coordinator.java:286)
	at org.apache.iceberg.connect.channel.Coordinator.lambda$doCommit$1(Coordinator.java:173)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:315)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Add a case 400 check in CommitErrorHandler that recognises the conflict pattern and raises CommitFailedException instead, restoring normal optimistic-concurrency retry behaviour for non-compliant catalogs. Responses that do not match the pattern still fall through to the default 400 handler and raise BadRequestException as before.

@github-actions github-actions Bot added the core label Jun 1, 2026
@martinskeem martinskeem marked this pull request as draft June 1, 2026 08:10
Some REST catalog implementations (e.g., Databricks Unity Catalog) return
HTTP 400 with a "commit validation failed" message for concurrent-write
conflicts instead of the spec-mandated HTTP 409. Because CommitErrorHandler
previously mapped all 400 responses to BadRequestException, these conflicts
escaped SnapshotProducer's retry-with-refresh loop entirely and propagated
as fatal errors.

Add a case 400 check in CommitErrorHandler that recognises the conflict
pattern and raises CommitFailedException instead, restoring normal
optimistic-concurrency retry behaviour for non-compliant catalogs. Responses
that do not match the pattern still fall through to the default 400 handler
and raise BadRequestException as before.
@martinskeem martinskeem force-pushed the fix/databricks-bug branch from 0f62f38 to 40363fc Compare June 1, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant