Skip to content

{182998236}: dispatch retry sql to pool#5873

Open
emelialei88 wants to merge 1 commit intobloomberg:mainfrom
emelialei88:feat/retry-pool
Open

{182998236}: dispatch retry sql to pool#5873
emelialei88 wants to merge 1 commit intobloomberg:mainfrom
emelialei88:feat/retry-pool

Conversation

@emelialei88
Copy link
Copy Markdown
Contributor

@emelialei88 emelialei88 commented Apr 16, 2026

When a verified retry happens, what happens currently is that the all the statements within this transaction will rerun in the COMMIT SQL thread. This will cause the thread to block and unfair for the other normal transactions. In this PR, we schedule each statement into its own SQL thread.

When a retry happens, we call srs_tran_replay_prepare to do some preparations like saving the original done_cb and assign the new done_cb to be srs_tran_replay_async. Then for each retry, srs_tran_replay_begin is called to set some states and dispatch the first statement. If srs_tran_replay_prepare returns without failure, we will return error code RC_INTERNAL_RETRY so that the callers will return the thread without any cleanup.

At the end of execution, srs_tran_replay_async will be called. It will either dispatch the next statement, start another retry if this one fails, handle error or successfully return. For the last two cases we restore the done_cb before returning.

@emelialei88 emelialei88 force-pushed the feat/retry-pool branch 2 times, most recently from 0a230d5 to 221983a Compare April 16, 2026 17:17
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
selectv_rcode_serialretry_generated [failed with core dumped]
logfill [db unavailable at finish] **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
api_tst
mem_tracker

@emelialei88 emelialei88 force-pushed the feat/retry-pool branch 3 times, most recently from ad3a581 to f6fb6d5 Compare April 17, 2026 19:04
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
comdb2sys_queueodh_generated [db unavailable at finish]
truncatesc_offline_generated **quarantined**
sc_resume_logicalsc_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
replay_trans
mem_tracker

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
truncatesc_offline_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
replay_trans
mem_tracker
sc_truncate_lockorder_generated [timeout] **quarantined**
reco-ddlk-sql [timeout] **quarantined**

@emelialei88 emelialei88 force-pushed the feat/retry-pool branch 2 times, most recently from 3e7966d to 2412e0c Compare April 27, 2026 15:42
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_truncate_multiddl_generated [db unavailable at finish] **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
remsql_locks_rte_connect_generated **quarantined**
remsql_locks **quarantined**
mem_tracker
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
logfill [db unavailable at finish] **quarantined**
scindex
sc_resume_logicalsc_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
remsql_locks_rte_connect_generated **quarantined**
remsql_locks **quarantined**
mem_tracker
reco-ddlk-sql [timeout] **quarantined**

@emelialei88 emelialei88 marked this pull request as ready for review April 27, 2026 20:03
@mohitkhullar
Copy link
Copy Markdown
Contributor

failing mem_tracker test

Comment thread net/sqlwriter.c
if (done_cb_evbuffer(clnt) != 0) {
return -1;
int rc = done_cb_evbuffer(clnt);
if (rc != 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle rc RC_INTERNAL_RETRY here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled in newsql_done_cb by returning without doing anything. I still think the failure could be a timing issue but I'm looking into it more.

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_resume_logicalsc_generated **quarantined**
sc_resume
cldeadlock
consumer_non_atomic_default_consumer_generated **quarantined**
remsql_locks_rte_connect_generated **quarantined**
remsql_locks **quarantined**
mem_tracker

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
scindex_logicalsc_generated
sc_resume_logicalsc_generated **quarantined**
noresetgen
consumer_non_atomic_default_consumer_generated **quarantined**
remsql_locks_rte_connect_generated **quarantined**
remsql_locks **quarantined**
manual_partition
skipscan **quarantined**
reco-ddlk-sql [timeout] **quarantined**

@akshatsikarwar
Copy link
Copy Markdown
Contributor

Please check robomark failures
cdb2test Apr 29 14:59:11 2026 failed feat/retry-pool.R20260429.6

@emelialei88 emelialei88 force-pushed the feat/retry-pool branch 2 times, most recently from e31f69e to fb958db Compare April 30, 2026 18:29
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_redo [failed with core dumped]
logfill [db unavailable at finish] **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
remsql_locks_rte_connect_generated **quarantined**
remsql_locks **quarantined**
longreq_stats
truncatesc_offline_generated [timeout] **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_truncate [db unavailable at finish]
sc_resume_logicalsc_generated **quarantined**
reco-ddlk-sql **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
remsql_locks_rte_connect_generated **quarantined**
remsql_locks **quarantined**

@emelialei88
Copy link
Copy Markdown
Contributor Author

Please check robomark failures cdb2test Apr 29 14:59:11 2026 failed feat/retry-pool.R20260429.6

I think it was a github clone issue and the most recently robomark run is fine.

@akshatsikarwar
Copy link
Copy Markdown
Contributor

I think it was a github clone issue and the most recently robomark run is fine.

It's still failing a few tests..
cdb2test Apr 30 18:05:43 2026 failed feat/retry-pool.R20260430.12

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_truncate [db unavailable at finish]
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_resume [db unavailable at finish]
consumer_non_atomic_default_consumer_generated **quarantined**
remtran_origin
reco-ddlk-sql [timeout] **quarantined**

Signed-off-by: Emelia Lei <wlei29@bloomberg.net>
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

The first 10 failing tests are:
logfill [db unavailable at finish] **quarantined**
scindex_logicalsc_generated
sc_resume
consumer_non_atomic_default_consumer_generated **quarantined**
sc_downgrade [timeout] **quarantined**
truncatesc_offline_generated [timeout] **quarantined**

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants