Skip to content

Always enable check hb on blocked write for chunk transactions#5864

Open
chands10 wants to merge 1 commit intobloomberg:mainfrom
chands10:chunk_hb
Open

Always enable check hb on blocked write for chunk transactions#5864
chands10 wants to merge 1 commit intobloomberg:mainfrom
chands10:chunk_hb

Conversation

@chands10
Copy link
Copy Markdown
Contributor

To help us review your pull request, please consider providing an overview of the following:

  • What is the type of the change (bug fix, feature, documentation and etc.) ?
  • What are the current behavior and expected behavior, if this is a bugfix ?
  • What are the steps required to reproduce the bug, if this is a bugfix ?
  • What is the current behavior and new behavior, if this is a feature change or enhancement ?
  • [Optional] Why is the new behavior better than the current behavior, if this is a feature change ?

@chands10 chands10 changed the title Always enable check hn on blocked write for chunk transactions Always enable check hb on blocked write for chunk transactions Apr 10, 2026
Comment thread cdb2api/cdb2api.c Outdated
Comment thread cdb2api/cdb2api.c Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this timing guess is not robust. can we get cdb2buf_flush to indicate that failure was due to timeout?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when cdb2buf_flush returns -1 currently, it either means that sb is NULL, or that it was a timeout error. Can I just check that? Or can add new bool param to cdb2buf_flush *timeout_error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added commit with new param to explicitly set it

@chands10 chands10 force-pushed the chunk_hb branch 2 times, most recently from e2e3403 to 7e68096 Compare April 10, 2026 18:50
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_resume_logicalsc_generated **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
sc_truncate_lockorder_generated [timeout] **quarantined**
incoh_remsql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
scindex_logicalsc_generated
sc_resume_logicalsc_generated **quarantined**
queuedb_rollover **quarantined**
cdb2api_chunk
consumer_non_atomic_default_consumer_generated **quarantined**
incoh_remsql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
cdb2api_chunk
consumer_non_atomic_default_consumer_generated **quarantined**
reco-ddlk-sql [timeout] **quarantined**
incoh_remsql [timeout] **quarantined**

Comment thread bbinc/comdb2buf.h Outdated
int CDB2BUF_FUNC(cdb2buf_flush)(COMDB2BUF *sb);
/* flush output. returns # of bytes written or <0 for error. If timeout_error is not NULL, it will be set to 1 if a
* timeout occurred */
int CDB2BUF_FUNC(cdb2buf_flush)(COMDB2BUF *sb, int *timeout_error);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce cdb2_flush_chk_timeout (or another suitable name), so all cdb2buf_flush calls don't have to be updated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

########################################
## RUN WITHOUT FIX EXPLICITLY ENABLED ##
########################################
echo 'comdb2_config:check_hb_on_blocked_write=0' >> ${cfg}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this means it's impossible to disable this for chunks

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_truncate [db unavailable at finish]
consumer_non_atomic_default_consumer_generated **quarantined**
api_tst
alias
cdb2api_chunk [timeout]
incoh_remsql [timeout] **quarantined**

Comment thread cdb2api/cdb2api.c Outdated
int64_t end_us = end.tv_sec * 1000000 + end.tv_usec;
int64_t start_us = start.tv_sec * 1000000 + start.tv_usec;
int64_t elapsed_us = end_us - start_us;
if (!is_begin && !hndl->is_read && hndl->in_trans && hndl->socket_timeout &&
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how we haven't gotten seg faults here before because hndl could be NULL

@chands10
Copy link
Copy Markdown
Contributor Author

Maybe let's wait making more changes till after we get the first cdb2api merged rollout in

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
truncatesc_offline_generated **quarantined**
sp_snapshot_generated
reco-ddlk-sql **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
api_tst
dc
incoh_remsql [timeout] **quarantined**

@chands10 chands10 marked this pull request as draft April 14, 2026 19:59
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
truncatesc_offline_generated **quarantined**
sc_resume_logicalsc_generated **quarantined**
sp_snapshot_generated
consumer_non_atomic_default_consumer_generated **quarantined**
api_tst
reco-ddlk-sql [timeout] **quarantined**

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Error ⚠.
Regression testing: Success ✓.

@chands10 chands10 marked this pull request as ready for review April 22, 2026 14:37
Signed-off-by: Salil Chandra <schandra107@bloomberg.net>
Copy link
Copy Markdown

@roborivers roborivers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cbuild submission: Success ✓.
Regression testing: Success ✓.

The first 10 failing tests are:
sc_timepart [failed with core dumped] **quarantined**
sc_timepart_multiddl_generated
analyze **quarantined**
consumer_non_atomic_default_consumer_generated **quarantined**
tunables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants