-
-
Notifications
You must be signed in to change notification settings - Fork 2k
MDEV-39006: Galera test failure on galera_3nodes.galera_garbd_backup #5018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| SET GLOBAL innodb_max_dirty_pages_pct=0.0; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.0; | ||
| SET GLOBAL innodb_max_dirty_pages_pct=99; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=99; | ||
| CREATE TABLE t1 (f1 INTEGER, f2 VARCHAR(1024)) ENGINE=InnoDB; | ||
| CREATE TABLE ten (f1 INTEGER) ENGINE=InnoDB; | ||
| INSERT INTO ten VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10); | ||
| INSERT INTO t1 (f2) SELECT REPEAT('x', 1024) FROM ten AS a1, ten AS a2, ten AS a3, ten AS a4; | ||
| SET GLOBAL debug_dbug = "+d,sst_disable_writes_now,sst_force_lsn_advance"; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=99; | ||
| SET GLOBAL debug_dbug = ""; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.0; | ||
| SET GLOBAL innodb_max_dirty_pages_pct=0.0; | ||
| SET GLOBAL innodb_max_dirty_pages_pct=99; | ||
| SET GLOBAL debug_dbug = "+d,sst_enable_writes_now"; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=99; | ||
| SET GLOBAL debug_dbug = ""; | ||
| DROP TABLE t1; | ||
| DROP TABLE ten; | ||
| Warnings: | ||
| Warning 1210 innodb_max_dirty_pages_pct cannot be set lower than innodb_max_dirty_pages_pct_lwm. | ||
| Warning 1210 Lowering innodb_max_dirty_page_pct_lwm to 90.000000 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # | ||
| # MDEV-39006: simple non-Galera reproducer for the InnoDB page-cleaner | ||
| # write leak after ha_disable_internal_writes(true). | ||
| # | ||
| # This test bypasses Galera entirely. It drives sst_disable_innodb_writes() | ||
| # / sst_enable_innodb_writes() through DBUG hooks (sst_disable_writes_now / | ||
| # sst_enable_writes_now) installed at the bottom of | ||
| # innodb_max_dirty_pages_pct_lwm_update(). The same setter wakes the page | ||
| # cleaner, so each `SET GLOBAL innodb_max_dirty_pages_pct_lwm = ...` is the | ||
| # single trigger for both the test action and the cleaner wake-up. | ||
| # | ||
| # The sst_force_lsn_advance DBUG injection (inside sst_disable_innodb_writes) | ||
| # deterministically advances LSN past the log_checkpoint_low() early-return | ||
| # threshold so the page cleaner's idle-path call has work to do. | ||
| # | ||
| # Without the wsrep_sst_disable_writes gate in | ||
| # storage/innobase/buf/buf0flu.cc:buf_flush_page_cleaner(), the page | ||
| # cleaner's idle-path log_checkpoint_low() trips | ||
| # ut_ad(!recv_no_log_write) (debug) or appends a fresh FILE_CHECKPOINT | ||
| # record to ib_logfile0 (release). With the gate, the cleaner skips | ||
| # both and the snapshot stays byte-identical. | ||
| # | ||
|
|
||
| --source include/have_innodb.inc | ||
| --source include/have_debug.inc | ||
| --source include/have_wsrep.inc | ||
|
|
||
| --let $datadir= `SELECT @@datadir` | ||
| --let $innodb_max_dirty_pages_pct = `SELECT @@innodb_max_dirty_pages_pct` | ||
| --let $innodb_max_dirty_pages_pct_lwm = `SELECT @@innodb_max_dirty_pages_pct_lwm` | ||
|
|
||
| # Drain any baseline dirty pages first, so the buffer pool is clean | ||
| # before we accumulate test dirty pages. | ||
| SET GLOBAL innodb_max_dirty_pages_pct=0.0; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.0; | ||
| --let $wait_condition = SELECT variable_value = 0 FROM information_schema.global_status WHERE variable_name = 'INNODB_BUFFER_POOL_PAGES_DIRTY' | ||
| --source include/wait_condition.inc | ||
|
|
||
| # Raise the threshold so the page cleaner does not flush proactively | ||
| # while we build up dirty pages. | ||
| SET GLOBAL innodb_max_dirty_pages_pct=99; | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=99; | ||
|
|
||
| CREATE TABLE t1 (f1 INTEGER, f2 VARCHAR(1024)) ENGINE=InnoDB; | ||
| CREATE TABLE ten (f1 INTEGER) ENGINE=InnoDB; | ||
| INSERT INTO ten VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10); | ||
| INSERT INTO t1 (f2) SELECT REPEAT('x', 1024) FROM ten AS a1, ten AS a2, ten AS a3, ten AS a4; | ||
|
|
||
| # Arm two DBUG hooks on innodb_max_dirty_pages_pct_lwm_update(): | ||
| # sst_disable_writes_now -> calls ha_disable_internal_writes(true) | ||
| # sst_force_lsn_advance -> writes one FILE_MODIFY redo record from | ||
| # inside sst_disable_innodb_writes() so LSN | ||
| # moves past the log_checkpoint_low() | ||
| # early-return threshold. | ||
| SET GLOBAL debug_dbug = "+d,sst_disable_writes_now,sst_force_lsn_advance"; | ||
|
|
||
| # Trigger the disable hook. Setting lwm to 99 (its current value) is enough | ||
| # to invoke innodb_max_dirty_pages_pct_lwm_update(), which fires the DBUG | ||
| # block and drives ha_disable_internal_writes(true) -> sst_disable_innodb_writes | ||
| # (where sst_force_lsn_advance also fires). | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=99; | ||
|
|
||
| SET GLOBAL debug_dbug = ""; | ||
|
|
||
| # Snapshot of datadir BEFORE waking the page cleaner. Excludes transient | ||
| # SST/PID artefacts so the test isolates engine-level writes. | ||
| --exec find $datadir -type f ! -name 'tables_flushed' ! -name 'backup_sst_complete' ! -name 'wsrep_sst*.pid' ! -name 'wsrep_sst*.log' ! -name 'stunnel.conf' -exec md5sum {} \; 2>/dev/null | sort -u | md5sum > $MYSQLTEST_VARDIR/tmp/innodb_before | ||
|
|
||
| # Wake the page cleaner. With the production fix in | ||
| # buf_flush_page_cleaner() this is a no-op; without it the cleaner | ||
| # enters log_checkpoint_low() past the early-return because the | ||
| # injection bumped LSN, then either trips ut_ad(!recv_no_log_write) | ||
| # (debug) or writes FILE_CHECKPOINT to ib_logfile0 (release). | ||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.0; | ||
| SET GLOBAL innodb_max_dirty_pages_pct=0.0; | ||
| --sleep 2 | ||
|
|
||
| # Snapshot of datadir AFTER the page cleaner has had a chance to run. | ||
| --exec find $datadir -type f ! -name 'tables_flushed' ! -name 'backup_sst_complete' ! -name 'wsrep_sst*.pid' ! -name 'wsrep_sst*.log' ! -name 'stunnel.conf' -exec md5sum {} \; 2>/dev/null | sort -u | md5sum > $MYSQLTEST_VARDIR/tmp/innodb_after | ||
|
|
||
| # The two snapshots must be identical: no datadir writes while | ||
| # wsrep_sst_disable_writes is set. | ||
| --diff_files $MYSQLTEST_VARDIR/tmp/innodb_before $MYSQLTEST_VARDIR/tmp/innodb_after | ||
|
|
||
| # Re-enable internal writes via the matching DBUG hook. Clears | ||
| # recv_no_log_write and resumes normal page-cleaner operation. | ||
| # (raise pct first so the lwm setter does not warn about lwm > pct.) | ||
| SET GLOBAL innodb_max_dirty_pages_pct=99; | ||
| SET GLOBAL debug_dbug = "+d,sst_enable_writes_now"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we clear dbug_dbug before setting max_dirty_pages* values? |
||
| SET GLOBAL innodb_max_dirty_pages_pct_lwm=99; | ||
| SET GLOBAL debug_dbug = ""; | ||
|
|
||
| DROP TABLE t1; | ||
| DROP TABLE ten; | ||
|
|
||
| --disable_query_log | ||
| --eval SET GLOBAL innodb_max_dirty_pages_pct = $innodb_max_dirty_pages_pct | ||
| --eval SET GLOBAL innodb_max_dirty_pages_pct_lwm = $innodb_max_dirty_pages_pct_lwm | ||
| --enable_query_log | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1798,10 +1798,29 @@ static void sst_disable_innodb_writes() | |
| buf_flush_page_cleaner(). Let us prevent that by invoking another | ||
| checkpoint (which will write the FILE_CHECKPOINT record). */ | ||
| log_make_checkpoint(); | ||
|
|
||
| bool skip_verify_checkpoint= false; | ||
| DBUG_EXECUTE_IF("sst_force_lsn_advance", | ||
| { | ||
| /* Test-only: write a single FILE_MODIFY redo record so that | ||
| log_sys.get_lsn() advances past | ||
| last_checkpoint_lsn + SIZE_OF_FILE_CHECKPOINT + 8*is_encrypted(). | ||
| A subsequent buf_flush_page_cleaner() wake-up will then enter | ||
| log_checkpoint_low() past its early-return and, absent the | ||
| wsrep_sst_disable_writes gate in buf0flu.cc, fire the | ||
| recv_no_log_write tripwire (debug) or write a FILE_CHECKPOINT | ||
| record to ib_logfile0 (release). With the gate in place this | ||
| extra LSN gets checkpointed past once sst_enable_innodb_writes() | ||
| runs, so it does not survive past the SST window. */ | ||
| skip_verify_checkpoint= true; | ||
| debug_advance_lsn_via_file_modify(); | ||
| }); | ||
|
Comment on lines
+1803
to
+1817
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This causes a compilation failure on many debug builders, because the low-level member function Is there really no other way to trigger this scenario? Could we write a dummy
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missed that as it run successfully on local.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this piece of code can be removed? |
||
|
|
||
| ut_d(recv_no_log_write= true); | ||
| /* If this were not a no-op, an assertion would fail due to | ||
| recv_no_log_write. */ | ||
| ut_d(log_make_checkpoint()); | ||
| if (!skip_verify_checkpoint) | ||
| ut_d(log_make_checkpoint()); | ||
| } | ||
|
|
||
| static void sst_enable_innodb_writes() | ||
|
|
@@ -17502,6 +17521,15 @@ innodb_max_dirty_pages_pct_lwm_update( | |
| mysql_mutex_lock(&buf_pool.flush_list_mutex); | ||
| buf_pool.page_cleaner_wakeup(); | ||
| mysql_mutex_unlock(&buf_pool.flush_list_mutex); | ||
|
|
||
| #ifdef WITH_WSREP | ||
| /* To drive ha_disable_internal_writes() through this setter. */ | ||
| DBUG_EXECUTE_IF("sst_disable_writes_now", | ||
| ha_disable_internal_writes(true);); | ||
| DBUG_EXECUTE_IF("sst_enable_writes_now", | ||
| ha_disable_internal_writes(false);); | ||
| #endif /* WITH_WSREP */ | ||
|
|
||
| mysql_mutex_lock(&LOCK_global_system_variables); | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other way to see that page cleaner had a change to run? This introduces a race, page cleaner thread might not get change to run during this 2 second wait leading to fact that this test is not testing intended behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The page cleaner is a background thread spawned via std::thread, with no THD and no current_thd, it cannot host a per-THD DEBUG_SYNC action. We can increase sleep time to 5 second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no counter available for dirty_pages that we could query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no existing status variable that increments per page-cleaner idle iteration. Innodb_buffer_pool_pages_dirty is already 0 throughout the SST window and never changes; pages_flushed, dblwr_writes, MONITOR_FLUSH_BACKGROUND_COUNT only increment on actual flush work, which the cleaner's set_almost_idle: block doesn't do.
The cleaner does flip buf_pool.page_cleaner_idle() right before reaching set_almost_idle:. That predicate is exactly what we'd want to observe - it goes false on page_cleaner_wakeup() and true again once the cleaner has fully completed an iteration. But it's not exposed as a status variable, sysvar, or INFORMATION_SCHEMA row anywhere. Adding it as a status variable is the cleanest fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hemantdangi-gc I do not think you need any new test cases and any of this debug injection. Current test cases already cover code path effected. See e.g. https://es-jenkins.mariadb.net/job/Test-Package//725056/artifact/test-MTR-custom.log/*view*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that current test covers issue and new test which is just forced injection can be removed.
But from where you got https://es-jenkins.mariadb.net/job/Test-Package//725056/artifact/test-MTR-custom.log/*view. Is this link from MDEV-3006 jenkins run? The link you provided is build on revision: 363280c179e6c8f7a62405b69121c276b7b3dfe3 but the current patch is on revision: "7106d8d72144117f71ef5feac56374207aab0de9".