Skip to content

Fix data race in POOL_create_advanced by protecting threadLimit#4558

Closed
richardsonnick wants to merge 1 commit intofacebook:devfrom
richardsonnick:threadpool-database-fix
Closed

Fix data race in POOL_create_advanced by protecting threadLimit#4558
richardsonnick wants to merge 1 commit intofacebook:devfrom
richardsonnick:threadpool-database-fix

Conversation

@richardsonnick
Copy link
Copy Markdown
Contributor

@richardsonnick richardsonnick commented Dec 23, 2025

Fixes a potential data race in POOL_create_advanced where ctx->threadLimit was being updated without holding the queueMutex. Worker threads could access threadLimit before it was fully initialized, leading to undefined behavior.

The update to ctx->threadLimit is now protected by the ctx->queueMutex to ensure thread safety during initialization.

ctx->threadCapacity remains unprotected, as it is only modified by the parent thread during setup.

Fix for #4547

Fixes a potential data race in `POOL_create_advanced` where `ctx->threadLimit` was being updated without holding the `queueMutex`. Worker threads (`POOL_thread`) could access `threadLimit` before it was fully initialized, leading to undefined behavior.

The update to `ctx->threadLimit` is now mutex-protected to ensure thread safety during initialization.

`ctx->threadCapacity` remains unprotected, as it is only modified by the parent thread during setup.
@meta-cla
Copy link
Copy Markdown

meta-cla bot commented Dec 23, 2025

Hi @richardsonnick!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla
Copy link
Copy Markdown

meta-cla bot commented Dec 23, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

}
ctx->threadCapacity = numThreads;

ZSTD_pthread_mutex_lock(&ctx->queueMutex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the lock around the set of threadLimit wouldn't prevent possible RC at all, because the threads going created before the main thread would enter the lock here...
Let alone such set is nearby to atomic operation, so it is strange to try to protect it (alone) with a mutex.
Not to mention that there is basically no RC, since the ctx (created here) isn't used somewhere else yet, excepting the workers in pool, where the first of them would hold the lock in:

ZSTD_pthread_mutex_lock(&ctx->queueMutex);

and then enter the conditional wait, because of ctx->queueEmpty, no matter what ctx->threadLimit is, because by || (or) the value of (ctx->numThreadsBusy >= ctx->threadLimit) is irrelevant:

zstd/lib/common/pool.c

Lines 74 to 75 in ae9f20c

while ( ctx->queueEmpty
|| (ctx->numThreadsBusy >= ctx->threadLimit) ) {

So this lock would not necessarily change something, still worse it may cause a typical deadlock, if the first worker thread would manage to enter the lock, and the main also trying to do that, so nobody would be able to put something to the queue and notify conditional wait to wake the worker up (which could also unlock the mutex for this thread).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this could cause a deadlock. But, I agree that it is unnecessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this could cause a deadlock

But it can, take a closer look at that scenario:

  1. pool master thread creates a 1st-worker, which enters the POOL_thread and then in line 72 it locks the mutex.
  2. pool master thread creates a 2nd- ... Nth-workers, and the longer it takes (and the more context switches happens) the more likely the 1st-worker manages to enter the aforementioned lock.
  3. before leave the POOL_create_advanced the pool master thread also trying to lock the mutex, but it is locked by 1st-worker (which is waiting for the queue), so the master never leave POOL_create_advanced and ctx remains unknown for whole program, and nobody can add something to the queue to wake up the 1st-worker. It is typical deadlock.

Just for the record.

Copy link
Copy Markdown
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @richardsonnick! I'm going to close because PR #4601 supercedes.

}
ctx->threadCapacity = numThreads;

ZSTD_pthread_mutex_lock(&ctx->queueMutex);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this could cause a deadlock. But, I agree that it is unnecessary.

@terrelln terrelln closed this Mar 2, 2026
terrelln pushed a commit that referenced this pull request Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants