Fix data race in POOL_create_advanced by protecting threadLimit#4558
Fix data race in POOL_create_advanced by protecting threadLimit#4558richardsonnick wants to merge 1 commit intofacebook:devfrom
Conversation
Fixes a potential data race in `POOL_create_advanced` where `ctx->threadLimit` was being updated without holding the `queueMutex`. Worker threads (`POOL_thread`) could access `threadLimit` before it was fully initialized, leading to undefined behavior. The update to `ctx->threadLimit` is now mutex-protected to ensure thread safety during initialization. `ctx->threadCapacity` remains unprotected, as it is only modified by the parent thread during setup.
|
Hi @richardsonnick! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
| } | ||
| ctx->threadCapacity = numThreads; | ||
|
|
||
| ZSTD_pthread_mutex_lock(&ctx->queueMutex); |
There was a problem hiding this comment.
Holding the lock around the set of threadLimit wouldn't prevent possible RC at all, because the threads going created before the main thread would enter the lock here...
Let alone such set is nearby to atomic operation, so it is strange to try to protect it (alone) with a mutex.
Not to mention that there is basically no RC, since the ctx (created here) isn't used somewhere else yet, excepting the workers in pool, where the first of them would hold the lock in:
Line 72 in ae9f20c
and then enter the conditional wait, because of ctx->queueEmpty, no matter what ctx->threadLimit is, because by || (or) the value of (ctx->numThreadsBusy >= ctx->threadLimit) is irrelevant:
Lines 74 to 75 in ae9f20c
So this lock would not necessarily change something, still worse it may cause a typical deadlock, if the first worker thread would manage to enter the lock, and the main also trying to do that, so nobody would be able to put something to the queue and notify conditional wait to wake the worker up (which could also unlock the mutex for this thread).
There was a problem hiding this comment.
I don't think this could cause a deadlock. But, I agree that it is unnecessary.
There was a problem hiding this comment.
I don't think this could cause a deadlock
But it can, take a closer look at that scenario:
- pool master thread creates a 1st-worker, which enters the
POOL_threadand then in line 72 it locks the mutex. - pool master thread creates a 2nd- ... Nth-workers, and the longer it takes (and the more context switches happens) the more likely the 1st-worker manages to enter the aforementioned lock.
- before leave the
POOL_create_advancedthe pool master thread also trying to lock the mutex, but it is locked by 1st-worker (which is waiting for the queue), so the master never leavePOOL_create_advancedandctxremains unknown for whole program, and nobody can add something to the queue to wake up the 1st-worker. It is typical deadlock.
Just for the record.
terrelln
left a comment
There was a problem hiding this comment.
Thanks for the PR @richardsonnick! I'm going to close because PR #4601 supercedes.
| } | ||
| ctx->threadCapacity = numThreads; | ||
|
|
||
| ZSTD_pthread_mutex_lock(&ctx->queueMutex); |
There was a problem hiding this comment.
I don't think this could cause a deadlock. But, I agree that it is unnecessary.
Fixes a potential data race in
POOL_create_advancedwherectx->threadLimitwas being updated without holding thequeueMutex. Worker threads could accessthreadLimitbefore it was fully initialized, leading to undefined behavior.The update to
ctx->threadLimitis now protected by thectx->queueMutexto ensure thread safety during initialization.ctx->threadCapacityremains unprotected, as it is only modified by the parent thread during setup.Fix for #4547