Skip to content

New thread pool#4635

Merged
mzient merged 30 commits intoNVIDIA:mainfrom
mzient:NewThreadPool
Feb 19, 2026
Merged

New thread pool#4635
mzient merged 30 commits intoNVIDIA:mainfrom
mzient:NewThreadPool

Conversation

@mzient
Copy link
Contributor

@mzient mzient commented Feb 2, 2023

Category:

New feature (non-breaking change which adds functionality)

Description:

This PR introduces a new thread pool object that will eventually replace the current thread pool.
This thread pool separates the definition of work (called a Job) from the runner threads, allowing multiple concurrent jobs to share one pool of threads.
There are two types of job:
Job and IncrementalJob.
The former must be fully defined before it is submitted to the thread pool. The tasks in this kind of job have priorities.
The latter can be submitted in parts. This precludes sorting (once submitted to the thread pool, the tasks are processed in FIFO order).
The thread pool allows for some overlap between jobs, although when a job is submitted, all of its task will be picked up for execution before any tasks from any subsequent job.
After submitting a job, the job owner has to wait for the job to complete.

The new thread pool is reentrant - it is legal to submit and wait for a sub-job from inside a thread pool task. This scenario is detected automatically ant the Wait function cooperatively executes the tasks from the thread pool until the inner job completes.

Additional information:

The thread pool uses a plain std::queue for task storage and a counting semaphore (POSIX) and a mutex for synchronization.
The job uses an atomic counter of outstanding tasks and atomic_wait to wait for completion.

Affected modules and functionalities:

None. This is entirely new code that doesn't affect any existing functionality yet.

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-864

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7183171]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7183171]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7189340]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7189340]: BUILD PASSED

@mzient mzient force-pushed the NewThreadPool branch 5 times, most recently from f0d76a6 to d74ec03 Compare February 9, 2023 12:49
@mzient mzient force-pushed the NewThreadPool branch 2 times, most recently from a181cde to 70deb50 Compare February 27, 2023 12:55
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mzient mzient force-pushed the NewThreadPool branch 2 times, most recently from 466d38d to e053ff7 Compare August 24, 2023 13:13
@mzient mzient force-pushed the NewThreadPool branch 3 times, most recently from 86c7337 to f2a437f Compare December 17, 2025 07:46
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41172120]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41172120]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41183936]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41184260]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41187088]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41187088]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41571534]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [41571534]: BUILD FAILED

mzient and others added 22 commits January 29, 2026 21:46
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
… a job object, reexposing the legacy thread pool interface.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
…le checks.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +61 to +62
(void)std::lock_guard(mtx_);
cv_.notify_all();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unusual memory ordering pattern - consider adding a comment

This intentional lock/unlock pattern for happens-before relationship is uncommon. Add a comment:

Suggested change
(void)std::lock_guard(mtx_);
cv_.notify_all();
// Acquire and immediately release the mutex to establish happens-before for reentrant wait
(void)std::lock_guard(mtx_);

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +25 to +26
while (running_)
std::this_thread::yield();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

busy-wait in destructor is CPU-intensive

The destructor spins while running_ is true. For long-running tasks, this burns CPU. Consider using a timed wait or documenting expected task duration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greptileai This never waits for more than the time between the notification of the atomic variable num_pending_tasks_ and setting the running_flag_ in DoNotify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely correct - the busy-wait in the destructor only spins for the extremely brief window between the atomic notification of num_pending_tasks_ and the setting of running_ = false at the very end of DoNotify().

Looking at the implementation, this design is actually quite clever. The running_ flag serves as a secondary synchronization mechanism to avoid a race condition where the destructor could be called right after num_pending_tasks_ is decremented but before all the notifications are sent.

The busy-wait duration is bounded to just a few CPU cycles - the time it takes to execute the notification calls and the final running_ = false assignment. This makes the spin time predictable and minimal, so the CPU overhead is negligible compared to the complexity of adding additional synchronization primitives.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [42890996]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [42890996]: BUILD PASSED

@mzient mzient merged commit d741bef into NVIDIA:main Feb 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments