feat: support broadcast with OpenMPI backend implementation#11
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a129ea4e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| CHECK_STATUS(Rt, Rt::Memcpy(host_buf, send_buff, total_bytes, | ||
| Rt::MemcpyDeviceToHost)); | ||
| CHECK_STATUS(Rt, Rt::StreamSynchronize(static_cast<Rt::Stream>(stream))); |
There was a problem hiding this comment.
Prevent root-only early exit before collective
In the root branch, CHECK_STATUS can return immediately when the D2H copy or stream sync fails, but non-root ranks still proceed to MPI_Bcast; this mismatched collective participation can hang the whole communicator. This shows up whenever root hits a runtime error (e.g., bad device pointer or prior sticky CUDA error) while other ranks are healthy.
Useful? React with 👍 / 👎.
| CHECK_STATUS(Rt, Rt::Memcpy(recv_buff, host_buf, total_bytes, | ||
| Rt::MemcpyHostToDevice)); |
There was a problem hiding this comment.
…ram instead of a test program - refactor `examples/broadcast.cc` - fix the formatting issues in `src/ompi/impl/broadcast.h`
- support `infiniBcast()` which in some CCLs are labeled as deprecated, but here it is simply an alias for `infiniBroadcast()` - update `examples/broadcast.cc` to also illustrate and test out `infiniBcast()`
1a129ea to
9d190e2
Compare
…e in `examples/broadcast.cc`
Summary
This PR adds two new collective communication operators:
infiniBroadcastandinfiniBcast()along with its OpenMPI backend implementation, plus the corresponding example programexamples/broadcast.cc.Public API
Parameter order matches
ncclBroadcastexactly, per the C++ rule inCONTRIBUTING.md("For functions that correspond to NCCL, parameter order must match the NCCL interface exactly").Changes
Broadcast Operator Support
Broadcastbase class insrc/base/broadcast.hwithparameter validation (root range, null pointer,
count = 0short-circuit);
src/ompi/impl/broadcast.h,using a host-staging path (consistent with the existing
AllReduceimplementation);
INT_MAXbytes and callMPI_Bcastrepeatedly,so that arbitrarily large
countvalues are supported;sendbuff == recvbuff) and out-of-placecalling conventions;
nullptrassendbuff(this convention is exercised by the test suite).Bcastwhich corresponding to the deprecated API in NCCL, but here it is simply an in-place alias forBroadcast.Broadcast Example
examples/broadcast.cccovering 3 scenarios:infiniBroadcast()infiniBroadcast()infiniBcast()Known Issues & Future Work
INT_MAX-byte chunking pattern currently only lives inBroadcast.AllReducecould benefit from the same treatment in the future.Broadcast::Execute's argument-validation style is kept consistent withAllReduce::Execute— invalid input returnskInvalidArgumentand emits aLOGline. The "invalid root" test case intentionally triggers these log lines, which is expected output, not a failure.Test Environment & Results
all_gather.log
all_reduce.log
reduce_scatter.log
broadcast.log