feat: support `ReduceScatter` with OpenMPI backend implementation by halfman510 · Pull Request #12 · InfiniTensor/InfiniCCL

halfman510 · 2026-05-18T14:24:53Z

Summary

This PR introduces an OpenMPI-based implementation of ReduceScatter, along with a complete example program for functionality verification and basic performance evaluation.

Changes

OpenMPI-based ReduceScatter Implementation
- add the basic OpenMPI implementation for infiniReduceScatter(), including:
  - the core interface src/base/reduce_scatter.h;
  - the OpenMPI backend implementation in src/ompi/impl/reduce_scatter.h;
  - the public API declaration in include/infiniccl.h.
- add an example program examples/reduce_scatter.cc similar to examples/all_reduce.cc for correctness verification and simple performance testing.

Known Issues & Future Work

The current OpenMPI AllGather implementation uses blocking MPI_Reduce_scatter_block, which prevents overlap between communication and computation. Future work may introduce non-blocking collectives (MPI_Reduce_scatter_block) and stream-aware asynchronous execution to improve concurrency and performance.
The current implementation allocates temporary host staging buffers using malloc/free on every invocation. This may introduce noticeable overhead in high-frequency workloads. Future work may add reusable buffer pools, allocator caching, and pinned host memory support to improve transfer efficiency and reduce allocation overhead.
For the heterogeneous UCX + InfiniBand cluster used in testing, large AllGather messages (e.g., 1 << 20 elements) may fail with mlx5 RC RDMA_READ errors due to UCX rendezvous RDMA_READ path limitations. This requires setting UCX_RNDV_SCHEME=put_zcopy to force a safe put-based transfer protocol. Without this setting, large-message AllGather execution is unstable on some NIC configurations.
Averaging (kAvg) is performed via a CPU-side loop after the MPI call. While functionally correct, this is not optimal for large recv_count. Future work may move scaling into the MPI operation (where supported) or use a more efficient vectorized/device-side post-processing step.
recv_count is cast to int for MPI (with a safety check). Extremely large messages exceeding INT_MAX elements are rejected. This is acceptable for current use cases but may need MPI_Count support in future MPI-4+ integrations for very large tensors.

Logs & Screenshots

all_reduce test (MetaX-NVIDIA heterogeneous)
all_reduce.log
all_gather test (MetaX-NVIDIA heterogeneous)
all_gather.log
reduce_scatter test (MetaX-NVIDIA heterogeneous)
reduce_scatter.log

Modified file: - `include/comm.h` Added files: - `src/base/reduce_scatter.h` - `src/ompi/impl/reduce_scatter.h` - `examples/reduce_scatter.cc`

…reduce_scatter.h`

…scatter.cc`.

Ziminli changed the title ~~Feat: support ReduceScatter with OpenMPI backend implementationFeat/support reducescatter~~ feat: support ReduceScatter with OpenMPI backend implementation May 18, 2026

Ziminli requested changes May 19, 2026

View reviewed changes

Comment thread examples/reduce_scatter.cc Outdated

halfman510 added 3 commits May 19, 2026 06:31

feat: support reducescatter with OpenMPI backend implementation

1ad360b

Modified file: - `include/comm.h` Added files: - `src/base/reduce_scatter.h` - `src/ompi/impl/reduce_scatter.h` - `examples/reduce_scatter.cc`

fix: correct the comment style for the output logs in `src/ompi/impl/…

6761744

…reduce_scatter.h`

fix: correct the code formatting in the comments of `examples/reduce_…

af38416

…scatter.cc`.

halfman510 force-pushed the feat/support-reducescatter branch from a1e30e7 to af38416 Compare May 19, 2026 06:35

Ziminli approved these changes May 19, 2026

View reviewed changes

Ziminli merged commit 75c184e into InfiniTensor:master May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support `ReduceScatter` with OpenMPI backend implementation#12

feat: support `ReduceScatter` with OpenMPI backend implementation#12
Ziminli merged 3 commits into
InfiniTensor:masterfrom
halfman510:feat/support-reducescatter

halfman510 commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

halfman510 commented May 18, 2026

Summary

Changes

Known Issues & Future Work

Logs & Screenshots

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants