Skip to content

[cudax] Implement _this_ hierarchy groups#7603

Merged
davebayer merged 3 commits intoNVIDIA:mainfrom
davebayer:hierarchy_groups2
Feb 13, 2026
Merged

[cudax] Implement _this_ hierarchy groups#7603
davebayer merged 3 commits intoNVIDIA:mainfrom
davebayer:hierarchy_groups2

Conversation

@davebayer
Copy link
Copy Markdown
Contributor

@davebayer davebayer commented Feb 10, 2026

Closes #7618.

@davebayer davebayer requested a review from a team as a code owner February 10, 2026 14:02
@davebayer davebayer requested a review from caugonnet February 10, 2026 14:02
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Feb 10, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Feb 10, 2026
@github-actions

This comment has been minimized.

Comment thread cudax/include/cuda/experimental/__hierarchy/grid_sync.cuh Outdated

_CCCL_TEMPLATE(class _HierarchyLike)
_CCCL_REQUIRES(__is_or_has_hierarchy_member_v<_HierarchyLike>)
[[nodiscard]] _CCCL_DEVICE_API auto this_thread(const _HierarchyLike& __hier_like) noexcept
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream of thoughts: Technically we don't have to carry over the choice to use this_X functions.

Options include:

  • this_group(level), level.group(), some other variations of similar things
  • Just constructors

But there is also no issue with it staying as is

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I'd love to use something more generic (as the options you listed), I find

auto warp = cuda::this_warp(hierarchy);

more readable than any of:

// sounds like we are grouping all warps in the hierarchy
auto warp = cuda::warp.group(hierarchy);

// I don't know what's going on at first glance 
auto warp = cuda::this_group(cuda::warp, hierarchy);

// it's not clear that it's the group of the current warp
cuda::warp_group warp{hierarchy};

I would love to make the level.group(h) work, but I don't find it easy to read and understand. Even level.this_group(h) doesn't seem better to me.

I would keep it as is for now :)

Comment thread cudax/include/cuda/experimental/__hierarchy/group.cuh
};

template <class _Hierarchy>
class thread_group<_Hierarchy, __this_hierarchy_group_kind> : __this_hierarchy_group_base<thread_level, _Hierarchy>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically thread_group in CG was a general polymorphic type for all types of groups. It fits better with the naming scheme for it to be a single thread, but it will be confusing. We don't need to resolve it right now, but I wanted to note that down.

In general I don't know if we need that polymorphic type anymore or if we need a single thread group

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. But we will need the thread_group for cub::ThreadMeow algorithms in the future.

As you say, I would deal with this later :)

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 14m 24s: Pass: 100%/48 | Total: 3h 24m | Max: 14m 03s | Hits: 99%/23529

See results here.


_CCCL_DEVICE_API inline void __grid_sync()
{
const auto __bar_ptr = &::cuda::experimental::__cg_imported::__grid_workspace_ptr()->__barrier_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we store this in the group?

CG was also validating it in case someone forgot to use the cooperative launch, so it wouldn't just fault on the device. We could also do validation, but maybe skip it if we construct the group from a config object with the cooperative launch option?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the reason, why we should store the pointer when it's available through the envregs.

Regarding the validation, I moved it to the place where we obtain the pointer from the envregs :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to check the generated code, but I believe the compiler can't store the result in registers and will need to generate a full constant bank read every time you call a sync, also do the validity check every time. I am pretty sure storing it in a register will end up being faster at the cost of extra registers

@davebayer davebayer merged commit dbe518a into NVIDIA:main Feb 13, 2026
67 of 68 checks passed
@github-project-automation github-project-automation Bot moved this from In Review to Done in CCCL Feb 13, 2026
@davebayer davebayer self-assigned this Feb 17, 2026
@davebayer davebayer deleted the hierarchy_groups2 branch February 17, 2026 16:44
fbusato pushed a commit to fbusato/cccl that referenced this pull request Feb 19, 2026
* [cudax] Implement this hierarchy groups

* fix grid sync

* fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA]: Initial hierarchy groups implementation

2 participants