smoke test to verify GPU memory allocation/deallocation by charan-003 · Pull Request #9195 · NVIDIA/cccl

charan-003 · 2026-05-30T21:45:00Z

Following up on #8859.

Add test for GPU memory allocation/deallocation.

copy-pr-bot · 2026-05-30T21:45:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-30T21:48:41Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec6a2649-1b4c-4fd0-959b-b3d9ef82a75e

📥 Commits

Reviewing files that changed from the base of the PR and between 1691f63 and fd2a4cc.

📒 Files selected for processing (1)

test/cuda_smoke/cuda_runtime_smoke.cu

🚧 Files skipped from review as they are similar to previous changes (1)

test/cuda_smoke/cuda_runtime_smoke.cu

📝 Walkthrough

Summary by CodeRabbit

Tests
- Added a CUDA smoke test that validates a complete device-memory round-trip: device allocation and deallocation, host↔device transfers, kernel execution with synchronization, and result verification. The test confirms expected per-element computation and verifies no CUDA errors remain after the sequence of operations.

important:

Walkthrough

Adds a fixed <<<4,64>>> kernel launch for the managed-memory smoke test and a new Catch2 test that exercises a device-memory round-trip: cudaMalloc, host→device copy, increment_kernel launch and synchronize, device→host copy, element-wise validation, cudaFree, and CUDA error-state check.

Changes

Device Memory Smoke Test

Layer / File(s)	Summary
Fixed managed-memory kernel launch `test/cuda_smoke/cuda_runtime_smoke.cu`	Replaces computed `grid`/`block` with a fixed `<<<4, 64>>>` launch in the `cudaMallocManaged round-trip works` test.
cudaMalloc/cudaFree round-trip test `test/cuda_smoke/cuda_runtime_smoke.cu`	New TEST_CASE that allocates device `int` buffer with `cudaMalloc`, copies host→device, runs `increment_kernel` and synchronizes, copies device→host, verifies each element equals `i + 1`, frees with `cudaFree`, and asserts CUDA error state is clean.

Possibly related PRs

NVIDIA/cccl#8859: Adds increment_kernel and prior CUDA runtime and managed-memory smoke tests to the same file.

Suggested reviewers

alliepiper

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c99fe5ab-3d24-479e-9214-fd813e9002a9

📥 Commits

Reviewing files that changed from the base of the PR and between fb8629d and 9021f99.

📒 Files selected for processing (1)

test/cuda_smoke/cuda_runtime_smoke.cu

charan-003 · 2026-05-30T21:52:08Z

@bernhardmgruber @alliepiper added a smoke test for GPU memory allocation

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

test/cuda_smoke/cuda_runtime_smoke.cu (2)
85-109: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

important: Qualify CUDA runtime free-function calls from global scope.

In this block, use ::cudaMalloc, ::cudaMemcpy, ::cudaGetLastError, ::cudaDeviceSynchronize, and ::cudaFree to match the repository rule for free-function qualification.

As per coding guidelines: “All calls to free functions must be fully qualified starting from the global namespace, e.g., ::cuda::ceil_div, including calls to functions in the same namespace”.

78-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Make this test self-contained for device readiness.

Line 85 allocates immediately, but this case does not verify device availability or set a device locally. Add a local cudaGetDeviceCount/SKIP + cudaSetDevice(0) in this test so it is independent of other test ordering.
#!/bin/bash
# Verify whether this test case has local device readiness guards.
rg -n -C3 'TEST_CASE\("cudaMalloc/cudaFree round-trip works"' test/cuda_smoke/cuda_runtime_smoke.cu
rg -n -C2 'cudaGetDeviceCount|cudaSetDevice|SKIP\(' test/cuda_smoke/cuda_runtime_smoke.cu

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3cca54e7-d47f-4bf2-aada-49eb9a5f6f38

📥 Commits

Reviewing files that changed from the base of the PR and between 9021f99 and 1691f63.

📒 Files selected for processing (1)

test/cuda_smoke/cuda_runtime_smoke.cu

bernhardmgruber · 2026-05-31T19:02:06Z

/ok to test 1691f63

davebayer · 2026-06-01T07:24:13Z

+
+// smoke test for GPU memory allocation/deallocation
+
+TEST_CASE("cudaMalloc/cudaFree round-trip works", "[cuda_smoke][device_memory]")


We should also add a test for async memory allocations from pools

Yes, in a separate PR!

bernhardmgruber · 2026-06-01T08:07:50Z

/ok to test fd2a4cc

davebayer · 2026-06-01T12:15:18Z

/ok to test b79a156

github-actions · 2026-06-01T21:53:22Z

🥳 CI Workflow Results

🟩 Finished in 9h 34m: Pass: 100%/501 | Total: 3d 10h | Max: 49m 36s | Hits: 99%/635766

See results here.

bernhardmgruber · 2026-06-02T07:30:16Z

@charan-003 thank you for the contribution! This is great!

very gpu alloc/dealloc

9021f99

github-project-automation Bot added this to CCCL May 30, 2026

github-project-automation Bot moved this to Todo in CCCL May 30, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 30, 2026

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Comment thread test/cuda_smoke/cuda_runtime_smoke.cu Outdated

Comment thread test/cuda_smoke/cuda_runtime_smoke.cu Outdated

rename

1691f63

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

bernhardmgruber approved these changes May 31, 2026

View reviewed changes

Comment thread test/cuda_smoke/cuda_runtime_smoke.cu Outdated

This comment has been minimized.

Sign in to view

hardcode grid and block

fd2a4cc

davebayer reviewed Jun 1, 2026

View reviewed changes

bernhardmgruber approved these changes Jun 1, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

Merge branch 'main' into gpu-memory-alloc

b79a156

This comment has been minimized.

Sign in to view

bernhardmgruber enabled auto-merge (squash) June 1, 2026 21:39

bernhardmgruber merged commit 4fd733e into NVIDIA:main Jun 1, 2026
1035 of 1039 checks passed

charan-003 deleted the gpu-memory-alloc branch June 1, 2026 22:45


		// smoke test for GPU memory allocation/deallocation

		TEST_CASE("cudaMalloc/cudaFree round-trip works", "[cuda_smoke][device_memory]")

Conversation

charan-003 commented May 30, 2026

Uh oh!

copy-pr-bot Bot commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

charan-003 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bernhardmgruber commented May 31, 2026

Uh oh!

This comment has been minimized.

davebayer Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber commented Jun 1, 2026

Uh oh!

This comment has been minimized.

davebayer commented Jun 1, 2026

Uh oh!

This comment has been minimized.

github-actions Bot commented Jun 1, 2026

🥳 CI Workflow Results

🟩 Finished in 9h 34m: Pass: 100%/501 | Total: 3d 10h | Max: 49m 36s | Hits: 99%/635766

Uh oh!

Uh oh!

bernhardmgruber commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented May 30, 2026 •

edited

Loading

charan-003 commented May 30, 2026 •

edited

Loading