Skip to content

smoke test to verify GPU memory allocation/deallocation#9195

Merged
bernhardmgruber merged 4 commits into
NVIDIA:mainfrom
charan-003:gpu-memory-alloc
Jun 1, 2026
Merged

smoke test to verify GPU memory allocation/deallocation#9195
bernhardmgruber merged 4 commits into
NVIDIA:mainfrom
charan-003:gpu-memory-alloc

Conversation

@charan-003
Copy link
Copy Markdown
Contributor

Following up on #8859.

Add test for GPU memory allocation/deallocation.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 30, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec6a2649-1b4c-4fd0-959b-b3d9ef82a75e

📥 Commits

Reviewing files that changed from the base of the PR and between 1691f63 and fd2a4cc.

📒 Files selected for processing (1)
  • test/cuda_smoke/cuda_runtime_smoke.cu
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/cuda_smoke/cuda_runtime_smoke.cu

📝 Walkthrough

Summary by CodeRabbit

  • Tests
    • Added a CUDA smoke test that validates a complete device-memory round-trip: device allocation and deallocation, host↔device transfers, kernel execution with synchronization, and result verification. The test confirms expected per-element computation and verifies no CUDA errors remain after the sequence of operations.

important:

Walkthrough

Adds a fixed <<<4,64>>> kernel launch for the managed-memory smoke test and a new Catch2 test that exercises a device-memory round-trip: cudaMalloc, host→device copy, increment_kernel launch and synchronize, device→host copy, element-wise validation, cudaFree, and CUDA error-state check.

Changes

Device Memory Smoke Test

Layer / File(s) Summary
Fixed managed-memory kernel launch
test/cuda_smoke/cuda_runtime_smoke.cu
Replaces computed grid/block with a fixed <<<4, 64>>> launch in the cudaMallocManaged round-trip works test.
cudaMalloc/cudaFree round-trip test
test/cuda_smoke/cuda_runtime_smoke.cu
New TEST_CASE that allocates device int buffer with cudaMalloc, copies host→device, runs increment_kernel and synchronizes, copies device→host, verifies each element equals i + 1, frees with cudaFree, and asserts CUDA error state is clean.

Possibly related PRs

  • NVIDIA/cccl#8859: Adds increment_kernel and prior CUDA runtime and managed-memory smoke tests to the same file.

Suggested reviewers

  • alliepiper

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c99fe5ab-3d24-479e-9214-fd813e9002a9

📥 Commits

Reviewing files that changed from the base of the PR and between fb8629d and 9021f99.

📒 Files selected for processing (1)
  • test/cuda_smoke/cuda_runtime_smoke.cu

Comment thread test/cuda_smoke/cuda_runtime_smoke.cu Outdated
Comment thread test/cuda_smoke/cuda_runtime_smoke.cu Outdated
@charan-003
Copy link
Copy Markdown
Contributor Author

charan-003 commented May 30, 2026

@bernhardmgruber @alliepiper added a smoke test for GPU memory allocation

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
test/cuda_smoke/cuda_runtime_smoke.cu (2)

85-109: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

important: Qualify CUDA runtime free-function calls from global scope.

In this block, use ::cudaMalloc, ::cudaMemcpy, ::cudaGetLastError, ::cudaDeviceSynchronize, and ::cudaFree to match the repository rule for free-function qualification.

As per coding guidelines: “All calls to free functions must be fully qualified starting from the global namespace, e.g., ::cuda::ceil_div, including calls to functions in the same namespace”.


78-86: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Make this test self-contained for device readiness.

Line 85 allocates immediately, but this case does not verify device availability or set a device locally. Add a local cudaGetDeviceCount/SKIP + cudaSetDevice(0) in this test so it is independent of other test ordering.

#!/bin/bash
# Verify whether this test case has local device readiness guards.
rg -n -C3 'TEST_CASE\("cudaMalloc/cudaFree round-trip works"' test/cuda_smoke/cuda_runtime_smoke.cu
rg -n -C2 'cudaGetDeviceCount|cudaSetDevice|SKIP\(' test/cuda_smoke/cuda_runtime_smoke.cu

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3cca54e7-d47f-4bf2-aada-49eb9a5f6f38

📥 Commits

Reviewing files that changed from the base of the PR and between 9021f99 and 1691f63.

📒 Files selected for processing (1)
  • test/cuda_smoke/cuda_runtime_smoke.cu

Comment thread test/cuda_smoke/cuda_runtime_smoke.cu Outdated
@bernhardmgruber
Copy link
Copy Markdown
Contributor

/ok to test 1691f63

@github-actions

This comment has been minimized.


// smoke test for GPU memory allocation/deallocation

TEST_CASE("cudaMalloc/cudaFree round-trip works", "[cuda_smoke][device_memory]")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a test for async memory allocations from pools

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in a separate PR!

@bernhardmgruber
Copy link
Copy Markdown
Contributor

/ok to test fd2a4cc

@github-actions

This comment has been minimized.

@davebayer
Copy link
Copy Markdown
Contributor

/ok to test b79a156

@github-actions

This comment has been minimized.

@bernhardmgruber bernhardmgruber enabled auto-merge (squash) June 1, 2026 21:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

🥳 CI Workflow Results

🟩 Finished in 9h 34m: Pass: 100%/501 | Total: 3d 10h | Max: 49m 36s | Hits: 99%/635766

See results here.

@bernhardmgruber bernhardmgruber merged commit 4fd733e into NVIDIA:main Jun 1, 2026
1035 of 1039 checks passed
@charan-003 charan-003 deleted the gpu-memory-alloc branch June 1, 2026 22:45
@bernhardmgruber
Copy link
Copy Markdown
Contributor

@charan-003 thank you for the contribution! This is great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants