WIP: Enable zero-copy for QNN GPU by qti-mattsinc · Pull Request #2105 · microsoft/onnxruntime-genai

qti-mattsinc · 2026-04-27T22:55:51Z

Use the newly added "enable_dx12_shared_memory_allocator" provided option to allocate the KV cache on CPU-accessible GPU memory.
This provides a large speedup by eliminating unnecessary copy overhead.

* Use the newly added "enable_dx12_shared_memory_allocator" provided option to allocate the KV cache on CPU-accessible GPU memory. * This provides a large speedup by eliminating unnecessary copy overhead. > Co-authored-by: qti-mattsinc <mattsinc@qti.qualcomm.com>

qti-mattsinc · 2026-04-27T22:57:52Z

I had to modify this file and onnxruntime_inline.h to be able to access the CreateMemoryInfo_V2 API. It looks like these files are manually checked-in?

qti-mattsinc · 2026-04-27T23:43:12Z

+    }
+    if (use_dx12_shared_memory) {
+      provider_options_list.back().options.emplace_back("enable_dx12_shared_memory_allocator", "1");
+      provider_options_list.back().device_filtering_options = Config::DeviceFilteringOptions { OrtHardwareDeviceType_GPU };


note to self: would rather not touch device_filtering_options here if possible. Needs to be cleaned up in a way that still selects the correct allocator

johnpaultaken

This change does not look ideal to me, we need to discuss why it deviates from the norm.
Ideally what I like to see is no change to genai at all.
When OrtMemoryInfo is of type QnnShared can we just return the QnnGpuAllocator based on the device selection made by the user ie GPU ?
Everything else should work transparently, just like how it works for the other EPs cuda, openvino etc.
I don't see a need for enable_dx12_shared_memory_allocator option, why not enable it always in the EP ?
Also we cannot have variable names like use_dx12_shared_memory etc which then turns out to be only a Qnn specific option. Ideally, I would want to avoid any code in genai that are EP specific.

qti-mattsinc commented Apr 27, 2026

View reviewed changes

johnpaultaken suggested changes Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Enable zero-copy for QNN GPU#2105

WIP: Enable zero-copy for QNN GPU#2105
qti-mattsinc wants to merge 1 commit intomicrosoft:mainfrom
CodeLinaro:dev/mattsinc/gpu-zero-copy

qti-mattsinc commented Apr 27, 2026

Uh oh!

qti-mattsinc Apr 27, 2026

Uh oh!

qti-mattsinc Apr 27, 2026

Uh oh!

johnpaultaken left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qti-mattsinc commented Apr 27, 2026

Uh oh!

qti-mattsinc Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

qti-mattsinc Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

johnpaultaken left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johnpaultaken left a comment •

edited

Loading