Skip to content

Fix Qwen3.5-0.8B WebGPU optimization config#388

Open
xiaofeihan1 wants to merge 4 commits intomicrosoft:mainfrom
xiaofeihan1:fix/qwen3.5-webgpu-config
Open

Fix Qwen3.5-0.8B WebGPU optimization config#388
xiaofeihan1 wants to merge 4 commits intomicrosoft:mainfrom
xiaofeihan1:fix/qwen3.5-webgpu-config

Conversation

@xiaofeihan1
Copy link
Copy Markdown

No description provided.

Copilot AI review requested due to automatic review settings April 29, 2026 08:13
@xiaofeihan1
Copy link
Copy Markdown
Author

Hi @apsonawane PTAL, thanks!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Qwen3.5-0.8B “builtin” WebGPU Olive workflow configs to explicitly target a WebGPU-enabled local system, and adjusts the optimization script’s genai_config patching.

Changes:

  • Add systems.local_system (WebGpuExecutionProvider) and set "target": "local_system" in the WebGPU embedding/text/vision Olive JSON configs.
  • Update optimize.py to use vision_session_options when setting the decoder session_options.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
Qwen-Qwen3.5-0.8B/builtin/webgpu/vision.json Adds WebGPU system definition and selects it via target.
Qwen-Qwen3.5-0.8B/builtin/webgpu/text.json Adds WebGPU system definition and selects it via target.
Qwen-Qwen3.5-0.8B/builtin/webgpu/embedding.json Adds WebGPU system definition and selects it via target.
Qwen-Qwen3.5-0.8B/builtin/optimize.py Changes which session options are written into genai_config.json for the decoder.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 56 to +58
vision_session_options = {"log_id": "onnxruntime-genai", "provider_options": vision_provider_options}

config["model"]["decoder"]["session_options"] = session_options
config["model"]["decoder"]["session_options"] = vision_session_options
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder.session_options is being set to vision_session_options. For device == "gpu", vision_provider_options explicitly disables CUDA graph capture, so this change unintentionally applies the vision/embedding workaround to the decoder too (and also hides the fact that session_options is otherwise undefined). Create a separate session_options based on provider_options and assign that to the decoder, keeping vision_session_options only for vision/embedding.

Copilot uses AI. Check for mistakes.
@apsonawane
Copy link
Copy Markdown
Contributor

Why is this change required only for qwen3.5-0.8b model?

Co-authored-by: Copilot <copilot@github.com>
@xiaofeihan1
Copy link
Copy Markdown
Author

Why is this change required only for qwen3.5-0.8b model?

Thanks @apsonawane . Applied for other Qwen3.5 models. I have a follow-up question:
Why Qwen3.5-2B use k_quant_mixed but Qwen3.5-4B use rtn?

@apsonawane
Copy link
Copy Markdown
Contributor

@xiaofeihan1 We tried few combinations of quantization and selected the one that gives us best accuracy and performance. That's why there is a difference.
2B and 0.8B are very sensitive to int4 quant that's why we kept is mixed while 4B and 9B perform better on int4 and gives better performance

@apsonawane apsonawane requested a review from xiaoyu-work May 4, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants