Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4)#381
Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4)#381justinchuby wants to merge 8 commits intomainfrom
Conversation
Two Olive pipeline configs for google/gemma-4-E2B-it: - gemma4_fp32_cpu.json: FP32 build for CPU - gemma4_int4_cuda.json: FP16 build + OnnxBlockWiseRtnQuantization INT4 Requires olive-ai with MobiusModelBuilder pass (Olive PR #2406) and mobius-ai package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds Olive recipe configs for building and quantizing the HuggingFace google/gemma-4-E2B-it model using the MobiusModelBuilder pass, including an INT4 CUDA pipeline.
Changes:
- Added FP32 CPU build recipe (
gemma4_fp32_cpu.json) usingMobiusModelBuilder. - Added FP16 build + INT4 RTN quantization CUDA recipe (
gemma4_int4_cuda.json) usingOnnxBlockWiseRtnQuantization. - Added model-specific README documenting prerequisites and how to run both recipes.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| google-gemma-4-E2B-it/gemma4_int4_cuda.json | New CUDA recipe: build FP16 with Mobius, then INT4 block-wise RTN quantization. |
| google-gemma-4-E2B-it/gemma4_fp32_cpu.json | New CPU recipe: build FP32 with Mobius. |
| google-gemma-4-E2B-it/README.md | Usage docs for running the two new recipes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Please add info.yml and LICENSE file. |
|
@copilot Add info.yml and LICENSE file. |
Agent-Logs-Url: https://github.com/microsoft/olive-recipes/sessions/5a084083-4534-4971-82c9-91d806abcd01 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Added
Added Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
The MobiusModelBuilder pass now automatically generates ORT GenAI config files (genai_config.json, tokenizer.json, etc.) alongside the ONNX models. This commit updates the recipe documentation to reflect this feature.
Olive recipes for google/gemma-4-E2B-it using the MobiusModelBuilder pass (Olive PR microsoft/Olive#2406).
Recipes
gemma4_fp32_cpu.jsongemma4_int4_cuda.jsongemma4_int4_kquant_cpu.jsonPrerequisites
Validated
INT4 CUDA pipeline tested end-to-end (~3.5 min):