Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4) by justinchuby · Pull Request #381 · microsoft/olive-recipes

justinchuby · 2026-04-23T23:43:17Z

Olive recipes for google/gemma-4-E2B-it using the MobiusModelBuilder pass (Olive PR microsoft/Olive#2406).

Recipes

Config	Pipeline	Output
`gemma4_fp32_cpu.json`	MobiusModelBuilder (fp32)	4 ONNX components (~5GB)
`gemma4_int4_cuda.json`	MobiusModelBuilder (fp16) → OnnxBlockWiseRtnQuantization (int4)	4 quantized ONNX components (~2.8GB)
`gemma4_int4_kquant_cpu.json`	MobiusModelBuilder (fp32) → OnnxKQuantQuantization (int4)	4 quantized ONNX components (k-quant, CPU)

Prerequisites

pip install olive-ai[gpu] mobius-ai

Validated

INT4 CUDA pipeline tested end-to-end (~3.5 min):

97-98% of weights quantized (MatMulNBits + GatherBlockQuantized)
4 components: decoder (2.4G), audio (152M), embedding (199M), vision (89M)

Two Olive pipeline configs for google/gemma-4-E2B-it: - gemma4_fp32_cpu.json: FP32 build for CPU - gemma4_int4_cuda.json: FP16 build + OnnxBlockWiseRtnQuantization INT4 Requires olive-ai with MobiusModelBuilder pass (Olive PR #2406) and mobius-ai package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

Copilot

Pull request overview

Adds Olive recipe configs for building and quantizing the HuggingFace google/gemma-4-E2B-it model using the MobiusModelBuilder pass, including an INT4 CUDA pipeline.

Changes:

Added FP32 CPU build recipe (gemma4_fp32_cpu.json) using MobiusModelBuilder.
Added FP16 build + INT4 RTN quantization CUDA recipe (gemma4_int4_cuda.json) using OnnxBlockWiseRtnQuantization.
Added model-specific README documenting prerequisites and how to run both recipes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
google-gemma-4-E2B-it/gemma4_int4_cuda.json	New CUDA recipe: build FP16 with Mobius, then INT4 block-wise RTN quantization.
google-gemma-4-E2B-it/gemma4_fp32_cpu.json	New CPU recipe: build FP32 with Mobius.
google-gemma-4-E2B-it/README.md	Usage docs for running the two new recipes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

devang-ml · 2026-04-23T23:58:19Z

Please add info.yml and LICENSE file.

justinchuby · 2026-04-24T00:00:46Z

@copilot Add info.yml and LICENSE file.

Agent-Logs-Url: https://github.com/microsoft/olive-recipes/sessions/5a084083-4534-4971-82c9-91d806abcd01 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot · 2026-04-24T00:03:52Z

@copilot Add info.yml and LICENSE file.

Added info.yml and LICENSE (Apache 2.0, matching the google-deepmind/gemma repo) to google-gemma-4-E2B-it/ in commit 58b231a.

Please add info.yml and LICENSE file.

Added info.yml and LICENSE (Apache 2.0) to google-gemma-4-E2B-it/ in commit 58b231a.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

ai.google.dev
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

The MobiusModelBuilder pass now automatically generates ORT GenAI config files (genai_config.json, tokenizer.json, etc.) alongside the ONNX models. This commit updates the recipe documentation to reflect this feature.

Copilot AI review requested due to automatic review settings April 23, 2026 23:43

Copilot started reviewing on behalf of justinchuby April 23, 2026 23:45 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread google-gemma-4-E2B-it/gemma4_fp32_cpu.json Outdated

Comment thread google-gemma-4-E2B-it/gemma4_int4_cuda.json Outdated

justinchuby and others added 2 commits April 23, 2026 16:52

Update google-gemma-4-E2B-it/gemma4_fp32_cpu.json

de8dd94

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update google-gemma-4-E2B-it/gemma4_int4_cuda.json

8df41d2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of justinchuby April 24, 2026 00:00 View session

Add info.yml and LICENSE to google-gemma-4-E2B-it

58b231a

Agent-Logs-Url: https://github.com/microsoft/olive-recipes/sessions/5a084083-4534-4971-82c9-91d806abcd01 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot finished work on behalf of justinchuby April 24, 2026 00:04

justinchuby mentioned this pull request May 4, 2026

feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export microsoft/Olive#2406

Open

justinchuby added 2 commits May 4, 2026 07:36

docs: update README to mention ORT GenAI config files in output

2f04b0e

The MobiusModelBuilder pass now automatically generates ORT GenAI config files (genai_config.json, tokenizer.json, etc.) alongside the ONNX models. This commit updates the recipe documentation to reflect this feature.

Merge branch 'main' into justinchu/gemma4-mobius

0689728

justinchuby commented May 4, 2026

View reviewed changes

Comment thread google-gemma-4-E2B-it/gemma4_int4_cuda.json Outdated

justinchuby added 2 commits May 4, 2026 07:51

Apply suggestion from @justinchuby

0d866f7

feat: add Gemma4 INT4 k-quant CPU recipe

81fb245

justinchuby mentioned this pull request May 4, 2026

Olive feedback: excellent quantization experience with Gemma4 models microsoft/Olive#2440

Open

justinchuby requested review from jambayk and xiaoyu-work May 6, 2026 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4)#381

Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4)#381
justinchuby wants to merge 8 commits intomainfrom
justinchu/gemma4-mobius

justinchuby commented Apr 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

devang-ml commented Apr 23, 2026

Uh oh!

justinchuby commented Apr 24, 2026

Uh oh!

Copilot AI commented Apr 24, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

justinchuby commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Recipes

Prerequisites

Validated

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

devang-ml commented Apr 23, 2026

Uh oh!

justinchuby commented Apr 24, 2026

Uh oh!

Copilot AI commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

justinchuby commented Apr 23, 2026 •

edited

Loading

Copilot AI commented Apr 24, 2026 •

edited

Loading