Skip to content

Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4)#381

Open
justinchuby wants to merge 8 commits intomainfrom
justinchu/gemma4-mobius
Open

Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4)#381
justinchuby wants to merge 8 commits intomainfrom
justinchu/gemma4-mobius

Conversation

@justinchuby
Copy link
Copy Markdown

@justinchuby justinchuby commented Apr 23, 2026

Olive recipes for google/gemma-4-E2B-it using the MobiusModelBuilder pass (Olive PR microsoft/Olive#2406).

Recipes

Config Pipeline Output
gemma4_fp32_cpu.json MobiusModelBuilder (fp32) 4 ONNX components (~5GB)
gemma4_int4_cuda.json MobiusModelBuilder (fp16) → OnnxBlockWiseRtnQuantization (int4) 4 quantized ONNX components (~2.8GB)
gemma4_int4_kquant_cpu.json MobiusModelBuilder (fp32) → OnnxKQuantQuantization (int4) 4 quantized ONNX components (k-quant, CPU)

Prerequisites

pip install olive-ai[gpu] mobius-ai

Validated

INT4 CUDA pipeline tested end-to-end (~3.5 min):

  • 97-98% of weights quantized (MatMulNBits + GatherBlockQuantized)
  • 4 components: decoder (2.4G), audio (152M), embedding (199M), vision (89M)

Two Olive pipeline configs for google/gemma-4-E2B-it:
- gemma4_fp32_cpu.json: FP32 build for CPU
- gemma4_int4_cuda.json: FP16 build + OnnxBlockWiseRtnQuantization INT4

Requires olive-ai with MobiusModelBuilder pass (Olive PR #2406)
and mobius-ai package.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 23:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Olive recipe configs for building and quantizing the HuggingFace google/gemma-4-E2B-it model using the MobiusModelBuilder pass, including an INT4 CUDA pipeline.

Changes:

  • Added FP32 CPU build recipe (gemma4_fp32_cpu.json) using MobiusModelBuilder.
  • Added FP16 build + INT4 RTN quantization CUDA recipe (gemma4_int4_cuda.json) using OnnxBlockWiseRtnQuantization.
  • Added model-specific README documenting prerequisites and how to run both recipes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
google-gemma-4-E2B-it/gemma4_int4_cuda.json New CUDA recipe: build FP16 with Mobius, then INT4 block-wise RTN quantization.
google-gemma-4-E2B-it/gemma4_fp32_cpu.json New CPU recipe: build FP32 with Mobius.
google-gemma-4-E2B-it/README.md Usage docs for running the two new recipes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread google-gemma-4-E2B-it/gemma4_fp32_cpu.json Outdated
Comment thread google-gemma-4-E2B-it/gemma4_int4_cuda.json Outdated
justinchuby and others added 2 commits April 23, 2026 16:52
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@devang-ml
Copy link
Copy Markdown
Contributor

Please add info.yml and LICENSE file.

@justinchuby
Copy link
Copy Markdown
Author

@copilot Add info.yml and LICENSE file.

Agent-Logs-Url: https://github.com/microsoft/olive-recipes/sessions/5a084083-4534-4971-82c9-91d806abcd01

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 24, 2026

@copilot Add info.yml and LICENSE file.

Added info.yml and LICENSE (Apache 2.0, matching the google-deepmind/gemma repo) to google-gemma-4-E2B-it/ in commit 58b231a.

Please add info.yml and LICENSE file.

Added info.yml and LICENSE (Apache 2.0) to google-gemma-4-E2B-it/ in commit 58b231a.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • ai.google.dev
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

The MobiusModelBuilder pass now automatically generates ORT GenAI config
files (genai_config.json, tokenizer.json, etc.) alongside the ONNX models.
This commit updates the recipe documentation to reflect this feature.
Comment thread google-gemma-4-E2B-it/gemma4_int4_cuda.json Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants