Add Olmo-3-7B-Instruct, SmolLM3-3B, and Mistral-NeMo-12B-Instruct recipes by hanbitmyths · Pull Request #391 · microsoft/olive-recipes

hanbitmyths · 2026-05-03T02:24:28Z

Adds Olive recipes for three new instruction-tuned LLMs, each with cpu / cuda / webgpu backends and a PyTorch baseline directory for MMLU comparison.

Models

allenai/Olmo-3-7B-Instruct (arch: olmo3)
HuggingFaceTB/SmolLM3-3B (arch: smollm3)
nvidia/Mistral-NeMo-12B-Instruct ‑> mistralai/Mistral-Nemo-Instruct-2407 (arch: mistral)

Per-backend layout (matches existing Olive recipe convention)

Each cpu/, cuda/, webgpu/ subfolder contains:

<model>_<backend>_fp32.json / _fp16.json (fp baseline export)
<model>_<backend>_int4.json (canonical INT4 export)
*_with_eval.json variants that add an MMLU evaluator
info.yaml, README.md, requirements.txt

A baseline/ folder per model carries the PyTorch+MMLU evaluation recipe used to compare INT4 vs FP baselines.

INT4 quantization pipeline

SelectiveMixedPrecision (kld_gradient for 3B/7B, k_quant_last for 12B-on-WebGPU to fit in 80GB) → GPTQ → RTN (8-bit lm_head/embeds via overrides) → ModelBuilder (precision=int4). Group size 128 for cpu/cuda, 32 for webgpu.

Validation

All 9 INT4 artifacts (3 models × 3 backends) were rebuilt from clean caches and validated:

8/9 pass the onnxruntime_genai repetition heuristic on Linux (cpu/cuda); WebGPU EP is browser-only so its repetition test on Linux is skipped.
1 known regression: SmolLM3 cuda INT4 produces a repeated 4-gram on one synthesis prompt (model-specific behavior under cuda INT4; tracked separately).

Top-level Apache 2.0 LICENSE added per model, matching the Qwen3 recipe pattern.

…ipes Adds Olive recipes for three new models with cpu/cuda/webgpu backends and pytorch baseline: - allenai/Olmo-3-7B-Instruct - HuggingFaceTB/SmolLM3-3B - nvidia/Mistral-NeMo-12B-Instruct (mistralai/Mistral-Nemo-Instruct-2407) Each backend dir includes fp32/fp16 and int4 configs (with and without MMLU eval).

Copilot

Pull request overview

Adds Olive optimization + evaluation recipes for three new instruction-tuned LLMs (OLMo-3-7B-Instruct, SmolLM3-3B, and Mistral-NeMo-12B-Instruct) across CPU/CUDA/WebGPU backends, plus PyTorch baseline evaluation configs for MMLU comparison.

Changes:

Introduces per-backend (cpu/cuda/webgpu) FP baseline and INT4 export workflows, including _with_eval MMLU evaluator variants.
Adds per-model baseline PyTorch+MMLU evaluation recipes for accuracy comparison.
Adds per-model Apache-2.0 LICENSE files and per-backend documentation/requirements.

Reviewed changes

Copilot reviewed 78 out of 78 changed files in this pull request and generated 21 comments.

Show a summary per file

File	Description
nvidia-Mistral-NeMo-12B-Instruct/webgpu/requirements.txt	WebGPU dependency set for Mistral-NeMo recipes
nvidia-Mistral-NeMo-12B-Instruct/webgpu/nvidia-Mistral-NeMo-12B-Instruct_webgpu_int4_with_eval.json	WebGPU INT4 workflow with MMLU evaluator
nvidia-Mistral-NeMo-12B-Instruct/webgpu/nvidia-Mistral-NeMo-12B-Instruct_webgpu_int4.json	WebGPU INT4 workflow (no artifacts)
nvidia-Mistral-NeMo-12B-Instruct/webgpu/nvidia-Mistral-NeMo-12B-Instruct_webgpu_fp16_with_eval.json	WebGPU FP16 workflow with MMLU evaluator
nvidia-Mistral-NeMo-12B-Instruct/webgpu/nvidia-Mistral-NeMo-12B-Instruct_webgpu_fp16.json	WebGPU FP16 workflow (no artifacts)
nvidia-Mistral-NeMo-12B-Instruct/webgpu/info.yaml	Registers WebGPU recipes for discovery/metadata
nvidia-Mistral-NeMo-12B-Instruct/webgpu/README.md	WebGPU build/eval usage + notes/results
nvidia-Mistral-NeMo-12B-Instruct/cuda/requirements.txt	CUDA dependency set for Mistral-NeMo recipes
nvidia-Mistral-NeMo-12B-Instruct/cuda/nvidia-Mistral-NeMo-12B-Instruct_cuda_int4_with_eval.json	CUDA INT4 workflow with MMLU evaluator
nvidia-Mistral-NeMo-12B-Instruct/cuda/nvidia-Mistral-NeMo-12B-Instruct_cuda_int4.json	CUDA INT4 workflow (no artifacts)
nvidia-Mistral-NeMo-12B-Instruct/cuda/nvidia-Mistral-NeMo-12B-Instruct_cuda_fp16_with_eval.json	CUDA FP16 workflow with MMLU evaluator
nvidia-Mistral-NeMo-12B-Instruct/cuda/nvidia-Mistral-NeMo-12B-Instruct_cuda_fp16.json	CUDA FP16 workflow (no artifacts)
nvidia-Mistral-NeMo-12B-Instruct/cuda/info.yaml	Registers CUDA recipes for discovery/metadata
nvidia-Mistral-NeMo-12B-Instruct/cuda/README.md	CUDA build/eval usage + notes/results
nvidia-Mistral-NeMo-12B-Instruct/cpu/requirements.txt	CPU dependency set for Mistral-NeMo recipes
nvidia-Mistral-NeMo-12B-Instruct/cpu/nvidia-Mistral-NeMo-12B-Instruct_cpu_int4_with_eval.json	CPU INT4 workflow with MMLU evaluator
nvidia-Mistral-NeMo-12B-Instruct/cpu/nvidia-Mistral-NeMo-12B-Instruct_cpu_int4.json	CPU INT4 workflow (no artifacts)
nvidia-Mistral-NeMo-12B-Instruct/cpu/nvidia-Mistral-NeMo-12B-Instruct_cpu_fp32_with_eval.json	CPU FP32 workflow with MMLU evaluator
nvidia-Mistral-NeMo-12B-Instruct/cpu/nvidia-Mistral-NeMo-12B-Instruct_cpu_fp32.json	CPU FP32 workflow (no artifacts)
nvidia-Mistral-NeMo-12B-Instruct/cpu/info.yaml	Registers CPU recipes for discovery/metadata
nvidia-Mistral-NeMo-12B-Instruct/cpu/README.md	CPU build/eval usage + notes/results
nvidia-Mistral-NeMo-12B-Instruct/baseline/requirements.txt	Baseline (PyTorch eval) dependency set
nvidia-Mistral-NeMo-12B-Instruct/baseline/nvidia-Mistral-NeMo-12B-Instruct_pytorch_with_eval.json	PyTorch MMLU baseline evaluation config
nvidia-Mistral-NeMo-12B-Instruct/baseline/info.yaml	Registers baseline recipe for discovery/metadata
nvidia-Mistral-NeMo-12B-Instruct/baseline/README.md	Baseline eval usage + reported results
nvidia-Mistral-NeMo-12B-Instruct/LICENSE	Per-model Apache-2.0 license file
allenai-Olmo-3-7B-Instruct/webgpu/requirements.txt	WebGPU dependency set for OLMo recipes
allenai-Olmo-3-7B-Instruct/webgpu/info.yaml	Registers WebGPU recipes for discovery/metadata
allenai-Olmo-3-7B-Instruct/webgpu/allenai-Olmo-3-7B-Instruct_webgpu_int4_with_eval.json	WebGPU INT4 workflow with MMLU evaluator
allenai-Olmo-3-7B-Instruct/webgpu/allenai-Olmo-3-7B-Instruct_webgpu_int4.json	WebGPU INT4 workflow (no artifacts)
allenai-Olmo-3-7B-Instruct/webgpu/allenai-Olmo-3-7B-Instruct_webgpu_fp16_with_eval.json	WebGPU FP16 workflow with MMLU evaluator
allenai-Olmo-3-7B-Instruct/webgpu/allenai-Olmo-3-7B-Instruct_webgpu_fp16.json	WebGPU FP16 workflow (no artifacts)
allenai-Olmo-3-7B-Instruct/webgpu/README.md	WebGPU build/eval usage + notes/results
allenai-Olmo-3-7B-Instruct/cuda/requirements.txt	CUDA dependency set for OLMo recipes
allenai-Olmo-3-7B-Instruct/cuda/info.yaml	Registers CUDA recipes for discovery/metadata
allenai-Olmo-3-7B-Instruct/cuda/allenai-Olmo-3-7B-Instruct_cuda_int4_with_eval.json	CUDA INT4 workflow with MMLU evaluator
allenai-Olmo-3-7B-Instruct/cuda/allenai-Olmo-3-7B-Instruct_cuda_int4.json	CUDA INT4 workflow (no artifacts)
allenai-Olmo-3-7B-Instruct/cuda/allenai-Olmo-3-7B-Instruct_cuda_fp16_with_eval.json	CUDA FP16 workflow with MMLU evaluator
allenai-Olmo-3-7B-Instruct/cuda/allenai-Olmo-3-7B-Instruct_cuda_fp16.json	CUDA FP16 workflow (no artifacts)
allenai-Olmo-3-7B-Instruct/cuda/README.md	CUDA build/eval usage + notes/results
allenai-Olmo-3-7B-Instruct/cpu/requirements.txt	CPU dependency set for OLMo recipes
allenai-Olmo-3-7B-Instruct/cpu/info.yaml	Registers CPU recipes for discovery/metadata
allenai-Olmo-3-7B-Instruct/cpu/allenai-Olmo-3-7B-Instruct_cpu_int4_with_eval.json	CPU INT4 workflow with MMLU evaluator
allenai-Olmo-3-7B-Instruct/cpu/allenai-Olmo-3-7B-Instruct_cpu_int4.json	CPU INT4 workflow (no artifacts)
allenai-Olmo-3-7B-Instruct/cpu/allenai-Olmo-3-7B-Instruct_cpu_fp32_with_eval.json	CPU FP32 workflow with MMLU evaluator
allenai-Olmo-3-7B-Instruct/cpu/allenai-Olmo-3-7B-Instruct_cpu_fp32.json	CPU FP32 workflow (no artifacts)
allenai-Olmo-3-7B-Instruct/cpu/README.md	CPU build/eval usage + notes/results
allenai-Olmo-3-7B-Instruct/baseline/requirements.txt	Baseline (PyTorch eval) dependency set
allenai-Olmo-3-7B-Instruct/baseline/info.yaml	Registers baseline recipe for discovery/metadata
allenai-Olmo-3-7B-Instruct/baseline/allenai-Olmo-3-7B-Instruct_pytorch_with_eval.json	PyTorch MMLU baseline evaluation config
allenai-Olmo-3-7B-Instruct/baseline/README.md	Baseline eval usage + reported results
allenai-Olmo-3-7B-Instruct/LICENSE	Per-model Apache-2.0 license file
HuggingFaceTB-SmolLM3-3B/webgpu/requirements.txt	WebGPU dependency set for SmolLM3 recipes
HuggingFaceTB-SmolLM3-3B/webgpu/info.yaml	Registers WebGPU recipes for discovery/metadata
HuggingFaceTB-SmolLM3-3B/webgpu/README.md	WebGPU build/eval usage + notes/results
HuggingFaceTB-SmolLM3-3B/webgpu/HuggingFaceTB-SmolLM3-3B_webgpu_int4_with_eval.json	WebGPU INT4 workflow with MMLU evaluator
HuggingFaceTB-SmolLM3-3B/webgpu/HuggingFaceTB-SmolLM3-3B_webgpu_int4.json	WebGPU INT4 workflow (no artifacts)
HuggingFaceTB-SmolLM3-3B/webgpu/HuggingFaceTB-SmolLM3-3B_webgpu_fp16_with_eval.json	WebGPU FP16 workflow with MMLU evaluator
HuggingFaceTB-SmolLM3-3B/webgpu/HuggingFaceTB-SmolLM3-3B_webgpu_fp16.json	WebGPU FP16 workflow (no artifacts)
HuggingFaceTB-SmolLM3-3B/cuda/requirements.txt	CUDA dependency set for SmolLM3 recipes
HuggingFaceTB-SmolLM3-3B/cuda/info.yaml	Registers CUDA recipes for discovery/metadata
HuggingFaceTB-SmolLM3-3B/cuda/README.md	CUDA build/eval usage + notes/results
HuggingFaceTB-SmolLM3-3B/cuda/HuggingFaceTB-SmolLM3-3B_cuda_int4_with_eval.json	CUDA INT4 workflow with MMLU evaluator
HuggingFaceTB-SmolLM3-3B/cuda/HuggingFaceTB-SmolLM3-3B_cuda_int4.json	CUDA INT4 workflow (no artifacts)
HuggingFaceTB-SmolLM3-3B/cuda/HuggingFaceTB-SmolLM3-3B_cuda_fp16_with_eval.json	CUDA FP16 workflow with MMLU evaluator
HuggingFaceTB-SmolLM3-3B/cuda/HuggingFaceTB-SmolLM3-3B_cuda_fp16.json	CUDA FP16 workflow (no artifacts)
HuggingFaceTB-SmolLM3-3B/cpu/requirements.txt	CPU dependency set for SmolLM3 recipes
HuggingFaceTB-SmolLM3-3B/cpu/info.yaml	Registers CPU recipes for discovery/metadata
HuggingFaceTB-SmolLM3-3B/cpu/README.md	CPU build/eval usage + notes/results
HuggingFaceTB-SmolLM3-3B/cpu/HuggingFaceTB-SmolLM3-3B_cpu_int4_with_eval.json	CPU INT4 workflow with MMLU evaluator
HuggingFaceTB-SmolLM3-3B/cpu/HuggingFaceTB-SmolLM3-3B_cpu_int4.json	CPU INT4 workflow (no artifacts)
HuggingFaceTB-SmolLM3-3B/cpu/HuggingFaceTB-SmolLM3-3B_cpu_fp32_with_eval.json	CPU FP32 workflow with MMLU evaluator
HuggingFaceTB-SmolLM3-3B/cpu/HuggingFaceTB-SmolLM3-3B_cpu_fp32.json	CPU FP32 workflow (no artifacts)
HuggingFaceTB-SmolLM3-3B/baseline/requirements.txt	Baseline (PyTorch eval) dependency set
HuggingFaceTB-SmolLM3-3B/baseline/info.yaml	Registers baseline recipe for discovery/metadata
HuggingFaceTB-SmolLM3-3B/baseline/README.md	Baseline eval usage + reported results
HuggingFaceTB-SmolLM3-3B/baseline/HuggingFaceTB-SmolLM3-3B_pytorch_with_eval.json	PyTorch MMLU baseline evaluation config
HuggingFaceTB-SmolLM3-3B/LICENSE	Per-model Apache-2.0 license file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+sentencepiece
+tiktoken
+torch
+transformers>=4.57.1


+sentencepiece
+tiktoken
+torch
+transformers>=4.57.1


+sentencepiece
+tiktoken
+torch
+transformers>=4.57.1


+	"input_model": {
+		"type": "HfModel",
+		"model_path": "HuggingFaceTB/SmolLM3-3B",
+		"load_kwargs": {
+			"torch_dtype": "float16"
+		}
+	},
+	"systems": {
+		"local_system": {
+			"type": "LocalSystem",
+			"accelerators": [
+				{
+					"device": "gpu",
+					"execution_providers": [
+						"CUDAExecutionProvider"
+					]
+				}
+			]
+		}
+	},
+	"passes": {
+		"m": {
+			"type": "ModelBuilder",
+			"precision": "fp16"
+		},
+		"t": {
+			"type": "GraphSurgeries",
+			"surgeries": [
+				{
+					"surgeon": "TieWordEmbeddings"
+				}
+			]
+		}
+	},
+	"target": "local_system",
+	"log_severity_level": 0,
+	"output_dir": "model_cuda_fp16",
+	"cache_dir": "cache_cuda_fp16",
+	"evaluators": {
+		"mmlu": {
+			"type": "LMEvaluator",
+			"tasks": [
+				"mmlu"
+			],
+			"batch_size": 1,
+			"max_length": 2048,
+			"provider_options": {
+				"enable_skip_layer_norm_strict_mode": "1"
+			}
+		}
+	},
+	"evaluator": "mmlu",
+	"evaluate_input_model": false


+	"input_model": {
+		"type": "HfModel",
+		"model_path": "HuggingFaceTB/SmolLM3-3B",
+		"load_kwargs": {
+			"torch_dtype": "float16"
+		}
+	},
+	"systems": {
+		"local_system": {
+			"type": "LocalSystem",
+			"accelerators": [
+				{
+					"device": "cpu",
+					"execution_providers": [
+						"CPUExecutionProvider"
+					]
+				}
+			]
+		}
+	},
+	"passes": {
+		"m": {
+			"type": "ModelBuilder",
+			"precision": "fp32"
+		},
+		"t": {
+			"type": "GraphSurgeries",
+			"surgeries": [
+				{
+					"surgeon": "TieWordEmbeddings"
+				}
+			]
+		}
+	},
+	"target": "local_system",
+	"log_severity_level": 0,
+	"output_dir": "model_cpu_fp32",
+	"cache_dir": "cache_cpu_fp32",
+	"evaluators": {
+		"mmlu": {
+			"type": "LMEvaluator",
+			"tasks": [
+				"mmlu"
+			],
+			"batch_size": 1,
+			"max_length": 2048,
+			"provider_options": {
+				"enable_skip_layer_norm_strict_mode": "1"
+			}
+		}
+	},
+	"evaluator": "mmlu",
+	"evaluate_input_model": false


+	"input_model": {
+		"type": "HfModel",
+		"model_path": "HuggingFaceTB/SmolLM3-3B",
+		"load_kwargs": {
+			"torch_dtype": "float16"
+		}
+	},
+	"systems": {
+		"local_system": {
+			"type": "LocalSystem",
+			"accelerators": [
+				{
+					"device": "gpu",
+					"execution_providers": [
+						"WebGpuExecutionProvider"
+					]
+				}
+			]
+		}
+	},
+	"passes": {
+		"m": {
+			"type": "ModelBuilder",
+			"precision": "fp16"
+		},
+		"t": {
+			"type": "GraphSurgeries",
+			"surgeries": [
+				{
+					"surgeon": "TieWordEmbeddings"
+				}
+			]
+		}
+	},
+	"target": "local_system",
+	"log_severity_level": 0,
+	"output_dir": "model_webgpu_fp16",
+	"cache_dir": "cache_webgpu_fp16",
+	"evaluators": {
+		"mmlu": {
+			"type": "LMEvaluator",
+			"tasks": [
+				"mmlu"
+			],
+			"batch_size": 1,
+			"max_length": 2048,
+			"provider_options": {
+				"enable_skip_layer_norm_strict_mode": "1"
+			}
+		}
+	},
+	"evaluator": "mmlu",
+	"evaluate_input_model": false


+sentencepiece
+tiktoken
+torch
+transformers>=4.57.1


+torch
+transformers>=4.57.1
+onnxruntime-genai
+onnxruntime-webgpu


+sentencepiece
+tiktoken
+torch
+transformers>=4.57.1


+sentencepiece
+tiktoken
+torch
+transformers>=4.57.1


Copilot AI review requested due to automatic review settings May 3, 2026 02:24

Copilot started reviewing on behalf of hanbitmyths May 3, 2026 02:25 View session

Merge branch 'main' into sunghcho/add-olmo3-smollm3-mistral-nemo

ee13b49

Copilot AI reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Olmo-3-7B-Instruct, SmolLM3-3B, and Mistral-NeMo-12B-Instruct recipes#391

Add Olmo-3-7B-Instruct, SmolLM3-3B, and Mistral-NeMo-12B-Instruct recipes#391
hanbitmyths wants to merge 2 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/add-olmo3-smollm3-mistral-nemo

hanbitmyths commented May 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hanbitmyths commented May 3, 2026

Models

Per-backend layout (matches existing Olive recipe convention)

INT4 quantization pipeline

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants