TPU-MLIR

An open-source, MLIR-based machine-learning compiler for TPUs.

English · 简体中文 · Quick Start · Docs · Issues

✨ Overview

TPU-MLIR converts pre-trained neural networks from mainstream frameworks into bmodel files that run efficiently on TPUs. Built on top of MLIR, it provides a unified IR, a clean lowering pipeline, and a rich set of tools for quantization, calibration, and deployment.

┌──────────────────┐    model_transform.py    ┌──────────┐    model_deploy.py    ┌──────────┐
│ ONNX / PyTorch / │ ───────────────────────► │   MLIR   │ ────────────────────► │  bmodel  │
│  TFLite / Caffe  │     (front-end import)   │  (TOP →  │  (lowering, quant,    │  on TPU  │
│   HuggingFace    │                          │   TPU)   │   layer-group, …)     │          │
└──────────────────┘                          └──────────┘                       └──────────┘

🚀 Highlights

Multi-framework front-ends — PyTorch, ONNX, TFLite, Caffe (other frameworks via ONNX).
LLM-ready — one-shot conversion of HuggingFace LLMs (Qwen, MiniCPM-V, …) via llm_convert.py.
Full quantization toolchain — F32 / BF16 / F16 / INT8 (symmetric & asymmetric), AWQ / GPTQ / AutoRound passthrough, calibration, QAT.
MLIR-based pipeline — clean dialects (Top / Tpu), pattern rewrites, layer-group memory planning.
Production tooling — model_runner, model_tool, accuracy validation, simulator, visualizer.
Bilingual docs & active community — English / 中文 manuals, papers, and video tutorials.

🔧 Installation

TPU-MLIR runs inside a prebuilt Docker image. After the container is running you can either install the Python wheel or build from source.

1. Pull the Docker image

docker pull sophgo/tpuc_dev:latest

If the pull fails, download the tarball and load it manually:

wget https://sophon-assets.sophon.cn/sophon-prod-s3/drive/25/04/15/16/tpuc_dev_v3.4.tar.gz
docker load -i tpuc_dev_v3.4.tar.gz

Create and enter the container:

docker run --privileged --name tpu-mlir -v $PWD:/workspace -it sophgo/tpuc_dev:latest

2a. Install the prebuilt wheel (recommended)

Requires Python ≥ 3.10 on Ubuntu 22.04 (already satisfied inside the Docker image).

pip install tpu_mlir

2b. Build from source

cd /workspace/tpu-mlir
pip install -r requirements.txt
source ./envsetup.sh
./build.sh

🤖 Quick Start — LLM (Qwen)

Convert and run a HuggingFace LLM (here: Qwen) on a TPU.

Click to expand the full LLM walkthrough

1. Download the model

A pre-quantized AWQ / GPTQ / AutoRound build is recommended.

git lfs install
git clone https://huggingface.co/Intel/Qwen3.5-2B-int4-AutoRound

2. Compile to bmodel

# If you encounter transformers/torch version issues:
#   pip3 install transformers torchvision -U
# --max_input_length sets the max prefill length; if omitted it defaults to -s.
llm_convert.py \
  -m /workspace/Qwen3.5-2B-int4-AutoRound \
  --max_input_length 1024 \
  -s 2048 \
  -c bm1684x \
  --max_pixels 768,768 \
  -o qwen3.5_2b

Main arguments of llm_convert.py:

Parameter	Short	Required	Description
`model_path`	`m`	✅	Path to the model weights
`seq_length`	`s`	✅	Maximum sequence length
`max_input_length`	—	—	Maximum input length; defaults to `seq_length` (`-s`) when omitted
`quantize`	`q`	✅	Quantization type: `auto` / `w4bf16` / `w4f16` / `bf16` / `f16`, etc. (omit if the source model is already quantized)
`q_group_size`	`g`	—	Group size for quantization (default `64`)
`chip`	`c`	✅	Target platform: `bm1684x` / `bm1688` / `cv186ah`
`max_pixels`	—	—	Multi-modal max resolution `width,height`. Defaults vary by `model_type` (qwen2_5_vl: `672,896`; minicpmv: `980,980`; otherwise `768,768`)
`out_dir`	`o`	✅	Output directory

3. Run on PCIe / SoC

Copy the python_demo folder onto your device and build it:

mkdir build && cd build
cmake ..
make
cp *cpython*.so ..
cd ..

Then run the bmodel:

python3 pipeline.py -m xxxx.bmodel -c config

Sample output:

🖼️ Quick Start — Vision (YOLOv5)

Compile and run yolov5s.onnx on the BM1684X TPU. The model is bundled in regression/model/yolov5s.onnx.

Click to expand the full YOLOv5 walkthrough

1. Prepare the working directory

mkdir model_yolov5s && cd model_yolov5s
cp ${REGRESSION_PATH}/model/yolov5s.onnx .
cp -rf ${REGRESSION_PATH}/dataset/COCO2017 .
cp -rf ${REGRESSION_PATH}/image .
mkdir workspace && cd workspace

2. Convert the model to MLIR

If the model takes images as input, the preprocessing must be specified. The preprocessing formula is:

$$y = (x - \text{mean}) \times \text{scale}$$

YOLOv5's official input is RGB scaled by 1/255, so mean = 0,0,0 and scale = 0.0039216,0.0039216,0.0039216.

model_transform.py \
  --model_name yolov5s \
  --model_def ../yolov5s.onnx \
  --input_shapes [[1,3,640,640]] \
  --mean 0.0,0.0,0.0 \
  --scale 0.0039216,0.0039216,0.0039216 \
  --keep_aspect_ratio \
  --pixel_format rgb \
  --output_names 350,498,646 \
  --test_input ../image/dog.jpg \
  --test_result yolov5s_top_outputs.npz \
  --mlir yolov5s.mlir

Main arguments of model_transform.py:

Argument	Required	Description
`model_name`	✅	Model name
`model_def`	✅	Model definition file (`.onnx`, `.pt`, `.tflite`, `.prototxt`)
`model_data`	—	Caffe weight file (`.caffemodel`)
`input_shapes`	—	Input shape, e.g. `[[1,3,640,640]]` — supports multiple inputs
`resize_dims`	—	Image resize size before feeding into the model
`keep_aspect_ratio`	—	Keep aspect ratio (pads with 0). Off by default
`mean`	—	Per-channel mean (default `0,0,0`)
`scale`	—	Per-channel scale (default `1,1,1`)
`pixel_format`	—	`rgb` / `bgr` / `gray` / `rgbd`
`output_names`	—	Output tensor names. Defaults to model outputs
`test_input`	—	Validation input (image / npy / npz). Skipped if not specified
`test_result`	—	Output file for validation
`excepts`	—	Comma-separated list of layers excluded from validation
`debug`	—	Keep intermediate files
`mlir`	✅	Output MLIR file path

A ${model_name}_in_f32.npz file containing the preprocessed input is generated after this step.

3. MLIR → F16 bmodel

model_deploy.py \
  --mlir yolov5s.mlir \
  --quantize F16 \
  --processor bm1684x \
  --test_input yolov5s_in_f32.npz \
  --test_reference yolov5s_top_outputs.npz \
  --model yolov5s_1684x_f16.bmodel

Main arguments of model_deploy.py:

Argument	Required	Description
`mlir`	✅	Input MLIR file
`quantize`	✅	`F32` / `BF16` / `F16` / `INT8`
`processor`	✅	Target chip
`calibration_table`	—	Calibration table (required for INT8)
`tolerance`	—	Min similarity between MLIR-quantized and MLIR-fp32 inference
`correctness`	—	Min similarity between simulator and MLIR-quantized inference (default `0.99,0.90`)
`excepts`	—	Comma-separated layers excluded from validation
`debug`	—	Keep intermediate files
`model`	✅	Output bmodel path
`dynamic`	—	Dynamic codegen for dynamic shapes

4. MLIR → INT8 bmodel

Run calibration first (typically 100–1000 images). Prefer symmetric quantization unless accuracy demands asymmetric.

run_calibration.py yolov5s.mlir \
  --dataset ../COCO2017 \
  --input_num 100 \
  -o yolov5s_cali_table

model_deploy.py \
  --mlir yolov5s.mlir \
  --quantize INT8 \
  --calibration_table yolov5s_cali_table \
  --processor bm1684x \
  --test_input yolov5s_in_f32.npz \
  --test_reference yolov5s_top_outputs.npz \
  --tolerance 0.85,0.45 \
  --model yolov5s_1684x_int8.bmodel

5. Verify the results

The sample script lives at python/samples/detect_yolov5.py.

# ONNX
detect_yolov5.py --input ../image/dog.jpg --model ../yolov5s.onnx          --output dog_origin.jpg
# F16 bmodel
detect_yolov5.py --input ../image/dog.jpg --model yolov5s_1684x_f16.bmodel --output dog_f16.jpg
# INT8 bmodel
detect_yolov5.py --input ../image/dog.jpg --model yolov5s_1684x_int8.bmodel --output dog_int8.jpg

Comparison of outputs:

🛠️ Auxiliary Tools

`model_runner.py` — universal inference runner

Supports bmodel / mlir / PyTorch / ONNX / TFLite / Caffe.

model_runner.py \
  --input  resnet18_in_f32.npz \
  --model  resnet18_1684x_f32.bmodel \
  --output resnet18_output.npz

`model_tool` — inspect & edit bmodel

model_tool
  --info     model_file                                : show brief model info
  --print    model_file                                : show detailed model info
  --extract  model_file                                : split a multi-net bmodel into single-net bmodels
  --combine  file1 .. fileN -o new_file                : merge bmodels by file path
  --combine_dir dir1 .. dirN -o new_dir                : merge bmodels by directory
  --dump     model_file start_offset byte_size out_file: dump raw bytes from a bmodel

model_tool --info resnet18_1684x_f32.bmodel

📖 Resources

Documentation & Papers

Type	Link
Paper	TPU-MLIR (arXiv 2210.15016)
Manual	Technical Reference Manual
Guide	Quick Start

Talks

Video tutorials

Click to expand video index

#	Topic	Links
01	What is a Deep Learning Compiler?	Intro
02	MLIR Intro	Syntax 1 · Syntax 2 · Syntax 3 · Dialect Conversion · Pattern Rewriting
03	TPU-MLIR Intro	Overview · Front-end · Lowering
04	Quantization	Overview · Formula · Calibration · QAT
05	TPU Memory	Ep1 · Ep2
06	TPU-MLIR Practice	To ONNX · Graph Optimization · Operator Support · Model Support · Fuse Preprocess · Accuracy Validation

📝 Citation

If TPU-MLIR helps your research, please cite:

@misc{tpumlir2022,
  title         = {TPU-MLIR: A Compiler For TPU Using MLIR},
  author        = {HuPengchao and LuMan and WangLei and JiangGuoyue},
  year          = {2022},
  eprint        = {2210.15016},
  archivePrefix = {arXiv},
  primaryClass  = {cs.PL}
}

🤝 Contributing

Bug reports, feature requests and pull requests are welcome! Before you start:

Search existing issues to avoid duplicates.
For non-trivial changes, open an issue first to discuss the design.
Run the regression tests under regression/ before sending a PR.

📄 License

This project is licensed under the terms of the LICENSE file in the root of this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 4,744 Commits
.github		.github
.vscode		.vscode
bindings		bindings
capi		capi
docker		docker
docs		docs
experimental		experimental
hooks		hooks
ignore		ignore
include		include
lib		lib
python		python
regression		regression
release_tools		release_tools
template		template
test		test
third_party		third_party
tools		tools
unittests		unittests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.style.yapf		.style.yapf
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md
SECURITY.md		SECURITY.md
build.sh		build.sh
envsetup.sh		envsetup.sh
release.sh		release.sh
release_doc.sh		release_doc.sh
release_pip.sh		release_pip.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TPU-MLIR

✨ Overview

🚀 Highlights

📚 Table of Contents

🔧 Installation

1. Pull the Docker image

2a. Install the prebuilt wheel (recommended)

2b. Build from source

🤖 Quick Start — LLM (Qwen)

1. Download the model

2. Compile to bmodel

3. Run on PCIe / SoC

🖼️ Quick Start — Vision (YOLOv5)

1. Prepare the working directory

2. Convert the model to MLIR

3. MLIR → F16 bmodel

4. MLIR → INT8 bmodel

5. Verify the results

🛠️ Auxiliary Tools

`model_runner.py` — universal inference runner

`model_tool` — inspect & edit bmodel

📖 Resources

Documentation & Papers

Talks

Video tutorials

📝 Citation

🤝 Contributing

📄 License

About

Uh oh!

Releases 59

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TPU-MLIR

✨ Overview

🚀 Highlights

📚 Table of Contents

🔧 Installation

1. Pull the Docker image

2a. Install the prebuilt wheel (recommended)

2b. Build from source

🤖 Quick Start — LLM (Qwen)

1. Download the model

2. Compile to bmodel

3. Run on PCIe / SoC

🖼️ Quick Start — Vision (YOLOv5)

1. Prepare the working directory

2. Convert the model to MLIR

3. MLIR → F16 bmodel

4. MLIR → INT8 bmodel

5. Verify the results

🛠️ Auxiliary Tools

model_runner.py — universal inference runner

model_tool — inspect & edit bmodel

📖 Resources

Documentation & Papers

Talks

Video tutorials

📝 Citation

🤝 Contributing

📄 License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 59

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`model_runner.py` — universal inference runner

`model_tool` — inspect & edit bmodel

Packages