GLM-4.7 MTP support #792

Edwardf0t1 · 2026-01-16T07:09:49Z

What does this PR do?

Type of change: ?

Overview: Enable GLM-4.7 PTQ workflow, including loading the standalone MTP modules and export as-is.

Usage

python3 hf_ptq.py --pyt_ckpt_path /home/omniml_data_3/models/GLM-4.7 --qformat nvfp4_mlp_only --export_path /home/omniml_data_3/zhiyuc/checkpoints/GLM-4.7-NVFP4-0203 --trust_remote_code

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes

Additional Information

Summary by CodeRabbit

New Features
- Added quantization support for GLM-4.7 model with automatic handling of specialized layer architecture.
- Added image-text data calibration capabilities for Nemotron VL model quantization.
Documentation
- Updated support matrix to reflect newly supported models and quantization features.

copy-pr-bot · 2026-01-16T07:09:53Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

codecov · 2026-01-16T07:22:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.73%. Comparing base (e024097) to head (5e42017).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #792   +/-   ##
=======================================
  Coverage   73.73%   73.73%           
=======================================
  Files         196      196           
  Lines       20412    20412           
=======================================
  Hits        15050    15050           
  Misses       5362     5362

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv · 2026-02-03T16:32:44Z

examples/llm_ptq/hf_ptq.py

        calibration_only = True

+        # Load any missing weights from non-standard safetensors (handled in get_model for non-low-memory mode)
+        from example_utils import load_mtp_weights_if_needed


please move to the top

cjluo-nv · 2026-02-03T16:39:10Z

examples/llm_ptq/example_utils.py

+        return False
+
+    # Load the index to find all referenced safetensors files
+    with open(index_file) as f:


nit: single line: index = json.loads(index_path.read_text())

This single is not really “better” here. json.load(f) is idiomatic and streams from the file object; json.loads(index_file.read_text()) reads the whole file into memory first.

examples/llm_ptq/example_utils.py

Signed-off-by: Zhiyu Cheng <[email protected]>

coderabbitai · 2026-02-04T00:14:36Z

📝 Walkthrough

Walkthrough

Adds PTQ support for GLM-4.7 models by implementing utilities to load MTP layer weights from separate files and automatically exclude these layers from quantization during the PTQ process. Includes documentation updates reflecting the new model support.

Changes

Cohort / File(s)	Summary
Documentation `CHANGELOG.rst`, `examples/llm_ptq/README.md`	Added changelog entries and support matrix documentation for GLM-4.7 PTQ support, including footnotes describing MTP layer behavior and exclusion from quantization.
MTP Weight Loading & Integration `examples/llm_ptq/example_utils.py`	Introduces `load_mtp_weights_if_needed()` utility function to inspect safetensors indices, load non-standard weight shards, and identify MTP layer prefixes. Integrates into model initialization in `get_model()` to attach discovered prefixes to the model instance.
Quantization Config Exclusion `examples/llm_ptq/hf_ptq.py`, `modelopt/torch/export/unified_export_hf.py`	Reads MTP layer prefixes and constructs exclusion patterns in quantization configuration to prevent these layers from being quantized during PTQ processing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'GLM-4.7 MTP support' directly describes the main change: adding support for GLM-4.7's MTP (Modular Tensor Parallel) modules in the PTQ workflow.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch zhiyu/glm-4.7-mtp-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 · 2026-02-03T22:37:57Z

examples/llm_ptq/example_utils.py

+        return False
+
+    # Load the index to find all referenced safetensors files
+    with open(index_file) as f:


This single is not really “better” here. json.load(f) is idiomatic and streams from the file object; json.loads(index_file.read_text()) reads the whole file into memory first.

examples/llm_ptq/example_utils.py

Edwardf0t1 · 2026-02-03T22:41:30Z

examples/llm_ptq/hf_ptq.py

        calibration_only = True

+        # Load any missing weights from non-standard safetensors (handled in get_model for non-low-memory mode)
+        from example_utils import load_mtp_weights_if_needed


Edwardf0t1 · 2026-02-04T00:22:48Z

CHANGELOG.rst

 - Add ``--opset`` option to ONNX quantization CLI to specify the target opset version for the quantized model.
 - Add support for context parallelism in Eagle speculative decoding for huggingface and megatron core models.
+- Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is.
+- Add support for image-text data calibration in PTQ for Nemotron VL models.


This is added for this PR: #755

cjluo-nv · 2026-02-04T02:57:06Z

examples/llm_ptq/example_utils.py

+                    break
+
+        # Load the weights
+        weights = load_file(str(filepath))


then maybe do device="cpu" here?

cjluo-nv reviewed Feb 3, 2026

View reviewed changes

Edwardf0t1 added 6 commits February 3, 2026 16:09

add support for MTP loading for GLM-4.7 in PTQ flow

116de1d

Signed-off-by: Zhiyu Cheng <[email protected]>

add support for MTP loading for GLM-4.7 in PTQ flow

01f0b05

Signed-off-by: Zhiyu Cheng <[email protected]>

add support for MTP loading for GLM-4.7 in PTQ flow

e13d9a2

Signed-off-by: Zhiyu Cheng <[email protected]>

minor

0ad3f88

Signed-off-by: Zhiyu Cheng <[email protected]>

skip MTP layers from quantization and export as-is

2774a2c

Signed-off-by: Zhiyu Cheng <[email protected]>

add MTP modules in excluded/ignore modules in config

39c6195

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 force-pushed the zhiyu/glm-4.7-mtp-support branch from ac6b609 to 39c6195 Compare February 4, 2026 00:14

Edwardf0t1 added 2 commits February 3, 2026 16:18

update changelog

f060d94

Signed-off-by: Zhiyu Cheng <[email protected]>

update readme

5e42017

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 commented Feb 4, 2026

View reviewed changes

Edwardf0t1 marked this pull request as ready for review February 4, 2026 01:45

Edwardf0t1 requested review from a team as code owners February 4, 2026 01:45

Edwardf0t1 requested review from cjluo-nv, jenchen13 and meenchen February 4, 2026 01:45

cjluo-nv reviewed Feb 4, 2026

View reviewed changes

examples/llm_ptq/example_utils.py

break

# Load the weights

weights = load_file(str(filepath))

Copy link

Collaborator

cjluo-nv Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then maybe do device="cpu" here?

Edwardf0t1 self-assigned this Feb 4, 2026

GLM-4.7 MTP support #792

Are you sure you want to change the base?

GLM-4.7 MTP support #792

Conversation

Edwardf0t1 commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Edwardf0t1 commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Jan 16, 2026 •

edited

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading