[New Feature] MOPD by doctorMcy · Pull Request #9035 · modelscope/ms-swift

doctorMcy · 2026-04-08T01:49:59Z

PR type

Bug Fix
[ √ ] New Feature
Document Updates
More Models or Datasets Support

PR information

https://github.com/XiaomiMiMo/MiMo-V2-Flash/blob/main/paper.pdf
小米团队推出的MiMo-V2中提出了一种新的后训练范式Multi-Teacher On-Policy Distillation (MOPD)，用于解决能力不平衡的问题（例如：提升了数学，代码能力下降）。
Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

gemini-code-assist

Code Review

This pull request introduces the teacher_model_group argument and refactors the GKDTrainer to use a choose_teacher_model method for selecting teacher models during loss computation. Feedback highlights that the use_mopd flag is missing from the argument definitions, which will result in an AttributeError. Additionally, the choose_teacher_model implementation is currently incomplete and will cause crashes when MOPD is enabled, and the core logic for incorporating teacher weights into the JSD loss calculation is still pending.

gemini-code-assist · 2026-04-08T01:51:31Z

swift/arguments/rlhf_args.py

            remotely. When this is set, `teacher_model` is not required. Defaults to None.
    """
    teacher_model: Optional[str] = None
+    teacher_model_group: List[str] = field(default_factory=list)


use_mopd 标志在 GKDTrainer 中被引用，但未在参数定义中声明。应在此处添加以避免 AttributeError。此外，建议更新 TeacherModelArguments 的 docstring 以包含 teacher_model_group 和 use_mopd 的说明。

Suggested change

teacher_model_group: List[str] = field(default_factory=list)

teacher_model_group: List[str] = field(default_factory=list)

use_mopd: bool = False

gemini-code-assist · 2026-04-08T01:51:31Z

swift/rlhf_trainers/gkd_trainer.py

+    def choose_teacher_model(self):
+        if not self.args.use_mopd:
+            return self.teacher_model
+        #todo 使用mopd时从教师模型组选择最佳模型


choose_teacher_model 的实现不完整。当启用 use_mopd 时，该函数目前隐式返回 None，这将导致 compute_loss 在预期有效模型的地方（例如第 311 行或第 419 行）发生崩溃。此外，teacher_model_group 中的模型需要在训练器初始化期间加载并准备为 module 对象，以便在此处使用。

gemini-code-assist · 2026-04-08T01:51:31Z

swift/rlhf_trainers/gkd_trainer.py

            t_log_probs = F.log_softmax(t_chunk, dim=-1)
            del s_chunk, t_chunk

+            #todo 使用mopd的计算函数，增加教师模型权重


此 TODO 表明 MOPD 的核心逻辑（将教师权重纳入 JSD 损失计算）尚未实现。如果没有这部分实现，MOPD 功能将无法按预期工作。

[MOPD] init

54822f2

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Feature] MOPD#9035

[New Feature] MOPD#9035
doctorMcy wants to merge 1 commit intomodelscope:mainfrom
doctorMcy:feature_MOPD

doctorMcy commented Apr 8, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

gemini-code-assist bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	teacher_model_group: List[str] = field(default_factory=list)
	teacher_model_group: List[str] = field(default_factory=list)
	use_mopd: bool = False

Conversation

doctorMcy commented Apr 8, 2026

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants