-
Notifications
You must be signed in to change notification settings - Fork 697
[Feature] Support ThinkingBudget Logits processor to control thinking content length #6367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
adc6dbd to
6934693
Compare
1c8ee27 to
f3741ce
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #6367 +/- ##
==========================================
Coverage ? 68.18%
==========================================
Files ? 392
Lines ? 52907
Branches ? 8300
==========================================
Hits ? 36076
Misses ? 14179
Partials ? 2652
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
466ad5d to
6f185d2
Compare
2e0d651 to
6f7ab95
Compare
dba8598 to
c2225f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
该 PR 引入 ThinkingBudgetLogitsProcessor,用于在解码阶段对 <think>...</think> 思考段生成长度做硬性限制,并打通从 DataProcessor → Engine → Worker → ModelConfig 的相关参数链路,同时补充中英文文档与单测。
Changes:
- 新增
ThinkingBudgetLogitsProcessor并注册为内置 logits processor,支持预算触发后强制换行与</think>(可选强制 stop sentence)。 - 在
text_processor/v1/text_processor侧预计算 prompt 的<think>状态,并补充think_stop_sentence→ token ids 的处理逻辑。 - Engine 启动 worker 时补齐
think_start_id/line_break_id获取与透传,并新增文档与大规模单测覆盖。
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/model_executor/test_thinking_budget.py | 覆盖 thinking budget 的关键状态机与 DataProcessor/Engine 相关分支的单测 |
| mkdocs.yml | 将 Thinking Budget 文档加入 mkdocs 导航 |
| fastdeploy/worker/worker_process.py | worker 参数解析新增 --think_start_id |
| fastdeploy/model_executor/logits_processor/thinking_budget.py | 新增 ThinkingBudgetLogitsProcessor 的核心实现 |
| fastdeploy/model_executor/logits_processor/init.py | 注册并导出 ThinkingBudgetLogitsProcessor |
| fastdeploy/input/v1/text_processor.py | v1 侧增加 prompt thinking 状态预计算与 stop sentence tokenization(并新增编码缓存) |
| fastdeploy/input/text_processor.py | 非 v1 侧增加同等能力(并新增编码缓存与 get_eos_token_id import fallback) |
| fastdeploy/engine/engine.py | 获取并透传 think_start_id,补齐 line_break_id 获取逻辑 |
| fastdeploy/engine/common_engine.py | 同 engine.py 的 worker 启动透传与 line_break_id 处理 |
| fastdeploy/config.py | ModelConfig 增加 think_start_id 字段承接 worker 入参 |
| docs/zh/features/thinking_budget.md | 中文功能文档 |
| docs/features/thinking_budget.md | 英文功能文档 |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation
引入 ThinkingBudget logits processor 用于限制 段长度,同时补充中英文使用文档与单测。
Modifications
ThinkingBudgetLogitsProcessor并注册为内置 logits processor,用于控制思考长度。<think>prompt 状态预计算,避免首步 GPU 扫描 prompt。Usage or Command
启动服务:
请求示例:
Benchmark
测试环境:
--logits-processorsThinkingBudgetLogitsProcessor(thinking_budget=20000,不触发预算逻辑)并发 64 / 500 请求
并发 1 / 50 请求
结论
Accuracy Tests
不涉及 kernel 或模型 forward 逻辑变更,暂无精度回归需求。
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.