[Feature] Support ThinkingBudget Logits processor to control thinking content length #6367

jackyYang6 · 2026-02-05T09:55:18Z

Motivation

引入 ThinkingBudget logits processor 用于限制段长度，同时补充中英文使用文档与单测。

Modifications

新增 ThinkingBudgetLogitsProcessor 并注册为内置 logits processor，用于控制思考长度。
DataProcessor 阶段进行 <think> prompt 状态预计算，避免首步 GPU 扫描 prompt。
打通 Engine/SamplingParams 的参数透传链路。
新增 thinking budget 行为单测。
新增中英文功能文档并更新 mkdocs 导航。

Usage or Command

启动服务：

python -m fastdeploy.entrypoints.openai.api_server \
  --model Qwen/Qwen3-0.6B \
  --port 8180 \
  --metrics-port 8181 \
  --engine-worker-queue-port 8182 \
  --max-model-len 32768 \
  --max-num-seqs 32 \
  --logits-processors ThinkingBudgetLogitsProcessor

请求示例：

curl -X POST "http://0.0.0.0:8180/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "你好！"}],
    "max_completion_tokens": 30,
    "logits_processors_args": {
      "thinking_budget": 20,
      "think_stop_sentence": "思考已达上限，开始回复"
    }
  }'

Benchmark

测试环境：

GPU：A30 24GB（单卡）
模型：Qwen3-0.6B
数据集: filtered_sharedgpt_2000_input_1136_output_200_fd.json
Baseline：不启用 --logits-processors
ThinkingBudget：启用 ThinkingBudgetLogitsProcessor（thinking_budget=20000，不触发预算逻辑）

并发 64 / 500 请求

指标	Baseline	ThinkingBudget	变化
Output throughput (tok/s)	1947.53	1737.76	-10.77%
Mean Decode (tok/s)	45.64	44.50	-2.50%
Mean TTFT (ms)	279.41	286.99	+2.71%
Mean TPOT (ms)	21.99	22.56	+2.59%

并发 1 / 50 请求

指标	Baseline	ThinkingBudget	变化
Output throughput (tok/s)	212.18	202.43	-4.59%
Mean Decode (tok/s)	216.38	205.87	-4.86%
Mean TTFT (ms)	53.85	55.37	+2.82%
Mean TPOT (ms)	4.63	4.86	+4.97%

结论

最新单并发结果显示，TTFT 回退约 +2.8%，TPOT/ITL 回退约 +5%，属于可接受但有可优化空间。
高并发下吞吐仍有明显回退（Output/Total tok/s 约 -11%~-13%），这是当前方案的主要性能代价。

Accuracy Tests

不涉及 kernel 或模型 forward 逻辑变更，暂无精度回归需求。

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-02-05T09:55:25Z

Thanks for your contribution!

codecov-commenter · 2026-02-05T14:51:18Z

Codecov Report

❌ Patch coverage is 84.73118% with 71 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@fd56d85). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...model_executor/logits_processor/thinking_budget.py	84.61%	14 Missing and 22 partials ⚠️
fastdeploy/input/v1/text_processor.py	81.52%	8 Missing and 9 partials ⚠️
fastdeploy/engine/common_engine.py	65.00%	2 Missing and 5 partials ⚠️
fastdeploy/engine/engine.py	65.00%	2 Missing and 5 partials ⚠️
fastdeploy/input/text_processor.py	95.83%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6367   +/-   ##
==========================================
  Coverage           ?   68.18%           
==========================================
  Files              ?      392           
  Lines              ?    52907           
  Branches           ?     8300           
==========================================
  Hits               ?    36076           
  Misses             ?    14179           
  Partials           ?     2652

Flag	Coverage Δ
GPU	`68.18% <84.73%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

该 PR 引入 ThinkingBudgetLogitsProcessor，用于在解码阶段对 <think>...</think> 思考段生成长度做硬性限制，并打通从 DataProcessor → Engine → Worker → ModelConfig 的相关参数链路，同时补充中英文文档与单测。

Changes:

新增 ThinkingBudgetLogitsProcessor 并注册为内置 logits processor，支持预算触发后强制换行与 </think>（可选强制 stop sentence）。
在 text_processor / v1/text_processor 侧预计算 prompt 的 <think> 状态，并补充 think_stop_sentence → token ids 的处理逻辑。
Engine 启动 worker 时补齐 think_start_id / line_break_id 获取与透传，并新增文档与大规模单测覆盖。

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/model_executor/test_thinking_budget.py	覆盖 thinking budget 的关键状态机与 DataProcessor/Engine 相关分支的单测
mkdocs.yml	将 Thinking Budget 文档加入 mkdocs 导航
fastdeploy/worker/worker_process.py	worker 参数解析新增 `--think_start_id`
fastdeploy/model_executor/logits_processor/thinking_budget.py	新增 ThinkingBudgetLogitsProcessor 的核心实现
fastdeploy/model_executor/logits_processor/init.py	注册并导出 ThinkingBudgetLogitsProcessor
fastdeploy/input/v1/text_processor.py	v1 侧增加 prompt thinking 状态预计算与 stop sentence tokenization（并新增编码缓存）
fastdeploy/input/text_processor.py	非 v1 侧增加同等能力（并新增编码缓存与 get_eos_token_id import fallback）
fastdeploy/engine/engine.py	获取并透传 `think_start_id`，补齐 `line_break_id` 获取逻辑
fastdeploy/engine/common_engine.py	同 engine.py 的 worker 启动透传与 `line_break_id` 处理
fastdeploy/config.py	ModelConfig 增加 `think_start_id` 字段承接 worker 入参
docs/zh/features/thinking_budget.md	中文功能文档
docs/features/thinking_budget.md	英文功能文档

fastdeploy/input/text_processor.py

fastdeploy/input/v1/text_processor.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jackyYang6 had a problem deploying to Metax_ci February 5, 2026 09:55 — with GitHub Actions Failure

EmmonsCurse temporarily deployed to Metax_ci February 5, 2026 10:09 — with GitHub Actions Inactive

jackyYang6 force-pushed the feat/thinking-budget branch from adc6dbd to 6934693 Compare February 5, 2026 11:40

jackyYang6 had a problem deploying to Metax_ci February 5, 2026 11:40 — with GitHub Actions Error

jackyYang6 temporarily deployed to Metax_ci February 5, 2026 11:40 — with GitHub Actions Inactive

jackyYang6 force-pushed the feat/thinking-budget branch from 1c8ee27 to f3741ce Compare February 5, 2026 13:12

jackyYang6 had a problem deploying to Metax_ci February 5, 2026 13:12 — with GitHub Actions Error

jackyYang6 temporarily deployed to Metax_ci February 5, 2026 13:13 — with GitHub Actions Inactive

jackyYang6 force-pushed the feat/thinking-budget branch from 466ad5d to 6f185d2 Compare February 5, 2026 17:29

jackyYang6 had a problem deploying to Metax_ci February 5, 2026 17:29 — with GitHub Actions Error

jackyYang6 force-pushed the feat/thinking-budget branch from 2e0d651 to 6f7ab95 Compare February 5, 2026 17:35

jackyYang6 had a problem deploying to Metax_ci February 5, 2026 17:35 — with GitHub Actions Error

jackyYang6 temporarily deployed to Metax_ci February 5, 2026 17:36 — with GitHub Actions Inactive

jackyYang6 added 4 commits February 9, 2026 12:01

feat: add thinking budget logits processor

1797d9a

add unittest

0124f9d

fix pre-commit

2f77572

add unittest

c2225f5

jackyYang6 force-pushed the feat/thinking-budget branch from dba8598 to c2225f5 Compare February 9, 2026 04:01

jackyYang6 temporarily deployed to Metax_ci February 9, 2026 04:02 — with GitHub Actions Inactive

Jiang-Jia-Jun requested a review from Copilot February 9, 2026 07:01

Copilot started reviewing on behalf of Jiang-Jia-Jun February 9, 2026 07:01 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

fastdeploy/input/text_processor.py Outdated Show resolved Hide resolved

fastdeploy/input/text_processor.py Outdated Show resolved Hide resolved

fastdeploy/input/v1/text_processor.py Outdated Show resolved Hide resolved

Apply suggestions from code review

86ea9e4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

jackyYang6 had a problem deploying to Metax_ci February 9, 2026 07:47 — with GitHub Actions Error

Merge branch 'develop' into feat/thinking-budget

dce3c5c

jackyYang6 had a problem deploying to Metax_ci February 9, 2026 07:47 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support ThinkingBudget Logits processor to control thinking content length #6367

[Feature] Support ThinkingBudget Logits processor to control thinking content length #6367

jackyYang6 commented Feb 5, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Feb 5, 2026

Uh oh!

codecov-commenter commented Feb 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] Support ThinkingBudget Logits processor to control thinking content length #6367

Are you sure you want to change the base?

[Feature] Support ThinkingBudget Logits processor to control thinking content length #6367

Conversation

jackyYang6 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Benchmark

并发 64 / 500 请求

并发 1 / 50 请求

结论

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 5, 2026

Uh oh!

codecov-commenter commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jackyYang6 commented Feb 5, 2026 •

edited

Loading

codecov-commenter commented Feb 5, 2026 •

edited

Loading