Skip to content

[Feature] console print metrics add env#6413

Merged
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:developfrom
CSWYF3634076:log-metrics-v2
Feb 10, 2026
Merged

[Feature] console print metrics add env#6413
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:developfrom
CSWYF3634076:log-metrics-v2

Conversation

@CSWYF3634076
Copy link
Collaborator

@CSWYF3634076 CSWYF3634076 commented Feb 9, 2026

Motivation

  • 为控制台打印增加环境变量开关
  • 控制decode打印的频率
  • cuda_graph字段更精确

Modifications

增加两个环境变量,控制是否打印和打印频率
参考PR #6196 进行cuda_graph字段的确定

Usage or Command

no

Accuracy Tests

no

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Feb 9, 2026

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 为调度器(scheduler)在控制台输出的 metrics 日志增加环境变量开关与频率控制,并在 decode 日志中更准确地展示当前 step 是否实际使用了 CUDAGraph(结合 graph_opt_config 的不同模式)。

Changes:

  • 新增环境变量 FD_CONSOLE_SCHEDULER_METRICS 控制是否打印 scheduler prefill/decode batch 日志
  • 新增环境变量 FD_CONSOLE_DECODE_LOG_INTERVAL 控制 decode batch 日志打印间隔,并在 SchedulerMetricsLogger 内实现按间隔打印
  • ResourceManagerV1.schedule() 中更精确计算 decode 阶段日志里的 use_cudagraph 字段(参考 6196 的模式区分)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/engine/test_scheduler_metrics_logger.py 增加 decode 打印间隔与环境变量读取的单测覆盖
fastdeploy/envs.py 注册新增的两个环境变量并支持惰性读取
fastdeploy/engine/sched/scheduler_metrics_logger.py 增加 decode 日志按间隔打印逻辑,并从 env 读取间隔
fastdeploy/engine/sched/resource_manager_v1.py 增加日志总开关判断,并细化 decode 日志里 use_cudagraph 的判定逻辑

Comment on lines +960 to +964
if (
hasattr(self, "scheduler_metrics_logger")
and self.scheduler_metrics_logger is not None
and envs.FD_CONSOLE_SCHEDULER_METRICS
):
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议为新增的控制台日志开关/逻辑补充单元测试:当前修改在 schedule() 中新增了 envs.FD_CONSOLE_SCHEDULER_METRICS 的开关判断,并且后续对 use_cudagraph 的计算引入了多分支条件;但现有 ResourceManagerV1 相关测试未覆盖“开关为 0 时不应调用 logger”以及不同 graph_opt_config 组合下 use_cudagraph 取值是否符合预期。可以在 tests/v1/test_schedule_output.py 或新增用例里通过注入 mock 的 scheduler_metrics_logger 来断言 log_prefill_batch/log_decode_batch 的调用与参数。

Copilot generated this review using guidance from repository custom instructions.
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@5bfc093). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #6413   +/-   ##
==========================================
  Coverage           ?   67.93%           
==========================================
  Files              ?      391           
  Lines              ?    52558           
  Branches           ?     8202           
==========================================
  Hits               ?    35703           
  Misses             ?    14249           
  Partials           ?     2606           
Flag Coverage Δ
GPU 67.93% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 335ab70 into PaddlePaddle:develop Feb 10, 2026
26 of 29 checks passed
kesmeey pushed a commit to kesmeey/FastDeploy that referenced this pull request Feb 22, 2026
chang-wenbin pushed a commit to chang-wenbin/FastDeploy that referenced this pull request Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants