[Feature] console print metrics add env#6413
[Feature] console print metrics add env#6413Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
该 PR 为调度器(scheduler)在控制台输出的 metrics 日志增加环境变量开关与频率控制,并在 decode 日志中更准确地展示当前 step 是否实际使用了 CUDAGraph(结合 graph_opt_config 的不同模式)。
Changes:
- 新增环境变量
FD_CONSOLE_SCHEDULER_METRICS控制是否打印 scheduler prefill/decode batch 日志 - 新增环境变量
FD_CONSOLE_DECODE_LOG_INTERVAL控制 decode batch 日志打印间隔,并在SchedulerMetricsLogger内实现按间隔打印 - 在
ResourceManagerV1.schedule()中更精确计算 decode 阶段日志里的use_cudagraph字段(参考 6196 的模式区分)
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/engine/test_scheduler_metrics_logger.py | 增加 decode 打印间隔与环境变量读取的单测覆盖 |
| fastdeploy/envs.py | 注册新增的两个环境变量并支持惰性读取 |
| fastdeploy/engine/sched/scheduler_metrics_logger.py | 增加 decode 日志按间隔打印逻辑,并从 env 读取间隔 |
| fastdeploy/engine/sched/resource_manager_v1.py | 增加日志总开关判断,并细化 decode 日志里 use_cudagraph 的判定逻辑 |
| if ( | ||
| hasattr(self, "scheduler_metrics_logger") | ||
| and self.scheduler_metrics_logger is not None | ||
| and envs.FD_CONSOLE_SCHEDULER_METRICS | ||
| ): |
There was a problem hiding this comment.
建议为新增的控制台日志开关/逻辑补充单元测试:当前修改在 schedule() 中新增了 envs.FD_CONSOLE_SCHEDULER_METRICS 的开关判断,并且后续对 use_cudagraph 的计算引入了多分支条件;但现有 ResourceManagerV1 相关测试未覆盖“开关为 0 时不应调用 logger”以及不同 graph_opt_config 组合下 use_cudagraph 取值是否符合预期。可以在 tests/v1/test_schedule_output.py 或新增用例里通过注入 mock 的 scheduler_metrics_logger 来断言 log_prefill_batch/log_decode_batch 的调用与参数。
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #6413 +/- ##
==========================================
Coverage ? 67.93%
==========================================
Files ? 391
Lines ? 52558
Branches ? 8202
==========================================
Hits ? 35703
Misses ? 14249
Partials ? 2606
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
增加两个环境变量,控制是否打印和打印频率
参考PR #6196 进行cuda_graph字段的确定
Usage or Command
no
Accuracy Tests
no
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.