Skip to content

[CI] CPU kernel benchmark for ngram_match — DO NOT MERGE#7203

Draft
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:benchmark/049-ngram-cpu-nomerge
Draft

[CI] CPU kernel benchmark for ngram_match — DO NOT MERGE#7203
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:benchmark/049-ngram-cpu-nomerge

Conversation

@cloudforge1
Copy link
Copy Markdown
Contributor

Motivation

PR #7136 benchmarks the GPU ngram_match kernel but the "CPU path" column
only measures D2H/H2D tensor copy overhead, not the actual C++ kernel
computation. This makes the reported speedup (14×–1,700×) misleading — the
real GPU-vs-CPU-compute speedup is much more modest (~0.3×–5.8× per NKNaN's
profiling data).

This PR adds a standalone CPU benchmark that calls the production C++
kernel
(ngram_match.cc / find_candidate_pred_tokens) with CPU-placed
tensors, using the same 5-group experiment dimensions so the numbers are
directly comparable.

⚠️ NOT FOR MERGE — this is a reference-data-only PR. The .cc file
is deleted in the GPU kernel branch; this benchmark exists on develop
where both .cc and .cu coexist.

Modifications

  • Added tests/spec_decode/test_benchmark_ngram_cpu.py (354 lines)
    • 5 groups matching GPU benchmark dimensions (seq_len, batch_size, hit type, threshold, threshold×batch)
    • 2 latency tests (standard + extreme) matching GPU benchmark configs
    • Adaptive run counts (100–1000) to stay within 3-minute total runtime
    • Uses paddle.CPUPlace() tensors → dispatches to .cc C++ kernel

Usage or Command

cd FastDeploy
python tests/spec_decode/test_benchmark_ngram_cpu.py

Accuracy Tests

Not applicable — benchmark-only, no functional changes.

Checklist

  • pre-commit hooks pass (black, isort, flake8, ruff)
  • Same API signature as GPU benchmark for 1:1 comparison
  • Adaptive run counts to avoid CI timeout (est. ~2.2 min total)
  • NOT FOR MERGE — reference data only

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 4, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Apr 4, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@da3dfe1). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7203   +/-   ##
==========================================
  Coverage           ?   74.19%           
==========================================
  Files              ?      376           
  Lines              ?    52941           
  Branches           ?     8260           
==========================================
  Hits               ?    39279           
  Misses             ?    10910           
  Partials           ?     2752           
Flag Coverage Δ
GPU 74.19% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants