[CI] CPU kernel benchmark for ngram_match — DO NOT MERGE by cloudforge1 · Pull Request #7203 · PaddlePaddle/FastDeploy

cloudforge1 · 2026-04-04T18:06:23Z

Motivation

PR #7136 benchmarks the GPU ngram_match kernel but the "CPU path" column
only measures D2H/H2D tensor copy overhead, not the actual C++ kernel
computation. This makes the reported speedup (14×–1,700×) misleading — the
real GPU-vs-CPU-compute speedup is much more modest (~0.3×–5.8× per NKNaN's
profiling data).

This PR adds a standalone CPU benchmark that calls the production C++
kernel (ngram_match.cc / find_candidate_pred_tokens) with CPU-placed
tensors, using the same 5-group experiment dimensions so the numbers are
directly comparable.

⚠️ NOT FOR MERGE — this is a reference-data-only PR. The .cc file
is deleted in the GPU kernel branch; this benchmark exists on develop
where both .cc and .cu coexist.

Modifications

Added tests/spec_decode/test_benchmark_ngram_cpu.py (354 lines)
- 5 groups matching GPU benchmark dimensions (seq_len, batch_size, hit type, threshold, threshold×batch)
- 2 latency tests (standard + extreme) matching GPU benchmark configs
- Adaptive run counts (100–1000) to stay within 3-minute total runtime
- Uses paddle.CPUPlace() tensors → dispatches to .cc C++ kernel

Usage or Command

cd FastDeploy
python tests/spec_decode/test_benchmark_ngram_cpu.py

Accuracy Tests

Not applicable — benchmark-only, no functional changes.

Checklist

pre-commit hooks pass (black, isort, flake8, ruff)
Same API signature as GPU benchmark for 1:1 comparison
Adaptive run counts to avoid CI timeout (est. ~2.2 min total)
NOT FOR MERGE — reference data only

Provides the missing 'CPU compute' column for ngram_match benchmarks. The GPU PR (PaddlePaddle#7136) only measured D2H/H2D transfer overhead, not actual CPU computation. Uses the same 5-group experiment dimensions so results are directly comparable. NOT FOR MERGE — benchmark-only PR for reference data.

paddle-bot · 2026-04-04T18:06:31Z

Thanks for your contribution!

codecov-commenter · 2026-04-04T19:25:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@da3dfe1). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7203   +/-   ##
==========================================
  Coverage           ?   74.19%           
==========================================
  Files              ?      376           
  Lines              ?    52941           
  Branches           ?     8260           
==========================================
  Hits               ?    39279           
  Misses             ?    10910           
  Partials           ?     2752

Flag	Coverage Δ
GPU	`74.19% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cloudforge1 added 13 commits March 6, 2026 10:30

Merge remote-tracking branch 'upstream/develop' into develop

daf20d9

Merge remote-tracking branch 'upstream/develop' into develop

6f1e63c

Merge remote-tracking branch 'upstream/develop' into develop

4deb7a7

Merge remote-tracking branch 'upstream/develop' into develop

676daf6

Merge remote-tracking branch 'upstream/develop' into develop

9bcfdca

Merge remote-tracking branch 'upstream/develop' into develop

2bfa878

Merge remote-tracking branch 'upstream/develop' into develop

262c470

Merge remote-tracking branch 'upstream/develop' into develop

171b4d3

Merge remote-tracking branch 'upstream/develop' into develop

def0bd2

Merge remote-tracking branch 'upstream/develop' into develop

4fad5dc

Merge remote-tracking branch 'upstream/develop' into develop

99b9f88

Merge remote-tracking branch 'upstream/develop' into develop

8bc4081

cloudforge1 had a problem deploying to Metax_ci April 4, 2026 18:06 — with GitHub Actions Failure

paddle-bot bot added the contributor External developers label Apr 4, 2026

cloudforge1 changed the title ~~[CI] CPU baseline benchmark for ngram_match — DO NOT MERGE~~ [CI] CPU kernel benchmark for ngram_match — DO NOT MERGE Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] CPU kernel benchmark for ngram_match — DO NOT MERGE#7203

[CI] CPU kernel benchmark for ngram_match — DO NOT MERGE#7203
cloudforge1 wants to merge 13 commits intoPaddlePaddle:developfrom
CloudForge-Solutions:benchmark/049-ngram-cpu-nomerge

cloudforge1 commented Apr 4, 2026

Uh oh!

paddle-bot bot commented Apr 4, 2026

Uh oh!

codecov-commenter commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cloudforge1 commented Apr 4, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 4, 2026

Uh oh!

codecov-commenter commented Apr 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants