Skip to content

feat(bench): add release performance comparison workflow#143

Merged
acgetchell merged 2 commits into
mainfrom
feat/138-release-performance-comparison
Jun 4, 2026
Merged

feat(bench): add release performance comparison workflow#143
acgetchell merged 2 commits into
mainfrom
feat/138-release-performance-comparison

Conversation

@acgetchell
Copy link
Copy Markdown
Owner

@acgetchell acgetchell commented Jun 4, 2026

  • Extend vs_linalg with LDLT/Cholesky benchmark rows and shared deterministic inputs.
  • Add smoke coverage that checks la-stack, nalgebra, and faer agree on benchmark inputs.
  • Expand bench-compare to support latest-vs-last reports, suite/scope selection, peer baseline context, and clearer malformed Criterion diagnostics.
  • Document the benchmark methodology, release baseline workflow, roadmap direction, and contributor guidance.

Summary by CodeRabbit

  • Documentation

    • Added comprehensive contribution guidelines and AI-assisted workflow
    • Updated README with feature flags and clearer usage guidance
    • Expanded benchmarking and releasing docs with updated workflows and methodology
  • Tests

    • Added deterministic smoke tests to validate results across math implementations
  • Chores / Tooling

    • Improved benchmark comparison tooling and CLI (suite/scope/snapshot support, latest-vs-last)
    • Enhanced benchmark report formatting and baseline workflows

- Extend vs_linalg with LDLT/Cholesky benchmark rows and shared deterministic inputs.
- Add smoke coverage that checks la-stack, nalgebra, and faer agree on benchmark inputs.
- Expand bench-compare to support latest-vs-last reports, suite/scope selection, peer baseline context, and clearer malformed Criterion diagnostics.
- Document the benchmark methodology, release baseline workflow, roadmap direction, and contributor guidance.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1297bb48-9533-4bba-bb43-a777fbb851ea

📥 Commits

Reviewing files that changed from the base of the PR and between 53b5fde and 1222c93.

📒 Files selected for processing (1)
  • scripts/tests/test_bench_compare.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/tests/test_bench_compare.py

📝 Walkthrough

Walkthrough

Adds deterministic benchmark helpers and bench-gating tests, refactors vs_linalg benches toward SPD (LDLT/Cholesky) flows, and extends scripts/bench_compare to be suite/scope-aware with peer-context reporting; updates just recipes and benchmarking/release/contributing docs.

Changes

Benchmark Infrastructure and Suite Comparison

Layer / File(s) Summary
Benchmark helpers and deterministic input generation
benches/common/vs_linalg.rs
Shared functions compute faer permutation sign and determinants (LU/LDLT), and deterministically generate diagonal-dominant matrices and linear-offset vectors for reproducible benchmarks.
Smoke tests for vs_linalg cross-validation
tests/vs_linalg_inputs.rs
Bench-gated integration tests deterministically construct small matrices and cross-check LU/LDLT factorization determinants and solve results across la_stack, nalgebra, and faer with tolerance-scaled componentwise comparisons.
vs_linalg benchmark refactoring: SPD-based comparisons
benches/vs_linalg.rs
Macro redesigned from direct Criterion invocation to function-generating style; benchmark coverage shifted from LU-based to SPD-based (LDLT/Cholesky) factorization, solve, and precomputed-factorization sections.
Benchmark comparison data models and parsing
scripts/bench_compare.py
Criterion estimates.json parsing hardened with type validation; introduced suite-aware BenchResult/Comparison models, ReportSettings metadata container, and vs_linalg-specific group/dimension extraction.
Suite-aware benchmark result collection and comparison
scripts/bench_compare.py
Collection pipeline routes exact and vs_linalg benches separately; vs_linalg comparisons optionally capture baseline peer times (nalgebra/faer) and support release-signal vs all-benches scoping.
Suite-aware report generation and grouping
scripts/bench_compare.py
Report generation groups results by suite with suite-specific headings; vs_linalg sections conditionally render baseline peer columns and use dimension-based group labels (D=N).
CLI argument parsing and main flow
scripts/bench_compare.py
Extended CLI with --snapshot, --suite (all/exact/vs_linalg), and --scope (release-signal/all-benches); main() validates directories, routes collection, and constructs ReportSettings.
Benchmark comparison test coverage
scripts/tests/test_bench_compare.py
Extended tests cover vs_linalg Criterion tree synthesis, estimate JSON error cases, vs_linalg collection with peer context, peer-column rendering, and CLI snapshot invocation.
Benchmark workflow recipes
justfile
Added bench-latest, bench-latest-vs-last, bench-save-last, and bench-vs-linalg-la-stack; extended bench-compare and bench-save-baseline to accept suite/scope parameters.
Benchmarking and releasing documentation
docs/BENCHMARKING.md, docs/RELEASING.md, scripts/README.md
Updated benchmarking methodology for vs_linalg deterministic inputs and peer comparisons; clarified release baseline saving (named baseline + "last" refresh); documented bench_compare CLI usage and output.
General documentation and roadmap updates
CONTRIBUTING.md, README.md, docs/roadmap.md
Added CONTRIBUTING.md with contribution workflow; updated README with use-case checklist and feature-flags subsection; clarified API stability through v1.0 and added v0.4.3 tooling milestone.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • acgetchell/la-stack#70: Overlapping bench_compare refactoring and report generation changes; this PR extends that work with vs_linalg suite/scope and peer-column logic.
  • acgetchell/la-stack#4: Introduced criterion_dim_plot/METRICS mapping used to drive the new vs_linalg suite ordering.
  • acgetchell/la-stack#127: Related API changes for fallible numeric operations referenced by updated benches/tests (det/dot/inf_norm unwrapping).

Suggested labels

documentation, enhancement, performance, testing, rust

Poem

🐰 I hop through benches, rows aligned,
deterministic seeds in kind,
helpers whisper, tests confirm,
reports assemble, graphs now firm.
hop — the comparisons sing, performance signed.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main objective: adding a release performance comparison workflow with extended benchmarking capabilities, which aligns with the substantial changes across benchmark files, documentation, and tooling.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/138-release-performance-comparison

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.61%. Comparing base (4b792c4) to head (1222c93).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #143   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files           5        5           
  Lines        2846     2846           
=======================================
  Hits         2835     2835           
  Misses         11       11           
Flag Coverage Δ
unittests 99.61% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@acgetchell acgetchell enabled auto-merge June 4, 2026 18:41
Use a literal regex pattern for the malformed Criterion JSON diagnostic so
Windows paths with backslashes do not break pytest's match expression.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/bench_compare.py`:
- Around line 721-726: The hint generator _save_baseline_hint currently drops
the user's requested scope; modify _save_baseline_hint to accept a scope
argument (e.g., def _save_baseline_hint(suite: str, baseline: str, scope: str)
-> str) and include the scope flag in the returned command (append `--scope
{scope}` or only when scope is not the default) for all branches (the "exact",
"vs_linalg" and default cases) so the suggested `bench-save-baseline` command
preserves the original scope; apply the same change to the other similar helper
at the other occurrence mentioned (around the 749-751 block).

In `@scripts/tests/test_bench_compare.py`:
- Around line 153-158: The regex used in the pytest.raises matcher in
test_read_estimate_malformed_json_names_file is vulnerable because the Path
string {est} may contain regex metacharacters; update the test to escape the
path when building the match (e.g. use re.escape(str(est)) or escape only the
{est} portion) so the match is literal, and add the necessary import for re;
keep the assertion calling bench_compare._read_estimate unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c930f6f-b3b5-407c-8c47-3e7e31324f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 4b792c4 and 53b5fde.

📒 Files selected for processing (12)
  • CONTRIBUTING.md
  • README.md
  • benches/common/vs_linalg.rs
  • benches/vs_linalg.rs
  • docs/BENCHMARKING.md
  • docs/RELEASING.md
  • docs/roadmap.md
  • justfile
  • scripts/README.md
  • scripts/bench_compare.py
  • scripts/tests/test_bench_compare.py
  • tests/vs_linalg_inputs.rs

Comment thread scripts/bench_compare.py
Comment on lines +721 to +726
def _save_baseline_hint(suite: str, baseline: str) -> str:
if suite == "exact":
return f"just bench-save-baseline {baseline} exact"
if suite == "vs_linalg":
return f"just bench-save-baseline {baseline} vs_linalg"
return f"just bench-save-baseline {baseline}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve scope in the baseline recovery hint.

When the user runs with --scope all-benches, this fallback still suggests the default bench-save-baseline flow. That drops the requested scope and can leave the peer baseline rows missing on the rerun.

Also applies to: 749-751

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/bench_compare.py` around lines 721 - 726, The hint generator
_save_baseline_hint currently drops the user's requested scope; modify
_save_baseline_hint to accept a scope argument (e.g., def
_save_baseline_hint(suite: str, baseline: str, scope: str) -> str) and include
the scope flag in the returned command (append `--scope {scope}` or only when
scope is not the default) for all branches (the "exact", "vs_linalg" and default
cases) so the suggested `bench-save-baseline` command preserves the original
scope; apply the same change to the other similar helper at the other occurrence
mentioned (around the 749-751 block).

Comment thread scripts/tests/test_bench_compare.py
@acgetchell acgetchell merged commit ad660ba into main Jun 4, 2026
18 checks passed
@acgetchell acgetchell deleted the feat/138-release-performance-comparison branch June 4, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant