feat(bench): add release performance comparison workflow by acgetchell · Pull Request #143 · acgetchell/la-stack

acgetchell · 2026-06-04T18:39:50Z

Extend vs_linalg with LDLT/Cholesky benchmark rows and shared deterministic inputs.
Add smoke coverage that checks la-stack, nalgebra, and faer agree on benchmark inputs.
Expand bench-compare to support latest-vs-last reports, suite/scope selection, peer baseline context, and clearer malformed Criterion diagnostics.
Document the benchmark methodology, release baseline workflow, roadmap direction, and contributor guidance.

Summary by CodeRabbit

Documentation
- Added comprehensive contribution guidelines and AI-assisted workflow
- Updated README with feature flags and clearer usage guidance
- Expanded benchmarking and releasing docs with updated workflows and methodology
Tests
- Added deterministic smoke tests to validate results across math implementations
Chores / Tooling
- Improved benchmark comparison tooling and CLI (suite/scope/snapshot support, latest-vs-last)
- Enhanced benchmark report formatting and baseline workflows

- Extend vs_linalg with LDLT/Cholesky benchmark rows and shared deterministic inputs. - Add smoke coverage that checks la-stack, nalgebra, and faer agree on benchmark inputs. - Expand bench-compare to support latest-vs-last reports, suite/scope selection, peer baseline context, and clearer malformed Criterion diagnostics. - Document the benchmark methodology, release baseline workflow, roadmap direction, and contributor guidance.

coderabbitai · 2026-06-04T18:40:01Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1297bb48-9533-4bba-bb43-a777fbb851ea

📥 Commits

Reviewing files that changed from the base of the PR and between 53b5fde and 1222c93.

📒 Files selected for processing (1)

scripts/tests/test_bench_compare.py

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/tests/test_bench_compare.py

📝 Walkthrough

Walkthrough

Adds deterministic benchmark helpers and bench-gating tests, refactors vs_linalg benches toward SPD (LDLT/Cholesky) flows, and extends scripts/bench_compare to be suite/scope-aware with peer-context reporting; updates just recipes and benchmarking/release/contributing docs.

Changes

Benchmark Infrastructure and Suite Comparison

Layer / File(s)	Summary
Benchmark helpers and deterministic input generation `benches/common/vs_linalg.rs`	Shared functions compute faer permutation sign and determinants (LU/LDLT), and deterministically generate diagonal-dominant matrices and linear-offset vectors for reproducible benchmarks.
Smoke tests for vs_linalg cross-validation `tests/vs_linalg_inputs.rs`	Bench-gated integration tests deterministically construct small matrices and cross-check LU/LDLT factorization determinants and solve results across la_stack, nalgebra, and faer with tolerance-scaled componentwise comparisons.
vs_linalg benchmark refactoring: SPD-based comparisons `benches/vs_linalg.rs`	Macro redesigned from direct Criterion invocation to function-generating style; benchmark coverage shifted from LU-based to SPD-based (LDLT/Cholesky) factorization, solve, and precomputed-factorization sections.
Benchmark comparison data models and parsing `scripts/bench_compare.py`	Criterion `estimates.json` parsing hardened with type validation; introduced suite-aware `BenchResult`/`Comparison` models, `ReportSettings` metadata container, and vs_linalg-specific group/dimension extraction.
Suite-aware benchmark result collection and comparison `scripts/bench_compare.py`	Collection pipeline routes exact and vs_linalg benches separately; vs_linalg comparisons optionally capture baseline peer times (nalgebra/faer) and support release-signal vs all-benches scoping.
Suite-aware report generation and grouping `scripts/bench_compare.py`	Report generation groups results by suite with suite-specific headings; vs_linalg sections conditionally render baseline peer columns and use dimension-based group labels (D=N).
CLI argument parsing and main flow `scripts/bench_compare.py`	Extended CLI with `--snapshot`, `--suite` (all/exact/vs_linalg), and `--scope` (release-signal/all-benches); main() validates directories, routes collection, and constructs `ReportSettings`.
Benchmark comparison test coverage `scripts/tests/test_bench_compare.py`	Extended tests cover vs_linalg Criterion tree synthesis, estimate JSON error cases, vs_linalg collection with peer context, peer-column rendering, and CLI snapshot invocation.
Benchmark workflow recipes `justfile`	Added `bench-latest`, `bench-latest-vs-last`, `bench-save-last`, and `bench-vs-linalg-la-stack`; extended `bench-compare` and `bench-save-baseline` to accept suite/scope parameters.
Benchmarking and releasing documentation `docs/BENCHMARKING.md`, `docs/RELEASING.md`, `scripts/README.md`	Updated benchmarking methodology for vs_linalg deterministic inputs and peer comparisons; clarified release baseline saving (named baseline + "last" refresh); documented bench_compare CLI usage and output.
General documentation and roadmap updates `CONTRIBUTING.md`, `README.md`, `docs/roadmap.md`	Added CONTRIBUTING.md with contribution workflow; updated README with use-case checklist and feature-flags subsection; clarified API stability through v1.0 and added v0.4.3 tooling milestone.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Add first-class cross-release performance comparison tooling #138: PR implements cross-suite release-vs-current benchmark comparison with vs_linalg support, suite/scope flags, baseline management, documentation updates, and test coverage.

Possibly related PRs

acgetchell/la-stack#70: Overlapping bench_compare refactoring and report generation changes; this PR extends that work with vs_linalg suite/scope and peer-column logic.
acgetchell/la-stack#4: Introduced criterion_dim_plot/METRICS mapping used to drive the new vs_linalg suite ordering.
acgetchell/la-stack#127: Related API changes for fallible numeric operations referenced by updated benches/tests (det/dot/inf_norm unwrapping).

Suggested labels

documentation, enhancement, performance, testing, rust

Poem

🐰 I hop through benches, rows aligned,
deterministic seeds in kind,
helpers whisper, tests confirm,
reports assemble, graphs now firm.
hop — the comparisons sing, performance signed.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main objective: adding a release performance comparison workflow with extended benchmarking capabilities, which aligns with the substantial changes across benchmark files, documentation, and tooling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/138-release-performance-comparison

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-06-04T18:41:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.61%. Comparing base (4b792c4) to head (1222c93).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #143   +/-   ##
=======================================
  Coverage   99.61%   99.61%           
=======================================
  Files           5        5           
  Lines        2846     2846           
=======================================
  Hits         2835     2835           
  Misses         11       11

Flag	Coverage Δ
unittests	`99.61% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Use a literal regex pattern for the malformed Criterion JSON diagnostic so Windows paths with backslashes do not break pytest's match expression.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/bench_compare.py`:
- Around line 721-726: The hint generator _save_baseline_hint currently drops
the user's requested scope; modify _save_baseline_hint to accept a scope
argument (e.g., def _save_baseline_hint(suite: str, baseline: str, scope: str)
-> str) and include the scope flag in the returned command (append `--scope
{scope}` or only when scope is not the default) for all branches (the "exact",
"vs_linalg" and default cases) so the suggested `bench-save-baseline` command
preserves the original scope; apply the same change to the other similar helper
at the other occurrence mentioned (around the 749-751 block).

In `@scripts/tests/test_bench_compare.py`:
- Around line 153-158: The regex used in the pytest.raises matcher in
test_read_estimate_malformed_json_names_file is vulnerable because the Path
string {est} may contain regex metacharacters; update the test to escape the
path when building the match (e.g. use re.escape(str(est)) or escape only the
{est} portion) so the match is literal, and add the necessary import for re;
keep the assertion calling bench_compare._read_estimate unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5c930f6f-b3b5-407c-8c47-3e7e31324f1b

📥 Commits

Reviewing files that changed from the base of the PR and between 4b792c4 and 53b5fde.

📒 Files selected for processing (12)

CONTRIBUTING.md
README.md
benches/common/vs_linalg.rs
benches/vs_linalg.rs
docs/BENCHMARKING.md
docs/RELEASING.md
docs/roadmap.md
justfile
scripts/README.md
scripts/bench_compare.py
scripts/tests/test_bench_compare.py
tests/vs_linalg_inputs.rs

coderabbitai · 2026-06-04T18:49:58Z

+def _save_baseline_hint(suite: str, baseline: str) -> str:
+    if suite == "exact":
+        return f"just bench-save-baseline {baseline} exact"
+    if suite == "vs_linalg":
+        return f"just bench-save-baseline {baseline} vs_linalg"
+    return f"just bench-save-baseline {baseline}"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Preserve scope in the baseline recovery hint.

When the user runs with --scope all-benches, this fallback still suggests the default bench-save-baseline flow. That drops the requested scope and can leave the peer baseline rows missing on the rerun.

Also applies to: 749-751

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/bench_compare.py` around lines 721 - 726, The hint generator _save_baseline_hint currently drops the user's requested scope; modify _save_baseline_hint to accept a scope argument (e.g., def _save_baseline_hint(suite: str, baseline: str, scope: str) -> str) and include the scope flag in the returned command (append `--scope {scope}` or only when scope is not the default) for all branches (the "exact", "vs_linalg" and default cases) so the suggested `bench-save-baseline` command preserves the original scope; apply the same change to the other similar helper at the other occurrence mentioned (around the 749-751 block).

acgetchell enabled auto-merge June 4, 2026 18:41

fix(testing): escape path regex in benchmark parser test

1222c93

Use a literal regex pattern for the malformed Criterion JSON diagnostic so Windows paths with backslashes do not break pytest's match expression.

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

acgetchell merged commit ad660ba into main Jun 4, 2026
18 checks passed

acgetchell deleted the feat/138-release-performance-comparison branch June 4, 2026 18:53

This was referenced Jun 4, 2026

feat(bench): publish release benchmark baselines #144

Merged

feat(exact)!: make exact f64 conversions strict #149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): add release performance comparison workflow#143

feat(bench): add release performance comparison workflow#143
acgetchell merged 2 commits into
mainfrom
feat/138-release-performance-comparison

acgetchell commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acgetchell commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

acgetchell commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading