Skip to content

feat(orchestrator): add concurrent sandbox creation benchmark#2280

Merged
arkamar merged 9 commits intomainfrom
experiment/sbx-concurency
Apr 7, 2026
Merged

feat(orchestrator): add concurrent sandbox creation benchmark#2280
arkamar merged 9 commits intomainfrom
experiment/sbx-concurency

Conversation

@arkamar
Copy link
Copy Markdown
Contributor

@arkamar arkamar commented Apr 1, 2026

This PR adds a benchmark to measure how many sandboxes can be effectively
resumed in parallel on a single node, to help tune
maxStartingInstancesPerNode for different machine types.

  • BenchmarkConcurrentResume launches N sandboxes simultaneously and
    reports latency percentiles (P50/P95/P99) and wall-clock time
  • Latencies aggregated across all iterations for stable stats at high
    iteration counts (e.g. -benchtime=100x)
  • Includes bench2md.awk helper to convert raw output to markdown tables

arkamar added 4 commits April 1, 2026 16:12
The current maxStartingInstancesPerNode limit of 3 was established from
non-rigorous historical measurements on n1 machines. To find the optimal
limit for n2 machines we need proper data.

This adds BenchmarkConcurrentResume which launches N sandboxes
simultaneously via barrier-synchronized goroutines and measures
per-sandbox creation latency (avg, P50, P95, P99, min, max),
wall-clock time, and success/failure counts.

Concurrency levels default to 1,2,3,4,5,6,7,8,9,10 and can be overridden
via the CONCURRENCY_LEVELS env var. Network and NBD pool sizes are
scaled to the max concurrency level to avoid measuring pool starvation
instead of actual creation overhead.
Replace cgroup.NewNoopManager() with the real cgroup.NewManager() so
sandboxes get proper CPU and memory isolation via cgroupv2, matching
production behavior.
Accumulate per-sandbox latencies and wall-clock times across all
benchmark iterations instead of reporting only the last iteration.
This makes percentiles (P50/P95/P99) statistically meaningful at
high iteration counts (e.g. -benchtime=100x at concurrency-5 gives
500 latency samples instead of 5).
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 1, 2026

PR Summary

Low Risk
Changes are limited to benchmarks and a helper script, with no impact on production code paths. Main risk is benchmark flakiness or misleading results due to concurrency coordination and environment-dependent setup.

Overview
Adds BenchmarkConcurrentResume, which runs batches of parallel ResumeSandbox calls across configurable concurrency levels and reports aggregated latency percentiles plus wall-clock time, and includes a small AWK helper to convert the emitted Go benchmark metrics into a Markdown table; it also updates the existing benchmark tracer name and fixes local FIRECRACKER_VERSIONS_DIR/HOST_ENVD_PATH relative paths for running from the benchmarks package.

Reviewed by Cursor Bugbot for commit d86b43c. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go
@linear
Copy link
Copy Markdown

linear bot commented Apr 1, 2026

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go
Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go
- Use max/min builtins instead of if-chains in percentile()
- Use WaitGroup.Go instead of manual Add/Done goroutine pattern
- Replace sort.Slice/sort.Ints with slices.Sort
- Add blank line before bare return (nlreturn)
- Rename min/max variables to fastest/slowest to avoid shadowing builtins
@arkamar arkamar marked this pull request as ready for review April 1, 2026 15:15
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d98b658f8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — benchmark-only addition with no production code impact; the two nit-level issues are captured in inline comments.

Extended reasoning...

Overview

This PR adds BenchmarkConcurrentResume (a new _test.go file) and bench2md.awk (an auxiliary reporting script). No production code paths are modified.

Security risks

None. The benchmark requires root and is not part of any deployed artifact. The AWK script is a local post-processing helper.

Level of scrutiny

Low. This is developer tooling for tuning maxStartingInstancesPerNode. Correctness matters for drawing valid conclusions from the benchmark, but bugs here cannot affect production behaviour.

Other factors

My two prior inline comments (tracerProvider shutdown, misleading ns/op) were both reasonably dismissed by the author with valid explanations and marked resolved. The two new nits — a misleading docstring about pool-size scaling and throughput overcounting on failures — are flagged as inline comments. At the default concurrency levels (1–10) neither issue affects the benchmark results in practice. The overall design (barrier-based concurrent launch, aggregated percentiles, LIFO cleanup) is sound.

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go
Comment thread packages/orchestrator/scripts/benchmark-to-markdown.awk
Comment thread packages/orchestrator/scripts/benchmark-to-markdown.awk
Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go
Copy link
Copy Markdown
Member

@sitole sitole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits, lgtm. Please resolve the comments from AI.

arkamar and others added 4 commits April 7, 2026 09:35
Use a more descriptive filename for the awk script that converts
Go benchmark output to markdown tables.
Keep benchmark files in a dedicated subdirectory to separate them
from the main orchestrator source. Update relative paths to account
for the extra directory level.
@arkamar arkamar merged commit 8cc230b into main Apr 7, 2026
36 checks passed
@arkamar arkamar deleted the experiment/sbx-concurency branch April 7, 2026 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants