feat(orchestrator): add concurrent sandbox creation benchmark by arkamar · Pull Request #2280 · e2b-dev/infra

arkamar · 2026-04-01T14:33:32Z

This PR adds a benchmark to measure how many sandboxes can be effectively
resumed in parallel on a single node, to help tune
maxStartingInstancesPerNode for different machine types.

BenchmarkConcurrentResume launches N sandboxes simultaneously and
reports latency percentiles (P50/P95/P99) and wall-clock time
Latencies aggregated across all iterations for stable stats at high
iteration counts (e.g. -benchtime=100x)
Includes bench2md.awk helper to convert raw output to markdown tables

The current maxStartingInstancesPerNode limit of 3 was established from non-rigorous historical measurements on n1 machines. To find the optimal limit for n2 machines we need proper data. This adds BenchmarkConcurrentResume which launches N sandboxes simultaneously via barrier-synchronized goroutines and measures per-sandbox creation latency (avg, P50, P95, P99, min, max), wall-clock time, and success/failure counts. Concurrency levels default to 1,2,3,4,5,6,7,8,9,10 and can be overridden via the CONCURRENCY_LEVELS env var. Network and NBD pool sizes are scaled to the max concurrency level to avoid measuring pool starvation instead of actual creation overhead.

Replace cgroup.NewNoopManager() with the real cgroup.NewManager() so sandboxes get proper CPU and memory isolation via cgroupv2, matching production behavior.

Accumulate per-sandbox latencies and wall-clock times across all benchmark iterations instead of reporting only the last iteration. This makes percentiles (P50/P95/P99) statistically meaningful at high iteration counts (e.g. -benchtime=100x at concurrency-5 gives 500 latency samples instead of 5).

…rkdown table

cursor · 2026-04-01T14:33:41Z

PR Summary

Low Risk
Changes are limited to benchmarks and a helper script, with no impact on production code paths. Main risk is benchmark flakiness or misleading results due to concurrency coordination and environment-dependent setup.

Overview
Adds BenchmarkConcurrentResume, which runs batches of parallel ResumeSandbox calls across configurable concurrency levels and reports aggregated latency percentiles plus wall-clock time, and includes a small AWK helper to convert the emitted Go benchmark metrics into a Markdown table; it also updates the existing benchmark tracer name and fixes local FIRECRACKER_VERSIONS_DIR/HOST_ENVD_PATH relative paths for running from the benchmarks package.

^{Reviewed by Cursor Bugbot for commit d86b43c. Bugbot is set up for automated code reviews on this repo. Configure here.}

linear · 2026-04-01T14:39:32Z

ENG-3728 Experiment: How many sandboxes can be spawned concurrently (yet effectively) on a single node

- Use max/min builtins instead of if-chains in percentile() - Use WaitGroup.Go instead of manual Add/Done goroutine pattern - Replace sort.Slice/sort.Ints with slices.Sort - Add blank line before bare return (nlreturn) - Rename min/max variables to fastest/slowest to avoid shadowing builtins

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d98b658f8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

claude

LGTM — benchmark-only addition with no production code impact; the two nit-level issues are captured in inline comments.

Extended reasoning...

Overview

This PR adds BenchmarkConcurrentResume (a new _test.go file) and bench2md.awk (an auxiliary reporting script). No production code paths are modified.

Security risks

None. The benchmark requires root and is not part of any deployed artifact. The AWK script is a local post-processing helper.

Level of scrutiny

Low. This is developer tooling for tuning maxStartingInstancesPerNode. Correctness matters for drawing valid conclusions from the benchmark, but bugs here cannot affect production behaviour.

Other factors

My two prior inline comments (tracerProvider shutdown, misleading ns/op) were both reasonably dismissed by the author with valid explanations and marked resolved. The two new nits — a misleading docstring about pool-size scaling and throughput overcounting on failures — are flagged as inline comments. At the default concurrency levels (1–10) neither issue affects the benchmark results in practice. The overall design (barrier-based concurrent launch, aggregated percentiles, LIFO cleanup) is sound.

sitole

Nits, lgtm. Please resolve the comments from AI.

Use a more descriptive filename for the awk script that converts Go benchmark output to markdown tables.

Keep benchmark files in a dedicated subdirectory to separate them from the main orchestrator source. Update relative paths to account for the extra directory level.

arkamar added 4 commits April 1, 2026 16:12

feat(orchestrator): use real cgroup manager in concurrent benchmark

34dafd7

Replace cgroup.NewNoopManager() with the real cgroup.NewManager() so sandboxes get proper CPU and memory isolation via cgroupv2, matching production behavior.

feat(orchestrator): add awk script to convert benchmark results to ma…

b1775af

…rkdown table

e2b-request-same-site-reviewers bot assigned sitole Apr 1, 2026

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go

arkamar marked this pull request as ready for review April 1, 2026 15:15

arkamar requested review from ValentaTomas, dobrac and jakubno as code owners April 1, 2026 15:15

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go

claude bot reviewed Apr 1, 2026

View reviewed changes

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go

Comment thread packages/orchestrator/scripts/benchmark-to-markdown.awk

sitole reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/orchestrator/scripts/benchmark-to-markdown.awk

sitole reviewed Apr 2, 2026

View reviewed changes

Comment thread packages/orchestrator/benchmarks/concurrent_benchmark_test.go

sitole approved these changes Apr 2, 2026

View reviewed changes

arkamar and others added 4 commits April 7, 2026 09:35

chore(orchestrator): rename bench2md.awk to benchmark-to-markdown.awk

4e439fb

Use a more descriptive filename for the awk script that converts Go benchmark output to markdown tables.

refactor(orchestrator): move benchmark tests to benchmarks/ folder

e5d6ae0

Keep benchmark files in a dedicated subdirectory to separate them from the main orchestrator source. Update relative paths to account for the extra directory level.

Merge branch 'main' into experiment/sbx-concurency

82f6637

chore: auto-commit generated changes

d86b43c

arkamar merged commit 8cc230b into main Apr 7, 2026
36 checks passed

arkamar deleted the experiment/sbx-concurency branch April 7, 2026 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(orchestrator): add concurrent sandbox creation benchmark#2280

feat(orchestrator): add concurrent sandbox creation benchmark#2280
arkamar merged 9 commits intomainfrom
experiment/sbx-concurency

arkamar commented Apr 1, 2026

Uh oh!

cursor bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

linear bot commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sitole left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arkamar commented Apr 1, 2026

Uh oh!

cursor bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

linear bot commented Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sitole left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cursor bot commented Apr 1, 2026 •

edited

Loading