perf: bypass precompile cache (0% hit rate) by vbuilder69420 · Pull Request #903 · flashbots/rbuilder

vbuilder69420 · 2026-03-22T01:10:24Z

Summary

Bypass the PrecompileCache entirely — profiling shows 0 hits across 177,614 calls
The cache key includes gas_limit which varies per call, preventing any hits
Each miss incurs a mutex lock + Bytes clone + HashMap lookup + second mutex lock + LruCache insert

Motivation

Prometheus metrics from builder-lab profiling:

simulation_precompile_cache_hits 0
simulation_precompile_cache_misses 177614

The cache key is (SpecId, Bytes, u64) where the u64 is gas_limit. Since gas_limit varies per precompile call (it's the remaining gas, not a fixed value), identical precompile inputs with different remaining gas produce different cache keys, preventing hits entirely.

Benchmark Results

Tested with builder-lab, 100 TPS contender, 60s profiling window, stacked on top of the AccessListInspector removal (#902):

Metric	Before	After	Change
Block fill time (p50)	57.2ms	53.3ms	-6.8%
Block fill time (p95)	94.9ms	80.7ms	-15.0%
E2E latency (p50)	59ms	55ms	-6.8%
E2E latency (p95)	101ms	84ms	-16.8%

Future improvement

Rather than bypassing the cache, it could be made effective by removing gas_limit from the cache key — precompile results don't depend on the gas limit passed to them (they either succeed within their fixed gas cost or fail). This would require verifying that all precompile implementations are indeed gas-limit-independent.

Test plan

Blocks built and submitted correctly at 100 TPS
Confirmed 0% cache hit rate via Prometheus metrics
Profiling shows reduced overhead
Integration tests pass

🤖 Generated with Claude Code

…klist check Remove the AccessListInspector entirely from RBuilderEVMInspector. Replace the per-opcode blocklist tracking with a post-execution check against ResultAndState.state (EvmState = HashMap<Address, Account>), which already contains every address touched during EVM execution. The AccessListInspector called step() on every EVM opcode to build an access list, solely used to check addresses against the blocklist. Profiling showed this inspector overhead consumed ~52% of CPU time. The EVM execution result already contains the same information in its state diff, making the inspector entirely redundant. Changes: - order_commit.rs: Use create_evm() (NoOpInspector) when no used_state_tracer is needed. Check blocklist via res.state.keys() after execution instead of via access list. - evm_inspector.rs: Remove AccessListInspector from RBuilderEVMInspector. The inspector now only wraps the optional UsedStateEVMInspector (used by parallel builder / EVM caching). This optimization works regardless of whether a blocklist is configured. Benchmark (builder-lab, 100 TPS, seed=42, 60s profiling window): | Metric | Before | After | Change | |---------------------|----------|----------|--------| | Block fill p50 | 96.8ms | 58.9ms | -39% | | Block fill p95 | 129.2ms | 87.1ms | -33% | | E2E latency p50 | 98ms | 61ms | -38% | | E2E latency p95 | 134ms | 92ms | -31% | | Blocks submitted | 255 | 342 | +34% | | Txs included | 17,882 | 23,449 | +31% | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Skip the precompile cache lookup and insertion entirely. Profiling shows the cache has 0 hits across 177,614 calls — the cache key includes gas_limit which varies per call even for identical precompile inputs, preventing any cache hits. Each cache miss incurs: - Mutex lock/unlock (parking_lot::Mutex) - Bytes clone of the precompile input for the cache key - HashMap lookup - A second mutex lock + LruCache insert on miss This overhead is pure waste with 0% hit rate. Benchmark (builder-lab, 100 TPS, 60s profiling, stacked on AccessListInspector removal): | Metric | Before | After | Change | |---------------------|----------|----------|--------| | Block fill p50 | 57.2ms | 53.3ms | -6.8% | | Block fill p95 | 94.9ms | 80.7ms | -15.0% | | E2E latency p50 | 59ms | 55ms | -6.8% | | E2E latency p95 | 101ms | 84ms | -16.8% | Note: the precompile cache could be made effective by removing gas_limit from the cache key (precompile results don't depend on the gas limit passed to them — they either succeed within their gas budget or fail). This PR takes the simpler approach of bypassing it entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dmarzzz · 2026-03-22T01:11:50Z

we should not merge this lol

vbuilder69420 · 2026-03-22T02:37:48Z

Reopening — tested with complex DeFi workload (AMM swaps, lending, oracle updates, liquidations via shared economy contender scenario). Still 0 hits across 25,573 precompile calls. The gas_limit in the cache key prevents any hits regardless of workload complexity. The cache is pure overhead.

vbuilder69420 and others added 2 commits March 21, 2026 22:55

vbuilder69420 requested review from ZanCorDX and dvush as code owners March 22, 2026 01:10

dmarzzz closed this Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bypass precompile cache (0% hit rate)#903

perf: bypass precompile cache (0% hit rate)#903
vbuilder69420 wants to merge 2 commits intoflashbots:developfrom
vbuilder69420:perf/bypass-precompile-cache

vbuilder69420 commented Mar 22, 2026

Uh oh!

dmarzzz commented Mar 22, 2026 •

edited

Loading

Uh oh!

vbuilder69420 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbuilder69420 commented Mar 22, 2026

Summary

Motivation

Benchmark Results

Future improvement

Test plan

Uh oh!

dmarzzz commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbuilder69420 commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dmarzzz commented Mar 22, 2026 •

edited

Loading