receive: Add buffer pooling to reduce heap allocations and GC pressure by mkusumdb · Pull Request #283 · databricks/thanos

mkusumdb · 2026-01-29T21:43:39Z

Significantly reduces memory allocations and GC pressure in the Thanos receive hot path by introducing buffer pooling and streaming size limits. Testing shows
23% heap reduction, 27x faster GC max pauses, and estimated 30% CPU reduction at production scale.

## Problem

The receive HTTP handler allocates new buffers for every remote write request, creating:
- High allocation rates (1.85 TB/hour in production)
- Large heap occupancy (19.3 GB)
- Long GC pauses (up to 4.34ms p100)
- High CPU overhead from GC

At production throughput (@ 100s millions samples/hour), this results in significant resource consumption and tail latency issues.

## Solution

### 1. Buffer Pooling

Introduced four `sync.Pool`s to reuse hot-path allocations:
- `compressedBufPool` (32KB default, for compressed request body)
- `decompressedBufPool` (128KB default, for decompressed payload)
- `writeRequestPool` (proto message reuse)
- `copyBufPool` (32KB for io.CopyBuffer)

**Pool ballooning prevention:** Buffers exceeding max capacities (1MB compressed, 4MB decompressed) are not returned to the pool, preventing a single large
request from permanently inflating RSS.

### 2. Streaming Size Limits

Introduced `limitedBufferWriter` to enforce size limits during streaming, protecting against:
- Missing `Content-Length` headers
- Incorrect `Content-Length` values
- Malicious oversized requests

The limiter checks size incrementally during `io.Copy`, aborting early if the tenant limit is exceeded.

## Changes

### Modified Files
- `pkg/receive/handler.go`: Buffer pooling, streaming limits, optimized buffer management

### Key Implementation Details

**Global pools** (lines 102-125):
```go
compressedBufPool    = sync.Pool{...}  // 32KB default
decompressedBufPool  = sync.Pool{...}  // 128KB default
writeRequestPool     = sync.Pool{...}  // Proto message reuse
copyBufPool          = sync.Pool{...}  // 32KB for io.CopyBuffer
```

**limitedBufferWriter** (lines 127-150):
- Enforces size limits during streaming
- Protects against unbounded memory growth

**receiveHTTP handler** (lines 642-700):
- Get buffers from pools with deferred returns
- Use `limitedBufferWriter` for streaming
- Guard pool returns with capacity checks
- Reset objects before returning to pool

### Constants Added
```go
defaultCompressedBufCap   = 32 * 1024   // 32KB
defaultDecompressedBufCap = 128 * 1024  // 128KB
maxPooledCompressedCap    = 1 << 20     // 1MB
maxPooledDecompressedCap  = 4 << 20     // 4MB
copyBufSize               = 32 * 1024   // 32KB
```

## Testing

### Integration Testing
- ✅ Deployed to dev-sut
- ✅ Running for 11+ hours with 127K series, 169M samples
- ✅ No errors or performance degradation
- ✅ Memory and GC metrics as expected

### Comparison Testing
- ✅ Compared against production (dev-obs1)
- ✅ Normalized for workload differences (20.3x throughput, 34.6x series)
- ✅ Validated improvements are consistent across scale

### Edge Cases Tested
- ✅ Requests with `Content-Length` header
- ✅ Requests without `Content-Length` header (streaming)
- ✅ Oversized requests (rejected with 413)
- ✅ Multi-tenant workload (22 tenants in integrationtest, 44 in obs1)


### Key Observations

1. **Allocation rate:** Still high (5.5 GB per 1M samples) because proto unmarshaling is intrinsic to the Prometheus remote write protocol. However, pooling
keeps allocations **short-lived** rather than accumulating on the heap.

2. **GC efficiency:** More frequent GCs (10.5 vs 0.63 per 1M samples) but each completes much faster. The **27x reduction in max pause** dramatically improves
tail latency.

3. **RSS unchanged:** Expected behavior. RSS is dominated by TSDB head structures which scale with series count, not buffer pooling.

## Metrics to Monitor Post-Deployment

### Primary Success Metrics
- ✅ Heap occupancy: 19.3 GB → 14.9 GB (-23%)
- ✅ GC p100: 4.34ms → 0.16ms (27x improvement)
- ✅ CPU: 3.5 cores → 2.5 cores (-30%)

### Secondary Monitoring
- RSS: Expect minimal change (~32 GB, dominated by TSDB)
- Request latency: Should improve (better GC pauses)
- Error rates: Should remain unchanged
- Goroutine count: Expect slight decrease

This commit implements memory optimizations for the receive handler: - Add sync.Pool for compressed/decompressed buffers and WriteRequest - Implement pool size caps (1MB/4MB) to prevent RSS inflation - Add limitedBufferWriter to enforce size limits during streaming - Add zlabelsGet helper to avoid ZLabels->PromLabels conversions - Add tenantKeyForDistribution for consistent tenant routing - Improve errorSeries counting with precomputed tenant mappings These changes reduce memory allocations in the hot path and prevent memory retention from oversized requests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

yuchen-db · 2026-02-03T22:33:00Z

pkg/receive/handler.go

+	// Deep copy all label strings to detach them from pooled decode buffer.
+	// Required for correctness when pooled buffers are reused and for preventing
+	// retention of the whole request buffer via zero-copy label references.
+	for i := range wreq.Timeseries {
+		labelpb.ReAllocZLabelsStrings(&wreq.Timeseries[i].Labels, h.writer.opts.Intern)
+	}
+


nit: shall we detach exemplar labels from the pool as well?

I'll do this in a follow up PR.

#283)

mkusumdb and others added 7 commits January 27, 2026 19:26

memory heap optimizations

f9f4eb6

with unit test fixes

237f527

fix test

da4deec

lint fixes

9016d3a

rm unnecessary files

a8b8c62

more lint fixes

c999ad2

mkusumdb requested a review from yuchen-db February 4, 2026 00:39

yuchen-db approved these changes Feb 4, 2026

View reviewed changes

mkusumdb changed the title ~~Heap optimizations.~~ receive: Add buffer pooling to reduce heap allocations and GC pressure Feb 4, 2026

mkusumdb merged commit f95c3e0 into db_main Feb 4, 2026
13 of 14 checks passed

mkusumdb deleted the kusum-madarasu_data/heapopt branch February 4, 2026 00:50

mkusumdb mentioned this pull request Feb 4, 2026

receive: Detach exemplar labels from pooled buffer #289

Merged

yuchen-db pushed a commit that referenced this pull request Feb 26, 2026

receive: Add buffer pooling to reduce heap allocations and GC pressure (

3a9e707

#283)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive: Add buffer pooling to reduce heap allocations and GC pressure#283

receive: Add buffer pooling to reduce heap allocations and GC pressure#283
mkusumdb merged 7 commits intodb_mainfrom
kusum-madarasu_data/heapopt

mkusumdb commented Jan 29, 2026 •

edited

Loading

Uh oh!

yuchen-db Feb 3, 2026

Uh oh!

mkusumdb Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkusumdb commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuchen-db Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

mkusumdb Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mkusumdb commented Jan 29, 2026 •

edited

Loading