feat(heap-profiling): add initial heap profiling primitives [PROF-15190]#2166
feat(heap-profiling): add initial heap profiling primitives [PROF-15190]#2166scottgerring wants to merge 16 commits into
Conversation
|
b6aab83 to
72d9f77
Compare
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
72d9f77 to
fc5c074
Compare
🔒 Cargo Deny Results📦
|
b80ec4a to
11cf066
Compare
11cf066 to
f40cfdf
Compare
6fe0523 to
d10a58f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be77a1cb93
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ad0beee to
06d136f
Compare
06d136f to
d46e2c4
Compare
There was a problem hiding this comment.
Pull request overview
Adds the first set of heap-profiling building blocks to libdatadog: a Linux-focused sampler that emits ddheap:* USDT probes, plus Rust and runtime-injection frontends (GlobalAlloc wrapper + GOT interposition) and CI to keep the checked-in bindgen artifacts in sync.
Changes:
- Introduces
libdd-heap-sampler(C implementation + checked-in bindgen outputs) with vendoredusdt.hand a workflow to verify regenerated bindings match the committed artifacts. - Adds
libdd-heap-allocator(RustGlobalAllocwrapper) andlibdd-heap-gotter(ELF GOT-patching injector) with demos and tests. - Adds
libdd-heap-gotter-ffito expose install/update/restore over a small C ABI for use by other runtimes.
Reviewed changes
Copilot reviewed 42 out of 47 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/docker/Dockerfile.build | Adds new heap crates to Docker build caching/stubs. |
| NOTICE | Notes bundled usdt.h license attribution. |
| Cargo.toml | Adds new heap crates to workspace members. |
| Cargo.lock | Records new crate deps (incl. bindgen split versions). |
| .github/workflows/verify-heap-sampler-bindings.yml | CI job to regenerate/verify checked-in bindgen artifacts. |
| .github/CODEOWNERS | Assigns heap crates to profiling team. |
| libdd-heap-sampler/Cargo.toml | New sampler crate metadata + build deps. |
| libdd-heap-sampler/build.rs | Builds C sources on Linux; optional bindgen regen via env var. |
| libdd-heap-sampler/README.md | High-level design and binding-regeneration instructions. |
| libdd-heap-sampler/vendor/usdt.h | Vendored libbpf/usdt single-header implementation. |
| libdd-heap-sampler/vendor/README.md | Documents vendoring source/license and refresh procedure. |
| libdd-heap-sampler/src/lib.rs | Rust façade over generated bindings + basic layout/smoke tests. |
| libdd-heap-sampler/src/generated/bindings.rs | Checked-in bindgen Rust declarations. |
| libdd-heap-sampler/src/generated/dd_heap_sampler_static_wrappers.c | Checked-in bindgen static-inline wrappers. |
| libdd-heap-sampler/src/tl_state.c | TLS state init/seed logic for per-thread sampling. |
| libdd-heap-sampler/src/sample_flag.c | Arch-specific sampled-pointer marking (x86 header, arm64 TBI). |
| libdd-heap-sampler/src/probes.c | Non-inline USDT probe emission functions. |
| libdd-heap-sampler/src/allocation_requested.c | Poisson sampler slow-path + size bumping logic. |
| libdd-heap-sampler/src/allocation_created.c | Applies sample flag + emits alloc probe + closes reentry guard. |
| libdd-heap-sampler/src/allocation_freed.c | Emits free probe and returns raw pointer/size adjustments. |
| libdd-heap-sampler/src/allocation_realloc.c | Realloc helpers for sampled-old teardown behavior. |
| libdd-heap-sampler/include/datadog/heap/tl_state.h | Public TLS state struct + fast getters. |
| libdd-heap-sampler/include/datadog/heap/sample_flag.h | Public flagging API + x86/arm64 fast check helpers. |
| libdd-heap-sampler/include/datadog/heap/probes.h | Public probe API + Linux USDT include. |
| libdd-heap-sampler/include/datadog/heap/allocation_requested.h | Public pre-alloc hook API + fast path inline. |
| libdd-heap-sampler/include/datadog/heap/allocation_created.h | Public post-alloc hook API + fast path inline. |
| libdd-heap-sampler/include/datadog/heap/allocation_freed.h | Public pre-free hook API + fast path inline. |
| libdd-heap-sampler/include/datadog/heap/allocation_realloc.h | Public realloc helper API. |
| libdd-heap-allocator/Cargo.toml | New Rust allocator wrapper crate. |
| libdd-heap-allocator/src/lib.rs | Exposes SampledAllocator. |
| libdd-heap-allocator/src/allocator.rs | Implements GlobalAlloc wrapper + tests. |
| libdd-heap-allocator/README.md | Usage docs + example pointer. |
| libdd-heap-allocator/examples/usdt_demo.rs | Demo binary emitting USDT alloc samples. |
| libdd-heap-gotter/Cargo.toml | New GOT interposition crate. |
| libdd-heap-gotter/src/lib.rs | Public API + Linux/no-op cross-platform surface. |
| libdd-heap-gotter/src/hooks.rs | Allocator/dlopen/pthread_create hook implementations. |
| libdd-heap-gotter/src/elf.rs | ELF/DYNAMIC parsing + relocation patching + mprotect guard. |
| libdd-heap-gotter/README.md | GOTter overview documentation. |
| libdd-heap-gotter/examples/gotter_usdt_demo.rs | Demo using GOT overrides to drive sampler via Rust allocations. |
| libdd-heap-gotter/tests/install.rs | Linux integration smoke test for install/restore. |
| libdd-heap-gotter-ffi/Cargo.toml | New C-ABI wrapper crate around gotter. |
| libdd-heap-gotter-ffi/build.rs | Generates C header via cbindgen. |
| libdd-heap-gotter-ffi/cbindgen.toml | C header generation configuration. |
| libdd-heap-gotter-ffi/src/lib.rs | Exposes install/update/restore/is-installed over C ABI. |
| libdd-heap-gotter-ffi/README.md | Documents C ABI and lifetime constraints. |
| libdd-heap-gotter-ffi/examples/cdylib_demo.rs | Demo dynamic-loading and calling the C ABI. |
| libdd-heap-gotter-ffi/tests/install.rs | FFI smoke test for install/restore roundtrip. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
17c7bba to
38fba33
Compare
38fba33 to
c051bd7
Compare
0caaaad to
f0a58df
Compare
Adds a Criterion benchmark comparing direct System allocation, the SampledAllocator hot path, and direct sampler calls, plus a small bench.sh helper for libdd-heap-sampler. Kept on a separate change because it pulls criterion (and thus bindgen via libdd-heap-sampler's build) into the bench compile set, which the libdatadog benchmarking-platform image cannot build without libclang-dev.
af47843 to
00e15fd
Compare
yannham
left a comment
There was a problem hiding this comment.
Partial review: libdd-heap-allocator, libdd-heap-gotter-ffi.
| To wrap a custom allocator instead; note that this is kind of ill-advised; we want to see _all_ allocations for the process: | ||
|
|
||
| ```rust | ||
| #[global_allocator] | ||
| static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new()); | ||
| ``` |
There was a problem hiding this comment.
Also, I'm not sure to understand this mention. Why wrapping a custom allocator is ill-advised? Doesn't the wrapping reach the same blast radius anyway, which is the current project/compilation unit/whatever (This is not a rhetorical question, I don't remember how custom allocators are scoped) ?
| To wrap a custom allocator instead; note that this is kind of ill-advised; we want to see _all_ allocations for the process: | |
| ```rust | |
| #[global_allocator] | |
| static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new()); | |
| ``` | |
| To wrap a custom allocator instead: | |
| ```rust | |
| #[global_allocator] | |
| static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new()); | |
| ``` | |
| Note that this is kind of ill-advised: we want to see _all_ allocations for the process. |
There was a problem hiding this comment.
You could foreseeably use a custom allocator that doesn't go through the global allocator and isn't used everywhere, so our wrapper doesn't see all the heap allocations for the process that don't go through that wrapper. I think you need nightly to do this, but you can use Vec::new_in to allocate from a custom allocator.
What we want is that all allocations go through one point that we wrap so we have complete visibility. I can see the wording here needs to be improved!
There was a problem hiding this comment.
Cleaned up the note to make this clearer
There was a problem hiding this comment.
What I mean is that, when you do:
#[global_allocator]
static ALLOC: SampledAllocator<System> = SampledAllocator::<System>::DEFAULT;
It only uses SampledAllocator for whatever is the scope of static ALLOC. If this is only valid for say a crate (I have no idea), or a specific module, then the wrapper will also only be valid for this scope. Code outside of this will use the global allocator unwrapped, won't it? It looks like whether you put the default here or a custom allocator, or put differenty whatever is on the right of the = should not change the scope of where ALLOC is used as the allocator, and thus which allocations are tracked. Or at least that's what I would naively think.
| fn assert_ok(result: VoidResult, what: &str) { | ||
| match result { | ||
| VoidResult::Ok => {} | ||
| VoidResult::Err(err) => panic!("{what} failed: {err}"), | ||
| } | ||
| } |
There was a problem hiding this comment.
Nit: I think you can just do assert_eq!(foo, Ok(_)) to the same effect instead of defining this helper. It should print foo and Err(e) in debug mode.
Co-authored-by: Yann Hamdaoui <yann.hamdaoui@gmail.com>
yannham
left a comment
There was a problem hiding this comment.
Partial review of libdd-heap-gotter
| unsafe impl Send for LiveAlloc {} | ||
| unsafe impl Sync for LiveAlloc {} | ||
| impl LiveAlloc { | ||
| fn as_slice_mut(&self) -> &'static mut [u8] { |
There was a problem hiding this comment.
I think this function is unsafe, because you can just call it two times in a row and get aliasing mutable reference to self.ptr. So either it should take &mut self to avoid that, plus ensuring that nobody else can create mutable reference from another alias, or just mark this function as unsafe and let the caller enforce the invariants.
| fn new(seed: u64) -> Self { | ||
| Rng(seed) | ||
| } | ||
| fn next(&mut self) -> u64 { | ||
| let mut z = self.0.wrapping_add(0x9e37_79b9_7f4a_7c15); | ||
| self.0 = z; | ||
| z = (z ^ (z >> 30)).wrapping_mul(0xbf58_476d_1ce4_e5b9); | ||
| z = (z ^ (z >> 27)).wrapping_mul(0x94d0_49bb_1331_11eb); | ||
| z ^ (z >> 31) | ||
| } | ||
| fn range(&mut self, lo: usize, hi: usize) -> usize { | ||
| lo + (self.next() as usize) % (hi - lo).max(1) | ||
| } | ||
| fn choice<'a, T>(&mut self, xs: &'a [T]) -> &'a T { | ||
| &xs[(self.next() as usize) % xs.len()] | ||
| } |
There was a problem hiding this comment.
Nit: spacing
| fn new(seed: u64) -> Self { | |
| Rng(seed) | |
| } | |
| fn next(&mut self) -> u64 { | |
| let mut z = self.0.wrapping_add(0x9e37_79b9_7f4a_7c15); | |
| self.0 = z; | |
| z = (z ^ (z >> 30)).wrapping_mul(0xbf58_476d_1ce4_e5b9); | |
| z = (z ^ (z >> 27)).wrapping_mul(0x94d0_49bb_1331_11eb); | |
| z ^ (z >> 31) | |
| } | |
| fn range(&mut self, lo: usize, hi: usize) -> usize { | |
| lo + (self.next() as usize) % (hi - lo).max(1) | |
| } | |
| fn choice<'a, T>(&mut self, xs: &'a [T]) -> &'a T { | |
| &xs[(self.next() as usize) % xs.len()] | |
| } | |
| fn new(seed: u64) -> Self { | |
| Rng(seed) | |
| } | |
| fn next(&mut self) -> u64 { | |
| let mut z = self.0.wrapping_add(0x9e37_79b9_7f4a_7c15); | |
| self.0 = z; | |
| z = (z ^ (z >> 30)).wrapping_mul(0xbf58_476d_1ce4_e5b9); | |
| z = (z ^ (z >> 27)).wrapping_mul(0x94d0_49bb_1331_11eb); | |
| z ^ (z >> 31) | |
| } | |
| fn range(&mut self, lo: usize, hi: usize) -> usize { | |
| lo + (self.next() as usize) % (hi - lo).max(1) | |
| } | |
| fn choice<'a, T>(&mut self, xs: &'a [T]) -> &'a T { | |
| &xs[(self.next() as usize) % xs.len()] | |
| } |
| @@ -0,0 +1,960 @@ | |||
| // Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/ | |||
There was a problem hiding this comment.
Maybe not for everything, but could this file re-use existing Rust infrastructure for elf parsing, such as the elf crate for example?
There was a problem hiding this comment.
good idea - i've not looked into it tbh! It'll need to work from dl_iterate_phdr in the running process but potentially. I will leave this open and have a look next week
| // Fallback for degenerate .gnu.hash (e.g. executables with only | ||
| // undefined imports): estimate dynsym entry count from the | ||
| // distance between DT_SYMTAB and DT_STRTAB. This works because | ||
| // the linker always places .dynsym immediately before .dynstr. |
There was a problem hiding this comment.
Is it true for all linkers (mold, lld, ld) or is there any guarantee of that, or does it just happen to be the case right now?
There was a problem hiding this comment.
Good catch. The GNU hash walk itself is ported from ddprof, but this DT_STRTAB - DT_SYMTAB fallback i've added. It's already a fallback for malformed binaries, so we should generally not get here, so I don't think we need to dig into mold etc. to validate - at least for the moment.
If it underestimates, the likely failure mode is that we skip patching some relocations. If it overestimates, valid relocation symbol indexes in normal loaded objects should still keep us safe. I've rewritten the comment to make this a bit clearer.
| None | ||
| } | ||
|
|
||
| /// Mirror of ddprof's `check`: defining symbol, function/object/notype. |
There was a problem hiding this comment.
I know clankers love to do that when you translate from existing code (here I guess ddprof), but I would clean those references to ddprof (unless they help understanding or give proper attribution): if this crate is supposed to be a stand-alone implementation in the future, it's a bit confusing for the reader IMHO.
There was a problem hiding this comment.
I actively left it around to help reviewers, but if it janks you out then it's obviously not being very helpful. I'll remove it!
|
|
||
| /// Resolved address of the real `malloc`; filled by `install_heap_overrides`. | ||
| /// The bare `usize` payload is a function pointer. | ||
| pub(crate) static ORIG_MALLOC: AtomicUsize = AtomicUsize::new(0); |
There was a problem hiding this comment.
Any reason to use usize here instead of e.g. AtomicPointer<c_void>?
There was a problem hiding this comment.
Well I'm not sure it would be more ergonomic given what we do with it, so feel free to ignore. I didn't know you couldn't convert pointers to function pointers freely.
| // SAFETY: caller guarantees T is the right `extern "C" fn(...)` | ||
| // type and that `v` was written by `apply_overrides` with a | ||
| // function of that signature. | ||
| Some(core::mem::transmute_copy::<usize, T>(&v)) |
There was a problem hiding this comment.
Why do we need transmute_copy here, instead of just transmute(v)?
| *const libc::pthread_attr_t, | ||
| StartRoutine, | ||
| *mut c_void, | ||
| ) -> c_int; |
There was a problem hiding this comment.
Do you know how stable those signatures are, or if there's any way to avoid drift as much as possible (possibly with runtime checks)?
| return; | ||
| }; | ||
| if ptr.is_null() { | ||
| return; |
There was a problem hiding this comment.
I don't know what is the behavior of free(NULL), but shouldn't we forward the call anyway just to make sure to preserve behavior? Maybe it sets errno somewhere or something and some crazy people rely on that?
| #[no_mangle] | ||
| pub unsafe extern "C" fn gotter_calloc(nmemb: usize, size: usize) -> *mut c_void { | ||
| let Some(real): Option<CallocFn> = load_fn(&ORIG_CALLOC) else { | ||
| return std::ptr::null_mut(); |
There was a problem hiding this comment.
Same remark about null handling.
What does this PR do?
Adds initial support for eBPF heap profiling to libdatadog for the ebpf heap profiling project. These changes provide the ability for libdatadog users to introduce a USDT into consuming applications allocation and deallocation paths which is fired behind a poisson distributed sampling path every ~0.5 MiB of allocation. This will give us something to hook in user applications from the eBPF side to sample allocations for allocation tracking and sample allocations+frees for live heap.
I expect we will make more changes to this as we go along, but as this is a standalone feature and is largely based on prior work in ddprof it would be nice to get things in now.
A brief description of the change being made with this pull request.
Interesting Details
usdt.h; it is not consistently available on all our build environments (read: old alpine) and this seems to be the common pattern. The API is stable and I see no concerns with this and have updated the license stuff appropriatelyMotivation
An enthusiasm for heap profiling
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.