Skip to content

feat(heap-profiling): add initial heap profiling primitives [PROF-15190]#2166

Open
scottgerring wants to merge 16 commits into
mainfrom
sgg/heap-prof-poc
Open

feat(heap-profiling): add initial heap profiling primitives [PROF-15190]#2166
scottgerring wants to merge 16 commits into
mainfrom
sgg/heap-prof-poc

Conversation

@scottgerring

@scottgerring scottgerring commented Jun 26, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Adds initial support for eBPF heap profiling to libdatadog for the ebpf heap profiling project. These changes provide the ability for libdatadog users to introduce a USDT into consuming applications allocation and deallocation paths which is fired behind a poisson distributed sampling path every ~0.5 MiB of allocation. This will give us something to hook in user applications from the eBPF side to sample allocations for allocation tracking and sample allocations+frees for live heap.

I expect we will make more changes to this as we go along, but as this is a standalone feature and is largely based on prior work in ddprof it would be nice to get things in now.

A brief description of the change being made with this pull request.

Interesting Details

  • I have vendored usdt.h; it is not consistently available on all our build environments (read: old alpine) and this seems to be the common pattern. The API is stable and I see no concerns with this and have updated the license stuff appropriately
  • I have gated the binding generation behind an env var and committed the bindings. This is because the generation imposes a fair bit of extra stuff (llvm and supporting bits) on every build job and developer. I've added a CI job to validate that the bindings generated reflect what's checked in to avoid the footgun.

Motivation

An enthusiasm for heap profiling

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

@datadog-prod-us1-4

datadog-prod-us1-4 Bot commented Jun 26, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/apm-reliability/libdatadog | benchmarks   View in Datadog   GitLab

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 85.77%
Overall Coverage: 74.56% (+0.10%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: bebaa38 | Docs | Datadog PR Page | Give us feedback!

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 627 documentation warning(s) found

📦 libdd-heap-allocator - 44 warning(s)

📦 libdd-heap-gotter-ffi - 322 warning(s)

📦 libdd-heap-gotter - 44 warning(s)

📦 libdd-heap-sampler - 44 warning(s)

📦 tools - 173 warning(s)


Updated: 2026-07-03 17:32:41 UTC | Commit: ae516cc | missing-docs job results

@github-actions

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/sgg/heap-prof-poc

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 22 22 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 45 45 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 6 6 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 3 3 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 11 11 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 4 issue(s) found, showing only errors (advisories, bans, sources)

📦 libdd-heap-allocator - ✅ No issues

📦 libdd-heap-gotter-ffi - 1 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
    ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:121:1
    │
121 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
    │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
    │
    ├ ID: RUSTSEC-2026-0097
    ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
    ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
      
      - The `log` and `thread_rng` features are enabled
      - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
      - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
      - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
      - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
      
      `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
    ├ Announcement: https://github.com/rust-random/rand/pull/1763
    ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
    ├ rand v0.8.5
      └── (dev) libdd-common v5.0.0
          └── libdd-common-ffi v37.0.0
              └── libdd-heap-gotter-ffi v37.0.0

advisories FAILED, bans ok, sources ok

📦 libdd-heap-gotter - ✅ No issues

📦 libdd-heap-sampler - ✅ No issues

📦 tools - 3 error(s)

Show output
error[vulnerability]: Quadratic run time when checking a start tag for duplicate attribute names
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:73:1
   │
73 │ quick-xml 0.37.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0194
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0194
   ├ `BytesStart::attributes()` returns an `Attributes` iterator which, by default
     (`with_checks(true)`), rejects a start tag that repeats an attribute name. For
     each attribute yielded, the iterator compared the new name against every name
     seen so far in the same tag using a linear scan, so a start tag with `N`
     distinct attribute names cost `O(N²)` byte comparisons. There was no bound on
     `N` other than the size of the buffered start tag.
     
     ## Impact
     
     Any code that parses untrusted XML and iterates a start tag's attributes with
     the default duplicate check enabled can be made to spend CPU time quadratic in
     the number of attributes on a single tag. Because the check is pure computation
     with no `.await`/I/O, an I/O-based timeout on the consumer (for example a read
     or request timeout) cannot interrupt it while it runs.
     
     Measured cost of a single start tag, release build:
     
     | Attributes on one tag | Time |
     |---|---|
     | 80,000  | ~6 s   |
     | 800,000 | ~10 min |
     
     The cost grows with the square of the attribute count, so a start tag of a few
     tens of megabytes can stall a parsing thread for hours. No memory is exhausted
     and the parser does not crash; the effect is CPU exhaustion on the thread doing
     the parsing: a single crafted start tag can pin a CPU core for minutes to hours,
     denying service to that worker. A deployment that places a wall-clock bound on
     parsing, or confines it to a non-critical thread, may consider the availability
     impact lower.
     
     ## Affected code paths
     
     * `BytesStart::attributes()` / `Attributes` iterated with checks enabled (the
       default), and `BytesStart::try_get_attribute`.
     * `NsReader`, which resolves namespaces by iterating a tag's attributes and so
       reaches the same check internally.
     
     Consumers that iterate attributes with `.attributes().with_checks(false)` and do
     not use `NsReader` are not affected.
     
     This was reported as reachable by a remote, unauthenticated attacker in a
     real-world RPKI relying party (NLnet Labs Routinator) via a crafted RRDP
     `snapshot.xml`.
     
     ## Remediation
     
     Upgrade to `quick-xml >= 0.41.0`, where the duplicate check keeps the linear
     scan for start tags with a small number of attributes and switches to an `O(1)`
     hash pre-filter above a threshold, making the whole tag `O(N)`. The reported
     `AttrError::Duplicated` positions are unchanged.
     
     If upgrading is not possible and duplicate-name detection is not required,
     disable it with `.attributes().with_checks(false)` (this does not help
     `NsReader` consumers, which have no equivalent opt-out before 0.41.0).
   ├ Announcement: https://github.com/tafia/quick-xml/issues/969
   ├ Solution: Upgrade to >=0.41.0 (try `cargo update -p quick-xml`)
   ├ quick-xml v0.37.5
     └── tools v37.0.0

error[vulnerability]: Unbounded namespace-declaration allocation in `NsReader` enables memory-exhaustion denial of service
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:73:1
   │
73 │ quick-xml 0.37.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ security vulnerability detected
   │
   ├ ID: RUSTSEC-2026-0195
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0195
   ├ `NsReader` resolves namespaces by calling `NamespaceResolver::push` for every
     `Start`/`Empty` event *before* the event is returned to the caller. `push`
     iterated all `xmlns` / `xmlns:*` attributes on the start tag and, for each one,
     appended the prefix bytes to an internal buffer and pushed a `NamespaceBinding`
     (32 bytes on 64-bit) to an internal `Vec`, with no upper bound on the number of
     declarations.
     
     ## Impact
     
     A start tag with `N` namespace declarations drove roughly `3×` the tag's byte
     size in `NamespaceResolver` heap, allocated *inside* `quick-xml` before the
     `NsReader` consumer ever received the event and could inspect or reject it. A
     consumer that bounds its *input* size therefore still cannot bound this
     allocation: an `M`-byte start tag yields on the order of `3 × M` bytes of
     resolver heap the caller never sees.
     
     On untrusted XML this lets a remote, unauthenticated attacker force large heap
     allocations with a single start tag. With several `NsReader`s running
     concurrently on independent inputs (a common server pattern), the allocations
     stack and can exhaust process memory, causing the operating system to kill the
     process (OOM). This was confirmed against a real-world RPKI relying party (NLnet
     Labs Routinator), where concurrent RRDP validation workers parsing a crafted
     `snapshot.xml` exceeded the memory limit and the process was OOM-killed.
     
     ## Affected code paths
     
     Consumers using `NsReader` (which always calls `NamespaceResolver::push` before
     yielding `Start`/`Empty`), or calling `NamespaceResolver::push` directly. A plain
     `Reader` that does not perform namespace resolution is not affected.
     
     ## Remediation
     
     Upgrade to `quick-xml >= 0.41.0`. `NamespaceResolver::push` now rejects a start
     tag that declares more than `DEFAULT_MAX_DECLARATIONS_PER_ELEMENT` (256)
     namespace bindings, returning the new `NamespaceError::TooManyDeclarations`
     instead of allocating without limit. The limit is configurable via
     `NamespaceResolver::set_max_declarations_per_element` (use `usize::MAX` to
     restore the previous unbounded behavior), and `NsReader::resolver_mut()` is
     provided to reach it.
     
     There is no clean workaround for `NsReader` consumers before 0.41.0, as the
     allocation happens inside the reader with no configuration knob to cap it.
   ├ Announcement: https://github.com/tafia/quick-xml/issues/970
   ├ Solution: Upgrade to >=0.41.0 (try `cargo update -p quick-xml`)
   ├ quick-xml v0.37.5
     └── tools v37.0.0

error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:76:1
   │
76 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
   │
   ├ ID: RUSTSEC-2026-0097
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
   ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
     
     - The `log` and `thread_rng` features are enabled
     - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
     - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
     - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
     - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
     
     `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
   ├ Announcement: https://github.com/rust-random/rand/pull/1763
   ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
   ├ rand v0.8.5
     └── (dev) libdd-common v5.0.0
         └── tools v37.0.0

advisories FAILED, bans ok, sources ok

Updated: 2026-07-03 17:33:59 UTC | Commit: ae516cc | dependency-check job results

@scottgerring scottgerring force-pushed the sgg/heap-prof-poc branch 2 times, most recently from b80ec4a to 11cf066 Compare June 26, 2026 14:37
@scottgerring scottgerring changed the title [PROF-15190]: DRAFT - add initial heap profiling crates feat(heap-profiling): add initial heap profiling primitives [PROF-15190] Jun 29, 2026
@scottgerring scottgerring force-pushed the sgg/heap-prof-poc branch 19 times, most recently from 6fe0523 to d10a58f Compare July 1, 2026 07:01

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be77a1cb93

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread libdd-heap-sampler/src/allocation_requested.c
Comment thread libdd-heap-sampler/src/allocation_requested.c Outdated
Comment thread libdd-heap-gotter/src/hooks.rs Outdated
Comment thread libdd-heap-gotter/src/hooks.rs
Comment thread libdd-heap-sampler/src/allocation_created.c Outdated
Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-sampler/src/sample_flag.c Outdated
Comment thread libdd-heap-sampler/build.rs
Comment thread libdd-heap-gotter/src/lib.rs Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the first set of heap-profiling building blocks to libdatadog: a Linux-focused sampler that emits ddheap:* USDT probes, plus Rust and runtime-injection frontends (GlobalAlloc wrapper + GOT interposition) and CI to keep the checked-in bindgen artifacts in sync.

Changes:

  • Introduces libdd-heap-sampler (C implementation + checked-in bindgen outputs) with vendored usdt.h and a workflow to verify regenerated bindings match the committed artifacts.
  • Adds libdd-heap-allocator (Rust GlobalAlloc wrapper) and libdd-heap-gotter (ELF GOT-patching injector) with demos and tests.
  • Adds libdd-heap-gotter-ffi to expose install/update/restore over a small C ABI for use by other runtimes.

Reviewed changes

Copilot reviewed 42 out of 47 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tools/docker/Dockerfile.build Adds new heap crates to Docker build caching/stubs.
NOTICE Notes bundled usdt.h license attribution.
Cargo.toml Adds new heap crates to workspace members.
Cargo.lock Records new crate deps (incl. bindgen split versions).
.github/workflows/verify-heap-sampler-bindings.yml CI job to regenerate/verify checked-in bindgen artifacts.
.github/CODEOWNERS Assigns heap crates to profiling team.
libdd-heap-sampler/Cargo.toml New sampler crate metadata + build deps.
libdd-heap-sampler/build.rs Builds C sources on Linux; optional bindgen regen via env var.
libdd-heap-sampler/README.md High-level design and binding-regeneration instructions.
libdd-heap-sampler/vendor/usdt.h Vendored libbpf/usdt single-header implementation.
libdd-heap-sampler/vendor/README.md Documents vendoring source/license and refresh procedure.
libdd-heap-sampler/src/lib.rs Rust façade over generated bindings + basic layout/smoke tests.
libdd-heap-sampler/src/generated/bindings.rs Checked-in bindgen Rust declarations.
libdd-heap-sampler/src/generated/dd_heap_sampler_static_wrappers.c Checked-in bindgen static-inline wrappers.
libdd-heap-sampler/src/tl_state.c TLS state init/seed logic for per-thread sampling.
libdd-heap-sampler/src/sample_flag.c Arch-specific sampled-pointer marking (x86 header, arm64 TBI).
libdd-heap-sampler/src/probes.c Non-inline USDT probe emission functions.
libdd-heap-sampler/src/allocation_requested.c Poisson sampler slow-path + size bumping logic.
libdd-heap-sampler/src/allocation_created.c Applies sample flag + emits alloc probe + closes reentry guard.
libdd-heap-sampler/src/allocation_freed.c Emits free probe and returns raw pointer/size adjustments.
libdd-heap-sampler/src/allocation_realloc.c Realloc helpers for sampled-old teardown behavior.
libdd-heap-sampler/include/datadog/heap/tl_state.h Public TLS state struct + fast getters.
libdd-heap-sampler/include/datadog/heap/sample_flag.h Public flagging API + x86/arm64 fast check helpers.
libdd-heap-sampler/include/datadog/heap/probes.h Public probe API + Linux USDT include.
libdd-heap-sampler/include/datadog/heap/allocation_requested.h Public pre-alloc hook API + fast path inline.
libdd-heap-sampler/include/datadog/heap/allocation_created.h Public post-alloc hook API + fast path inline.
libdd-heap-sampler/include/datadog/heap/allocation_freed.h Public pre-free hook API + fast path inline.
libdd-heap-sampler/include/datadog/heap/allocation_realloc.h Public realloc helper API.
libdd-heap-allocator/Cargo.toml New Rust allocator wrapper crate.
libdd-heap-allocator/src/lib.rs Exposes SampledAllocator.
libdd-heap-allocator/src/allocator.rs Implements GlobalAlloc wrapper + tests.
libdd-heap-allocator/README.md Usage docs + example pointer.
libdd-heap-allocator/examples/usdt_demo.rs Demo binary emitting USDT alloc samples.
libdd-heap-gotter/Cargo.toml New GOT interposition crate.
libdd-heap-gotter/src/lib.rs Public API + Linux/no-op cross-platform surface.
libdd-heap-gotter/src/hooks.rs Allocator/dlopen/pthread_create hook implementations.
libdd-heap-gotter/src/elf.rs ELF/DYNAMIC parsing + relocation patching + mprotect guard.
libdd-heap-gotter/README.md GOTter overview documentation.
libdd-heap-gotter/examples/gotter_usdt_demo.rs Demo using GOT overrides to drive sampler via Rust allocations.
libdd-heap-gotter/tests/install.rs Linux integration smoke test for install/restore.
libdd-heap-gotter-ffi/Cargo.toml New C-ABI wrapper crate around gotter.
libdd-heap-gotter-ffi/build.rs Generates C header via cbindgen.
libdd-heap-gotter-ffi/cbindgen.toml C header generation configuration.
libdd-heap-gotter-ffi/src/lib.rs Exposes install/update/restore/is-installed over C ABI.
libdd-heap-gotter-ffi/README.md Documents C ABI and lifetime constraints.
libdd-heap-gotter-ffi/examples/cdylib_demo.rs Demo dynamic-loading and calling the C ABI.
libdd-heap-gotter-ffi/tests/install.rs FFI smoke test for install/restore roundtrip.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-gotter/src/elf.rs Outdated
Comment thread libdd-heap-gotter/src/elf.rs
Comment thread libdd-heap-sampler/src/allocation_requested.c
Comment thread libdd-heap-gotter/README.md
Comment thread libdd-heap-sampler/README.md Outdated
Comment thread libdd-heap-sampler/README.md Outdated
Comment thread libdd-heap-sampler/README.md Outdated
Comment thread libdd-heap-gotter/src/hooks.rs Outdated
@jbachorik jbachorik added the sphinx:critical Sphinx: critical — human review required label Jul 2, 2026
@scottgerring scottgerring force-pushed the sgg/heap-prof-poc branch 2 times, most recently from 17c7bba to 38fba33 Compare July 2, 2026 14:47
Adds a Criterion benchmark comparing direct System allocation, the
SampledAllocator hot path, and direct sampler calls, plus a small
bench.sh helper for libdd-heap-sampler.

Kept on a separate change because it pulls criterion (and thus
bindgen via libdd-heap-sampler's build) into the bench compile set,
which the libdatadog benchmarking-platform image cannot build
without libclang-dev.

@yannham yannham left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review: libdd-heap-allocator, libdd-heap-gotter-ffi.

Comment thread libdd-heap-allocator/examples/usdt_demo.rs Outdated
Comment thread libdd-heap-allocator/README.md Outdated
Comment on lines +18 to +23
To wrap a custom allocator instead; note that this is kind of ill-advised; we want to see _all_ allocations for the process:

```rust
#[global_allocator]
static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new());
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure to understand this mention. Why wrapping a custom allocator is ill-advised? Doesn't the wrapping reach the same blast radius anyway, which is the current project/compilation unit/whatever (This is not a rhetorical question, I don't remember how custom allocators are scoped) ?

Suggested change
To wrap a custom allocator instead; note that this is kind of ill-advised; we want to see _all_ allocations for the process:
```rust
#[global_allocator]
static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new());
```
To wrap a custom allocator instead:
```rust
#[global_allocator]
static ALLOC: SampledAllocator<MyAllocator> = SampledAllocator::new(MyAllocator::new());
```
Note that this is kind of ill-advised: we want to see _all_ allocations for the process.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could foreseeably use a custom allocator that doesn't go through the global allocator and isn't used everywhere, so our wrapper doesn't see all the heap allocations for the process that don't go through that wrapper. I think you need nightly to do this, but you can use Vec::new_in to allocate from a custom allocator.

What we want is that all allocations go through one point that we wrap so we have complete visibility. I can see the wording here needs to be improved!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up the note to make this clearer

@yannham yannham Jul 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that, when you do:

#[global_allocator]
static ALLOC: SampledAllocator<System> = SampledAllocator::<System>::DEFAULT;

It only uses SampledAllocator for whatever is the scope of static ALLOC. If this is only valid for say a crate (I have no idea), or a specific module, then the wrapper will also only be valid for this scope. Code outside of this will use the global allocator unwrapped, won't it? It looks like whether you put the default here or a custom allocator, or put differenty whatever is on the right of the = should not change the scope of where ALLOC is used as the allocator, and thus which allocations are tracked. Or at least that's what I would naively think.

Comment thread libdd-heap-gotter-ffi/src/lib.rs
Comment thread libdd-heap-gotter-ffi/src/lib.rs
Comment on lines +17 to +22
fn assert_ok(result: VoidResult, what: &str) {
match result {
VoidResult::Ok => {}
VoidResult::Err(err) => panic!("{what} failed: {err}"),
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think you can just do assert_eq!(foo, Ok(_)) to the same effect instead of defining this helper. It should print foo and Err(e) in debug mode.

Comment thread libdd-heap-gotter-ffi/README.md Outdated
Comment thread libdd-heap-allocator/src/allocator.rs
Comment thread libdd-heap-allocator/src/allocator.rs
Comment thread libdd-heap-allocator/src/allocator.rs Outdated
Comment thread libdd-heap-allocator/src/allocator.rs
scottgerring and others added 3 commits July 3, 2026 15:55
Co-authored-by: Yann Hamdaoui <yann.hamdaoui@gmail.com>

@yannham yannham left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review of libdd-heap-gotter

unsafe impl Send for LiveAlloc {}
unsafe impl Sync for LiveAlloc {}
impl LiveAlloc {
fn as_slice_mut(&self) -> &'static mut [u8] {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is unsafe, because you can just call it two times in a row and get aliasing mutable reference to self.ptr. So either it should take &mut self to avoid that, plus ensuring that nobody else can create mutable reference from another alias, or just mark this function as unsafe and let the caller enforce the invariants.

Comment on lines +80 to +95
fn new(seed: u64) -> Self {
Rng(seed)
}
fn next(&mut self) -> u64 {
let mut z = self.0.wrapping_add(0x9e37_79b9_7f4a_7c15);
self.0 = z;
z = (z ^ (z >> 30)).wrapping_mul(0xbf58_476d_1ce4_e5b9);
z = (z ^ (z >> 27)).wrapping_mul(0x94d0_49bb_1331_11eb);
z ^ (z >> 31)
}
fn range(&mut self, lo: usize, hi: usize) -> usize {
lo + (self.next() as usize) % (hi - lo).max(1)
}
fn choice<'a, T>(&mut self, xs: &'a [T]) -> &'a T {
&xs[(self.next() as usize) % xs.len()]
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: spacing

Suggested change
fn new(seed: u64) -> Self {
Rng(seed)
}
fn next(&mut self) -> u64 {
let mut z = self.0.wrapping_add(0x9e37_79b9_7f4a_7c15);
self.0 = z;
z = (z ^ (z >> 30)).wrapping_mul(0xbf58_476d_1ce4_e5b9);
z = (z ^ (z >> 27)).wrapping_mul(0x94d0_49bb_1331_11eb);
z ^ (z >> 31)
}
fn range(&mut self, lo: usize, hi: usize) -> usize {
lo + (self.next() as usize) % (hi - lo).max(1)
}
fn choice<'a, T>(&mut self, xs: &'a [T]) -> &'a T {
&xs[(self.next() as usize) % xs.len()]
}
fn new(seed: u64) -> Self {
Rng(seed)
}
fn next(&mut self) -> u64 {
let mut z = self.0.wrapping_add(0x9e37_79b9_7f4a_7c15);
self.0 = z;
z = (z ^ (z >> 30)).wrapping_mul(0xbf58_476d_1ce4_e5b9);
z = (z ^ (z >> 27)).wrapping_mul(0x94d0_49bb_1331_11eb);
z ^ (z >> 31)
}
fn range(&mut self, lo: usize, hi: usize) -> usize {
lo + (self.next() as usize) % (hi - lo).max(1)
}
fn choice<'a, T>(&mut self, xs: &'a [T]) -> &'a T {
&xs[(self.next() as usize) % xs.len()]
}

@@ -0,0 +1,960 @@
// Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not for everything, but could this file re-use existing Rust infrastructure for elf parsing, such as the elf crate for example?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea - i've not looked into it tbh! It'll need to work from dl_iterate_phdr in the running process but potentially. I will leave this open and have a look next week

Comment thread libdd-heap-gotter/src/elf.rs Outdated
// Fallback for degenerate .gnu.hash (e.g. executables with only
// undefined imports): estimate dynsym entry count from the
// distance between DT_SYMTAB and DT_STRTAB. This works because
// the linker always places .dynsym immediately before .dynstr.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true for all linkers (mold, lld, ld) or is there any guarantee of that, or does it just happen to be the case right now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The GNU hash walk itself is ported from ddprof, but this DT_STRTAB - DT_SYMTAB fallback i've added. It's already a fallback for malformed binaries, so we should generally not get here, so I don't think we need to dig into mold etc. to validate - at least for the moment.

If it underestimates, the likely failure mode is that we skip patching some relocations. If it overestimates, valid relocation symbol indexes in normal loaded objects should still keep us safe. I've rewritten the comment to make this a bit clearer.

Comment thread libdd-heap-gotter/src/elf.rs Outdated
None
}

/// Mirror of ddprof's `check`: defining symbol, function/object/notype.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know clankers love to do that when you translate from existing code (here I guess ddprof), but I would clean those references to ddprof (unless they help understanding or give proper attribution): if this crate is supposed to be a stand-alone implementation in the future, it's a bit confusing for the reader IMHO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actively left it around to help reviewers, but if it janks you out then it's obviously not being very helpful. I'll remove it!


/// Resolved address of the real `malloc`; filled by `install_heap_overrides`.
/// The bare `usize` payload is a function pointer.
pub(crate) static ORIG_MALLOC: AtomicUsize = AtomicUsize::new(0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to use usize here instead of e.g. AtomicPointer<c_void>?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I'm not sure it would be more ergonomic given what we do with it, so feel free to ignore. I didn't know you couldn't convert pointers to function pointers freely.

// SAFETY: caller guarantees T is the right `extern "C" fn(...)`
// type and that `v` was written by `apply_overrides` with a
// function of that signature.
Some(core::mem::transmute_copy::<usize, T>(&v))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need transmute_copy here, instead of just transmute(v)?

*const libc::pthread_attr_t,
StartRoutine,
*mut c_void,
) -> c_int;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how stable those signatures are, or if there's any way to avoid drift as much as possible (possibly with runtime checks)?

return;
};
if ptr.is_null() {
return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is the behavior of free(NULL), but shouldn't we forward the call anyway just to make sure to preserve behavior? Maybe it sets errno somewhere or something and some crazy people rely on that?

#[no_mangle]
pub unsafe extern "C" fn gotter_calloc(nmemb: usize, size: usize) -> *mut c_void {
let Some(real): Option<CallocFn> = load_fn(&ORIG_CALLOC) else {
return std::ptr::null_mut();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark about null handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-build sphinx:critical Sphinx: critical — human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants