Skip to content

feat: tiered data storage#7

Open
enigbe wants to merge 111 commits intomainfrom
2025-10-tiered-data-storage
Open

feat: tiered data storage#7
enigbe wants to merge 111 commits intomainfrom
2025-10-tiered-data-storage

Conversation

@enigbe
Copy link
Owner

@enigbe enigbe commented Oct 21, 2025

What this PR does:

We introduce TierStore, a KVStore implementation that manages data across
three distinct storage layers.

The layers are:

  1. Primary: The main/remote data store.
  2. Ephemeral: A secondary store for non-critical, easily-rebuildable data
    (e.g., network graph). This tier aims to improve latency by leveraging a
    local KVStore designed for fast/local access.
  3. Backup: A tertiary store for disaster recovery. Backup operations are sent
    asynchronously/lazily to avoid blocking primary store operations.

We also permit the configuration of Node with these stores allowing
callers to set exponential back-off parameters, as well as backup and ephemeral
stores, and to build the Node with TierStore's primary store. These configuration
options also extend to our foreign interface, allowing bindings target to build the
Node with their own ffi::KVStore implementations.

A sample Python implementation is added and tested.

Additionally, we add comprehensive testing for TierStore by introducing

  1. Unit tests for TierStore core functionality.
  2. Integration tests for Node built with tiered storage.
  3. Python FFI tests for foreign ffi::KVStore implementations.

Concerns

It is worth considering the way retry logic is handled, especially because of nested
retries. TierStore comes with a basic one by default but there are KVStore implementations
that come with them baked-in (e.g. VssStore), and thus would have no need for
the wrapper-store's own logic.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 9 times, most recently from 29f47f3 to 264aa7f Compare November 4, 2025 22:07
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 3 times, most recently from a30cbfb to 1e7bdbc Compare December 4, 2025 23:30
benthecarman and others added 18 commits December 9, 2025 13:06
…lices

LDK gives us the actual funding output so we no longer need to create a dummy one with fake pubkeys
We insert a channel's funding utxo into our wallet so we can later
calculate the fees for the transaction, otherwise our wallet would have
incomplete information. We do it before the splice as we only really
need this information for splices and not for all channels.
Exposes the funding_redeem_script that LDK already exposes
…wallet

Insert channel funding outputs into Wallet
Refactor the unified_qr.rs module into unified.rs to provide a single
API for sending payments to BIP 21/321 URIs and BIP 353 HRNs. This
change simplifies the user interface by leveraging the
bitcoin-payment-instructions library for parsing.

Key changes:
- Rename UnifiedQrPayment to UnifiedPayment.
- Rename QRPaymentResult to UnifiedPaymentResult.
- Update the send method to support both URIs and HRNs.
- Update integration tests to match the new unified flow.
…tcoin-payment-instructions

Refactor unified_qr.rs to use bitcoin-payment-instructions
Rather than using `KVStoreSync` we now use the async `KVStore`
implementation for most `read_X` util methods used during node building.

This is a first step towards making node building/startup entirely async
eventually.
Previously, we would read entries of our payment store sequentially.
This is more or less fine when we read from a local store, but when we
read from a remote (e.g., VSS) store, all the latency could result in
considerable slowdown during startup. Here, we opt to read store entries
in batches.
Previously, we consistently handed around `Arc` references for most
objects to avoid unnecessary refactoring work. This approach however
introduced a bunch of unnecessary allocations through `Arc::clone`.

Here we opt to rather use plain references in a bunch of places,
reducing the usage of `Arc`s.
Add integration test that verifies 200 payments are correctly
persisted and retrievable via `list_payments` after restarting
a node.

Co-Authored-By: Claude AI
…-payment-reads

Parallelize `read_payments`
.. we bump to the most recent `rust-lightning` commit and fix some minor
test code changes.
.. as this is now done by the background processor.
LDK 0.2 added a method to load `ChannelMonitor`s on startup without
resilvering them, avoiding the startup latency of persistence for
each `ChannelMonitor`. Here we start using it.
…silver

Avoid resilvering `ChannelMonitor`s on startup
`LiquiditySource` takes a reference to our `PeerManager` but the
`PeerManager` holds an indirect reference to the `LiquiditySource`.
As a result, after our `Node` instance is `stop`ped and the `Node`
`drop`ped, much of the node's memory will stick around, including
the `NetworkGraph`.

Here we fix this issue by using `Weak` pointers, though note that
there is another issue caused by LDK's gossip validation API.
tnull and others added 10 commits February 11, 2026 15:36
Change the `update` method on `StorableObject` to take the update by
value rather than by reference. This avoids unnecessary clones when
applying updates, since the caller typically constructs a fresh update
struct that can simply be moved.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
…alue

Take `StorableObjectUpdate` by value in `StorableObject::update`
It's weird to have a special intermediary `setup_node` method if we have
`TestConfig` for exactly that reason by now. So we move
`async_payment_role` over.
.. all of our tests should be robust against switching chain sources. We
here opt to pick a random one each time to considerably extend our test
coverage, instead of just running some cases against non-Esplora chain
sources.

Signed-off-by: Elias Rohrer <dev@tnull.de>
When we intially implemented `bitcoind` syncing polling the mempool was
very frequent and rather inefficient so we made a choice not to
unnecessarily update the payment store for mempool changes, especially
since we only consider transactions `Succeeded` after
`ANTI_REORG_DELAY` anyways.

However, since then we made quite a few peformance improvements to the
mempool syncing, and by now we should just update they payment store as
not doing so will lead to rather unexpected behavior, making some tests
fail for `TestChainSource::Bitcoind`, e.g., `channel_full_cycle_0conf`,
which we fix here.

As we recently switched to updating the payment store based on BDK's
`WalletEvent`, but they currently don't offer an API returning such
events when applying mempool transactions, we copy over the respective
method for generating events from `bdk_wallet`, with the intention of
dropping it again once they do.

Signed-off-by: Elias Rohrer <dev@tnull.de>
Previously, we fixed than a fresh node syncing via `bitcoind` RPC would
resync all chain data back to genesis. However, while introducing a
wallet birthday is great, it disallowed discovery of historical funds if
a wallet would be imported from seed. Here, we add a recovery mode flag
to the builder that explictly allows to re-enable resyncing from genesis
in such a scenario. Going forward, we intend to reuse that API for an
upcoming Lightning recoery flow, too.
Previously, we'd selectively insert the funding outputs into the onchain
wallet to later allow calculating `fees_paid` when creating payment
store entries (only for splicing mostly). However, this didn't always work, and we might for
example end up with a missing funding output (and hence would fall back
to `fees_paid: Some(0)`) if it was a counterparty-initiated channel and
we synced via `bitcoind` RPC.

Here, we fix this by tracking all LDK-registered `txids` in
`ChainSource` and then in the `Wallet`'s `Listen` implementation insert
all outputs of all registered transactions into the `Wallet`, ensuring
we'd always have sufficient data for `calculate_fee` available.

Thereby we also fix the `onchain_send_receive` test which previously
failed when using `TestChainSource::Bitcoind`.

Signed-off-by: Elias Rohrer <dev@tnull.de>
Previously, we'd update the payment store after persisting the wallet in
some cases. This was fine as long as we iterated all wallet transactions
anyways (hence idempotent). However, now that we use the event-based
flow we should persist the payment store(s) first, so that wallet events
get replayed if there was a crash in-between some of the persistence
operations.
Randomize chain source selection in tests
Add automatic rebroadcasting of unconfirmed transactions triggered by
the `ChainTipChanged` event from BDK. This ensures pending transactions
remain in mempools.
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 2 times, most recently from 95285b0 to 4b2d345 Compare February 18, 2026 11:23
Camillarhi and others added 15 commits February 18, 2026 17:51
Add `Replace-by-Fee` functionality to allow users to increase fees on
pending outbound transactions, improving confirmation likelihood during
network congestion.

- Uses BDK's `build_fee_bump` for transaction replacement
- Validates transaction eligibility: must be outbound and unconfirmed
- Maintains payment history consistency across wallet updates
- Includes integration tests for various RBF scenarios
Signed-off-by: Elias Rohrer <dev@tnull.de>
…dcast-fee-bumping

Enhance onchain transaction management
node_sender_lsp and node_receiver_lsp each have two channels, so they
receive two ChannelReady events whose order depends on timing. The test
previously consumed these events in a fixed order, which could fail when
the events arrive in the opposite order.

Reproduced locally by swapping the assertion order at line 1346-1347,
which fails deterministically since the "normal" local ordering is the
opposite of the one expected by the swapped assertions.

Add an expect_channel_ready_events\! macro that consumes two ChannelReady
events and asserts both expected counterparties are present regardless of
arrival order.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ent-test

Fix flaky async_payment test due to non-deterministic event ordering
Introduces TierStore, a KVStore implementation that manages data
across three storage layers:

- Primary: Main data store for critical node data
- Ephemeral: Secondary store for non-critical, easily-rebuildable
data (e.g., network graph) with fast local access
- Backup: Tertiary store for disaster recovery with async/lazy
operations to avoid blocking primary store
- Unit tests for TierStore core functionality
Adds TierStoreConfig and two configuration methods to NodeBuilder:

- set_backup_store: Configure backup store for disaster recovery
- set_ephemeral_store: Configure ephemeral store for non-critical data

Modifies build_with_store to wrap the provided store in a TierStore,
as the primary store, applying any configured ephemeral and backup stores.

Note: Temporal dead code allowance will be removed in test commit.
Introduce FFI-safe abstractions and builder APIs to allow foreign
language targets to configure custom backup and ephemeral stores when
constructing nodes with a custom store.

Major changes include:

- Addition of FfiDynStoreTrait, an FFI-safe equivalent of DynStoreTrait,
  working around uniffi's lack of support for Pin<Box<T>>
- Addition of FfiDynStore, a concrete wrapper for foreign language store
  implementations
- Provision of FfiDynStoreTrait implementation for DynStoreWrapper to bridge
  native Rust stores to FFI layer (useful in testing)
- Extension of ArcedNodeBuilder with methods for configuring backup and
  ephemeral stores
- Exposure of build_with_store so foreign targets can build nodes with
  custom store implementations
- Addition of build_node_with_store test helper to abstract uniffi-gated
  store wrapping at build_with_store call sites
- Add Rust integration test verifying correct routing to storage tiers
- Add Python in-memory KV store and FFI test for tiered storage
Introduce FfiDynStoreInner with per-key write version locks that
ensure write ordering and skip stale versions in both sync and async
code paths.

Test changes:
- Unify tier store test helpers to use TestSyncStore for all tiers,
  replacing mixed SqliteStore/FilesystemStore/TestStore usage that
  caused test hangs due to TestStore's async write blocking
- Split build_node_with_store into cfg-gated versions for uniffi
  vs non-uniffi feature flags
These were created to test that our backup store does
not impact the primary store writes but the boilerplate
appears too much for the functionality being tested.
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 4b2d345 to cba29a3 Compare February 24, 2026 22:10
- Restrict `TierStoreInner` visibility from `pub` to `pub(crate)`
- Primary store can be either local or remote
- Extract repeated ephemeral key matching into a standalone
  `is_ephemerally_cached_key` helper to DRY up `read_internal`,
  `write_internal`, and `remove_internal`
- Replace `KVStoreSync::list` with async `KVStore::list` in
  `list_internal` to avoid blocking the async runtime
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from cba29a3 to db1fe83 Compare February 24, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.