Skip to content

feat(ocap-kernel): leverage libp2p v3 features in remote communications#915

Open
sirtimid wants to merge 4 commits intomainfrom
sirtimid/leverage-libp2p-v3-features
Open

feat(ocap-kernel): leverage libp2p v3 features in remote communications#915
sirtimid wants to merge 4 commits intomainfrom
sirtimid/leverage-libp2p-v3-features

Conversation

@sirtimid
Copy link
Copy Markdown
Contributor

@sirtimid sirtimid commented Apr 2, 2026

Closes #903

Summary

Adopts several libp2p v3 features to improve error handling, observability, and robustness of the remote communications layer:

  • Async protocol handlers with auto-abort — inbound connection handler is now async, so handler rejections automatically abort the stream instead of silently dropping errors
  • Fine-grained stream close events — v3 streams emit close events with { error, local } properties for better diagnostics distinguishing local vs remote and clean vs error closes
  • Stream inactivity timeout — configurable streamInactivityTimeoutMs (default 120s) detects dead streams via bidirectional silence, with a 5s minimum floor to prevent self-DoS
  • peer:disconnect safety net — listens for peer-level disconnect events as a fallback when readChannel cleanup misses a disconnection, with signal.aborted guard to avoid spurious reconnection during shutdown
  • StreamResetError detection — recognizes remote stream resets without permanently marking the peer as intentionally closed (prevents malicious peers from suppressing reconnection)
  • Graceful closeChannel rewrite — uses channel.stream.close() directly instead of the fragile unwrap() pattern, with abort fallback
  • Graceful shutdownstop() sends FIN to remote peers so they see clean stream ends rather than StreamResetError
  • Channel type gains stream — direct access to stream lifecycle APIs, avoiding unwrap()
  • TooManyOutboundProtocolStreamsError handling — re-throws immediately since trying other addresses won't help
  • writeWithTimeout stream status guard — fast-fails on already-closed streams before attempting a write
  • RemoteHandle.giveUp() method — transport-level give-up now properly cleans up (stops ACK retransmission + rejects pending messages + rejects pending URL redemptions)
  • Flaky multi-peer reconnection E2E test — fixed by adding maxConnectionAttemptsPerMinute: 500 (rate limiter was the root cause), parallel restarts, and ackTimeoutMs: 5000

Test plan

New unit tests cover stream status fast-fail (it.each over 4 statuses), peer:disconnect handler (registration, forwarding, relay filtering, shutdown guard), stream close event diagnostics (4 scenarios via it.each), TooManyOutboundProtocolStreamsError rethrow, StreamResetError reconnection behavior, closeChannel graceful/abort/double-fail paths, stop() stream closing, RemoteHandle.giveUp(), and streamInactivityTimeoutMs query string round-trip. New E2E test file (libp2p-v3-features.test.ts) covers stream inactivity timeout recovery end-to-end.

Suggested review order

  1. packages/ocap-kernel/src/remotes/types.tsChannel type gains stream, new PeerDisconnectHandler and streamInactivityTimeoutMs option. Read this first to understand the type changes everything else builds on.
  2. packages/ocap-kernel/src/remotes/platform/constants.ts — new STREAM_INACTIVITY_TIMEOUT_MS and MIN_STREAM_INACTIVITY_TIMEOUT_MS constants.
  3. packages/ocap-kernel/src/remotes/platform/channel-utils.ts + test — writeWithTimeout stream status guard (small, self-contained).
  4. packages/ocap-kernel/src/remotes/platform/connection-factory.ts + test — peer:disconnect listener, async inbound handler, closeChannel rewrite, TooManyOutboundProtocolStreamsError. Core plumbing.
  5. packages/ocap-kernel/src/remotes/platform/transport.ts + test — biggest file: StreamResetError handling, registerChannel stream events/timeout, peer:disconnect safety net, graceful stop(). Builds on 3 and 4.
  6. packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts + test — new giveUp() method and #handleAckTimeout cleanup path.
  7. packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts + test — #handleRemoteGiveUp wiring to giveUp().
  8. packages/kernel-node-runtime/test/e2e/remote-comms.test.ts — flaky multi-peer reconnection fix (rate limit + parallel restarts).
  9. packages/kernel-node-runtime/test/e2e/libp2p-v3-features.test.ts — new E2E tests for the v3 features.
  10. packages/kernel-browser-runtime/src/utils/comms-query-string.ts + test — streamInactivityTimeoutMs query string support (trivial).

🤖 Generated with Claude Code

@sirtimid sirtimid requested a review from a team as a code owner April 2, 2026 13:36
sirtimid and others added 2 commits April 2, 2026 15:38
The "handles multiple simultaneous reconnections to different peers"
test was flaky because the fast test backoff (10-50ms) exhausted the
default connection rate limit (10/min) in <1s, blocking reconnection
for ~60s — right at the edge of the redemption timeout.

Fix the test with three changes:
- Add maxConnectionAttemptsPerMinute: 500 to prevent rate limiting
- Restart both peers in parallel (Promise.all) to halve wall-clock time
- Use ackTimeoutMs: 5000 for a 20s redemption window

Also add RemoteHandle.giveUp() method so the transport-level give-up
(via onRemoteGiveUp) properly cleans up: stops ACK retransmission,
rejects pending messages, and rejects pending URL redemptions. The
previous code path only rejected redemptions, missing timer cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StreamResetError no longer marks peer as intentionally closed —
  prevents a malicious peer from permanently suppressing reconnection
- Add signal.aborted guard to peer:disconnect handler to avoid
  spurious reconnection during shutdown
- Add MIN_STREAM_INACTIVITY_TIMEOUT_MS (5s) floor to prevent self-DoS
  via extremely small timeout values
- Clean up double-invoke in #handleAckTimeout — delegate to giveUp()
- Add unit test for RemoteHandle.giveUp()
- Add test for stop() calling channel.stream.close()
- Add test for closeChannel when abort() throws
- Add streamInactivityTimeoutMs to comms-query-string round-trip test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sirtimid sirtimid force-pushed the sirtimid/leverage-libp2p-v3-features branch from 6ade570 to 877fb80 Compare April 2, 2026 13:39
- Set streamInactivityTimeoutMs to 6s (above the 5s floor) in the
  E2E test so the inactivity timeout actually fires
- Remove redundant giveUp() call in #handleAckTimeout — RemoteManager
  is the sole caller via #onGiveUp callback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

- Call giveUp() before #onGiveUp callback in #handleAckTimeout so
  cleanup happens even when the callback is unset
- StreamResetError now logs "stream reset by remote, reconnecting"
  instead of the misleading "remote intentionally disconnected"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 78.35%
⬆️ +0.06%
8519 / 10873
🔵 Statements 78.17%
⬆️ +0.06%
8655 / 11072
🔵 Functions 75.89%
⬆️ +0.04%
1987 / 2618
🔵 Branches 76.14%
⬆️ +0.07%
3661 / 4808
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/kernel-browser-runtime/src/utils/comms-query-string.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts 89%
⬆️ +0.42%
82.88%
🟰 ±0%
87.75%
⬆️ +0.25%
89.27%
⬆️ +0.42%
354, 373-415, 468, 511, 521-523, 564-577, 916, 989, 1040
packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts 97.8%
⬆️ +0.03%
100%
🟰 ±0%
95.65%
🟰 ±0%
97.8%
⬆️ +0.03%
188, 374-376
packages/ocap-kernel/src/remotes/platform/channel-utils.ts 100%
🟰 ±0%
85.71%
⬆️ +5.71%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/platform/connection-factory.ts 94.56%
⬆️ +0.71%
88.75%
⬆️ +0.60%
97.29%
⬆️ +0.33%
94.5%
⬆️ +0.72%
112-114, 124, 365, 550-551, 556-560, 619, 632
packages/ocap-kernel/src/remotes/platform/constants.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/remotes/platform/transport.ts 82.14%
⬆️ +0.25%
80.95%
⬆️ +1.18%
74.28%
⬆️ +0.09%
82.14%
⬆️ +0.25%
59, 134, 137, 158, 202-203, 208-212, 254-263, 322, 356-374, 398, 482, 526-529, 553, 577-582, 585-586, 590-593, 634, 664, 683-685, 694, 707, 787, 816
Generated in workflow #4169 for commit 0d83bbb by the Vitest Coverage Report Action

@sirtimid sirtimid enabled auto-merge April 2, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: leverage libp2p v3 features in remote communications

1 participant