feat(ocap-kernel): leverage libp2p v3 features in remote communications by sirtimid · Pull Request #914 · MetaMask/ocap-kernel

sirtimid · 2026-04-02T13:34:14Z

Closes #903

Summary

Adopts several libp2p v3 features to improve error handling, observability, and robustness of the remote communications layer:

Async protocol handlers with auto-abort — inbound connection handler is now async, so handler rejections automatically abort the stream instead of silently dropping errors
Fine-grained stream close events — v3 streams emit close events with { error, local } properties for better diagnostics distinguishing local vs remote and clean vs error closes
Stream inactivity timeout — configurable streamInactivityTimeoutMs (default 120s) detects dead streams via bidirectional silence, with a 5s minimum floor to prevent self-DoS
peer:disconnect safety net — listens for peer-level disconnect events as a fallback when readChannel cleanup misses a disconnection
StreamResetError detection — recognizes remote stream resets without permanently marking the peer as intentionally closed (prevents malicious peers from suppressing reconnection)
Graceful closeChannel rewrite — uses channel.stream.close() directly instead of the fragile unwrap() pattern, with abort fallback
Graceful shutdown — stop() sends FIN to remote peers so they see clean stream ends rather than StreamResetError
Channel type gains stream — direct access to stream lifecycle APIs, avoiding unwrap()
TooManyOutboundProtocolStreamsError handling — re-throws immediately since trying other addresses won't help
writeWithTimeout stream status guard — fast-fails on already-closed streams before attempting a write

Also fixes:

Flaky multi-peer reconnection E2E test — root cause was the fast test backoff (10-50ms) exhausting the default connection rate limit (10/min) in <1s. Fixed with maxConnectionAttemptsPerMinute: 500, parallel restarts, and ackTimeoutMs: 5000
RemoteHandle.giveUp() method — transport-level give-up now properly cleans up (stops ACK retransmission + rejects pending messages + rejects pending URL redemptions)
#handleAckTimeout double-invoke — delegates to giveUp() instead of duplicating rejection logic

Test plan

New unit tests cover: stream status fast-fail (it.each over 4 statuses), peer:disconnect handler (registration, forwarding, relay filtering, shutdown guard), stream close event diagnostics (4 scenarios via it.each), TooManyOutboundProtocolStreamsError rethrow, StreamResetError reconnection behavior, closeChannel graceful/abort/double-fail paths, stop() stream closing, RemoteHandle.giveUp(), and streamInactivityTimeoutMs query string round-trip. New E2E test file (libp2p-v3-features.test.ts) covers stream inactivity timeout recovery end-to-end.

🤖 Generated with Claude Code

The "handles multiple simultaneous reconnections to different peers" test was flaky because the fast test backoff (10-50ms) exhausted the default connection rate limit (10/min) in <1s, blocking reconnection for ~60s — right at the edge of the redemption timeout. Fix the test with three changes: - Add maxConnectionAttemptsPerMinute: 500 to prevent rate limiting - Restart both peers in parallel (Promise.all) to halve wall-clock time - Use ackTimeoutMs: 5000 for a 20s redemption window Also add RemoteHandle.giveUp() method so the transport-level give-up (via onRemoteGiveUp) properly cleans up: stops ACK retransmission, rejects pending messages, and rejects pending URL redemptions. The previous code path only rejected redemptions, missing timer cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- StreamResetError no longer marks peer as intentionally closed — prevents a malicious peer from permanently suppressing reconnection - Add signal.aborted guard to peer:disconnect handler to avoid spurious reconnection during shutdown - Add MIN_STREAM_INACTIVITY_TIMEOUT_MS (5s) floor to prevent self-DoS via extremely small timeout values - Clean up double-invoke in #handleAckTimeout — delegate to giveUp() - Add unit test for RemoteHandle.giveUp() - Add test for stop() calling channel.stream.close() - Add test for closeChannel when abort() throws - Add streamInactivityTimeoutMs to comms-query-string round-trip test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-02T13:40:09Z

Coverage Report

Status	Category	Percentage	Covered / Total
🔵	Lines	78.35% ⬆️ +0.06%	8519 / 10872
🔵	Statements	78.17% ⬆️ +0.06%	8655 / 11071
🔵	Functions	75.89% ⬆️ +0.04%	1987 / 2618
🔵	Branches	76.16% ⬆️ +0.09%	3662 / 4808

File Coverage

File	Stmts	Branches	Functions	Lines	Uncovered Lines
Changed Files
packages/kernel-browser-runtime/src/utils/comms-query-string.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/remotes/types.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts	89% ⬆️ +0.42%	82.88% 🟰 ±0%	87.75% ⬆️ +0.25%	89.27% ⬆️ +0.42%	354, 373-414, 467, 510, 520-522, 563-576, 915, 988, 1039
packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts	97.8% ⬆️ +0.03%	100% 🟰 ±0%	95.65% 🟰 ±0%	97.8% ⬆️ +0.03%	185, 371-373
packages/ocap-kernel/src/remotes/platform/channel-utils.ts	100% 🟰 ±0%	85.71% ⬆️ +5.71%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/remotes/platform/connection-factory.ts	94.56% ⬆️ +0.71%	88.75% ⬆️ +0.60%	97.29% ⬆️ +0.33%	94.5% ⬆️ +0.72%	112-114, 124, 365, 550-551, 556-560, 619, 632
packages/ocap-kernel/src/remotes/platform/constants.ts	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%	100% 🟰 ±0%
packages/ocap-kernel/src/remotes/platform/transport.ts	82.47% ⬆️ +0.58%	81.9% ⬆️ +2.13%	74.28% ⬆️ +0.09%	82.47% ⬆️ +0.58%	134, 137, 158, 202-203, 208-212, 254-263, 322, 356-374, 398, 481, 525-528, 552, 576-581, 584-585, 589-592, 633, 663, 682-684, 693, 706, 786, 815

Generated in workflow #4165 for commit 6ade570 by the Vitest Coverage Report Action

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

cursor · 2026-04-02T13:41:12Z

packages/kernel-node-runtime/test/e2e/libp2p-v3-features.test.ts

+        // Wait longer than the inactivity timeout (2s + buffer).
+        // The stream should auto-abort due to inactivityTimeout,
+        // triggering connection loss handling and reconnection.
+        await delay(4_000);


E2E test delay shorter than clamped inactivity timeout

Medium Severity

The test sets streamInactivityTimeoutMs: 2_000 and waits delay(4_000), expecting the stream to have auto-aborted. However, registerChannel clamps the value via Math.max(streamInactivityTimeoutMs, MIN_STREAM_INACTIVITY_TIMEOUT_MS), which evaluates to Math.max(2000, 5000) = 5000. Since 4s < 5s, the stream hasn't timed out yet, so the test succeeds on the existing connection without ever triggering inactivity-based reconnection. The test passes but doesn't exercise the intended code path.

Additional Locations (2)

packages/ocap-kernel/src/remotes/platform/transport.ts#L287-L291

packages/ocap-kernel/src/remotes/platform/constants.ts#L46-L47

cursor · 2026-04-02T13:41:12Z

packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts

      );
-      this.#rejectAllPending(`not acknowledged after ${MAX_RETRIES} retries`);
-      this.rejectPendingRedemptions(
-        `Remote connection lost after ${MAX_RETRIES} failed retries`,


Double giveUp() call in ACK timeout path

Low Severity

#handleAckTimeout calls this.giveUp(reason) on line 384, then this.#onGiveUp?.(this.#peerId) on line 385 triggers #handleRemoteGiveUp, which calls remote.giveUp(reason) again with a different reason string. The second invocation is a no-op since collections were already cleared, but it contradicts the comment stating "notify RemoteManager which calls giveUp()" — implying RemoteManager is the intended caller. The redundant path may confuse future maintainers.

Additional Locations (1)

packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts#L179-L181

sirtimid and others added 2 commits April 2, 2026 15:26

sirtimid requested a review from a team as a code owner April 2, 2026 13:34

sirtimid changed the title ~~fix(ocap-kernel): fix flaky multi-peer reconnection E2E test~~ feat(ocap-kernel): leverage libp2p v3 features in remote communications Apr 2, 2026

sirtimid closed this Apr 2, 2026

sirtimid deleted the sirtimid/fix-flaky-multi-peer-reconnection-test branch April 2, 2026 13:36

cursor bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ocap-kernel): leverage libp2p v3 features in remote communications#914

feat(ocap-kernel): leverage libp2p v3 features in remote communications#914
sirtimid wants to merge 2 commits intomainfrom
sirtimid/fix-flaky-multi-peer-reconnection-test

sirtimid commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 2, 2026

Uh oh!

cursor bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sirtimid commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Apr 2, 2026

Coverage Report

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

E2E test delay shorter than clamped inactivity timeout

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

Double giveUp() call in ACK timeout path

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sirtimid commented Apr 2, 2026 •

edited

Loading

Double `giveUp()` call in ACK timeout path