feat(ocap-kernel): leverage libp2p v3 features in remote communications#915
Open
feat(ocap-kernel): leverage libp2p v3 features in remote communications#915
Conversation
The "handles multiple simultaneous reconnections to different peers" test was flaky because the fast test backoff (10-50ms) exhausted the default connection rate limit (10/min) in <1s, blocking reconnection for ~60s — right at the edge of the redemption timeout. Fix the test with three changes: - Add maxConnectionAttemptsPerMinute: 500 to prevent rate limiting - Restart both peers in parallel (Promise.all) to halve wall-clock time - Use ackTimeoutMs: 5000 for a 20s redemption window Also add RemoteHandle.giveUp() method so the transport-level give-up (via onRemoteGiveUp) properly cleans up: stops ACK retransmission, rejects pending messages, and rejects pending URL redemptions. The previous code path only rejected redemptions, missing timer cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- StreamResetError no longer marks peer as intentionally closed — prevents a malicious peer from permanently suppressing reconnection - Add signal.aborted guard to peer:disconnect handler to avoid spurious reconnection during shutdown - Add MIN_STREAM_INACTIVITY_TIMEOUT_MS (5s) floor to prevent self-DoS via extremely small timeout values - Clean up double-invoke in #handleAckTimeout — delegate to giveUp() - Add unit test for RemoteHandle.giveUp() - Add test for stop() calling channel.stream.close() - Add test for closeChannel when abort() throws - Add streamInactivityTimeoutMs to comms-query-string round-trip test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6ade570 to
877fb80
Compare
- Set streamInactivityTimeoutMs to 6s (above the 5s floor) in the E2E test so the inactivity timeout actually fires - Remove redundant giveUp() call in #handleAckTimeout — RemoteManager is the sole caller via #onGiveUp callback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
- Call giveUp() before #onGiveUp callback in #handleAckTimeout so cleanup happens even when the callback is unset - StreamResetError now logs "stream reset by remote, reconnecting" instead of the misleading "remote intentionally disconnected" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Coverage Report
File Coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Closes #903
Summary
Adopts several libp2p v3 features to improve error handling, observability, and robustness of the remote communications layer:
async, so handler rejections automatically abort the stream instead of silently dropping errors{ error, local }properties for better diagnostics distinguishing local vs remote and clean vs error closesstreamInactivityTimeoutMs(default 120s) detects dead streams via bidirectional silence, with a 5s minimum floor to prevent self-DoSpeer:disconnectsafety net — listens for peer-level disconnect events as a fallback whenreadChannelcleanup misses a disconnection, withsignal.abortedguard to avoid spurious reconnection during shutdownStreamResetErrordetection — recognizes remote stream resets without permanently marking the peer as intentionally closed (prevents malicious peers from suppressing reconnection)closeChannelrewrite — useschannel.stream.close()directly instead of the fragileunwrap()pattern, with abort fallbackstop()sends FIN to remote peers so they see clean stream ends rather thanStreamResetErrorChanneltype gainsstream— direct access to stream lifecycle APIs, avoidingunwrap()TooManyOutboundProtocolStreamsErrorhandling — re-throws immediately since trying other addresses won't helpwriteWithTimeoutstream status guard — fast-fails on already-closed streams before attempting a writeRemoteHandle.giveUp()method — transport-level give-up now properly cleans up (stops ACK retransmission + rejects pending messages + rejects pending URL redemptions)maxConnectionAttemptsPerMinute: 500(rate limiter was the root cause), parallel restarts, andackTimeoutMs: 5000Test plan
New unit tests cover stream status fast-fail (
it.eachover 4 statuses),peer:disconnecthandler (registration, forwarding, relay filtering, shutdown guard), stream close event diagnostics (4 scenarios viait.each),TooManyOutboundProtocolStreamsErrorrethrow,StreamResetErrorreconnection behavior,closeChannelgraceful/abort/double-fail paths,stop()stream closing,RemoteHandle.giveUp(), andstreamInactivityTimeoutMsquery string round-trip. New E2E test file (libp2p-v3-features.test.ts) covers stream inactivity timeout recovery end-to-end.Suggested review order
packages/ocap-kernel/src/remotes/types.ts—Channeltype gainsstream, newPeerDisconnectHandlerandstreamInactivityTimeoutMsoption. Read this first to understand the type changes everything else builds on.packages/ocap-kernel/src/remotes/platform/constants.ts— newSTREAM_INACTIVITY_TIMEOUT_MSandMIN_STREAM_INACTIVITY_TIMEOUT_MSconstants.packages/ocap-kernel/src/remotes/platform/channel-utils.ts+ test —writeWithTimeoutstream status guard (small, self-contained).packages/ocap-kernel/src/remotes/platform/connection-factory.ts+ test —peer:disconnectlistener, async inbound handler,closeChannelrewrite,TooManyOutboundProtocolStreamsError. Core plumbing.packages/ocap-kernel/src/remotes/platform/transport.ts+ test — biggest file:StreamResetErrorhandling,registerChannelstream events/timeout,peer:disconnectsafety net, gracefulstop(). Builds on 3 and 4.packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts+ test — newgiveUp()method and#handleAckTimeoutcleanup path.packages/ocap-kernel/src/remotes/kernel/RemoteManager.ts+ test —#handleRemoteGiveUpwiring togiveUp().packages/kernel-node-runtime/test/e2e/remote-comms.test.ts— flaky multi-peer reconnection fix (rate limit + parallel restarts).packages/kernel-node-runtime/test/e2e/libp2p-v3-features.test.ts— new E2E tests for the v3 features.packages/kernel-browser-runtime/src/utils/comms-query-string.ts+ test —streamInactivityTimeoutMsquery string support (trivial).🤖 Generated with Claude Code