Fix #6169: elect call offerer using same-namespace session IDs#6375
Fix #6169: elect call offerer using same-namespace session IDs#6375alauzon wants to merge 1 commit into
Conversation
In HPB (external signaling) mode the remote participant sessionId passed to PeerConnectionWrapper is a signaling-layer session ID, while the local session used for the offerer election (callSession) is the Nextcloud session ID. The two peers then compare unrelated strings and can both compute the same role: both "not offerer" (no offer is created — the call deadlocks) or both "offerer" (offer collision). The requestoffer fallback only applies in MCU mode, so direct 2-party calls deadlock outright. Pass the local signaling session (webSocketClient.sessionId when external signaling is used, callSession otherwise — matching the existing currentSessionId resolution) so both peers compare the same namespace. Also extract the election into PeerConnectionWrapper.isOfferer() and add PeerConnectionWrapperOffererRoleTest covering the contract and the cross-namespace failure mode. Signed-off-by: Alain Lauzon <alauzon@alainlauzon.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Since this fixes a call-breaking regression (2-party Android calls deadlock intermittently in HPB mode, no media at all), would it be possible to backport it to the 24.0.2 patch (#111) rather than waiting for 24.1.0? The change is small and self-contained (just the offerer election at the |
|
Thank you for contributing @alauzon i will review this soon! |
|
Please hold this — I'm converting it back to draft. On real-device testing (two phones on different networks), with this change the call now correctly rings and an offer is created where previously the offerer election deadlocked — so the cross-namespace offerer bug itself is addressed. However, after the callee answers, the call does not establish media (it keeps "connecting"). So there is a separate problem in the post-answer ICE/media negotiation that this change does not resolve, and I don't want to give the impression #6169 is fully fixed by this commit alone. My earlier "connects 6/6" validation was on emulators sharing the same host network, which isn't representative of two real-world NAT'd devices — that's on me. I'll characterize the remaining media-connection issue (TURN relay / answer handling) on real devices before marking this ready again. |
|
Thanks to let me know and for the further testing @alauzon 👍 |
Fix #6169: elect call offerer using same-namespace session IDs
Fixes #6169.
Problem
In High-Performance Backend (external signaling) mode, 2-party calls can deadlock:
neither client creates the WebRTC offer, no media flows.
Root cause: the offerer is elected by a lexicographic comparison of the local and remote
session IDs (
remoteSession.compareTo(localSession) < 0). This yields exactly one offereronly if both peers compare the same ordered pair of strings. But at the
PeerConnectionWrapperconstruction site, the remotesessionIdis a signaling-layersession ID while the local session passed in (
callSession) is the Nextcloud session ID.The two peers therefore compare unrelated strings and can both compute the same role:
The
requestofferfallback only fires in MCU mode, so a direct (no-MCU) 2-party call deadlocksoutright. Because the session IDs are effectively random per call, the failure is intermittent.
Fix
Pass the local signaling session to the wrapper (the same
webSocketClient.sessionIdthat isalready used for
currentSessionIda few lines above), so both peers compare IDs from the samenamespace. In internal signaling mode
webSocketClient == null, socallSessionis used asbefore (unchanged behaviour).
The election is also extracted into
PeerConnectionWrapper.isOfferer()to make it unit-testable.How it was verified
Measured directly via a debug build logging the two compared session IDs, on a test NC instance
using the same HPB (no MCU):
localSessionis a Nextcloud session ID (omX5MFK/…, base64),remoteSessionis an HPB session ID (
73gsILAp…-g_…). A↔A automated calls connected 1 / 3 (deadlock 2/3).localSessionis now an HPB session ID, same namespace asremoteSession; bothpeers compare the same pair. A↔A automated calls connected 6 / 6 (0 deadlock).
New unit tests (
PeerConnectionWrapperOffererRoleTest) pin the contract (exactly one offerer for aconsistent pair) and the cross-namespace failure mode as a regression guard. They are
mutation-verified (breaking
isOffererturns them red).Not covered / scope