Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery + multi-segment dedup#676
Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery + multi-segment dedup#676RBrid wants to merge 5 commits into
Conversation
Three related fixes for chat timeline rendering, validated by three rounds of adversarial dual-model code review. 1. Cross-turn assistant overwrite (the original repro) In a user -> reply1 -> tool-call -> tool-output -> reply2 flow, reply2 was silently overwriting reply1's text in place instead of appending a new bubble. Cause: the chat.message final for reply2 carried reconcilePrevious=true, and UpsertAssistant's fast-path merged into the most recent assistant entry unconditionally - including a previously finalised one. Fix in ChatTimelineReducer.UpsertAssistant: narrow reconcile to only merge into entries that are still IsStreaming=true. A finalised assistant entry (IsStreaming=false) belongs to a completed turn and must not be overwritten. The byte-equal duplicate safety net remains unconditional so genuine duplicate emissions still collapse. Defense-in-depth: extract ClearStreamingAtTurnBoundary helper and call from both AddLocalUser (production typed-input path) and ApplyUserMessage (gateway-injected path), hoisted above the nonce early-return. This handles the dropped-ChatTurnEndEvent scenario where a stale ActiveAssistantId could otherwise leak across turns. 2. Concatenated thinking prose (Option B) The gateway emits a stream:"item", kind:"reasoning", phase:"end" bracket marker between distinct thinking passes within a single turn. Map it to a new ChatReasoningEndEvent that nulls ActiveReasoningId so the next reasoning chunk starts a fresh bubble instead of appending to / replacing the previous one. Also tag delta-state ChatMessageEvents with IsStreaming=true so the reducer's new reconcile guard distinguishes them from finals. 3. Trace logging plumbing Add IOpenClawLogger.Trace verbose channel with a default no-op so existing implementers (tests, console logger, setup logger) don't need updating. Logger.Trace in the tray is gated by OPENCLAW_TRAY_TRACE=1|true. Forward Trace through DiagnosticTeeLogger and AppLogger; SetupOpenClawLogger inherits the default no-op (no opt-in gate available in setup engine). OpenClawGatewayClient now surfaces item-event kind+phase metadata (no payload content) at trace level for shape diagnostics. Test coverage added in ChatTimelineReducerTests: - StaleStreamingPreview_DoesNotMergeAcrossUserBoundary - UserMessage_AsTurnBoundary_PreventsCrossTurnOverwrite - AddLocalUser_AsTurnBoundary_PreventsCrossTurnOverwrite - AddLocalUser_ClearsStaleStreamingPreviewAcrossTurns Validation: - ./build.ps1: succeeds - OpenClaw.Tray.Tests: 944 passed - OpenClaw.Shared.Tests: 2045 passed / 29 skipped Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Codex review: needs real behavior proof before merge. Reviewed June 8, 2026, 4:43 PM ET / 20:43 UTC. Summary Reproducibility: yes. from source for the old behavior: current main still uses slash-command approval responses and lacks the PR's reducer safeguards. I did not run a live gateway/native chat reproduction in this read-only review. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the current response-aware RPC and reducer implementation after maintainers either see redacted live proof of the rejected approval retry path or explicitly accept the targeted test coverage as sufficient. Do we have a high-confidence way to reproduce the issue? Yes from source for the old behavior: current main still uses slash-command approval responses and lacks the PR's reducer safeguards. I did not run a live gateway/native chat reproduction in this read-only review. Is this the best way to solve the issue? Yes on the code direction: response-aware exec.approval.resolve is narrower and safer than slash-command delivery, and the reducer changes are covered by focused regression tests. The remaining blocker is real behavior proof, not a known code defect. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 37b0ea672037. Label changesLabel justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Native WinUI chat was silently dropping exec.approval.requested/resolved because the gateway client only forwarded the agent stream and the tray responder sent /approve as a slash command (which deadlocks while the agent is blocked on the approval). Result: Allow/Deny banner never appeared, or appeared and did nothing. Fixes: - Gateway client now translates top-level exec.approval.requested and exec.approval.resolved into synthetic AgentEventInfo (Stream=approval, phase derived locally from the event name). MultiSubscriberInvoke isolates per-subscriber exceptions so one throwing handler can't suppress others. - Add IChatGatewayBridge.ResolveExecApprovalAsync calling the exec.approval.resolve RPC. Decision is allowlisted at the send site (allow-once|allow-always|deny); IsConnected is checked both before send and after the await to narrow the disconnect-while-sending race (SendRawAsync silently swallows mid-flight cancellation/dispose, so this is acknowledged best-effort). - Tray RespondToPermissionAsync swaps the slash command for the RPC. On RPC failure the pending-permission banner is preserved so the user can retry instead of silently losing the prompt. Tests: - OpenClawGatewayClientApprovalTranslationTests (new, Shared): 8 tests covering phase mapping, decision validation, disconnect guard, slug/UUID alt-id wiring, and multi-subscriber isolation under throw. - OpenClawChatDataProviderTests (Tray): 3 new tests pinning RespondToPermissionAsync to the RPC path (allow->allow-once, deny->deny, throw->banner-preserved). Validation: build green; Shared 2054 passed / 29 skipped; Tray 947 passed. Reviewed via 3 rounds of dual-model adversarial review (Opus 1M + GPT-5.3-Codex) until both converged on no remaining actionable issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a turn produces 3+ assistant segments split by tool calls (A1 -> Tool -> A2 -> Tool -> A3), the gateway emits cumulative-for-turn chat.message frames containing every prior segment. The fast-path strip already handled this while ActiveAssistantId was live, but late frames arriving AFTER ApplyTurnEnd cleared ActiveAssistantId came through the scan-back reconcile branch, which wrote the raw cumulative text into the last bubble verbatim. Users saw A3 render correctly first, then mutate to repeat A1+A2's content. Apply StripCumulativeTurnPrefix(excludeIndex=li) on the scan-back reconcile merge target so the bubble retains only its segment tail. Add a symmetric empty-result guard mirroring the new-bubble path's duplicate-resend skip, so a gateway regression / stale cumulative frame that's fully consumed by other priors leaves the streaming candidate intact rather than blanking it. Tests: 2 new regression tests cover the in-place scan-back merge path (with id-equality assertion locking the test to that path) and the defensive blank-guard. Document the StripCumulativeTurnPrefix excludeIndex caller contract so future callers understand the self-peel hazard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eset2 # Conflicts: # tests/OpenClaw.Tray.Tests/OpenClawChatDataProviderTests.cs
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…er P1) ResolveExecApprovalAsync used SendTrackedRequestAsync, which returns once the WS frame is sent. ok:false rejections were logged and dropped while the chat banner silently cleared, leaving the agent blocked with no retry path. Register a TCS per requestId in _pendingApprovalResolves, route ok:true / ok:false in HandleResponse, and fault pending TCSes on disconnect so the chat banner is preserved for retry. Round-1 adversarial review fixes (Opus + Codex): - A: RemovePendingRequest in timeout / send-fail branches (no _pendingRequestMethods leak; happy path is already cleaned via TakePendingRequestMethod in HandleResponse). - B: Disconnect cleanup uses OperationCanceledException to match the _pendingWizardResponses taxonomy. - D: Timeout 5s -> 15s; approval-resolve traverses gateway -> operator -> optional UI confirm -> ack. - E: Check completion.Task.IsCompleted directly after Task.WhenAny instead of comparing returned task identity (avoids same-tick race that would discard a real ok:true / ok:false response as a spurious TimeoutException). Round-2 follow-up fixes: - H: Re-added post-send !IsConnected re-check using TryRemove-first idiom. Closes the race where ClearPendingRequests fires on an empty snapshot before the TCS is registered (Fix G left a 15s UX hang in this window). If we win TryRemove, throw immediately — no exception is set on the discarded TCS, so no UnobservedTaskException. If cleanup already claimed the entry, fall through to WhenAny so the OCE it set is observed by the caller. - I: Explicit contract comment near the cleanup-loop OCE: callers MUST catch Exception (not narrow OCE) to preserve the banner-preserve UX. RunFireAndForget swallows OCE silently — a future caller refactor that lets OCE bubble to RunFireAndForget would silently clear the banner. Deferred (tracked as PR follow-ups, not in this commit): - C: Deeper integration test coverage exercising SerializeRequest, registration ordering, real timeout, and real disconnect paths. Current tests use reflection seams. - F: Sibling-wide hardening of the permissive ok:false pattern across _pendingChatSendRequests / _pendingWizardResponses (would create inconsistency if applied only to approval-resolve). Validation: ./build.ps1; Shared.Tests 2142 passed / 29 skipped; Tray.Tests 1004 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|


Five related fixes for chat timeline rendering and exec-approval delivery, plus a follow-up fix for the rejected
exec.approval.resolveresponse path, all validated by adversarial dual-model code review (Opus 1M + GPT-5.3-Codex) until both converged on no remaining actionable issues.In a user → reply1 → tool-call → tool-output → reply2 flow, reply2 was silently overwriting reply1's text in place instead of appending a new bubble. Cause: the chat.message final for reply2 carried reconcilePrevious=true, and UpsertAssistant's fast-path merged into the most recent assistant entry unconditionally — including a previously finalised one.
Fix in ChatTimelineReducer.UpsertAssistant: narrow reconcile to only merge into entries that are still IsStreaming=true. A finalised assistant entry (IsStreaming=false) belongs to a completed turn and must not be overwritten. The byte-equal duplicate safety net remains unconditional so genuine duplicate emissions still collapse.
Defense-in-depth: extract ClearStreamingAtTurnBoundary helper and call from both AddLocalUser (production typed-input path) and ApplyUserMessage (gateway-injected path), hoisted above the nonce early-return. This handles the dropped-ChatTurnEndEvent scenario where a stale ActiveAssistantId could otherwise leak across turns.
The gateway emits a stream:"item", kind:"reasoning", phase:"end" bracket marker between distinct thinking passes within a single turn. Map it to a new ChatReasoningEndEvent that nulls ActiveReasoningId so the next reasoning chunk starts a fresh bubble instead of appending to / replacing the previous one. Also tag delta-state ChatMessageEvents with IsStreaming=true so the reducer's new reconcile guard distinguishes them from finals.
Native WinUI chat was silently dropping exec.approval.requested / exec.approval.resolved because the gateway client only forwarded the agent stream, and the tray responder sent /approve as a slash command — which deadlocks while the agent is blocked waiting for that very approval. Net effect: the Allow/Deny banner never appeared, or appeared and did nothing.
Add IOpenClawLogger.Trace verbose channel with a default no-op so existing implementers (tests, console logger, setup logger) don't need updating. Logger.Trace in the tray is gated by OPENCLAW_TRAY_TRACE=1|true. Forward Trace through DiagnosticTeeLogger and AppLogger; SetupOpenClawLogger inherits the default no-op (no opt-in gate available in setup engine). OpenClawGatewayClient now surfaces item-event kind+phase metadata (no payload content) at trace level for shape diagnostics.
When a turn produced 3+ assistant segments split by tool calls (A1 → Tool → A2 → Tool → A3), the gateway emits cumulative-for-turn chat.message frames containing every prior segment. The fast-path strip already handled this while ActiveAssistantId was live, but late frames arriving AFTER ApplyTurnEnd cleared ActiveAssistantId came through the scan-back reconcile branch, which wrote the raw cumulative text into the last bubble verbatim. Users observed A3 render correctly first, then mutate to repeat A1+A2's content.
Round 5 of the ClawSweeper review flagged that ResolveExecApprovalAsync used SendTrackedRequestAsync, which returns once the WS frame leaves the socket. An ok:false rejection from the gateway was logged and dropped while the chat banner silently cleared, leaving the agent blocked with no retry path.
Coverage matrix for the rejected-resolve / banner-preserve contract
Test coverage added (cumulative across the PR)
ChatTimelineReducerTests (Shared)
OpenClawGatewayClientApprovalTranslationTests (Shared, 12 tests)
OpenClawChatDataProviderTests (Tray, 3 tests)
Validation (against commit 77a707f)
Review
Five rounds of adversarial dual-model review (Claude Opus 4.7 1M + GPT-5.3-Codex). Rounds 1–3 covered fixes 1–4. Round 4 on the multi-segment dedup surfaced 3 applied fixes (empty-result guard on the new strip site, id-equality assertion in the regression test, excludeIndex caller-contract xml-doc). Round 5 covered fix 6's ClawSweeper P1 follow-up and surfaced 5 applied fixes (Fix A leak in timeout branch, Fix B OCE taxonomy, Fix D 15s timeout, Fix E IsCompleted check, Fix G→H post-send re-check rework using TryRemove-first idiom) plus 1 documentation fix (Fix I contract comment on the cleanup-loop OCE). Each round converged on both models reporting "ready to commit."
Proof status (per ClawSweeper)
Code review labels: 🐚 platinum hermit (patch quality), 🦪 silver shellfish (overall, capped by proof). Test-only coverage for the rejected-resolve banner-preserve path is pinned by the coverage matrix above; live native-chat proof of a denied / rejected exec.approval.resolve keeping the banner visible is pending Mantis or manual QA. Maintainers may either request live proof, ask Mantis to capture it, or accept test-only coverage and re-review.
Deferred follow-ups (intentional, tracked outside this PR)