Skip to content

Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery + multi-segment dedup#676

Open
RBrid wants to merge 5 commits into
openclaw:mainfrom
RBrid:user/rbrid/ReasoningReset2
Open

Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery + multi-segment dedup#676
RBrid wants to merge 5 commits into
openclaw:mainfrom
RBrid:user/rbrid/ReasoningReset2

Conversation

@RBrid
Copy link
Copy Markdown
Contributor

@RBrid RBrid commented Jun 3, 2026

Five related fixes for chat timeline rendering and exec-approval delivery, plus a follow-up fix for the rejected exec.approval.resolve response path, all validated by adversarial dual-model code review (Opus 1M + GPT-5.3-Codex) until both converged on no remaining actionable issues.

  1. Cross-turn assistant overwrite (the original repro)

In a user → reply1 → tool-call → tool-output → reply2 flow, reply2 was silently overwriting reply1's text in place instead of appending a new bubble. Cause: the chat.message final for reply2 carried reconcilePrevious=true, and UpsertAssistant's fast-path merged into the most recent assistant entry unconditionally — including a previously finalised one.

Fix in ChatTimelineReducer.UpsertAssistant: narrow reconcile to only merge into entries that are still IsStreaming=true. A finalised assistant entry (IsStreaming=false) belongs to a completed turn and must not be overwritten. The byte-equal duplicate safety net remains unconditional so genuine duplicate emissions still collapse.

Defense-in-depth: extract ClearStreamingAtTurnBoundary helper and call from both AddLocalUser (production typed-input path) and ApplyUserMessage (gateway-injected path), hoisted above the nonce early-return. This handles the dropped-ChatTurnEndEvent scenario where a stale ActiveAssistantId could otherwise leak across turns.

  1. Concatenated thinking prose

The gateway emits a stream:"item", kind:"reasoning", phase:"end" bracket marker between distinct thinking passes within a single turn. Map it to a new ChatReasoningEndEvent that nulls ActiveReasoningId so the next reasoning chunk starts a fresh bubble instead of appending to / replacing the previous one. Also tag delta-state ChatMessageEvents with IsStreaming=true so the reducer's new reconcile guard distinguishes them from finals.

  1. Exec-approval delivery in native chat

Native WinUI chat was silently dropping exec.approval.requested / exec.approval.resolved because the gateway client only forwarded the agent stream, and the tray responder sent /approve as a slash command — which deadlocks while the agent is blocked waiting for that very approval. Net effect: the Allow/Deny banner never appeared, or appeared and did nothing.

  • Gateway client: HandleEvent now translates top-level exec.approval.requested / exec.approval.resolved events into synthetic AgentEventInfo (Stream="approval", phase derived locally from the event name — payload.phase is intentionally ignored). Multi-subscriber invoke isolates per-subscriber exceptions so one throwing handler can't suppress others.
  • New RPC path: IChatGatewayBridge.ResolveExecApprovalAsync calls the exec.approval.resolve RPC. Decision is allowlisted at the send site (allow-once | allow-always | deny).
  • Tray responder: RespondToPermissionAsync swaps the slash command for the RPC. On RPC failure the pending-permission banner is preserved so the user can retry instead of silently losing the prompt.
  1. Trace logging plumbing

Add IOpenClawLogger.Trace verbose channel with a default no-op so existing implementers (tests, console logger, setup logger) don't need updating. Logger.Trace in the tray is gated by OPENCLAW_TRAY_TRACE=1|true. Forward Trace through DiagnosticTeeLogger and AppLogger; SetupOpenClawLogger inherits the default no-op (no opt-in gate available in setup engine). OpenClawGatewayClient now surfaces item-event kind+phase metadata (no payload content) at trace level for shape diagnostics.

  1. Multi-segment chat-bubble duplication (last bubble repeats prior segments)

When a turn produced 3+ assistant segments split by tool calls (A1 → Tool → A2 → Tool → A3), the gateway emits cumulative-for-turn chat.message frames containing every prior segment. The fast-path strip already handled this while ActiveAssistantId was live, but late frames arriving AFTER ApplyTurnEnd cleared ActiveAssistantId came through the scan-back reconcile branch, which wrote the raw cumulative text into the last bubble verbatim. Users observed A3 render correctly first, then mutate to repeat A1+A2's content.

  • ChatTimelineReducer scan-back reconcile merge now calls StripCumulativeTurnPrefix (with excludeIndex=li) so the merge target retains only its own segment tail.
  • Symmetric empty-result guard mirroring the new-bubble path's duplicate-resend skip: if a stale / duplicate cumulative frame is fully consumed by other priors, leave the streaming candidate intact instead of blanking it.
  • StripCumulativeTurnPrefix excludeIndex caller contract documented in xml-doc (self-peel hazard).
  1. exec.approval.resolve response-await + rejected-resolve banner-preserve (ClawSweeper P1 follow-up, commit 77a707f)

Round 5 of the ClawSweeper review flagged that ResolveExecApprovalAsync used SendTrackedRequestAsync, which returns once the WS frame leaves the socket. An ok:false rejection from the gateway was logged and dropped while the chat banner silently cleared, leaving the agent blocked with no retry path.

  • Register a per-requestId TaskCompletionSource in _pendingApprovalResolves and route ok:true / ok:false in HandleResponse so the caller awaits the actual gateway verdict.
  • Disconnect cleanup faults pending TCSes with OperationCanceledException (matches the _pendingWizardResponses cleanup taxonomy). Explicit contract comment notes callers must catch System.Exception (not narrow OCE) to preserve banner-preserve UX, because RunFireAndForget swallows OCE silently.
  • 15s bounded timeout (approval-resolve traverses gateway → operator → optional UI confirm → ack; longer budget than the 5s chat.send sibling). Timeout / send-fail branches call RemovePendingRequest so _pendingRequestMethods does not leak.
  • Task.WhenAny followed by an IsCompleted check rather than a returned-task identity compare, to avoid a same-tick race that would surface a real ok:true / ok:false response as a spurious TimeoutException.
  • Post-send !IsConnected re-check using a TryRemove-first idiom: if we win the race and pull the TCS out of the dict, throw InvalidOperationException immediately (no exception is ever set on the discarded TCS, so no UnobservedTaskException risk); if the cleanup loop already claimed the entry and faulted it with OCE, fall through to the WhenAny path so the cleanup-set exception is observed by the caller. Closes a narrow window where a disconnect between the up-front IsConnected guard and TCS registration would otherwise have stranded the caller for the full 15s timeout.

Coverage matrix for the rejected-resolve / banner-preserve contract

Rejected-resolve path Pinned by test Test file
Gateway returns ok:false with error.message ResolveExecApproval_OkFalseResponse_SurfacesAsException tests/OpenClaw.Shared.Tests/OpenClawGatewayClientApprovalTranslationTests.cs
Gateway returns ok:false with no error.message ResolveExecApproval_OkFalseWithoutErrorMessage_UsesFallback tests/OpenClaw.Shared.Tests/OpenClawGatewayClientApprovalTranslationTests.cs
Gateway returns ok:true (happy path completes await) ResolveExecApproval_OkTrueResponse_CompletesCaller tests/OpenClaw.Shared.Tests/OpenClawGatewayClientApprovalTranslationTests.cs
Disconnected at call time ResolveExecApprovalAsync_ThrowsWhenDisconnected tests/OpenClaw.Shared.Tests/OpenClawGatewayClientApprovalTranslationTests.cs
Unknown decision value ResolveExecApprovalAsync_RejectsUnknownDecision tests/OpenClaw.Shared.Tests/OpenClawGatewayClientApprovalTranslationTests.cs
Tray UX: RPC throws → banner stays visible RespondToPermissionAsync_RpcThrows_BannerPreservedForRetry tests/OpenClaw.Tray.Tests/OpenClawChatDataProviderTests.cs
Tray UX: allow routes RPC + clears banner RespondToPermissionAsync_AllowRoutesAllowOnceThroughRpcAndClearsBanner tests/OpenClaw.Tray.Tests/OpenClawChatDataProviderTests.cs
Tray UX: deny routes RPC + clears banner RespondToPermissionAsync_DenyRoutesDenyThroughRpcAndClearsBanner tests/OpenClaw.Tray.Tests/OpenClawChatDataProviderTests.cs

Test coverage added (cumulative across the PR)

ChatTimelineReducerTests (Shared)

  • StaleStreamingPreview_DoesNotMergeAcrossUserBoundary
  • UserMessage_AsTurnBoundary_PreventsCrossTurnOverwrite
  • AddLocalUser_AsTurnBoundary_PreventsCrossTurnOverwrite
  • AddLocalUser_ClearsStaleStreamingPreviewAcrossTurns
  • IntraTurnLateReconcile_AfterTurnEnd_DoesNotPrependPriorSegments (id-equality assertion locks the test to the in-place merge path)
  • IntraTurnLateReconcile_RegressionCumulative_DoesNotBlankCandidate (defensive blank-guard)

OpenClawGatewayClientApprovalTranslationTests (Shared, 12 tests)

  • Phase mapping for exec.approval.requested / resolved
  • Decision allowlist (IsValidApprovalDecision) — rejects unknown decisions
  • IsConnected precheck — throws when disconnected before send
  • Slug / UUID alt-id wiring for terminal-event matching
  • Multi-subscriber isolation: first subscriber throws, second still invoked
  • Response routing: ok:true completes, ok:false surfaces InvalidOperationException, ok:false without error.message uses fallback

OpenClawChatDataProviderTests (Tray, 3 tests)

  • RespondToPermissionAsync_AllowRoutesAllowOnceThroughRpcAndClearsBanner
  • RespondToPermissionAsync_DenyRoutesDenyThroughRpcAndClearsBanner
  • RespondToPermissionAsync_RpcThrows_BannerPreservedForRetry

Validation (against commit 77a707f)

  • ./build.ps1: succeeds
  • OpenClaw.Shared.Tests: 2142 passed / 29 skipped
  • OpenClaw.Tray.Tests: 1004 passed

Review

Five rounds of adversarial dual-model review (Claude Opus 4.7 1M + GPT-5.3-Codex). Rounds 1–3 covered fixes 1–4. Round 4 on the multi-segment dedup surfaced 3 applied fixes (empty-result guard on the new strip site, id-equality assertion in the regression test, excludeIndex caller-contract xml-doc). Round 5 covered fix 6's ClawSweeper P1 follow-up and surfaced 5 applied fixes (Fix A leak in timeout branch, Fix B OCE taxonomy, Fix D 15s timeout, Fix E IsCompleted check, Fix G→H post-send re-check rework using TryRemove-first idiom) plus 1 documentation fix (Fix I contract comment on the cleanup-loop OCE). Each round converged on both models reporting "ready to commit."

Proof status (per ClawSweeper)

Code review labels: 🐚 platinum hermit (patch quality), 🦪 silver shellfish (overall, capped by proof). Test-only coverage for the rejected-resolve banner-preserve path is pinned by the coverage matrix above; live native-chat proof of a denied / rejected exec.approval.resolve keeping the banner visible is pending Mantis or manual QA. Maintainers may either request live proof, ask Mantis to capture it, or accept test-only coverage and re-review.

Deferred follow-ups (intentional, tracked outside this PR)

  • True bubble-splitting in native chat (rendering N visually separate bubbles per multi-segment turn) — this PR only ensures the merged final bubble contains the correct (last) segment's text without duplicating prior segments.
  • Plugin approvals (plugin.approval.*) — same shape, not exercised by current scenarios.
  • Full strict-send infrastructure replacing the silent-swallow behavior of SendRawAsync — acknowledged in code comments as best-effort.
  • Deeper integration tests for ResolveExecApprovalAsync exercising SerializeRequest, registration ordering, real timeout, and real disconnect paths (current tests use reflection seams against HandleResponse / _pendingApprovalResolves).
  • Sibling-wide hardening of the permissive ok:false detection pattern across _pendingChatSendRequests / _pendingWizardResponses — applying it only to exec.approval.resolve would create inconsistency; needs a focused PR.

Three related fixes for chat timeline rendering, validated by three rounds
of adversarial dual-model code review.

1. Cross-turn assistant overwrite (the original repro)
   In a user -> reply1 -> tool-call -> tool-output -> reply2 flow, reply2
   was silently overwriting reply1's text in place instead of appending a
   new bubble. Cause: the chat.message final for reply2 carried
   reconcilePrevious=true, and UpsertAssistant's fast-path merged into the
   most recent assistant entry unconditionally - including a previously
   finalised one.

   Fix in ChatTimelineReducer.UpsertAssistant: narrow reconcile to only
   merge into entries that are still IsStreaming=true. A finalised
   assistant entry (IsStreaming=false) belongs to a completed turn and
   must not be overwritten. The byte-equal duplicate safety net remains
   unconditional so genuine duplicate emissions still collapse.

   Defense-in-depth: extract ClearStreamingAtTurnBoundary helper and
   call from both AddLocalUser (production typed-input path) and
   ApplyUserMessage (gateway-injected path), hoisted above the nonce
   early-return. This handles the dropped-ChatTurnEndEvent scenario
   where a stale ActiveAssistantId could otherwise leak across turns.

2. Concatenated thinking prose (Option B)
   The gateway emits a stream:"item", kind:"reasoning", phase:"end"
   bracket marker between distinct thinking passes within a single
   turn. Map it to a new ChatReasoningEndEvent that nulls
   ActiveReasoningId so the next reasoning chunk starts a fresh bubble
   instead of appending to / replacing the previous one. Also tag
   delta-state ChatMessageEvents with IsStreaming=true so the reducer's
   new reconcile guard distinguishes them from finals.

3. Trace logging plumbing
   Add IOpenClawLogger.Trace verbose channel with a default no-op so
   existing implementers (tests, console logger, setup logger) don't
   need updating. Logger.Trace in the tray is gated by
   OPENCLAW_TRAY_TRACE=1|true. Forward Trace through DiagnosticTeeLogger
   and AppLogger; SetupOpenClawLogger inherits the default no-op (no
   opt-in gate available in setup engine). OpenClawGatewayClient now
   surfaces item-event kind+phase metadata (no payload content) at
   trace level for shape diagnostics.

Test coverage added in ChatTimelineReducerTests:
- StaleStreamingPreview_DoesNotMergeAcrossUserBoundary
- UserMessage_AsTurnBoundary_PreventsCrossTurnOverwrite
- AddLocalUser_AsTurnBoundary_PreventsCrossTurnOverwrite
- AddLocalUser_ClearsStaleStreamingPreviewAcrossTurns

Validation:
- ./build.ps1: succeeds
- OpenClaw.Tray.Tests: 944 passed
- OpenClaw.Shared.Tests: 2045 passed / 29 skipped

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented Jun 3, 2026

Codex review: needs real behavior proof before merge. Reviewed June 8, 2026, 4:43 PM ET / 20:43 UTC.

Summary
The PR updates native chat timeline reconciliation, reasoning boundaries, top-level exec approval event translation and RPC response handling, trace logging, and shared/tray regression tests.

Reproducibility: yes. from source for the old behavior: current main still uses slash-command approval responses and lacks the PR's reducer safeguards. I did not run a live gateway/native chat reproduction in this read-only review.

Review metrics: 2 noteworthy metrics.

  • Changed surface: 14 files, +2104/-54. The PR spans shared gateway protocol handling, native chat reducer/provider behavior, logging, and tests, so live proof matters beyond unit coverage.
  • Approval regression tests: 12 shared approval tests, 3 tray approval tests. The latest head adds targeted coverage for translation, success, thrown-send failures, and ok:false response routing.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦞 diamond lobster
Result: blocked until stronger real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Post redacted live proof that a denied or rejected exec.approval.resolve keeps the native retry banner visible; update the PR body afterward so ClawSweeper re-reviews automatically, or ask a maintainer to comment @clawsweeper re-review.

Proof guidance:

  • [P1] Needs stronger real behavior proof before merge: Screenshots provide partial UI proof for multi-bubble/multi-tool rendering and allowed approval entries, but not the denied or rejected exec.approval.resolve retry path. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Mantis proof suggestion
A native UI proof would materially verify approval success and rejected-resolve retry behavior that unit tests cannot show end to end. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify native chat exec approval Allow/Deny success and a rejected resolve keeps the retry banner visible.

Risk before merge

  • [P1] The posted screenshots are useful partial proof, but they do not show a denied or rejected exec.approval.resolve leaving the native retry banner visible.
  • [P1] The diff changes shared gateway event translation and native timeline reconciliation, so a missed live edge case could still suppress, duplicate, overwrite, or prematurely clear visible chat/approval entries.
  • [P1] I did not run build or tests under the read-only review contract; the PR body reports validation against 77a707f, but that remains contributor-reported here.

Maintainer options:

  1. Require rejected-resolve proof before merge (recommended)
    Ask for redacted screenshot, recording, terminal/live output, logs, or a linked artifact showing a rejected or failed exec.approval.resolve leaves the native retry banner visible.
  2. Accept test-only coverage for the rejection path
    Maintainers may intentionally rely on the new ok:false regression tests and merge with only the existing successful native-chat screenshots as live proof.
  3. Pause for native approval QA
    If the gateway rejection contract is hard to trigger locally, pause the PR until Mantis or a maintainer can capture the native approval retry flow.

Next step before merge

  • [P1] The remaining action is human proof or maintainer acceptance of test-only coverage, not a mechanical code repair.

Security
Cleared: No concrete security or supply-chain issue was found; the diff adds no dependencies or workflow execution, and the approval RPC path allowlists decisions and waits for gateway success or failure.

Review details

Best possible solution:

Land the current response-aware RPC and reducer implementation after maintainers either see redacted live proof of the rejected approval retry path or explicitly accept the targeted test coverage as sufficient.

Do we have a high-confidence way to reproduce the issue?

Yes from source for the old behavior: current main still uses slash-command approval responses and lacks the PR's reducer safeguards. I did not run a live gateway/native chat reproduction in this read-only review.

Is this the best way to solve the issue?

Yes on the code direction: response-aware exec.approval.resolve is narrower and safer than slash-command delivery, and the reducer changes are covered by focused regression tests. The remaining blocker is real behavior proof, not a known code defect.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 37b0ea672037.

Label changes

Label justifications:

  • P1: The PR targets native chat approval delivery and timeline corruption in active agent workflows, including cases that can leave users without a visible approval retry path.
  • merge-risk: 🚨 message-delivery: The diff changes chat timeline reconciliation and approval event delivery, where regressions could drop, duplicate, overwrite, or prematurely clear visible messages or approval prompts.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦞 diamond lobster.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs stronger real behavior proof before merge: Screenshots provide partial UI proof for multi-bubble/multi-tool rendering and allowed approval entries, but not the denied or rejected exec.approval.resolve retry path. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
  • proof: 📸 screenshot: Contributor real behavior proof includes screenshot evidence. Screenshots provide partial UI proof for multi-bubble/multi-tool rendering and allowed approval entries, but not the denied or rejected exec.approval.resolve retry path.
Evidence reviewed

What I checked:

Likely related people:

  • RBrid: History shows the native FunctionalUI chat experience and in-bubble approval banner were introduced in commits 08cab69 and 7d9152f, covering the same provider, bridge, timeline, and approval behavior. (role: native chat feature owner and recent area contributor; confidence: high; commits: 08cab69a038c, 7d9152f427a3; files: src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs, src/OpenClaw.Tray.WinUI/Chat/IChatGatewayBridge.cs, src/OpenClaw.Chat/ChatTimelineReducer.cs)
  • Christine Yan: Current-main blame for UpsertAssistant, RespondToPermissionAsync, and HandleResponse points to commit 85445c7. (role: introduced current-main baseline behavior; confidence: high; commits: 85445c78066b; files: src/OpenClaw.Chat/ChatTimelineReducer.cs, src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs, src/OpenClaw.Shared/OpenClawGatewayClient.cs)
  • kenehong: Earlier reducer history added duplicate assistant bubble deduplication and continuation-bubble fixes adjacent to the cumulative-message logic touched here. (role: adjacent reducer contributor; confidence: medium; commits: 4231feab9c0e, 96d33d2da597; files: src/OpenClaw.Chat/ChatTimelineReducer.cs, tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs)
  • Scott Hanselman: Recent current-main commits touched OpenClawGatewayClient and OpenClawChatDataProvider in adjacent hardening and model-filtering work. (role: recent area contributor; confidence: medium; commits: d23f8ca50013, d1b136347e95; files: src/OpenClaw.Shared/OpenClawGatewayClient.cs, src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs)
  • Ranjesh: Recent history touched native chat provider state and timeline-adjacent behavior, including attachment metadata and tool-call tracking. (role: recent adjacent contributor; confidence: medium; commits: 9a3a7a6131a8, ca09b2132a0f; files: src/OpenClaw.Tray.WinUI/Chat/OpenClawChatDataProvider.cs, tests/OpenClaw.Tray.Tests/ChatTimelineReducerTests.cs)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 Urgent regression or broken agent/channel workflow affecting real users now. labels Jun 3, 2026
@RBrid RBrid marked this pull request as ready for review June 3, 2026 20:21
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 message-delivery 🚨 Merging this PR could drop, duplicate, misroute, suppress, or wrongly target messages. label Jun 3, 2026
@RBrid RBrid marked this pull request as draft June 4, 2026 23:29
Native WinUI chat was silently dropping exec.approval.requested/resolved
because the gateway client only forwarded the agent stream and the tray
responder sent /approve as a slash command (which deadlocks while the
agent is blocked on the approval). Result: Allow/Deny banner never
appeared, or appeared and did nothing.

Fixes:
- Gateway client now translates top-level exec.approval.requested and
  exec.approval.resolved into synthetic AgentEventInfo (Stream=approval,
  phase derived locally from the event name). MultiSubscriberInvoke
  isolates per-subscriber exceptions so one throwing handler can't
  suppress others.
- Add IChatGatewayBridge.ResolveExecApprovalAsync calling the
  exec.approval.resolve RPC. Decision is allowlisted at the send site
  (allow-once|allow-always|deny); IsConnected is checked both before
  send and after the await to narrow the disconnect-while-sending race
  (SendRawAsync silently swallows mid-flight cancellation/dispose, so
  this is acknowledged best-effort).
- Tray RespondToPermissionAsync swaps the slash command for the RPC.
  On RPC failure the pending-permission banner is preserved so the user
  can retry instead of silently losing the prompt.

Tests:
- OpenClawGatewayClientApprovalTranslationTests (new, Shared): 8 tests
  covering phase mapping, decision validation, disconnect guard,
  slug/UUID alt-id wiring, and multi-subscriber isolation under throw.
- OpenClawChatDataProviderTests (Tray): 3 new tests pinning
  RespondToPermissionAsync to the RPC path (allow->allow-once,
  deny->deny, throw->banner-preserved).

Validation: build green; Shared 2054 passed / 29 skipped;
Tray 947 passed. Reviewed via 3 rounds of dual-model adversarial
review (Opus 1M + GPT-5.3-Codex) until both converged on no remaining
actionable issues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@RBrid RBrid changed the title Fix cross-turn assistant overwrite + separate reasoning bubbles Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery Jun 5, 2026
When a turn produces 3+ assistant segments split by tool calls (A1 -> Tool -> A2 -> Tool -> A3), the gateway emits cumulative-for-turn chat.message frames containing every prior segment. The fast-path strip already handled this while ActiveAssistantId was live, but late frames arriving AFTER ApplyTurnEnd cleared ActiveAssistantId came through the scan-back reconcile branch, which wrote the raw cumulative text into the last bubble verbatim. Users saw A3 render correctly first, then mutate to repeat A1+A2's content.

Apply StripCumulativeTurnPrefix(excludeIndex=li) on the scan-back reconcile merge target so the bubble retains only its segment tail. Add a symmetric empty-result guard mirroring the new-bubble path's duplicate-resend skip, so a gateway regression / stale cumulative frame that's fully consumed by other priors leaves the streaming candidate intact rather than blanking it.

Tests: 2 new regression tests cover the in-place scan-back merge path (with id-equality assertion locking the test to that path) and the defensive blank-guard. Document the StripCumulativeTurnPrefix excludeIndex caller contract so future callers understand the self-peel hazard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@RBrid RBrid changed the title Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery Native chat: cross-turn overwrite + reasoning bubbles + exec.approval delivery + multi-segment dedup Jun 6, 2026
@RBrid
Copy link
Copy Markdown
Contributor Author

RBrid commented Jun 6, 2026

Screenshots of multi-bubbles / multi-tool-calls:

image image

@RBrid RBrid marked this pull request as ready for review June 6, 2026 02:03
@clawsweeper clawsweeper Bot added the proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. label Jun 6, 2026
…eset2

# Conflicts:
#	tests/OpenClaw.Tray.Tests/OpenClawChatDataProviderTests.cs
@RBrid
Copy link
Copy Markdown
Contributor Author

RBrid commented Jun 8, 2026

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented Jun 8, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

…er P1)

ResolveExecApprovalAsync used SendTrackedRequestAsync, which returns once the WS frame is sent. ok:false rejections were logged and dropped while the chat banner silently cleared, leaving the agent blocked with no retry path.

Register a TCS per requestId in _pendingApprovalResolves, route ok:true / ok:false in HandleResponse, and fault pending TCSes on disconnect so the chat banner is preserved for retry.

Round-1 adversarial review fixes (Opus + Codex):

- A: RemovePendingRequest in timeout / send-fail branches (no _pendingRequestMethods leak; happy path is already cleaned via TakePendingRequestMethod in HandleResponse).

- B: Disconnect cleanup uses OperationCanceledException to match the _pendingWizardResponses taxonomy.

- D: Timeout 5s -> 15s; approval-resolve traverses gateway -> operator -> optional UI confirm -> ack.

- E: Check completion.Task.IsCompleted directly after Task.WhenAny instead of comparing returned task identity (avoids same-tick race that would discard a real ok:true / ok:false response as a spurious TimeoutException).

Round-2 follow-up fixes:

- H: Re-added post-send !IsConnected re-check using TryRemove-first idiom. Closes the race where ClearPendingRequests fires on an empty snapshot before the TCS is registered (Fix G left a 15s UX hang in this window). If we win TryRemove, throw immediately — no exception is set on the discarded TCS, so no UnobservedTaskException. If cleanup already claimed the entry, fall through to WhenAny so the OCE it set is observed by the caller.

- I: Explicit contract comment near the cleanup-loop OCE: callers MUST catch Exception (not narrow OCE) to preserve the banner-preserve UX. RunFireAndForget swallows OCE silently — a future caller refactor that lets OCE bubble to RunFireAndForget would silently clear the banner.

Deferred (tracked as PR follow-ups, not in this commit):

- C: Deeper integration test coverage exercising SerializeRequest, registration ordering, real timeout, and real disconnect paths. Current tests use reflection seams.

- F: Sibling-wide hardening of the permissive ok:false pattern across _pendingChatSendRequests / _pendingWizardResponses (would create inconsistency if applied only to approval-resolve).

Validation: ./build.ps1; Shared.Tests 2142 passed / 29 skipped; Tray.Tests 1004 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 8, 2026
@RBrid
Copy link
Copy Markdown
Contributor Author

RBrid commented Jun 8, 2026

@clawsweeper re-review

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented Jun 8, 2026

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 message-delivery 🚨 Merging this PR could drop, duplicate, misroute, suppress, or wrongly target messages. P1 Urgent regression or broken agent/channel workflow affecting real users now. proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant