fix: prevent infinite retry loop from orphaned tool_use blocks#15152
fix: prevent infinite retry loop from orphaned tool_use blocks#15152ryankass-cb wants to merge 1 commit intoanomalyco:devfrom
Conversation
Three bugs combined to cause unrecoverable sessions when tool execution was interrupted: 1. Retry loop skips orphan cleanup: When processor.ts retries after an API error, the `continue` statement at the retry path jumps back to the top of the while loop, bypassing the orphan cleanup code that marks pending/running tools as errors. This leaves orphaned tool_use blocks in the message history. 2. Stale messages on retry: The streamInput.messages array is built once before the retry loop starts and never refreshed. Even if orphan cleanup ran, the retry would send the same stale messages with orphaned tool_use blocks. 3. invalid_request_error incorrectly retried: The catch-all `return JSON.stringify(json)` in retry.ts makes ALL JSON error bodies retryable, including `invalid_request_error` which is a structural issue that will fail identically on every attempt, creating an infinite loop. Fixes: - Move orphan cleanup before the retry `continue` in processor.ts - Add `rebuildMessages` callback to StreamInput so processor can refresh messages from DB after orphan cleanup on retry - Mark `invalid_request_error` as non-retryable in retry.ts - Remove catch-all retry classification for unrecognized error types - Add defensive `repairOrphanedToolUse()` in transform.ts as a last-resort validation that injects synthetic error tool_results for any orphaned tool_use blocks before sending to the API
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found a potentially related PR: PR #8497: "fix: handle dangling tool_use blocks for LiteLLM proxy compatibility"
PR #14456: "fix(core): repair interleaved text/tool-call parts in assistant messages"
However, PR #15152 (the current PR) appears to be the primary/newest fix addressing the specific infinite retry loop issue with orphaned tool_use blocks, while the older PRs address related but distinct aspects of tool_use handling. |
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
Summary
Fixes an unrecoverable session crash caused by orphaned
tool_useblocks triggering an infinite retry loop against the Anthropic API.Error:
tool_use ids were found without tool_result blocks immediately after: toolu_vrtx_...Root Cause
Three bugs combine to create the infinite loop:
Retry loop skips orphan cleanup (
processor.ts): Thecontinuestatement in the retry path jumps back towhile(true), bypassing the orphan cleanup code (lines 393-409) that marks pending/running tools as errors. Orphanedtool_useblocks persist in message history.Stale messages on retry (
processor.ts):streamInput.messagesis built once before the retry loop and never refreshed from the database. Even if orphan cleanup ran, retries send the same broken messages.invalid_request_errorincorrectly classified as retryable (retry.ts): The catch-allreturn JSON.stringify(json)makes ALL JSON error bodies retryable, includinginvalid_request_error— a structural issue that fails identically on every attempt.Fixes
processor.tscontinue; rebuild messages from DB on retry via newrebuildMessagescallbackllm.tsrebuildMessagescallback toStreamInputinterfaceprompt.tsrebuildMessagesto re-read messages from DB and convert viatoModelMessages()retry.tsinvalid_request_erroras non-retryable; replace catch-allreturn JSON.stringify(json)withreturn undefinedtransform.tsrepairOrphanedToolUse()validation as last-resort before API calls — injects synthetic errortool_resultfor any orphanedtool_useblocksDefense in Depth
The fix operates at three layers:
invalid_request_errorstops the retry loop immediatelyrepairOrphanedToolUse()in the transform layer catches any orphans that slip through other defensesReproduction
invalid_request_errorrepeatedly