fix(kosong): treat think-only model responses as incomplete response errors#1801
fix(kosong): treat think-only model responses as incomplete response errors#1801
Conversation
…loop When extended thinking exhausts the max_tokens budget, the model may return a response containing only ThinkPart content with no text or tool calls. Previously this caused the agent loop to silently stop with a normal TurnEnd, leaving users with no output and no error. Now _step() detects think-only responses (checking for non-empty ThinkPart with no substantive TextPart) and: 1. Logs a warning with the thinking character count 2. Injects a user continuation message to maintain proper user/assistant message alternation (preventing API rejection) 3. Returns None to continue the agent loop, letting the model see its own thinking and complete its response The continuation message write is protected with asyncio.shield() for consistency with _grow_context. The has_text check uses p.text.strip() to handle empty/whitespace-only TextPart edge cases.
Signed-off-by: Kai <me@kaiyi.cool>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR prevents the agent loop from silently stopping when the model returns a “think-only” response (thinking content with no visible text or tool calls) by detecting that condition in _step() and auto-continuing with an injected user continuation message.
Changes:
- Add think-only response detection in
KimiSoul._step()and inject a user continuation message to preserve message alternation. - Add a dedicated test suite covering think-only, empty/whitespace text, consecutive think-only responses, and loop continuation.
- Document the behavior change in English and Chinese changelogs (plus root
CHANGELOG.md).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/kimi_cli/soul/kimisoul.py |
Detect think-only assistant responses and auto-continue the loop via an injected user continuation message. |
tests/core/test_kimisoul_think_only.py |
New tests validating think-only detection, alternation safety, and agent-loop continuation behavior. |
docs/en/release-notes/changelog.md |
Release note entry describing the fix. |
docs/zh/release-notes/changelog.md |
Chinese release note entry describing the fix. |
CHANGELOG.md |
Root changelog entry describing the fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # The last message must be the injected user continuation message | ||
| assert history[-1].role == "user" | ||
| assert len(history[-1].content) == 1 | ||
| assert isinstance(history[-1].content[0], TextPart) | ||
| assert "<system>" in history[-1].content[0].text | ||
| assert "continue" in history[-1].content[0].text.lower() |
There was a problem hiding this comment.
These assertions are tightly coupled to the exact wording/format of the injected continuation message (including the <system> rendering detail). This will make the tests fragile to harmless copy changes or any refactor of system() formatting. Consider centralizing the continuation text into a constant/helper (used by both production code and tests), and asserting equivalence against that constant (or asserting a more structural property, e.g., that the message is a system(...)-generated continuation marker).
| # The last message must be the injected user continuation message | |
| assert history[-1].role == "user" | |
| assert len(history[-1].content) == 1 | |
| assert isinstance(history[-1].content[0], TextPart) | |
| assert "<system>" in history[-1].content[0].text | |
| assert "continue" in history[-1].content[0].text.lower() | |
| # The last message must be the injected user continuation message. | |
| # Assert its structure rather than exact wording/formatting so the test | |
| # remains stable across harmless copy or rendering changes. | |
| assert history[-1].role == "user" | |
| assert len(history[-1].content) == 1 | |
| assert isinstance(history[-1].content[0], TextPart) | |
| assert history[-1].content[0].text.strip() |
src/kimi_cli/soul/kimisoul.py
Outdated
| # Detect think-only responses: the model produced thinking content but no | ||
| # text and no tool calls. This typically happens when extended thinking | ||
| # exhausts the max_tokens budget before the model can emit visible output. | ||
| # Treat this as an incomplete response and let the agent loop continue so | ||
| # the model can see its own thinking and finish generating. | ||
| has_think = any(isinstance(p, ThinkPart) for p in result.message.content) | ||
| has_text = any(isinstance(p, TextPart) and p.text.strip() for p in result.message.content) | ||
| if has_think and not has_text: | ||
| think_chars = sum( | ||
| len(p.think) for p in result.message.content if isinstance(p, ThinkPart) | ||
| ) | ||
| logger.warning( | ||
| "Model response contains only thinking content ({n_chars} chars) " | ||
| "without text or tool calls — likely truncated by max_tokens. " | ||
| "Continuing agent loop to let the model complete its response.", | ||
| n_chars=think_chars, | ||
| ) | ||
| # Inject a user continuation message to maintain proper | ||
| # user→assistant message alternation in the context. Without | ||
| # this, the think-only assistant message followed by the next | ||
| # assistant response would create consecutive assistant messages, | ||
| # which can be rejected by some APIs (and becomes problematic | ||
| # after compaction strips ThinkParts). | ||
| await asyncio.shield( | ||
| self._context.append_message( | ||
| Message( | ||
| role="user", | ||
| content=[ | ||
| system( | ||
| "Your previous response contained only thinking content " | ||
| "and was likely truncated. Please continue and provide " | ||
| "your response." | ||
| ) | ||
| ], | ||
| ) | ||
| ) | ||
| ) | ||
| return None |
There was a problem hiding this comment.
If the model repeatedly returns think-only responses (e.g., due to an overly small output budget), this will spin until max_steps_per_turn is hit, potentially consuming significant tokens/time and then failing with a generic MaxStepsReached. Consider adding a dedicated safeguard for consecutive think-only continuations (e.g., count consecutive think-only events within a turn, and after a small threshold stop with a clearer, actionable error/stop_reason such as instructing the user to increase max tokens / reduce thinking effort).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4cfb0c0188
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/kimi_cli/soul/kimisoul.py
Outdated
| role="user", | ||
| content=[ | ||
| system( | ||
| "Your previous response contained only thinking content " |
There was a problem hiding this comment.
Mark continuation prompt as internal bookkeeping
This injected continuation is persisted as a normal user message wrapped in <system>...</system>, but downstream readers only treat checkpoints (<system>CHECKPOINT...), system-reminders, and notifications as internal (see src/kimi_cli/ui/shell/replay.py _is_user_message at lines 159-167, src/kimi_cli/utils/export.py _is_internal_user_message at lines 47-53, and src/kimi_cli/session_fork.py turn detection at lines 153-200). As a result, every think-only recovery can appear as a fake user turn in replay/export/fork flows, which is user-visible and distorts turn indexing.
Useful? React with 👍 / 👎.
src/kimi_cli/soul/kimisoul.py
Outdated
| ) | ||
| ) | ||
| ) | ||
| return None |
There was a problem hiding this comment.
Cap think-only auto-continue retries per turn
This branch always continues the loop for think-only outputs without any dedicated retry limit, so if a provider keeps returning think-only responses (for example, repeated reasoning truncation), the turn will keep issuing LLM calls until the global max_steps_per_turn threshold is hit and then fail with MaxStepsReached. That is a costly/slow failure mode introduced by this change; adding a small think-only retry cap or fallback stop reason would prevent runaway retries.
Useful? React with 👍 / 👎.
Add max_think_only_continuations config (default 3) to LoopControl.
When the model exceeds this limit of consecutive think-only responses,
stop the turn with a clear error message instead of spinning until
max_steps_per_turn (100) — saving tokens and giving actionable feedback
("increase max output tokens or reduce thinking effort").
The streak counter resets on any non-think-only response (text or
tool_calls), so intermittent think-only responses are handled normally.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1053d61a5c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if result.tool_calls: | ||
| self._think_only_streak = 0 | ||
| return None |
There was a problem hiding this comment.
Reset think-only streak on every turn-ending path
_think_only_streak is only reset in the result.tool_calls/normal-text paths, but _step() can also end a turn earlier via the tool_rejected return above this block. If a turn has think-only retries and then exits through that rejection path, the stale streak carries into the next user turn and can trigger the "giving up" branch too early (fewer continuations than configured), which is a user-visible regression in retry behavior.
Useful? React with 👍 / 👎.
Signed-off-by: Kai <me@kaiyi.cool>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b36a222536
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/kimi_cli/soul/kimisoul.py
Outdated
| has_think = any(isinstance(p, ThinkPart) for p in result.message.content) | ||
| has_text = any(isinstance(p, TextPart) and p.text.strip() for p in result.message.content) | ||
| if has_think and not has_text: | ||
| self._think_only_streak += 1 |
There was a problem hiding this comment.
Reset think-only streak at each new turn
_think_only_streak is incremented here but not cleared at turn boundaries, so if a turn ends via a non-text/non-tool path (e.g. interruption or early stop before the reset branches run), the next user turn inherits the old streak and can hit the continuation cap prematurely. That makes later turns fail faster than max_think_only_continuations intends and is user-visible after any aborted turn; resetting this counter at the start of _turn/_agent_loop avoids cross-turn state leakage.
Useful? React with 👍 / 👎.
…errors A response containing only ThinkPart content (no text, no tool calls) is an abnormal condition — typically a stream interruption or output token budget exhaustion during reasoning. The model should always produce visible output after thinking. Detect this at the generate layer and raise APIEmptyResponseError, which is already retryable through the standard retry mechanism (max_retries_per_step=3). This replaces the previous approach of auto-continuing the agent loop, which masked the underlying error. Reverts kimisoul.py changes (think-only continuation, streak counter, user message injection) and config.py changes (max_think_only_continuations) from the previous commits in favor of this simpler, correct-by-design fix.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f852f858b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| has_think = any(isinstance(p, ThinkPart) for p in message.content) | ||
| has_text = any(isinstance(p, TextPart) and p.text.strip() for p in message.content) | ||
| if has_think and not has_text and not message.tool_calls: |
There was a problem hiding this comment.
Treat non-text assistant parts as valid non-empty output
The new guard classifies responses as "think-only" whenever there is no non-whitespace TextPart and no tool call, but that also rejects valid assistant outputs that use other supported content parts (for example ImageURLPart/AudioURLPart/VideoURLPart). In that case a response like ThinkPart + ImageURLPart now raises APIEmptyResponseError, so callers retry/fail instead of receiving the model output; this is a regression from the previous behavior where any non-empty message.content was accepted.
Useful? React with 👍 / 👎.
Signed-off-by: Kai <me@kaiyi.cool>
The test_fetch_url_basic_functionality test was hitting a real GitHub issue page and comparing against a snapshot. GitHub page HTML changes over time, causing flaky CI failures unrelated to code changes. Replace with a mock HTTP server serving static HTML to test the same trafilatura extraction pipeline deterministically. Assertions verify: - HTML tags are stripped (not returning raw HTML) - Meaningful text content is extracted - Metadata (title, description) is present - Correct result message for HTML extraction path
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ae3b3e2be
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| has_think = any(isinstance(p, ThinkPart) for p in message.content) | ||
| has_text = any(isinstance(p, TextPart) and p.text.strip() for p in message.content) | ||
| if has_think and not has_text and not message.tool_calls: |
There was a problem hiding this comment.
Allow multimodal outputs in think-only response check
The new has_think && !has_text && !tool_calls guard treats any response without non-empty TextPart as invalid, which incorrectly rejects valid assistant outputs that contain other ContentPart types (for example ImageURLPart, AudioURLPart, or VideoURLPart) alongside ThinkPart. This is a behavioral regression from the prior not message.content check and can turn legitimate multimodal replies into APIEmptyResponseError retries/failures whenever a provider emits reasoning plus non-text content.
Useful? React with 👍 / 👎.
Summary
generate()layer detects think-only responses and raisesAPIEmptyResponseError, which is automatically retried through the existing retry mechanism (max_retries_per_step=3). If all retries fail, the user sees a clear error message.p.text.strip().Test plan
test_generate_think_only_raises_error— think-only response raises APIEmptyResponseErrortest_generate_think_with_empty_text_raises_error— think + empty/whitespace text also raisestest_generate_think_with_text_succeeds— think + real text succeeds normallytest_generate_think_with_tool_calls_succeeds— think + tool calls succeeds normallytest_think_only_error_is_retryable— APIEmptyResponseError is retryable in kimisoul