fix(kosong): treat think-only model responses as incomplete response errors by RealKai42 · Pull Request #1801 · MoonshotAI/kimi-cli

RealKai42 · 2026-04-08T12:31:08Z

Summary

When a model response contains only thinking content (ThinkPart) with no text or tool calls — typically caused by stream interruption or output token budget exhaustion during reasoning — the agent loop previously stopped silently with a normal TurnEnd, leaving users with no output and no error.
Now the generate() layer detects think-only responses and raises APIEmptyResponseError, which is automatically retried through the existing retry mechanism (max_retries_per_step=3). If all retries fail, the user sees a clear error message.
Also handles the edge case of ThinkPart + empty/whitespace-only TextPart via p.text.strip().

Test plan

test_generate_think_only_raises_error — think-only response raises APIEmptyResponseError
test_generate_think_with_empty_text_raises_error — think + empty/whitespace text also raises
test_generate_think_with_text_succeeds — think + real text succeeds normally
test_generate_think_with_tool_calls_succeeds — think + tool calls succeeds normally
test_think_only_error_is_retryable — APIEmptyResponseError is retryable in kimisoul
Verified tests fail when fix is disabled (mutation testing)
Full test suite: 2206 passed, 8 skipped, 0 failed

…loop When extended thinking exhausts the max_tokens budget, the model may return a response containing only ThinkPart content with no text or tool calls. Previously this caused the agent loop to silently stop with a normal TurnEnd, leaving users with no output and no error. Now _step() detects think-only responses (checking for non-empty ThinkPart with no substantive TextPart) and: 1. Logs a warning with the thinking character count 2. Injects a user continuation message to maintain proper user/assistant message alternation (preventing API rejection) 3. Returns None to continue the agent loop, letting the model see its own thinking and complete its response The continuation message write is protected with asyncio.shield() for consistency with _grow_context. The has_text check uses p.text.strip() to handle empty/whitespace-only TextPart edge cases.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Signed-off-by: Kai <me@kaiyi.cool>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR prevents the agent loop from silently stopping when the model returns a “think-only” response (thinking content with no visible text or tool calls) by detecting that condition in _step() and auto-continuing with an injected user continuation message.

Changes:

Add think-only response detection in KimiSoul._step() and inject a user continuation message to preserve message alternation.
Add a dedicated test suite covering think-only, empty/whitespace text, consecutive think-only responses, and loop continuation.
Document the behavior change in English and Chinese changelogs (plus root CHANGELOG.md).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/kimi_cli/soul/kimisoul.py`	Detect think-only assistant responses and auto-continue the loop via an injected user continuation message.
`tests/core/test_kimisoul_think_only.py`	New tests validating think-only detection, alternation safety, and agent-loop continuation behavior.
`docs/en/release-notes/changelog.md`	Release note entry describing the fix.
`docs/zh/release-notes/changelog.md`	Chinese release note entry describing the fix.
`CHANGELOG.md`	Root changelog entry describing the fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T12:38:39Z

tests/core/test_kimisoul_think_only.py

+    # The last message must be the injected user continuation message
+    assert history[-1].role == "user"
+    assert len(history[-1].content) == 1
+    assert isinstance(history[-1].content[0], TextPart)
+    assert "<system>" in history[-1].content[0].text
+    assert "continue" in history[-1].content[0].text.lower()


These assertions are tightly coupled to the exact wording/format of the injected continuation message (including the <system> rendering detail). This will make the tests fragile to harmless copy changes or any refactor of system() formatting. Consider centralizing the continuation text into a constant/helper (used by both production code and tests), and asserting equivalence against that constant (or asserting a more structural property, e.g., that the message is a system(...)-generated continuation marker).

Suggested change

# The last message must be the injected user continuation message

assert history[-1].role == "user"

assert len(history[-1].content) == 1

assert isinstance(history[-1].content[0], TextPart)

assert "<system>" in history[-1].content[0].text

assert "continue" in history[-1].content[0].text.lower()

# The last message must be the injected user continuation message.

# Assert its structure rather than exact wording/formatting so the test

# remains stable across harmless copy or rendering changes.

assert history[-1].role == "user"

assert len(history[-1].content) == 1

assert isinstance(history[-1].content[0], TextPart)

assert history[-1].content[0].text.strip()

Copilot · 2026-04-08T12:38:39Z

src/kimi_cli/soul/kimisoul.py

+        # Detect think-only responses: the model produced thinking content but no
+        # text and no tool calls.  This typically happens when extended thinking
+        # exhausts the max_tokens budget before the model can emit visible output.
+        # Treat this as an incomplete response and let the agent loop continue so
+        # the model can see its own thinking and finish generating.
+        has_think = any(isinstance(p, ThinkPart) for p in result.message.content)
+        has_text = any(isinstance(p, TextPart) and p.text.strip() for p in result.message.content)
+        if has_think and not has_text:
+            think_chars = sum(
+                len(p.think) for p in result.message.content if isinstance(p, ThinkPart)
+            )
+            logger.warning(
+                "Model response contains only thinking content ({n_chars} chars) "
+                "without text or tool calls — likely truncated by max_tokens. "
+                "Continuing agent loop to let the model complete its response.",
+                n_chars=think_chars,
+            )
+            # Inject a user continuation message to maintain proper
+            # user→assistant message alternation in the context.  Without
+            # this, the think-only assistant message followed by the next
+            # assistant response would create consecutive assistant messages,
+            # which can be rejected by some APIs (and becomes problematic
+            # after compaction strips ThinkParts).
+            await asyncio.shield(
+                self._context.append_message(
+                    Message(
+                        role="user",
+                        content=[
+                            system(
+                                "Your previous response contained only thinking content "
+                                "and was likely truncated. Please continue and provide "
+                                "your response."
+                            )
+                        ],
+                    )
+                )
+            )
+            return None


If the model repeatedly returns think-only responses (e.g., due to an overly small output budget), this will spin until max_steps_per_turn is hit, potentially consuming significant tokens/time and then failing with a generic MaxStepsReached. Consider adding a dedicated safeguard for consecutive think-only continuations (e.g., count consecutive think-only events within a turn, and after a small threshold stop with a clearer, actionable error/stop_reason such as instructing the user to increase max tokens / reduce thinking effort).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4cfb0c0188

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T12:41:28Z

src/kimi_cli/soul/kimisoul.py

+                        role="user",
+                        content=[
+                            system(
+                                "Your previous response contained only thinking content "


Mark continuation prompt as internal bookkeeping

This injected continuation is persisted as a normal user message wrapped in <system>...</system>, but downstream readers only treat checkpoints (<system>CHECKPOINT...), system-reminders, and notifications as internal (see src/kimi_cli/ui/shell/replay.py _is_user_message at lines 159-167, src/kimi_cli/utils/export.py _is_internal_user_message at lines 47-53, and src/kimi_cli/session_fork.py turn detection at lines 153-200). As a result, every think-only recovery can appear as a fake user turn in replay/export/fork flows, which is user-visible and distorts turn indexing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-08T12:41:28Z

src/kimi_cli/soul/kimisoul.py

+                    )
+                )
+            )
+            return None


Cap think-only auto-continue retries per turn

This branch always continues the loop for think-only outputs without any dedicated retry limit, so if a provider keeps returning think-only responses (for example, repeated reasoning truncation), the turn will keep issuing LLM calls until the global max_steps_per_turn threshold is hit and then fail with MaxStepsReached. That is a costly/slow failure mode introduced by this change; adding a small think-only retry cap or fallback stop reason would prevent runaway retries.

Useful? React with 👍 / 👎.

Add max_think_only_continuations config (default 3) to LoopControl. When the model exceeds this limit of consecutive think-only responses, stop the turn with a clear error message instead of spinning until max_steps_per_turn (100) — saving tokens and giving actionable feedback ("increase max output tokens or reduce thinking effort"). The streak counter resets on any non-think-only response (text or tool_calls), so intermittent think-only responses are handled normally.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1053d61a5c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T13:01:03Z

src/kimi_cli/soul/kimisoul.py

        if result.tool_calls:
+            self._think_only_streak = 0
            return None


Reset think-only streak on every turn-ending path

_think_only_streak is only reset in the result.tool_calls/normal-text paths, but _step() can also end a turn earlier via the tool_rejected return above this block. If a turn has think-only retries and then exits through that rejection path, the stale streak carries into the next user turn and can trigger the "giving up" branch too early (fewer continuations than configured), which is a user-visible regression in retry behavior.

Useful? React with 👍 / 👎.

Signed-off-by: Kai <me@kaiyi.cool>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b36a222536

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T13:09:01Z

src/kimi_cli/soul/kimisoul.py

+        has_think = any(isinstance(p, ThinkPart) for p in result.message.content)
+        has_text = any(isinstance(p, TextPart) and p.text.strip() for p in result.message.content)
+        if has_think and not has_text:
+            self._think_only_streak += 1


Reset think-only streak at each new turn

_think_only_streak is incremented here but not cleared at turn boundaries, so if a turn ends via a non-text/non-tool path (e.g. interruption or early stop before the reset branches run), the next user turn inherits the old streak and can hit the continuation cap prematurely. That makes later turns fail faster than max_think_only_continuations intends and is user-visible after any aborted turn; resetting this counter at the start of _turn/_agent_loop avoids cross-turn state leakage.

Useful? React with 👍 / 👎.

…errors A response containing only ThinkPart content (no text, no tool calls) is an abnormal condition — typically a stream interruption or output token budget exhaustion during reasoning. The model should always produce visible output after thinking. Detect this at the generate layer and raise APIEmptyResponseError, which is already retryable through the standard retry mechanism (max_retries_per_step=3). This replaces the previous approach of auto-continuing the agent loop, which masked the underlying error. Reverts kimisoul.py changes (think-only continuation, streak counter, user message injection) and config.py changes (max_think_only_continuations) from the previous commits in favor of this simpler, correct-by-design fix.

…o-continuation

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f852f858b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-08T14:19:39Z

packages/kosong/src/kosong/_generate.py

+    has_think = any(isinstance(p, ThinkPart) for p in message.content)
+    has_text = any(isinstance(p, TextPart) and p.text.strip() for p in message.content)
+    if has_think and not has_text and not message.tool_calls:


Treat non-text assistant parts as valid non-empty output

The new guard classifies responses as "think-only" whenever there is no non-whitespace TextPart and no tool call, but that also rejects valid assistant outputs that use other supported content parts (for example ImageURLPart/AudioURLPart/VideoURLPart). In that case a response like ThinkPart + ImageURLPart now raises APIEmptyResponseError, so callers retry/fail instead of receiving the model output; this is a regression from the previous behavior where any non-empty message.content was accepted.

Useful? React with 👍 / 👎.

Signed-off-by: Kai <me@kaiyi.cool>

The test_fetch_url_basic_functionality test was hitting a real GitHub issue page and comparing against a snapshot. GitHub page HTML changes over time, causing flaky CI failures unrelated to code changes. Replace with a mock HTTP server serving static HTML to test the same trafilatura extraction pipeline deterministically. Assertions verify: - HTML tags are stripped (not returning raw HTML) - Meaningful text content is extracted - Metadata (title, description) is present - Correct result message for HTML extraction path

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ae3b3e2be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-09T06:21:36Z

packages/kosong/src/kosong/_generate.py

+    has_think = any(isinstance(p, ThinkPart) for p in message.content)
+    has_text = any(isinstance(p, TextPart) and p.text.strip() for p in message.content)
+    if has_think and not has_text and not message.tool_calls:


Allow multimodal outputs in think-only response check

The new has_think && !has_text && !tool_calls guard treats any response without non-empty TextPart as invalid, which incorrectly rejects valid assistant outputs that contain other ContentPart types (for example ImageURLPart, AudioURLPart, or VideoURLPart) alongside ThinkPart. This is a behavioral regression from the prior not message.content check and can turn legitimate multimodal replies into APIEmptyResponseError retries/failures whenever a provider emits reasoning plus non-text content.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings April 8, 2026 12:31

devin-ai-integration bot reviewed Apr 8, 2026

View reviewed changes

docs: add changelog entry for think-only response detection fix

e0beccd

Copilot started reviewing on behalf of RealKai42 April 8, 2026 12:36 View session

Merge branch 'main' into kaiyi/las-vegas-v2

4cfb0c0

Signed-off-by: Kai <me@kaiyi.cool>

Copilot AI reviewed Apr 8, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

Merge branch 'main' into kaiyi/las-vegas-v2

b36a222

Signed-off-by: Kai <me@kaiyi.cool>

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

RealKai42 added 2 commits April 8, 2026 22:10

docs: update changelog to reflect error-based approach instead of aut…

3f852f8

…o-continuation

RealKai42 changed the title ~~fix(soul): detect think-only model responses and auto-continue agent loop~~ fix(kosong): treat think-only model responses as incomplete response errors Apr 8, 2026

chatgpt-codex-connector bot reviewed Apr 8, 2026

View reviewed changes

RealKai42 added 2 commits April 9, 2026 10:39

Merge branch 'main' into kaiyi/las-vegas-v2

bafffcb

Signed-off-by: Kai <me@kaiyi.cool>

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

RealKai42 merged commit 001c3a4 into main Apr 9, 2026
19 checks passed

RealKai42 deleted the kaiyi/las-vegas-v2 branch April 9, 2026 06:30

Conversation

RealKai42 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RealKai42 commented Apr 8, 2026 •

edited

Loading