Improve reasoning and tool-call parsers by lvhan028 · Pull Request #4468 · InternLM/lmdeploy

lvhan028 · 2026-03-26T04:02:47Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

Copilot

Pull request overview

This PR refactors streaming reasoning/tool-call parsing by centralizing “previous/current text + token ids” into a shared, request-attached streaming state and updates the OpenAI API server + parsers to use the new streaming interfaces.

Changes:

Introduces StreamingParserState + get_streaming_state(request) and a shared ThinkingReasoningParser base for <think>...</think>-style models.
Simplifies streaming parser method signatures (tool + reasoning) to accept deltas only and read previous/current context from request-attached state.
Renames/reshuffles parser modules (*_tool_parser.py, qwen_reasoning_parser.py) and updates imports accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/test_qwen3coder_parser.py	Updates tool parser import path (but streaming call sites still need updating to new API).
tests/test_lmdeploy/test_qwen3_parser.py	Updates reasoning/tool parser import paths (but streaming call sites still need updating to new API).
lmdeploy/serve/openai/tool_parser/tool_parser.py	Updates abstract streaming tool parser signature + docs to rely on request-attached streaming state.
lmdeploy/serve/openai/tool_parser/qwen3coder_tool_parser.py	Adapts Qwen3Coder streaming parsing to read `current_text` via `get_streaming_state`.
lmdeploy/serve/openai/tool_parser/qwen3_tool_parser.py	Adapts Qwen3 streaming parsing to read `current_text` via `get_streaming_state`.
lmdeploy/serve/openai/tool_parser/qwen2d5_tool_parser.py	Adapts streaming parsing to shared state (but still uses parser-instance state).
lmdeploy/serve/openai/tool_parser/llama3_tool_parser.py	Adapts streaming parsing to shared state (but still uses parser-instance state).
lmdeploy/serve/openai/tool_parser/internlm2_tool_parser.py	Adapts streaming parsing to shared state (but still uses parser-instance state).
lmdeploy/serve/openai/tool_parser/init.py	Updates exports to new `*_tool_parser.py` module names.
lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py	Adds shared streaming state + `ThinkingReasoningParser` base; updates reasoning parser streaming API.
lmdeploy/serve/openai/reasoning_parser/qwen_reasoning_parser.py	New Qwen QwQ / Intern-S1 reasoning parser based on `ThinkingReasoningParser`.
lmdeploy/serve/openai/reasoning_parser/qwen_qwq_reasoning_parser.py	Removes the prior Qwen QwQ reasoning parser implementation.
lmdeploy/serve/openai/reasoning_parser/deepseek_r1_reasoning_parser.py	Refactors DeepSeek-R1 reasoning parser onto `ThinkingReasoningParser`.
lmdeploy/serve/openai/reasoning_parser/init.py	Updates exports to include new shared state utilities + new module path.
lmdeploy/serve/openai/api_server.py	Uses `StreamingParserState` in the streaming loop and calls updated parser streaming APIs.

Comments suppressed due to low confidence (3)

lmdeploy/serve/openai/tool_parser/qwen2d5_tool_parser.py:52

Qwen2d5ToolParser stores streaming parse state (self.position, self.current_tool_id, self.current_tool_name_sent, etc.) on the parser instance, but the server uses a single global parser instance across requests. This is not safe under concurrent streaming requests and can lead to mixed tool-call parsing. Please move per-request state to the request object (e.g., request._tool_parser_state) or another request-scoped structure.
lmdeploy/serve/openai/tool_parser/llama3_tool_parser.py:83
Llama3JsonToolParser relies on instance attributes (self.current_tool_id, self.current_tool_name_sent, self.prev_tool_call_arr, etc.) for streaming state. Since the API server stores a single global parser instance, concurrent streaming requests can race and corrupt this state. Please migrate the streaming state to be request-scoped (e.g., attach a parser-state object to request).
lmdeploy/serve/openai/tool_parser/internlm2_tool_parser.py:56
Internlm2ToolParser keeps streaming parse state (self.position, self.current_tool_id, self.current_tool_name_sent, etc.) on the parser instance, but VariableInterface.tool_parser is a single global instance shared across requests. With concurrent streaming requests, this state can interleave and corrupt parsing. Consider moving per-request state onto the request object (e.g., request._tool_parser_state) similar to Qwen3ToolParser, or into the shared StreamingParserState.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T04:09:03Z

lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py

+    """Base class for reasoning parsers that use <think>...</think> style tags.
+
+    Subclasses only need to set `start_token`, `end_token`, and optionally
+    override `strip_newlines` and `on_missing_start_tag` to customize behavior.


Docstring mentions overriding on_missing_start_tag, but the actual configurable attribute is on_missing_end_tag. Please fix the docstring to avoid confusing subclass authors.

Suggested change

override `strip_newlines` and `on_missing_start_tag` to customize behavior.

override `strip_newlines` and `on_missing_end_tag` to customize behavior.

Copilot · 2026-03-26T04:09:04Z

tests/test_lmdeploy/test_qwen3_parser.py

 from lmdeploy.serve.openai.protocol import (ChatCompletionRequest, ChatCompletionResponse, ChatCompletionResponseChoice,
                                            ChatCompletionResponseStreamChoice, ChatCompletionStreamResponse,
                                            ChatMessage, DeltaMessage, DeltaToolCall, UsageInfo)
-from lmdeploy.serve.openai.reasoning_parser.qwen_qwq_reasoning_parser import QwenQwQReasoningParser
-from lmdeploy.serve.openai.tool_parser.qwen3_parser import Qwen3ToolParser
+from lmdeploy.serve.openai.reasoning_parser.qwen_reasoning_parser import QwenQwQReasoningParser
+from lmdeploy.serve.openai.tool_parser.qwen3_tool_parser import Qwen3ToolParser


This test file still calls extract_tool_calls_streaming(...) / extract_reasoning_content_streaming(...) with the removed previous_text/current_text and previous_token_ids/current_token_ids parameters (see _chat_completion_v1 below). Update the test harness to match the new parser API (and to update/step the shared StreamingParserState via get_streaming_state(request) similarly to api_server.py).

Copilot · 2026-03-26T04:09:04Z

tests/test_lmdeploy/test_qwen3coder_parser.py

@@ -10,7 +10,7 @@
 from lmdeploy.serve.openai.protocol import (ChatCompletionRequest, ChatCompletionResponse, ChatCompletionResponseChoice,
                                            ChatCompletionResponseStreamChoice, ChatCompletionStreamResponse,
                                            ChatMessage, DeltaMessage, DeltaToolCall, UsageInfo)
-from lmdeploy.serve.openai.tool_parser.qwen3coder_parser import Qwen3CoderToolParser
+from lmdeploy.serve.openai.tool_parser.qwen3coder_tool_parser import Qwen3CoderToolParser


This test file still calls extract_tool_calls_streaming(...) / extract_reasoning_content_streaming(...) using the old signature with previous_text/current_text and token id args (see _chat_completion_v1 below). Update the test harness to the new streaming parser API and maintain the shared StreamingParserState on the request (as api_server.py now does).

Copilot · 2026-03-26T04:09:04Z

lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py

+    previous_text: str = ''
+    current_text: str = ''
+    previous_token_ids: list[int] = []
+    current_token_ids: list[int] = []


StreamingParserState uses mutable list defaults (previous_token_ids / current_token_ids). In a dataclass these lists are shared across all instances, which can leak token ids between requests. Use field(default_factory=list) for these fields.

Copilot · 2026-03-26T04:09:04Z

lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py

+    def step(self) -> None:
+        """Advance: copy current -> previous (call at end of each iteration)."""
+        self.previous_text = self.current_text
+        self.previous_token_ids = self.current_token_ids
+


StreamingParserState.step() assigns previous_token_ids = current_token_ids, which aliases the same list. Because update() mutates current_token_ids in place, previous_token_ids will also change, breaking any logic that compares previous vs current tokens (and can cause incorrect detection of start/end tags). Snapshot via .copy() (or store lengths / use an immutable sequence).

Copilot · 2026-03-26T04:09:04Z

lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py

+    def extract_reasoning_content(self, model_output: str, request: ChatCompletionRequest, **kwargs) -> tuple[str, str]:
        """Extract reasoning content from a complete model-generated string.

        Used for non-streaming responses where we have the entire model response
        available before sending to the client.

        Args:
-            model_output (str): The model-generated string to extract reasoning content from.
-            request (ChatCompletionRequest): he request object that was used to generate the model_output.
+            model_output: The model-generated string to extract reasoning content from.
+            request: The request object that was used to generate the model_output.

        Returns:
-            reasoning_content (str | None): The reasoning content.
-            final_output (str | None): The content.
+            A tuple of (reasoning_content, final_output). Either may be None.
        """


The return type annotation for extract_reasoning_content is tuple[str, str], but the docstring and implementations return None for either element (e.g., ThinkingReasoningParser.extract_reasoning_content returns (None, model_output) and vice versa). Update the annotation to allow None (e.g., tuple[str | None, str | None]) to match behavior.

lvhan028 added 3 commits March 10, 2026 09:07

improve reasoning parser

db3f184

rename file

1572900

minor fix

b895d53

Copilot AI review requested due to automatic review settings March 26, 2026 04:02

Copilot started reviewing on behalf of lvhan028 March 26, 2026 04:03 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

lvhan028 added 4 commits March 26, 2026 04:11

merge main

e8274c3

refactor

35b404c

update deepseek reasoning parser ut

c516394

Merge branch 'main' into improve-parsers

dea23e0

lvhan028 added the improvement label Mar 26, 2026

lvhan028 added 2 commits March 27, 2026 07:17

Merge branch 'main' into improve-parsers

390d2ed

agent's first refactor version

d3eb973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reasoning and tool-call parsers#4468

Improve reasoning and tool-call parsers#4468
lvhan028 wants to merge 9 commits intoInternLM:mainfrom
lvhan028:improve-parsers

lvhan028 commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	override `strip_newlines` and `on_missing_start_tag` to customize behavior.
	override `strip_newlines` and `on_missing_end_tag` to customize behavior.

Conversation

lvhan028 commented Mar 26, 2026

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants