[Feature] Support n parameter in /v1/chat/completions and /v1/completions#4419
[Feature] Support n parameter in /v1/chat/completions and /v1/completions#4419ziyangliu-666 wants to merge 4 commits intoInternLM:mainfrom
Conversation
|
The |
There was a problem hiding this comment.
Pull request overview
Adds functional support for the OpenAI-compatible n parameter so /v1/chat/completions and /v1/completions can return multiple choices per request, aligning server behavior with the existing protocol fields/validation.
Changes:
- Implement multi-choice generation for
chat_completions_v1(multiple sessions/generators, per-choice indices, seed offsetting, concurrent non-stream aggregation). - Implement multi-choice generation for
completions_v1(expand generators tolen(prompts) * n, fix streamingchoice.indextagging). - Add a new unit test suite to validate
nbehavior and indexing/usage aggregation with a mocked engine.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
lmdeploy/serve/openai/api_server.py |
Implements n>1 session/generator fan-out for chat/completions; updates streaming/non-streaming response construction. |
tests/test_lmdeploy/test_n_parameter.py |
Adds mocked async-engine tests validating n defaults, rejection, indexing, seed offsetting, and basic aggregation behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Follow-up: Copilot review fixes + re-verified self-testAddressed all 5 issues raised by the Copilot review (commit 99b77cb): Fixes
Self-test results after fixes (Qwen3-0.6B, PyTorch backend)
Streaming All four scenarios produce the correct number of choices with correct indexing. Pre-commit hooks pass. |
|
Further more, the unit tests your PR added have been skipped due to missing |
|
Fixed in 05f1057 — added |
|
Rebased on recent main branch, resolve dthe conflict and try to force push. |
6a470f0 to
69a90fd
Compare
…ions - For n>1 in chat completions, create n independent sessions and generators, stream each output sequentially with the correct choice index, and gather all n results concurrently in non-streaming mode - For n>1 in completions, expand session/generator creation to len(prompts)*n and fix the pre-existing streaming bug where index was hardcoded to 0 - When seed is set, use seed+i for the i-th generator to produce distinct but reproducible outputs - usage.completion_tokens is correctly summed across all n choices; usage.prompt_tokens is counted once (shared input) - Add 16 unit tests covering validation, choice count, usage aggregation, session assignment, seed offset, and multi-prompt × n combinations Closes InternLM#4073
- Fix TypeError in chat streaming path: create_stream_response_json was returning model_dump_json() (str) but cache_block_ids injection subscripted it as a dict; switch to model_dump() + json.dumps() - Fix stateful GptOssChatParser shared across concurrent asyncio.gather calls in non-streaming n>1 path; create a fresh instance per choice, consistent with the streaming path - Fix tool-call parse exceptions being swallowed and misreported as "Client disconnected"; re-raise so asyncio.gather propagates them, wrap gather in try/except to return INTERNAL_SERVER_ERROR - Add missing test_completion_n_negative_rejected to match the existing test_chat_n_negative_rejected
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
69a90fd to
5d433fb
Compare
Resolve conflict in api_server.py: - Incorporate tool_parser.adjust_request() from main before n parameter session logic - Add media_io_kwargs parameter to generate() calls in n parameter loop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Resolved merge conflict with main in eef2c64: incorporated |
|
Hi, @ziyangliu-666 Thank you for your contribution. I really appreciate the effort you put into it. While reviewing, I took a closer look at the current state of our Because your PR interacts with that part of the codebase, I'd like to align its merge with our refactoring plan. Specifically, I want to first estimate the scope of the refactoring work. If it turns out to be significant (meaning it'll take a while), then merging your PR sooner before any larger restructuring. If the refactoring ends up being relatively light, it may be better to merge after we clean things up. So for now, I'd like to keep this PR open while we do that quick assessment. I'll follow up with you once I have a clearer picture. Hope that sounds reasonable, and thank you again for your contribution |
|
@lvhan028 Thanks for the detailed explanation! That sounds totally reasonable. Happy to wait while you assess the refactoring scope. Let me know if there's anything I can do to help with the alignment, or if any changes are needed on this PR in the meantime :) |
Summary
This PR adds real support for the
nparameter in the OpenAI-compatible API server.Previously,
nwas already present in the protocol (ChatCompletionRequest.nandCompletionRequest.n) and validated (n > 0), but it was never actually respected at runtime. The server always generated exactly one output, regardless of the value passed.With this change:
/v1/chat/completionsnow returnsnchoices whenn > 1/v1/completionsnow returnslen(prompts) * nchoiceschoice.indexCloses #4073
Why
The OpenAI API's
nparameter is useful for:nseparate callsWithout this patch, users could pass
n > 1, but the server still produced only one output, which was both misleading and incompatible with expected OpenAI API behavior.What changed
lmdeploy/serve/openai/api_server.py/v1/chat/completionsn=1keeps the existing behavior unchangedrequest.session_idn>1nindependent sessions viaget_session(-1)nindependent generatorsStreaming path
GptOssChatParserinstance per generatorchoice.indexNon-streaming path
asyncio.gatherusage.prompt_tokensonce, since the prompt is sharedusage.completion_tokensacross all generated choicesSeed handling
seedis provided, generatoriusesseed + i/v1/completions(prompt, choice)pairThis means total generators =
len(prompts) * n.Also fixes a pre-existing issue in the streaming path:
index=0Tests
Added a new test file:
tests/test_lmdeploy/test_n_parameter.pyThis includes 16 unit tests with mocked engine behavior, covering:
n=0n<0n=1returns exactly one choicen=3returns three choices with indices[0, 1, 2]usage.completion_tokensis summed correctly across all choicesn=1preservesrequest.session_idn>1uses auto-assigned sessionsseed=100,n=3-> generator seeds[100, 101, 102]/v1/completionsn=3-> 3 choicesn=2-> 4 choicesBackward compatibility
No breaking change.
nstill defaults to1n > 1Example