docs: add Qwen3.5 deployment cookbook (EN/CN) by sufubao · Pull Request #1248 · ModelTC/LightLLM

sufubao · 2026-04-01T04:36:46Z

Summary

Add bilingual (English + Chinese) Qwen3.5 deployment cookbook covering model variants (qwen3_5, qwen3_5_moe)
Include launch scripts for text-only dense, multimodal, MoE, and high-performance H200 configurations
Document thinking/reasoning mode (--reasoning_parser qwen3), FP8 KV quantization, multimodal image input, and hardware requirements
Register new cookbooks in both EN and CN documentation index

Test plan

Verify RST renders correctly with sphinx-build
Confirm all launch command parameters are valid against api_server args
Test example curl commands against a running Qwen3.5 instance

Add bilingual cookbook covering text-only, multimodal, MoE, and high-performance launch configurations, thinking/reasoning mode, FP8 KV quantization, and hardware recommendations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces deployment guides for the Qwen3.5 model family, including dense and Mixture-of-Experts (MoE) variants, in both Chinese and English. The documentation covers model features, recommended launch scripts for different hardware setups, reasoning mode support, and KV cache quantization. Feedback from the review points out that the model types qwen3_5_text and qwen3_5_moe_text are not registered in the codebase and should be replaced with base model names. Additionally, the review suggests correcting the KV cache quantization parameter from --data_type to --kv_cache_quant_type and clarifying the usage of calibration configurations.

gemini-code-assist · 2026-04-01T04:37:59Z

docs/CN/source/cookbook/qwen35_deployment.rst

+   * - ``qwen3_5_text``
+     - 稠密 + 纯文本
+     - 稠密 MLP，无视觉编码器
+   * - ``qwen3_5_moe_text``
+     - MoE + 纯文本
+     - 混合专家模型，无视觉编码器


模型类型 qwen3_5_text 和 qwen3_5_moe_text 在当前代码库中并未注册（参见 lightllm/models/qwen3_5/model.py 和 lightllm/models/qwen3_5_moe/model.py）。在启动命令中使用这些名称作为 --model_type 参数会导致错误。建议在文档中说明：对于纯文本模式，应使用基础模型名称（qwen3_5 或 qwen3_5_moe）且不启用 --enable_multimodal 标志。

gemini-code-assist · 2026-04-01T04:38:00Z

docs/CN/source/cookbook/qwen35_deployment.rst

+
+.. code-block:: bash
+
+    --data_type fp8_e4m3


在 LightLLM 中，启用 KV 缓存量化的参数通常是 --kv_cache_quant_type，而 --data_type 通常用于指定模型权重和激活的精度（例如将整个模型以 FP8 加载）。由于本节专门讨论 KV 缓存量化，建议使用 --kv_cache_quant_type。此外，文档中提到的 JSON 校准配置应说明如何传递给服务器（例如通过 --quant_config_path 参数）。

Suggested change

--data_type fp8_e4m3

--kv_cache_quant_type fp8_e4m3

gemini-code-assist · 2026-04-01T04:38:00Z

docs/EN/source/cookbook/qwen35_deployment.rst

+   * - ``qwen3_5_text``
+     - Dense + Text-only
+     - Dense MLP without vision encoder
+   * - ``qwen3_5_moe_text``
+     - MoE + Text-only
+     - Mixture-of-Experts without vision encoder


The model types qwen3_5_text and qwen3_5_moe_text are not registered in the current codebase (see lightllm/models/qwen3_5/model.py and lightllm/models/qwen3_5_moe/model.py). Using these names in the --model_type argument will result in an error. It is recommended to clarify that for text-only mode, the base model names (qwen3_5, qwen3_5_moe) should be used without the --enable_multimodal flag.

gemini-code-assist · 2026-04-01T04:38:00Z

docs/EN/source/cookbook/qwen35_deployment.rst

+
+.. code-block:: bash
+
+    --data_type fp8_e4m3


In LightLLM, the parameter for enabling KV cache quantization is typically --kv_cache_quant_type. The --data_type flag is generally used to specify the precision of the model weights and activations. Since this section is specifically about KV cache quantization, --kv_cache_quant_type should be used instead. Also, please clarify how the calibration JSON configuration should be passed to the server (e.g., via --quant_config_path).

Suggested change

--data_type fp8_e4m3

--kv_cache_quant_type fp8_e4m3

Remove qwen3_5_text and qwen3_5_moe_text from the supported model types table since they are not registered via @ModelRegistry. Clarify that text-only mode uses the same model type without --enable_multimodal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…okbook Qwen3.5 models are registered as multimodal by default, so --enable_multimodal is not a user-facing CLI flag. For text-only deployment, use --disable_vision instead. For multimodal deployment, no extra flag is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rewrite both EN/CN cookbooks to use the real model Qwen3.5-397B-A17B (397B total / 17B active MoE) instead of fictional model names like Qwen3.5-VL or Qwen3.5-MoE. Add HuggingFace link, accurate architecture details (512 experts, 60-layer hybrid layout), recommended sampling parameters for thinking/non-thinking modes, and proper 8×H200 setup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

sufubao and others added 3 commits April 1, 2026 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add Qwen3.5 deployment cookbook (EN/CN)#1248

docs: add Qwen3.5 deployment cookbook (EN/CN)#1248
sufubao wants to merge 4 commits intomainfrom
qw35_cookbook

sufubao commented Apr 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sufubao commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sufubao commented Apr 1, 2026 •

edited

Loading