Conversation
Add bilingual cookbook covering text-only, multimodal, MoE, and high-performance launch configurations, thinking/reasoning mode, FP8 KV quantization, and hardware recommendations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces deployment guides for the Qwen3.5 model family, including dense and Mixture-of-Experts (MoE) variants, in both Chinese and English. The documentation covers model features, recommended launch scripts for different hardware setups, reasoning mode support, and KV cache quantization. Feedback from the review points out that the model types qwen3_5_text and qwen3_5_moe_text are not registered in the codebase and should be replaced with base model names. Additionally, the review suggests correcting the KV cache quantization parameter from --data_type to --kv_cache_quant_type and clarifying the usage of calibration configurations.
| * - ``qwen3_5_text`` | ||
| - 稠密 + 纯文本 | ||
| - 稠密 MLP,无视觉编码器 | ||
| * - ``qwen3_5_moe_text`` | ||
| - MoE + 纯文本 | ||
| - 混合专家模型,无视觉编码器 |
|
|
||
| .. code-block:: bash | ||
|
|
||
| --data_type fp8_e4m3 |
There was a problem hiding this comment.
| * - ``qwen3_5_text`` | ||
| - Dense + Text-only | ||
| - Dense MLP without vision encoder | ||
| * - ``qwen3_5_moe_text`` | ||
| - MoE + Text-only | ||
| - Mixture-of-Experts without vision encoder |
There was a problem hiding this comment.
The model types qwen3_5_text and qwen3_5_moe_text are not registered in the current codebase (see lightllm/models/qwen3_5/model.py and lightllm/models/qwen3_5_moe/model.py). Using these names in the --model_type argument will result in an error. It is recommended to clarify that for text-only mode, the base model names (qwen3_5, qwen3_5_moe) should be used without the --enable_multimodal flag.
|
|
||
| .. code-block:: bash | ||
|
|
||
| --data_type fp8_e4m3 |
There was a problem hiding this comment.
In LightLLM, the parameter for enabling KV cache quantization is typically --kv_cache_quant_type. The --data_type flag is generally used to specify the precision of the model weights and activations. Since this section is specifically about KV cache quantization, --kv_cache_quant_type should be used instead. Also, please clarify how the calibration JSON configuration should be passed to the server (e.g., via --quant_config_path).
| --data_type fp8_e4m3 | |
| --kv_cache_quant_type fp8_e4m3 |
Remove qwen3_5_text and qwen3_5_moe_text from the supported model types table since they are not registered via @ModelRegistry. Clarify that text-only mode uses the same model type without --enable_multimodal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…okbook Qwen3.5 models are registered as multimodal by default, so --enable_multimodal is not a user-facing CLI flag. For text-only deployment, use --disable_vision instead. For multimodal deployment, no extra flag is needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite both EN/CN cookbooks to use the real model Qwen3.5-397B-A17B (397B total / 17B active MoE) instead of fictional model names like Qwen3.5-VL or Qwen3.5-MoE. Add HuggingFace link, accurate architecture details (512 experts, 60-layer hybrid layout), recommended sampling parameters for thinking/non-thinking modes, and proper 8×H200 setup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
qwen3_5,qwen3_5_moe)--reasoning_parser qwen3), FP8 KV quantization, multimodal image input, and hardware requirementsTest plan
sphinx-buildapi_serverargs