feat: implement Turbomind vision encoder support for Qwen3VL/3.5 families by lapy · Pull Request #4460 · InternLM/lmdeploy

lapy · 2026-03-24T22:05:46Z

Add vision support for Turbomind and Qwen3 VL and 3.5 vision encoders.

PR Testing

Scope

This change was validated on the TurboMind path only. The PyTorch backend was not modified or tested.

Regression Test

Ran the TurboMind-focused regression test:

PYTHONPATH=/root/lmdeploy-prs/split-qwen-vision-backend \
pytest -q tests/test_lmdeploy/test_vl/test_qwen_vl_family.py

Result:

8 passed

Optional Qwen3.5 Fast Path Dependency

Installed the matching prebuilt causal-conv1d wheel for this environment:

python -m pip install --break-system-packages \
  'https://github.com/Dao-AILab/causal-conv1d/releases/download/v1.6.1.post4/causal_conv1d-1.6.1%2Bcu12torch2.10cxx11abiTRUE-cp312-cp312-linux_x86_64.whl'

Verified the optional Qwen3.5 fast path is available after install:

python - <<'PY'
from transformers.models.qwen3_5 import modeling_qwen3_5 as m
print('is_fast_path_available', m.is_fast_path_available)
print('causal_conv1d_fn', m.causal_conv1d_fn is not None)
print('causal_conv1d_update', m.causal_conv1d_update is not None)
print('chunk_gated_delta_rule', m.chunk_gated_delta_rule is not None)
print('fused_recurrent_gated_delta_rule', m.fused_recurrent_gated_delta_rule is not None)
PY

End-to-End TurboMind Inference

Validated image inference with the smallest Qwen3-VL model:

CUDA_VISIBLE_DEVICES=0 \
PYTHONPATH=/root/lmdeploy-prs/split-qwen-vision-backend:/root/lmdeploy/lmdeploy/lib \
TM_LOG_LEVEL=ERROR \
python - <<'PY'
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
from lmdeploy.vl import load_image

model = 'Qwen/Qwen3-VL-2B-Instruct'
backend_config = TurbomindEngineConfig(
    tp=1,
    max_batch_size=1,
    session_len=4096,
    cache_max_entry_count=0.05,
)
gen_config = GenerationConfig(max_new_tokens=48, do_sample=False, temperature=0.0)

img = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=backend_config, log_level='ERROR')
print('backend', pipe.async_engine.backend)
out = pipe(('Describe the image in one sentence.', img), gen_config=gen_config)
print(out.text)
pipe.close()
PY

Observed:

backend turbomind
A majestic tiger with a striking orange coat and black stripes rests peacefully on a vibrant green lawn, its gaze fixed directly at the camera.

Validated image inference with the smallest Qwen3.5 VL model:

CUDA_VISIBLE_DEVICES=0 \
PYTHONPATH=/root/lmdeploy-prs/split-qwen-vision-backend:/root/lmdeploy/lmdeploy/lib \
TM_LOG_LEVEL=ERROR \
python - <<'PY'
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
from lmdeploy.vl import load_image

model = 'Qwen/Qwen3.5-0.8B'
backend_config = TurbomindEngineConfig(
    tp=1,
    max_batch_size=1,
    session_len=4096,
    cache_max_entry_count=0.05,
)
gen_config = GenerationConfig(max_new_tokens=48, do_sample=False, temperature=0.0)

img = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
pipe = pipeline(model, backend_config=backend_config, log_level='ERROR')
print('backend', pipe.async_engine.backend)
out = pipe(('Describe the image in one sentence.', img), gen_config=gen_config)
print(out.text)
pipe.close()
PY

Observed:

backend turbomind
A majestic tiger lies peacefully on a sunlit grassy field, its powerful eyes fixed forward and its powerful body relaxed.

…lies

lvhan028 · 2026-03-25T02:51:54Z

@lapy you are on fire!

Copilot

Pull request overview

This PR adds TurboMind vision-encoder support for the Qwen3-VL and Qwen3.5 VL families by implementing split-vision loading/forwarding for TurboMind, updating model/arch routing, and extending export support for VL checkpoint key layouts.

Changes:

Implement TurboMind split-vision loading, vision forward, and mRoPE metadata packing for Qwen3-VL family (and reuse for Qwen3.5).
Update task routing (get_task) and call sites to use backend-config-aware VL/LLM engine selection, including a new disable_vision_encoder flag on TurbomindEngineConfig.
Extend TurboMind model/export support to recognize Qwen3-VL architectures and VL checkpoints whose text weights live under model.language_model.*, plus add regression tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`lmdeploy/vl/model/qwen3.py`	Adds dependency checks, HF arch→AutoModel mapping, split-vision loader, TurboMind forward/to_turbomind with mRoPE meta.
`lmdeploy/vl/model/qwen3_5.py`	Reuses Qwen3-VL TurboMind vision path for Qwen3.5 and aligns preprocessor dep checks.
`lmdeploy/archs.py`	Changes `get_task` signature and adds `disable_vision_encoder` routing behavior.
`lmdeploy/pipeline.py`	Updates `get_task` call to new signature.
`lmdeploy/serve/openai/api_server.py`	Updates `get_task` call to new signature.
`lmdeploy/lite/apis/calibrate.py`	Updates `get_task` call to new signature.
`lmdeploy/messages.py`	Adds `disable_vision_encoder` to `TurbomindEngineConfig`.
`lmdeploy/cli/serve.py`	Wires CLI flag through to `TurbomindEngineConfig`.
`lmdeploy/turbomind/supported_models.py`	Marks Qwen3-VL architectures as TurboMind-supported.
`lmdeploy/turbomind/deploy/source_model/qwen.py`	Supports nested `model.language_model.*` prefixes and layer-pattern matching for VL checkpoints.
`tests/test_lmdeploy/test_vl/test_qwen_vl_family.py`	Adds unit tests for arch resolution, task routing, and Qwen3-VL TurboMind packing behavior.
`tests/test_lmdeploy/test_pytorch/test_engine_disable_vision.py`	Adds a regression test for expected behavior when vision is disabled in PyTorch engine.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T05:22:16Z

lmdeploy/vl/model/qwen3.py

    try:
-        from transformers import Qwen3VLForConditionalGeneration, Qwen3VLMoeForConditionalGeneration  # noqa: F401
+        from transformers import (  # noqa: F401
+            Qwen3_5ForConditionalGeneration,
+            Qwen3_5MoeForConditionalGeneration,
+            Qwen3VLForConditionalGeneration,
+            Qwen3VLMoeForConditionalGeneration,
+        )


check_qwen3_vl_deps_install() currently requires both Qwen3-VL and Qwen3.5 classes to be importable from transformers. This can regress Qwen3-VL usage in environments where transformers has Qwen3-VL but not yet Qwen3.5 (or vice-versa). Consider checking only the architecture actually being loaded (e.g., based on self.hf_config.architectures[0]), or making the Qwen3.5 import optional unless a Qwen3.5 arch is detected.

Copilot · 2026-03-27T05:22:16Z

tests/test_lmdeploy/test_pytorch/test_engine_disable_vision.py

+    with patch('lmdeploy.pytorch.engine.engine.response_reqs', side_effect=capture_response):
+        engine._on_add_message([req])
+
+    assert len(captured) == 1
+    assert captured[0][0] == ResponseType.INTERNAL_ENGINE_ERROR
+    assert 'disable_vision_encoder=True' in captured[0][1]


This test expects disable_vision_encoder=True to reject multimodal inputs with an error, but the current Engine implementation (lmdeploy/pytorch/engine/engine.py:_on_add_message) sets input_multimodals=None and continues (warning only). Either update the engine behavior to match this test, or adjust the assertions to reflect the current contract; otherwise this test will fail consistently.

Copilot · 2026-03-27T05:22:16Z

lmdeploy/archs.py

+        if backend_config and backend_config.disable_vision_encoder:
+            return 'llm', AsyncEngine


When disable_vision_encoder is set, get_task() routes VL architectures to the plain AsyncEngine. AsyncEngine constructs MultimodalProcessor without a vl_encoder, and MultimodalProcessor.get_prompt_input() will then treat multimodal messages as text-only and silently drop image/video blocks (it only joins type=='text'). If the intended contract is to reject multimodal inputs when vision is disabled, this needs an explicit error path (or a different engine selection) to avoid silent data loss.

Suggested change

if backend_config and backend_config.disable_vision_encoder:

return 'llm', AsyncEngine

if backend_config and getattr(backend_config, 'disable_vision_encoder', False):

raise ValueError(

'Invalid configuration: disable_vision_encoder is True for a vision-language '

'model. This would route the model to a text-only engine and silently drop '

'image/video inputs. Please use a pure language model or enable the vision '

'encoder.'

)

lapy changed the title ~~feat: implement Turbomind vision encoder support for Qwen3VL/3.5 fami…~~ feat: implement Turbomind vision encoder support for Qwen3VL/3.5 families Mar 24, 2026

lapy force-pushed the split/qwen-vision-backend branch from b33b54e to 539e55c Compare March 24, 2026 22:11

feat: implement Turbomind vision encoder support for Qwen3VL/3.5 fami…

13f0ae2

…lies

lapy force-pushed the split/qwen-vision-backend branch from f177f41 to 13f0ae2 Compare March 24, 2026 23:14

lvhan028 requested review from CUHKSZzxy, Copilot and lvhan028 March 27, 2026 04:59

lvhan028 added enhancement New feature or request labels Mar 27, 2026

lvhan028 requested review from irexyc March 27, 2026 05:00

Copilot started reviewing on behalf of lvhan028 March 27, 2026 05:10 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement Turbomind vision encoder support for Qwen3VL/3.5 families#4460

feat: implement Turbomind vision encoder support for Qwen3VL/3.5 families#4460
lapy wants to merge 1 commit intoInternLM:mainfrom
lapy:split/qwen-vision-backend

lapy commented Mar 24, 2026 •

edited

Loading

Uh oh!

lvhan028 commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if backend_config and backend_config.disable_vision_encoder:
		return 'llm', AsyncEngine

-        if backend_config and backend_config.disable_vision_encoder:
-            return 'llm', AsyncEngine
+        if backend_config and getattr(backend_config, 'disable_vision_encoder', False):
+            raise ValueError(
+                'Invalid configuration: disable_vision_encoder is True for a vision-language '
+                'model. This would route the model to a text-only engine and silently drop '
+                'image/video inputs. Please use a pure language model or enable the vision '
+                'encoder.'
+            )

Conversation

lapy commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add vision support for Turbomind and Qwen3 VL and 3.5 vision encoders.

PR Testing

Scope

Regression Test

Optional Qwen3.5 Fast Path Dependency

End-to-End TurboMind Inference

Uh oh!

lvhan028 commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lapy commented Mar 24, 2026 •

edited

Loading