OpenWebUI update - new features and gpt as a main model#4102
OpenWebUI update - new features and gpt as a main model#4102
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the OpenWebUI integration demo documentation to use newer default models (including a VLM) and adds guidance for newer “agentic” OpenWebUI features like Web Search, Memory, and Code Interpreter.
Changes:
- Switched the primary chat model example to
OpenVINO/gpt-oss-20b-int4-ovand standardized the OpenWebUI Model ID toovms-model. - Replaced the VLM example model with
Junrui2021/Qwen3-VL-8B-Instruct-int4and added a new screenshot for image upload. - Added new documentation sections for Web Search, Memory/context, and Code Interpreter configuration in OpenWebUI.
Reviewed changes
Copilot reviewed 1 out of 13 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| demos/integration_with_OpenWebUI/README.md | Updates model pull/config instructions and adds new OpenWebUI feature sections (Web Search, Memory, Code Interpreter). |
| demos/integration_with_OpenWebUI/upload_images.png | Adds/updates a screenshot used by the VLM “upload images” step. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| mkdir models | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model Godreign/llama-3.2-3b-instruct-openvino-int4-model --model_repository_path /models --task text_generation | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --add_to_config --config_path /models/config.json --model_path Godreign/llama-3.2-3b-instruct-openvino-int4-model --model_name Godreign/llama-3.2-3b-instruct-openvino-int4-model | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss |
There was a problem hiding this comment.
There’s a trailing whitespace at the end of this Docker command line. Trimming it avoids noisy diffs and occasional copy/paste quirks.
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss | |
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/gpt-oss-20b-int4-ov --model_repository_path /models --task text_generation --tool_parser gptoss --reasoning_parser gptoss |
| @@ -73,4 +73,4 @@ | |||
| > **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue. | |||
There was a problem hiding this comment.
This WA is needed only for NPU, gpt-oss is fixed.
| @@ -17,4 +17,4 @@ | |||
| * [Docker Engine](https://docs.docker.com/engine/) installed | |||
| * Host with x86_64 architecture | |||
| * Linux, macOS, or Windows | |||
| * [Docker Engine](https://docs.docker.com/engine/) installed | ||
| * Host with x86_64 architecture | ||
| * Linux, macOS, or Windows | ||
| * Python 3.11 with pip |
There was a problem hiding this comment.
While pip package of OWU does allow for >=3.11, <3.13.0a1, the install instructions:
https://pypi.org/project/open-webui/
How to install recommend using 3.11
|
|
||
| ### Prerequisites | ||
|
|
||
| In this demo, OpenVINO Model Server is deployed on Linux with CPU using Docker and Open WebUI is installed via Python pip. Requirements to follow this demo: |
There was a problem hiding this comment.
let's make it GPU by default with option to switch to CPU
| > **Important Note**: While using NPU device for acceleration or model gpt-oss-20b with GPU, it is recommended to disable `Follow-Up Auto-Generation` in `Settings > Interface` menu. It will improve response time and avoid queuing requests. For gpt-oss model it will avoid concurrent execution which in version 2026.0 has an accuracy issue. | ||
|
|
||
| ### References | ||
| [https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html](https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching.html#model-preparation) |
There was a problem hiding this comment.
is it still relevant reference?
There was a problem hiding this comment.
You prefer to drop it or replace it with possibly:
https://docs.openvino.ai/2026/model-server/ovms_demos_continuous_batching_agent.html#export-llm-model?
There was a problem hiding this comment.
add info about Native Tool Calling
There was a problem hiding this comment.
for gpt-oss it will be "reasoning_effort":"low"
| ```bash | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenVINO/InternVL2-2B-int4-ov --task text_generation | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --add_to_config --config_path /models/config.json --model_path OpenVINO/InternVL2-2B-int4-ov --model_name OpenVINO/InternVL2-2B-int4-ov | ||
| docker run --rm -u $(id -u):$(id -g) -v $PWD/models:/models openvino/model_server:weekly --pull --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --model_repository_path /models --model_name ovms-model-vl --task text_generation --pipeline_type VLM_CB |
There was a problem hiding this comment.
Damian used it in his demos, I assumed that this model works better with that

🛠 Summary
CVS-183785
Changing models used in OpenWebUI, adding sections about new agentic features.
Done [todo]: Update screenshots to use ovms-model model name instead of
Godreign/llama-3.2-3b-instruct-openvino-int4-model🧪 Checklist
``