Add gemini-3.5-flash model configuration#3315
Conversation
Follows ADDINGMODEL.md to add the gemini-3.5-flash preview model released by Google. - Add gemini-3.5-flash entry to MODELS in resolve_model_config.py (litellm_proxy/gemini-3.5-flash-preview, temperature 0.0), following the existing gemini-3-flash / gemini-3.1-pro pattern. - Register gemini-3.5-flash and gemini-3.5-flash-preview in VERIFIED_GEMINI_MODELS and VERIFIED_OPENHANDS_MODELS so the model appears under the Gemini provider in the UI. - Add test_gemini_3_5_flash_config to tests/cross/test_resolve_model_config.py. PROMPT_CACHE_MODELS already covers any 'gemini-3' prefix, and reasoning_effort support is auto-detected from LiteLLM, so no changes are needed in model_features.py. Fixes #3312 Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED Behavioral default changes detectedThese public
|
|
Integration tests triggered against this PR's branch (
gh workflow run integration-runner.yml \
-f model_ids=gemini-3.5-flash \
-f reason="Testing new gemini-3.5-flash model from PR #3315" \
-f issue_number=3315 \
--ref openhands/add-gemini-3.5-flashResults (8 tests: basic commands, file ops, code editing, reasoning, errors, tools, context, vision) will be posted back here automatically once the run completes (~5-10 min). I will mark the PR ready for review only after a successful run. Comment posted by an AI agent (OpenHands). |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
🧪 Integration Tests ResultsOverall Success Rate: 0.0% 📊 Summary
📋 Detailed Resultslitellm_proxy_gemini_3.5_flash_preview
Skipped Tests:
Failed Tests:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
Review generated by an AI agent (OpenHands) on behalf of the user.
🟡 Taste Rating: Acceptable — simple config-only addition, but the new Gemini 3.5 Flash preview identifier appears unsupported by the referenced docs.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Selecting this model could fail at runtime if the configured LiteLLM/proxy model ID is not actually available.
VERDICT: ❌ Needs follow-up: correct or justify the Gemini 3.5 Flash preview identifier before approval.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26139047185
| "id": "gemini-3.5-flash", | ||
| "display_name": "Gemini 3.5 Flash", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/gemini-3.5-flash-preview", |
There was a problem hiding this comment.
🟠 Important: The linked Gemini 3.5 Flash docs list the model code as gemini-3.5-flash; I could not find gemini-3.5-flash-preview there, and local LiteLLM feature detection recognizes gemini-3.5-flash but not this preview name. Unless the proxy has a documented custom alias, this should route to litellm_proxy/gemini-3.5-flash and the test should assert that stable ID.
| VERIFIED_GEMINI_MODELS = [ | ||
| "gemini-3.1-pro-preview", | ||
| "gemini-3.1-pro", | ||
| "gemini-3.5-flash-preview", |
There was a problem hiding this comment.
🟠 Important: Same identifier issue: adding gemini-3.5-flash-preview to the verified model lists exposes a selectable model name that the referenced Google docs do not list. Please remove the preview entry from both VERIFIED_GEMINI_MODELS and VERIFIED_OPENHANDS_MODELS unless there is a verified proxy/provider alias for it.
|
all-hands-bot
left a comment
There was a problem hiding this comment.
❌ QA Report: FAIL
The SDK model-list portion works, but the real evaluation resolver path aborts for gemini-3.5-flash because the configured proxy model is rejected as invalid.
Does this PR achieve its stated goal?
No. The PR set out to make gemini-3.5-flash usable in evaluations and selectable through the verified-models surface. After applying the PR, the verified model lists contain both new Gemini entries, but running the actual eval model resolver with MODEL_IDS=gemini-3.5-flash reaches the proxy preflight and fails with Invalid model name passed in model=gemini-3.5-flash-preview, so the evaluation use case is not functional.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv environment successfully. |
| CI Status | ✅ 31 checks passing; only the current agent review/QA checks were pending, with 2 skipped. |
| Functional Verification | ❌ Eval resolver fails on the PR; SDK verified-model membership succeeds. |
Functional Verification
Test 1: Evaluation resolver can resolve and preflight gemini-3.5-flash
Step 1 — Establish baseline without the PR:
Checked out origin/main and ran:
cd .github/run-eval
MODEL_IDS='gemini-3.5-flash' GITHUB_OUTPUT=/tmp/model-output-main.txt uv run python resolve_model_config.pyObserved:
BASE_RESOLVER_EXIT=1
ERROR: Model ID 'gemini-3.5-flash' not found. Available models: ... gemini-3-flash, gemini-3.1-pro, ...
--- output file ---
<missing>
This confirms the baseline does not know about the new model.
Step 2 — Apply the PR's changes:
Checked out commit 5ca143c46cf435aae831d37ddf67acba7d3b837e.
Step 3 — Re-run with the PR:
Ran the same command:
cd .github/run-eval
MODEL_IDS='gemini-3.5-flash' GITHUB_OUTPUT=/tmp/model-output-pr.txt uv run python resolve_model_config.pyObserved:
PR_RESOLVER_EXIT=1
Resolved 1 model(s): gemini-3.5-flash
Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev
Preflight LLM check for 1 model(s)...
Checking Gemini 3.5 Flash... (2.0s)
✗ Gemini 3.5 Flash: Bad request - litellm.BadRequestError: Litellm_proxyException - /chat/completions: Invalid model name passed in model=gemini-3.5-flash-preview. Call `/v1/models` to view available models for your key.
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.
ERROR: Preflight LLM check failed
--- output file ---
<missing>
This shows the PR adds the ID far enough for resolution, but the real user-facing eval path still aborts before producing a GitHub Actions output because the configured proxy model name is unavailable/invalid.
Test 2: SDK verified-model surface includes the new Gemini entries
Step 1 — Establish baseline without the PR:
On origin/main, ran a short SDK import script and observed:
gemini-3.5-flash in VERIFIED_GEMINI_MODELS: False
gemini-3.5-flash in VERIFIED_OPENHANDS_MODELS: False
gemini-3.5-flash-preview in VERIFIED_GEMINI_MODELS: False
gemini-3.5-flash-preview in VERIFIED_OPENHANDS_MODELS: False
This confirms the baseline lists did not expose the new Gemini entries.
Step 2 — Apply the PR's changes:
Checked out commit 5ca143c46cf435aae831d37ddf67acba7d3b837e.
Step 3 — Re-run with the PR:
Ran the same import check and observed:
gemini-3.5-flash in VERIFIED_GEMINI_MODELS: True
gemini-3.5-flash in VERIFIED_OPENHANDS_MODELS: True
gemini-3.5-flash-preview in VERIFIED_GEMINI_MODELS: True
gemini-3.5-flash-preview in VERIFIED_OPENHANDS_MODELS: True
This confirms the SDK list backing the verified-models surface now exposes the new Gemini entries.
Issues Found
- 🔴 Blocker:
gemini-3.5-flashcannot currently be used by the eval resolver because preflight rejects the configured proxy model name (gemini-3.5-flash-preview) and aborts without writingGITHUB_OUTPUT.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
| "id": "gemini-3.5-flash", | ||
| "display_name": "Gemini 3.5 Flash", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/gemini-3.5-flash-preview", |
There was a problem hiding this comment.
🔴 Critical: I ran the actual resolver path with MODEL_IDS='gemini-3.5-flash' GITHUB_OUTPUT=/tmp/model-output-pr.txt uv run python resolve_model_config.py, and it resolved this entry but then failed preflight with Invalid model name passed in model=gemini-3.5-flash-preview. Because the resolver aborts without writing GITHUB_OUTPUT, this new model is not usable for evaluations yet; please align this proxy model name/configuration with a model currently accepted by the LiteLLM proxy.
This review comment was created by an AI agent (OpenHands) on behalf of the user.
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
1) Fix the LiteLLM model name: per the reviewer, the model is named
'gemini-3.5-flash', not 'gemini-3.5-flash-preview'. Update the entry
in resolve_model_config.py and the matching assertion in
tests/cross/test_resolve_model_config.py.
2) Remove gemini-3.5-flash{,-preview} from VERIFIED_GEMINI_MODELS and
VERIFIED_OPENHANDS_MODELS. The model has not been verified yet --
integration tests have not passed -- so it does not belong in the
verified list.
3) Update ADDINGMODEL.md with explicit guidance that models should not
be added to verified_models.py unless explicitly asked for by a
maintainer; passing integration tests is necessary but not sufficient.
Co-authored-by: openhands <openhands@all-hands.dev>
|
Thanks for the catch, @juanmichelini — addressed both points in dcea291: 1) Renamed the LiteLLM model to drop the "gemini-3.5-flash": {
"id": "gemini-3.5-flash",
"display_name": "Gemini 3.5 Flash",
"llm_config": {
- "model": "litellm_proxy/gemini-3.5-flash-preview",
+ "model": "litellm_proxy/gemini-3.5-flash",
"temperature": 0.0,
},
},…and updated the assertion in 2) Removed Pulled both Also added explicit guidance to
Integration tests re-triggered against the updated branch:
Local checks ( This comment was created by an AI agent (OpenHands) on behalf of @juanmichelini. |
SummaryBoth review points from @juanmichelini were addressed in a single commit ( Checklist
ConcisenessDiff is 4 files, +16/-11 lines. All changes are directly tied to the two review points; no extraneous edits. The removal of the two Notes for the reviewer
|
🧪 Integration Tests ResultsOverall Success Rate: 0.0% 📊 Summary
📋 Detailed Resultslitellm_proxy_gemini_3.5_flash
Skipped Tests:
Failed Tests:
|
Why
Add the newly released Gemini 3.5 Flash model so it can be used in evaluations and selected from the verified-models UI under the Gemini provider. See Gemini 3.5 Flash docs.
Summary
gemini-3.5-flashentry toMODELSin.github/run-eval/resolve_model_config.py, routed throughlitellm_proxy/gemini-3.5-flash-previewwithtemperature=0.0— same shape as the existinggemini-3-flashandgemini-3.1-proentries.gemini-3.5-flashandgemini-3.5-flash-previewinVERIFIED_GEMINI_MODELSandVERIFIED_OPENHANDS_MODELSinopenhands-sdk/openhands/sdk/llm/utils/verified_models.pyso the model surfaces under the Gemini provider in the UI (per ADDINGMODEL.md guidance forverified_models.py).test_gemini_3_5_flash_configtotests/cross/test_resolve_model_config.py.No changes needed in
model_features.py:PROMPT_CACHE_MODELSalready covers thegemini-3prefix, andreasoning_effortsupport is auto-detected from LiteLLM via_supports_reasoning_effort.Issue Number
Fixes #3312
How to Test
Local unit tests:
uv run pytest tests/cross/test_resolve_model_config.py::test_gemini_3_5_flash_config \ tests/cross/test_resolve_model_config.py::test_all_models_valid_with_pydantic \ tests/cross/test_resolve_model_config.py::test_find_all_models -v uv run pytest tests/sdk/llm/test_model_features.py uv run pre-commit run --files \ .github/run-eval/resolve_model_config.py \ openhands-sdk/openhands/sdk/llm/utils/verified_models.py \ tests/cross/test_resolve_model_config.pyManual verification of the config resolver:
Integration tests (per ADDINGMODEL.md Step 7) — run against this PR's branch:
gh workflow run integration-runner.yml \ -f model_ids=gemini-3.5-flash \ -f reason="Testing new model from PR" \ --ref openhands/add-gemini-3.5-flashThe integration-test run will be linked here once it completes.
Video/Screenshots
N/A — config-only change. Local test output:
Type
Notes
This PR was created by an AI agent (OpenHands) on behalf of the issue reporter.
@juanmichelini can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:dcea291-pythonRun
All tags pushed for this build
About Multi-Architecture Support
dcea291-python) is a multi-arch manifest supporting both amd64 and arm64dcea291-python-amd64) are also available if needed