Summary
The rag.clarify_query intrinsic enters a degenerate repetition loop when run against
ibm-granite/granite-switch-4.1-3b-preview (ALORA embedded adapter, via vLLM/OpenAI backend).
The model produces a valid JSON opening but then repeats program names indefinitely until hitting max tokens, never closing the JSON.
Controlled experiments
| Backend |
Model |
Adapter mechanism |
Result |
| vLLM |
granite-switch-4.1-3b-preview |
ALORA embedded in switch checkpoint |
FAIL |
| vLLM (enforce_eager) |
granite-switch-4.1-3b-preview |
ALORA embedded in switch checkpoint |
FAIL |
| vLLM |
GrizleeBer/gs-test-2 |
LORA embedded in switch checkpoint |
PASS |
| HF |
granite-4.1-3b |
ALORA loaded via PEFT from ibm-granite/granitelib-rag-r1.0 |
PASS |
| HF |
granite-4.1-3b |
LORA loaded via PEFT from ibm-granite/granitelib-rag-r1.0 |
PASS |
The same ALORA adapter weights work correctly when loaded via PEFT on the HF backend. The failure is specific to the embedded ALORA in the switch checkpoint.
Summary
The
rag.clarify_queryintrinsic enters a degenerate repetition loop when run againstibm-granite/granite-switch-4.1-3b-preview(ALORA embedded adapter, via vLLM/OpenAI backend).The model produces a valid JSON opening but then repeats program names indefinitely until hitting max tokens, never closing the JSON.
Controlled experiments
granite-switch-4.1-3b-previewgranite-switch-4.1-3b-previewGrizleeBer/gs-test-2granite-4.1-3bibm-granite/granitelib-rag-r1.0granite-4.1-3bibm-granite/granitelib-rag-r1.0The same ALORA adapter weights work correctly when loaded via PEFT on the HF backend. The failure is specific to the embedded ALORA in the switch checkpoint.