Skip to content

clarify_query intrinsic enters repetition loop on embedded ALORA switch model #9

@yairallouche

Description

@yairallouche

Summary

The rag.clarify_query intrinsic enters a degenerate repetition loop when run against
ibm-granite/granite-switch-4.1-3b-preview (ALORA embedded adapter, via vLLM/OpenAI backend).
The model produces a valid JSON opening but then repeats program names indefinitely until hitting max tokens, never closing the JSON.

Controlled experiments

Backend Model Adapter mechanism Result
vLLM granite-switch-4.1-3b-preview ALORA embedded in switch checkpoint FAIL
vLLM (enforce_eager) granite-switch-4.1-3b-preview ALORA embedded in switch checkpoint FAIL
vLLM GrizleeBer/gs-test-2 LORA embedded in switch checkpoint PASS
HF granite-4.1-3b ALORA loaded via PEFT from ibm-granite/granitelib-rag-r1.0 PASS
HF granite-4.1-3b LORA loaded via PEFT from ibm-granite/granitelib-rag-r1.0 PASS

The same ALORA adapter weights work correctly when loaded via PEFT on the HF backend. The failure is specific to the embedded ALORA in the switch checkpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions