Skip to content

feat(ollama): add max_retries to chat generator#2899

Open
Keyur-S-Patel wants to merge 2 commits intodeepset-ai:mainfrom
Keyur-S-Patel:feat/ollama-chat-max-retries
Open

feat(ollama): add max_retries to chat generator#2899
Keyur-S-Patel wants to merge 2 commits intodeepset-ai:mainfrom
Keyur-S-Patel:feat/ollama-chat-max-retries

Conversation

@Keyur-S-Patel
Copy link
Contributor

@Keyur-S-Patel Keyur-S-Patel commented Mar 1, 2026

PR : feat/ollama-chat-max-retries

Related Issues

  • partially addresses #9309

Proposed Changes:

  • Added max_retries support to OllamaChatGenerator.
  • Added tenacity-based retries for both sync and async chat calls.
  • Added exponential backoff via wait_exponential().
  • Kept retry handling local to run() / run_async() using @retry-decorated callables instead of separate helper methods.
  • Added tenacity to the Ollama integration package dependencies.
  • Updated unit tests for retry behavior, including the success-after-retry path and retry exhaustion path.
  • Adjusted the retry success test fixture to include valid Ollama token count metadata.

How did you test it?

  • hatch run fmt-check
  • hatch run test:types
  • hatch run test:unit

Result:

  • formatting checks passed
  • type checks passed
  • Ollama unit test suite passed on this branch (34 passed)

Notes for the reviewer

  • tenacity.stop_after_attempt(...) uses total attempts, so the implementation uses max_retries + 1 to preserve the expected semantics of “initial call + N retries”.
  • Retry policy currently retries generic Exception from Ollama chat calls, matching the previous broad retry behavior while switching to a standard retry library.
  • Full package test:all still depends on a running local Ollama instance for integration tests; only unit tests were used for branch validation.

Checklist

@Keyur-S-Patel Keyur-S-Patel requested a review from a team as a code owner March 1, 2026 02:48
@Keyur-S-Patel Keyur-S-Patel requested review from bogdankostic and removed request for a team March 1, 2026 02:48
@github-actions github-actions bot added integration:ollama type:documentation Improvements or additions to documentation labels Mar 1, 2026
@Keyur-S-Patel Keyur-S-Patel force-pushed the feat/ollama-chat-max-retries branch from ae3c423 to 83278a1 Compare March 1, 2026 02:58
@Keyur-S-Patel
Copy link
Contributor Author

@anakin87 Can you help me close this PR, its similar to the one you reviewed here

@Keyur-S-Patel Keyur-S-Patel changed the title add max_retries to chat generator Open feat(ollama): add max_retries to chat generator Mar 1, 2026
@Keyur-S-Patel Keyur-S-Patel changed the title Open feat(ollama): add max_retries to chat generator feat(ollama): add max_retries to chat generator Mar 1, 2026
Copy link
Contributor Author

@Keyur-S-Patel Keyur-S-Patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to this one #2875

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR @Keyur-S-Patel! I left two comments regarding the retry logic. Also, it would be great if we could add tests for run_async.

@retry(
reraise=True,
stop=stop_after_attempt(self.max_retries + 1),
retry=retry_if_exception_type(Exception),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying on all kind of exceptions is too broad.

Copy link
Contributor Author

@Keyur-S-Patel Keyur-S-Patel Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry on 429 and 5xx?

Lemme know if you want to retry on something else as well.

Comment on lines +527 to +545
@retry(
reraise=True,
stop=stop_after_attempt(self.max_retries + 1),
retry=retry_if_exception_type(Exception),
wait=wait_exponential(),
)
def chat_with_retry() -> ChatResponse | Iterator[ChatResponse]:
return self._client.chat(
model=self.model,
messages=ollama_messages,
tools=ollama_tools,
stream=is_stream, # type: ignore[call-overload] # Ollama expects Literal[True] or Literal[False], not bool
keep_alive=self.keep_alive,
options=generation_kwargs,
format=self.response_format,
think=self.think,
)

response = chat_with_retry()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this not nested inside of run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Retry annotation requires function so can

  1. use wrapper function or
  2. Build new method and put retry annotation there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will separate out function in next commit

Copy link
Contributor Author

@Keyur-S-Patel Keyur-S-Patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated new commit with following,

  • retry logic to retry on specific codes
  • pull the retry out of run
  • added unit tests for async

Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there, I just have a minor comment regarding the type of retry_state and a question on tenacity version.

HTTP_STATUS_SERVER_ERROR_MAX_EXCLUSIVE = 600


def _stop_after_instance_max_retries(retry_state: Any) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use type Any here

Suggested change
def _stop_after_instance_max_retries(retry_state: Any) -> bool:
def _stop_after_instance_max_retries(retry_state: RetryCallState) -> bool:

Comment on lines +266 to +267
:param max_retries:
Maximum number of retries to attempt for failed requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lest's make this parameter more descriptive.

Suggested change
:param max_retries:
Maximum number of retries to attempt for failed requests.
:param max_retries:
Maximum number of retries to attempt for failed requests (HTTP 429, 5xx, connection/timeout errors).
Uses exponential backoff between attempts. Set to 0 (default) to disable retries.

"Programming Language :: Python :: Implementation :: PyPy",
]
dependencies = ["haystack-ai>=2.22.0", "ollama>=0.5.0", "pydantic"]
dependencies = ["haystack-ai>=2.22.0", "ollama>=0.5.0", "pydantic", "tenacity>=8.2.3"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for >=8.2.3 for tenacity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:ollama type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants