-
Notifications
You must be signed in to change notification settings - Fork 8
Add HTTP resilience with retry policy for Azure OpenAI rate limiting #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add Microsoft.Extensions.Http.Resilience package - Configure standard resilience handler with exponential backoff - Handle HTTP 429 (rate limit) with automatic retry - Respect Retry-After header from Azure OpenAI - Add circuit breaker and timeout configuration - Document resilience implementation Co-authored-by: BenjaminMichaelis <[email protected]>
- Add configureResilience parameter to allow selective resilience - Update documentation with usage examples for mixed-use applications - Improve code comments explaining when to use which approach - Keep default behavior (resilience enabled) for Chat app Co-authored-by: BenjaminMichaelis <[email protected]>
- Explain Semantic Kernel limitation requiring ConfigureHttpClientDefaults - Document why named/typed client approach is not possible - Clarify usage scenarios for mixed-use vs dedicated applications - Improve inline code comments with technical justification Co-authored-by: BenjaminMichaelis <[email protected]>
|
@copilot should we set the http client in the AzureOpenAIClientOptions? Also, is there a reason we wouldn't have resilience on by default? also are there build errors? https://github.com/IntelliTect/EssentialCSharp.Web/actions/runs/21854540825/job/63068411734?pr=853 Research, then fix, then validate. |
- Remove configureResilience parameter - resilience is now always enabled - Fix pre-existing build error in LoggerExtensions.cs (LOGGEN036) - Simplify ConfigureAzureOpenAIResilience comments - All Chat tests passing (4/4) Co-authored-by: BenjaminMichaelis <[email protected]>
I researched all three questions: 1. Should we set the http client in AzureOpenAIClientOptions? 2. Is there a reason we wouldn't have resilience on by default? 3. Build errors? Changes in commit 4d1499d:
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Azure OpenAI embedding generation crashes on HTTP 429 (rate limit exceeded) without retry attempts, despite Azure providing explicit Retry-After headers.
Changes
ConfigureHttpClientDefaults- Required approach as Semantic Kernel'sAddAzureOpenAI*methods don't expose named/typed HttpClient configurationLoggerExtensions.cs(LOGGEN036)Configuration
Retry: 5 attempts, exponential backoff from 2s, with jitter. Handles 429/408/5xx.
Circuit Breaker: Opens at 20% failure rate over 30s window.
Retry-After: Standard resilience handler automatically respects Azure's specified delays.
Usage
All applications (resilience always enabled):
The retry policies are appropriate for most HTTP APIs and won't negatively impact other HTTP clients like hCaptcha or Mailjet.
Documentation
Added
docs/AZURE_OPENAI_RESILIENCE.mdwith configuration details, troubleshooting guide, and monitoring recommendations.Original prompt
06:09:39 fail: Microsoft.Extensions.AI.LoggingEmbeddingGenerator[1784604714] GenerateAsync failed. System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached) Your requests to text-embedding-3-small for text-embedding-3-small have exceeded the call rate limit for your current AIServices pricing tier. This request was for Embeddings_Create under Azure OpenAI API version 2025-03-01-preview. Please retry after 4 seconds. To increase your default rate limit, visit: https://aka.ms/oai/quotaincrease. at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options) at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(BinaryContent content, RequestOptions options) at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(IEnumerable
b__0>d.MoveNext() in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat/Program.cs:line 911 inputs, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at Microsoft.Extensions.AI.OpenAIEmbeddingGenerator.GenerateAsync(IEnumerable1 values,Unhandled exception: System.ClientModel.ClientResultException: HTTP 429 (: RateLimitReached)
Your requests to text-embedding-3-small-v1 for text-embedding-3-small in East US 2 have exceeded the call rate limit for your current AIServices S0 pricing tier. This request was for Embeddings_Create under Azure OpenAI API version 2025-03-01-preview. Please retry after 4 seconds. To increase your default rate limit, visit: https://aka.ms/oai/quotaincrease.
at OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)
at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(BinaryContent content, RequestOptions options)
at OpenAI.Embeddings.EmbeddingClient.GenerateEmbeddingsAsync(IEnumerable
1 inputs, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at Microsoft.Extensions.AI.OpenAIEmbeddingGenerator.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)at Microsoft.Extensions.AI.LoggingEmbeddingGenerator
2.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)at Microsoft.Extensions.AI.OpenTelemetryEmbeddingGenerator
2.GenerateAsync(IEnumerable1 values, EmbeddingGenerationOptions options, CancellationToken cancellationToken)at Microsoft.Extensions.AI.EmbeddingGeneratorExtensions.GenerateAsync[TInput,TEmbedding](IEmbeddingGenerator
2 generator, TInput value, EmbeddingGenerationOptions options, CancellationToken cancellationToken) at EssentialCSharp.Chat.Common.Services.EmbeddingService.GenerateEmbeddingAsync(String text, CancellationToken cancellationToken) in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 22 at EssentialCSharp.Chat.Common.Services.EmbeddingService.<>c__DisplayClass7_0.<<GenerateBookContentEmbeddingsAndUploadToVectorStore>b__0>d.MoveNext() in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 50 --- End of stack trace from previous location --- at System.Threading.Tasks.Parallel.<>c__531.<b__53_0>d.MoveNext()--- End of stack trace from previous location ---
at EssentialCSharp.Chat.Common.Services.EmbeddingService.GenerateBookContentEmbeddingsAndUploadToVectorStore(IEnumerable`1 bookContents, CancellationToken cancellationToken, String collectionName) in /home/runner/work/EssentialCSharp.AzureResourceManagement/EssentialCSharp.AzureResourceManagement/web-repo/EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs:line 47
at EssentialCSharp.Chat.Program.<>c__DisplayClass1_0.<
--- End of stack trace from previous location ---
at System.CommandLine.Command.<>c__DisplayClass32_0.<b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Invocation.InvocationPipeline.InvokeAsync(ParseResult parseResult, CancellationToken cancellationToken)
Error: Process completed with exit code 1.
Our quota is at the max. Can we use microsoft learn docs mcp to explore semantic kernal documentation and find out how to use dynamic quota (http://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/dynamic-quota?view=foundry-classic) or just a retry mechanism that (ideally) watches the response call and then waits as long as that specifies we have to wait so we don't keep hammering the system. https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota?view=foundry-classic&tabs=rest https://techcommunity.microsoft.com/blog/fasttrackforazureblog/optimizing-azure-openai-a-guide-to-limits-quotas-and-best-practic...
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.