Fix: Reinitialize gRPC channel on UNAVAILABLE error#4825
Fix: Reinitialize gRPC channel on UNAVAILABLE error#4825xrmx merged 7 commits intoopen-telemetry:mainfrom
Conversation
c670f77 to
b7620d0
Compare
b7620d0 to
436ecc9
Compare
|
I understand this issue is related to the upstream gRPC bug (grpc/grpc#38290). I've analyzed that issue in depth, and the root cause appears to be a regression in the gRPC 'backup poller' (introduced in grpcio>=1.68.0) which fails to recover connections when the primary EventEngine is disabled (common in Python for fork safety). While upstream fixes are being explored (e.g., grpc/grpc#38480), the issue has persisted for months, leaving exporters stuck in an UNAVAILABLE state indefinitely after collector restarts. This PR implements a robust mitigation: detecting the persistent UNAVAILABLE state and forcing a channel re-initialization. This effectively resets the underlying poller state, allowing the exporter to recover immediately without requiring a full application restart. This approach provides stability for users while the complex upstream fix is finalized. |
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
…mments - Remove aggressive gRPC keepalive and retry settings to rely on defaults. - Fix compression precedence logic to correctly handle NoCompression (0). - Refactor channel initialization to be stateless (remove _channel_reconnection_enabled).- Update documentation to refer to 'OTLP-compatible receiver'
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Show resolved
Hide resolved
...pentelemetry-exporter-otlp-proto-grpc/src/opentelemetry/exporter/otlp/proto/grpc/exporter.py
Outdated
Show resolved
Hide resolved
|
@dheeraj-vanamala you have to fix The warnings seems all to be not checking for None. |
Updated as necessary. Did |
Description
This PR fixes issue #4517 where the OTLP gRPC exporter fails to reconnect to the collector after a restart (returning
UNAVAILABLE).Changes:
StatusCode.UNAVAILABLEin the export loop.Fixes #4517
Fixes #4529
Type of change
How Has This Been Tested?
I added a new regression test case test_unavailable_reconnects in exporter/opentelemetry-exporter-otlp-proto-grpc/tests/test_otlp_exporter_mixin.py.
StatusCode.UNAVAILABLE.Does This PR Require a Contrib Repo Change?
Checklist: