Skip to content

fix(sdk): consistent labels for otel.bsp.dropped_spans metric#2108

Open
robbkidd wants to merge 2 commits intoopen-telemetry:mainfrom
robbkidd:robb.fix-bsp-dropped-spans-inconsistent-labels
Open

fix(sdk): consistent labels for otel.bsp.dropped_spans metric#2108
robbkidd wants to merge 2 commits intoopen-telemetry:mainfrom
robbkidd:robb.fix-bsp-dropped-spans-inconsistent-labels

Conversation

@robbkidd
Copy link
Copy Markdown
Member

@robbkidd robbkidd commented Apr 22, 2026

Fixes #2040

report_dropped_spans() is used in several of places and not all of them include a function name in the call. The .compact on the labels hash within report_dropped_spans() was removing nils. This produces inconsistent label sets for the 'otel.bsp.dropped_spans' metric. And that makes metrics reporters like prometheus-client unhappy because they require a stable set of label names.

This change leaves how to handle nil-value labels to concrete implementations of MetricsReporter instead.

report_dropped_spans is called from multiple code paths, some of which
pass a function name and some of which do not. The .compact on the labels
hash silently drops the code.function key when nil, producing an
inconsistent label signature that breaks metrics reporters like prometheus
which require a stable key set.

Demonstrates the problem reported in open-telemetry#2040
@robbkidd robbkidd self-assigned this Apr 22, 2026
@robbkidd robbkidd added the bug Something isn't working label Apr 22, 2026
report_dropped_spans() is used in several of places and not all of them
include a function name in the call. The .compact here was removing
nils from the labels hash. This produces inconsistent label sets for the
'otel.bsp.dropped_spans' metric. And that makes metrics reporters like
prometheus-client unhappy because they require a stable set of label
names.

We'll leave how to handle nil values to concrete implementations of
MetricsReporter instead.
@arielvalentin
Copy link
Copy Markdown
Contributor

What is nil? Code function?

Should the default be a blank string instead of nil in the method signature?

Should the method be defensive and ensure all labels are valid?

Otherwise we are relying on the metrics reporter to have to be defensive as well.

@robbkidd
Copy link
Copy Markdown
Member Author

robbkidd commented May 4, 2026

What is nil? Code function?

Yes. There are some call-sites that do not pass in function:. The current default is nil and compacting a nil value in the hash removed the function label which caused grief for prometheus.

Should the default be a blank string instead of nil in the method signature?

I don't know. Our metrics reporter is off-spec, so the handling of nil vs blank string has historically been left to the implementer of a concrete reporter.

Should the method be defensive and ensure all labels are valid?

Valid according to whose rules? Because our current metrics reporter interface is off-spec, someone wanting to use it decides what to do about labels and validity depend on where those labels are getting sent.

Otherwise we are relying on the metrics reporter to have to be defensive as well.

Yes, this change places the onus of defense on the concrete implementation of a metrics reporter that the SDK does not (to my knowledge) yet supply.


I'm currently thinking that these questions could be addressed in followup issues and PRs so that the poor experience for concrete implementations choosing to send to prometheus today is improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenTelemetry::SDK::Trace::Export.report_dropped_spans does not always produce the same set of labels, which the prometheus library considers invalid

4 participants