Skip to content

hwmon: disambiguate colliding chip labels#3646

Open
mwimpelberg28 wants to merge 1 commit intoprometheus:masterfrom
mwimpelberg28:mwimpelberg/hwmon-dedup-chip-labels
Open

hwmon: disambiguate colliding chip labels#3646
mwimpelberg28 wants to merge 1 commit intoprometheus:masterfrom
mwimpelberg28:mwimpelberg/hwmon-dedup-chip-labels

Conversation

@mwimpelberg28
Copy link
Copy Markdown

Summary

Fixes #3637.

Multiple hwmon nodes can be registered under a single parent device — for example, asus-nb-wmi on recent ASUS laptops registers one hwmon for fan control and another for WMI sensors. Both device symlinks resolve to the same /sys/devices/platform/asus-nb-wmi, so hwmonName produces the same platform_asus_nb_wmi chip label for both, and any sensor file that exists in both nodes (e.g. pwm1_enable) trips:

collected metric "node_hwmon_pwm_enable" { ... chip="platform_asus_nb_wmi" ... } was collected before with the same name and label values

Approach

Update now does two passes:

  • Pass 1: enumerate /sys/class/hwmon/*, compute the device-derived base chip name for each, and count collisions.
  • Pass 2: when a base name is shared, suffix the chip label with the chip's name file content if it disambiguates, otherwise with the hwmonX basename (always unique within a boot). The include/exclude filter is also moved here so user regexes match the label that is actually emitted in the metric.

Entries that already produce a unique chip label are unaffected — no surprise suffixes for users not hitting the collision.

This is closer in spirit to the discussion in #333 (the same class of bug for dual-socket coretemp boxes), but contained: the fix only kicks in when an actual collision is detected.

Test plan

New collector/hwmon_linux_test.go:

  • TestHwmonDuplicateChipNamesAreDisambiguated — reproduces the Metric node_hwmon_pwm_enable was collected before with the same name and label values #3637 ASUS WMI scenario (two hwmon dirs sharing one platform device, both exposing pwm1_enable) and asserts both Gather succeeds and the chip labels are distinct.
  • TestHwmonUniqueChipNamesAreUnchanged — guards against unintended label drift for users not hitting the collision.
  • TestHwmonDuplicateChipNamesWithSameNameFile — exercises the hwmonX-basename fallback when the name file content also collides.
  • Full collector test suite still passes (go test ./collector/), including the existing fixture-driven e2e checks.
  • go vet ./... clean.

Multiple hwmon nodes can be registered under a single parent device
(for example asus-nb-wmi exposes one hwmon for fan control and another
for WMI sensors). Both currently resolve to the same chip label
(`platform_asus_nb_wmi`) and trigger "metric collected before with the
same name and label values" errors at scrape time.

Detect this collision in a first pass and append the chip's `name` file
content (or the hwmonX basename if names also collide) to the chip
label in a second pass. The include/exclude filter is moved into the
same pass so user regexes match the label that is actually emitted.

Fixes: prometheus#3637

Signed-off-by: Matthew Wimpelberg <matt.wimpelberg@grafana.com>
@mwimpelberg28
Copy link
Copy Markdown
Author

@SuperQ would you have a moment to take a look? Happy to address any feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metric node_hwmon_pwm_enable was collected before with the same name and label values

1 participant