Add Pressure Stall Information (PSI) metrics (reopened #2996) by alpineQ · Pull Request #3068 · open-telemetry/semantic-conventions

alpineQ · 2025-11-11T15:41:16Z

Changes

This PR adds support for Linux Pressure Stall Information (PSI) metrics to the system semantic conventions.

PSI is a Linux kernel feature (available since kernel 4.20) that identifies and quantifies resource contention by measuring the time impact that CPU, memory, and I/O resource crunches have on workloads.

New Metrics

system.linux.psi.pressure (Gauge): Measures resource pressure as a percentage of time that tasks were stalled over a time window (10s, 60s, or 300s)
system.linux.psi.total_time (Counter): Tracks the total cumulative stall time in microseconds since system boot

New Attributes

system.psi.resource: The resource type (cpu, memory, io)
system.psi.stall_type: The stall severity (some for partial stalls, full for complete stalls where all non-idle tasks are blocked)
system.psi.window: The time window for pressure calculation (10s, 60s, 300s)

Use Cases

PSI metrics enable:

Sizing workloads to hardware or provisioning hardware according to workload demand
Detecting productivity losses caused by resource scarcity
Dynamic system management (load shedding, job migration, strategic pausing)
Maximizing hardware utilization without sacrificing workload health

References

Relevant issues and PRs

There are issues on this matter in:

And 2 PRs that I am proposing to address these issues:

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
Links to the prototypes or existing instrumentations (when adding or changing conventions)
- Prometheus node exporter has PSI metrics enabled by default

Reopened #2996

# Conflicts: # docs/system/system-metrics.md

Co-authored-by: James Thompson <thompson.tomo@outlook.com>

github-actions · 2025-11-26T16:47:28Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

alpineQ · 2025-11-27T08:01:30Z

@thompson-tomo @braydonk @trask
Issue #2996 was reopened here. If any additional changes are needed, I'm open to suggestions.

thompson-tomo · 2025-11-27T10:58:37Z

@alpineQ can you rebase/merge in master as the doc templates have been updated.

alpineQ · 2025-12-01T08:36:27Z

@thompson-tomo any updates on this?

thompson-tomo

Docs and definitions look good to me based on published guidance & clarification.

trask · 2025-12-01T16:00:58Z

hi @alpineQ, this will need review and approval from @open-telemetry/semconv-system-approvers

alpineQ · 2025-12-14T12:12:43Z

@trask do these @open-telemetry/semconv-system-approvers really exist or only you can see them? 🤣

rogercoll · 2025-12-29T11:38:28Z

@alpineQ Apologies for the delayed response. The group has been focused on delivering the first stable release of a subset of system metrics, and unfortunately this PR slipped through the cracks.

I’ve also noticed that we’re attempting to add a memory pressure metric for Darwin as well (open-telemetry/opentelemetry-collector-contrib#45154). This made me wonder whether we could agree on a cross-platform, generic naming scheme for pressure metrics (for example, system.cpu.pressure).

Since I’m not very familiar with how this concept is handled across other platforms, I’ve added this topic to the agenda for our next SIG meeting (08/01/2026) so we can discuss it together.

thompson-tomo · 2025-12-29T11:45:55Z

@alpineQ in light of open-telemetry/opentelemetry-collector-contrib#45154 it appears memory pressure is also applicable to macos.

Should we split based on resource type which would mean we end up with:

system.cpu.pressure.linux.ratio
system.cpu.pressure.linux.total_time
system.memory.pressure.linux.ratio
system.memory.pressure.linux.total_time

Io would become disk, network or other depending on what it refers to.

This way these metrics are complementing

system.memory.pressure.darwin.status

We then describe it in the description that it comes from psi.

jeffland-consist · 2026-01-08T11:48:48Z

As I've opened the original issue for the collector, I'd like to briefly chime in that from an end user perspective it would make sense to define the metrics similar to what they look like at the source. If it were my call I'd either go with system.linux.(cpu|memory|io).pressure.(ratio|total_time) and attributes for stall_type and window, or with system.linux.pressure.(ratio|total_time) and attributes for resource, stall_type and window. The former is a little more in line with how existing metrics are formatted, while the latter may be better suited for analytics (see last paragraph).

I can see an argument with adding window as part of the name, e.g. system.linux.cpu.pressure.ratio.10s, as it would be analoguous to the existing system.linux.cpu.load_average.5m. However, if my understanding is correct, for load average this was primarily done this way to adopt the pre-existing thought pattern/vocabulary in that there are three distinct "load averages", and I am not sure if the same way of thought applies to PSI. Naively, it would make more sense to me to have the time window as an attribute with load average too.

stall_type could also be part of the metric name (e.g. system.linux.cpu.pressure.ratio.some), but I'm not enough of a a sysadmin to have a strong opinion on that difference.

Per my understanding, functionally this point is relevant for analytics back ends, where creating statistics across metrics can be handled very differently. For example, when you want to find the maximum across 10s and 60s or the maximum across cpu and memory, having this detail as part of the metric name can apparently complicate the required query language with some back ends.

I'm very interested in learning of other arguments, and seeing how this is decided in the end. Thank you everyone who spends time and effort in making this whole thing possible.

rogercoll · 2026-01-09T15:14:40Z

This topic was discussed during the System SemConv SIG on 08/01/2025. The resulting naming proposal combines the suggestions above:

Split by resource type: As suggested by @thompson-tomo, metrics should start by defining the relevant system area: system.{cpu/memory/disk...}.
Include OS for specific features: Since psi is a Linux-only feature, we should use the OS name to separate the resource (system.memory) from the OS-specific technology, per the design philosophy docs: system.cpu.linux.
Standardize on pressure: To avoid redundancy, we should use either psi or pressure, but not both. The group preferred pressure to allow for future cross-OS terminology: system.cpu.linux.pressure.
Window as part of the metric name (Under evaluation, cc @braydonk): As @jeffland-consist pointed out, the window should be part of the metric rather than an attribute. This aligns with general guidelines stating that aggregations over all the attributes... SHOULD be meaningful. This is analogous to: system.linux.cpu.load_average.5m:
- system.cpu.linux.pressure_average.{10s/1m/5m},
- system.cpu.linux.pressure.total

thompson-tomo

@rogercoll an interesting question came up with when looking at what changes are needed.

We have spoken about metric naming but hasn't been considered is attribute naming. What is the recomendation for attribute naming.

Does system.linux.psi.stall_type become:

system.pressure.linux.stall_type?

Or do we need separate attributes for each resource type? Ie

system.memory.pressure.linux.stall_type?

Have raised https://github.com/open-telemetry/semantic-conventions/pull/3261/changes#r2678240965 to discuss this aspect.

rogercoll · 2026-01-12T09:53:04Z

@rogercoll an interesting question came up with when looking at what changes are needed.

We have spoken about metric naming but hasn't been considered is attribute naming. What is the recomendation for attribute naming.

Does system.linux.psi.stall_type become:
* system.pressure.linux.stall_type?
Or do we need separate attributes for each resource type? Ie
* system.memory.pressure.linux.stall_type?
Have raised https://github.com/open-telemetry/semantic-conventions/pull/3261/changes#r2678240965 to discuss this aspect.

I would say not to include the resource type in this case, as stall_type possible values are shared across resources. Ie

system.pressure.stall_type

What I would leave for discussion in https://github.com/open-telemetry/semantic-conventions/pull/3261/files#r2678240965 is the OS part in the attribute. (the attribute is attached to a metric which already shares the OS uniqueness)

Co-authored-by: James Thompson <thompson.tomo@outlook.com>

alpineQ · 2026-01-18T15:53:31Z

I'm sorry but I kind of lost track of what word nitpicking you are trying to implement here. Edits by maintainers are enabled for this fork. If you need more editing freedom, you are free to open a new PR and reuse changes defined here without me

thompson-tomo · 2026-01-19T13:06:57Z

@alpineQ could you regenerate/update the docs based on the latest model changes.

The open topic is attribute naming in particular if and where the OS name goes. This should be discussed in the linked issue.

The options are:

system.pressure.stall_type
system.pressure.linux.stall_type
system.linux.pressure.stall_type

alpineQ · 2026-01-19T13:16:30Z

I tried that to avoid leaving the work unfinished, but got errors referencing undefined new fields—likely due to incomplete renaming—so I gave up.

thompson-tomo

@alpineQ this should hopefully allow the docs to update. If so, then go through the process of adding the 2 metrics to the cpu section and then adding thr io section.

thompson-tomo · 2026-01-19T13:31:27Z

+
+This metric is [recommended][MetricRecommended].
+
+<!-- semconv metric.system.linux.psi.total_time -->


Suggested change

thompson-tomo · 2026-01-19T13:32:46Z

+
+This metric is [recommended][MetricRecommended].
+
+<!-- semconv metric.system.linux.psi.pressure -->


Suggested change

thompson-tomo · 2026-01-19T13:33:28Z

+<!-- END AUTOGENERATED TEXT -->
+<!-- endsemconv -->
+
+### Metric: `system.linux.psi.total_time`


Suggested change

### Metric: `system.linux.psi.total_time`

### Metric: `system.memory.linux.pressure.total`

thompson-tomo · 2026-01-19T13:34:01Z

+
+For more details, see the [Linux kernel PSI documentation](https://docs.kernel.org/accounting/psi.html).
+
+### Metric: `system.linux.psi.pressure`


Suggested change

### Metric: `system.linux.psi.pressure`

### Metric: `system.memory.linux.pressure.average`

thompson-tomo · 2026-01-19T13:35:15Z

+
+## Linux PSI (Pressure Stall Information) metrics
+
+**Description:** Linux Pressure Stall Information (PSI) metrics captured under the namespace `system.linux.psi`.
+
+PSI is a Linux kernel feature (available since kernel 4.20) that identifies and
+quantifies resource contention. It measures the time impact that resource
+crunches have on workloads by tracking the percentage of time tasks are stalled
+waiting for CPU, memory, or I/O resources.
+
+PSI helps in:
+
+- Sizing workloads to hardware or provisioning hardware according to workload demand
+- Detecting productivity losses caused by resource scarcity
+- Dynamic system management (load shedding, job migration, strategic pausing)
+- Maximizing hardware utilization without sacrificing workload health
+
+For more details, see the [Linux kernel PSI documentation](https://docs.kernel.org/accounting/psi.html).


Suggested change

## Linux PSI (Pressure Stall Information) metrics

**Description:** Linux Pressure Stall Information (PSI) metrics captured under the namespace `system.linux.psi`.

PSI is a Linux kernel feature (available since kernel 4.20) that identifies and

quantifies resource contention. It measures the time impact that resource

crunches have on workloads by tracking the percentage of time tasks are stalled

waiting for CPU, memory, or I/O resources.

PSI helps in:

- Sizing workloads to hardware or provisioning hardware according to workload demand

- Detecting productivity losses caused by resource scarcity

- Dynamic system management (load shedding, job migration, strategic pausing)

- Maximizing hardware utilization without sacrificing workload health

For more details, see the [Linux kernel PSI documentation](https://docs.kernel.org/accounting/psi.html).

This should be on the metrics if not already.

thompson-tomo · 2026-01-19T13:37:11Z

+- [Linux PSI (Pressure Stall Information) metrics](#linux-psi-pressure-stall-information-metrics)
+  - [Metric: `system.linux.psi.pressure`](#metric-systemlinuxpsipressure)
+  - [Metric: `system.linux.psi.total_time`](#metric-systemlinuxpsitotal_time)


Suggested change

- [Linux PSI (Pressure Stall Information) metrics](#linux-psi-pressure-stall-information-metrics)

- [Metric: `system.linux.psi.pressure`](#metric-systemlinuxpsipressure)

- [Metric: `system.linux.psi.total_time`](#metric-systemlinuxpsitotal_time)

- [Metric: `system.memory.linux.pressure.average`](#metric-systemmemorylinuxpressureaverage)

- [Metric: `system.memory.linux.pressure.total`](#metric-systemmemorylinuxpressuretotal)

github-actions · 2026-02-03T03:59:53Z

This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days.

rogercoll · 2026-02-06T01:25:42Z

@alpineQ Would you still be able to work on this PR and revisit the suggestions (#3068 (comment))?

github-actions · 2026-02-20T03:59:36Z

This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days.

alpineQ and others added 7 commits November 11, 2025 10:10

Add Pressure Stall Information (PSI) metrics

a93e54c

# Conflicts: # docs/system/system-metrics.md

More concise PSI metrics description

d8e1f0f

Co-authored-by: James Thompson <thompson.tomo@outlook.com>

Added review suggestions

de74f7d

Regenerated registry tables markdown

9dd8280

PSI stall time unit changed to seconds

7d49441

pr review changes

49a3b59

Co-authored-by: James Thompson <thompson.tomo@outlook.com>

docs markdown regenerated

6fb773a

alpineQ requested review from a team as code owners November 11, 2025 15:41

github-project-automation Bot added this to Semantic Conventions Triage Nov 11, 2025

github-project-automation Bot moved this to Untriaged in Semantic Conventions Triage Nov 11, 2025

github-actions Bot added enhancement New feature or request area:system labels Nov 11, 2025

alpineQ and others added 2 commits November 11, 2025 18:39

attribute names changed: system.psi -> system.linux.psi

3ef06f0

Merge branch 'open-telemetry:main' into main

fca96e1

lmolkova moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Nov 20, 2025

github-actions Bot added the Stale label Nov 26, 2025

Merge branch 'open-telemetry:main' into main

3ac26af

Regenerated registry tables markdown

8d601a5

github-actions Bot removed the Stale label Nov 28, 2025

Merge branch 'open-telemetry:main' into main

f57de7b

thompson-tomo approved these changes Dec 1, 2025

View reviewed changes

thompson-tomo mentioned this pull request Dec 26, 2025

[receiver/hostmetrics] Add new metric system.memory.darwin.pressure open-telemetry/opentelemetry-collector-contrib#45154

Closed

github-actions Bot added the Stale label Dec 29, 2025

rogercoll mentioned this pull request Dec 29, 2025

[receiver/hostmetrics] Pressure Stall Information (PSI) from linux hosts open-telemetry/opentelemetry-collector-contrib#42779

Open

rogercoll removed the Stale label Dec 29, 2025

thompson-tomo mentioned this pull request Jan 8, 2026

[receiver/hostmetrics] Add Darwin memory pressure and compressor metrics open-telemetry/opentelemetry-collector-contrib#45271

Closed

thompson-tomo mentioned this pull request Jan 10, 2026

[chore] System Semconv OS-exclusive instrumentation philosophy #3261

Closed

1 task

thompson-tomo suggested changes Jan 10, 2026

View reviewed changes

Comment thread model/system/metrics.yaml

Comment thread model/system/registry.yaml Outdated

Comment thread model/system/registry.yaml Outdated

github-project-automation Bot moved this from Awaiting codeowners approval to Blocked in Semantic Conventions Triage Jan 10, 2026

alpineQ and others added 2 commits January 18, 2026 14:17

Apply suggestions from code review

015e6cf

Co-authored-by: James Thompson <thompson.tomo@outlook.com>

Apply suggestion from @thompson-tomo

00f69d7

Co-authored-by: James Thompson <thompson.tomo@outlook.com>

thompson-tomo suggested changes Jan 19, 2026

View reviewed changes

lmolkova added this to System Semantic Convention Working Group Jan 19, 2026

github-project-automation Bot moved this to Todo in System Semantic Convention Working Group Jan 19, 2026

lmolkova moved this from Blocked to Awaiting codeowners approval in Semantic Conventions Triage Jan 19, 2026

github-actions Bot added the Stale label Feb 3, 2026

github-actions Bot removed the Stale label Feb 6, 2026

github-actions Bot added the Stale label Feb 20, 2026

github-actions Bot closed this Feb 28, 2026

github-project-automation Bot moved this from Todo to Done in System Semantic Convention Working Group Feb 28, 2026

rogercoll mentioned this pull request Mar 10, 2026

Add Linux Pressure Stall Information (PSI) metrics #3528

Open

3 tasks


		This metric is [recommended][MetricRecommended].

		<!-- semconv metric.system.linux.psi.total_time -->

	<!-- semconv metric.system.linux.psi.total_time -->
	<!-- semconv metric.system.memory.linux.pressure.total -->


		This metric is [recommended][MetricRecommended].

		<!-- semconv metric.system.linux.psi.pressure -->

	### Metric: `system.linux.psi.total_time`
	### Metric: `system.memory.linux.pressure.total`


		For more details, see the [Linux kernel PSI documentation](https://docs.kernel.org/accounting/psi.html).

		### Metric: `system.linux.psi.pressure`

Conversation

alpineQ commented Nov 11, 2025

Changes

New Metrics

New Attributes

Use Cases

References

Relevant issues and PRs

Merge requirement checklist

Uh oh!

github-actions Bot commented Nov 26, 2025

Uh oh!

alpineQ commented Nov 27, 2025

Uh oh!

thompson-tomo commented Nov 27, 2025

Uh oh!

alpineQ commented Dec 1, 2025

Uh oh!

thompson-tomo left a comment

Choose a reason for hiding this comment

Uh oh!

trask commented Dec 1, 2025

Uh oh!

alpineQ commented Dec 14, 2025

Uh oh!

rogercoll commented Dec 29, 2025

Uh oh!

thompson-tomo commented Dec 29, 2025

Uh oh!

jeffland-consist commented Jan 8, 2026

Uh oh!

rogercoll commented Jan 9, 2026

Uh oh!

thompson-tomo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rogercoll commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alpineQ commented Jan 18, 2026

Uh oh!

thompson-tomo commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alpineQ commented Jan 19, 2026

Uh oh!

thompson-tomo left a comment

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

thompson-tomo Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Feb 3, 2026

Uh oh!

rogercoll commented Feb 6, 2026

Uh oh!

github-actions Bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

thompson-tomo left a comment •

edited

Loading

rogercoll commented Jan 12, 2026 •

edited

Loading

thompson-tomo commented Jan 19, 2026 •

edited

Loading