Skip to content

WSL2 triggers sustained 2000+ MB/s phantom StorPort read storms on NVMe, killing all WSL processes #40420

@uakdemir

Description

@uakdemir

WSL2 triggers sustained 2000+ MB/s phantom StorPort read storms on NVMe, killing all WSL processes

Follow-up to #40284 which was closed before resolution. The issue is fully reproducible with kernel-level ETW evidence.

Summary

WSL2 intermittently triggers extreme disk I/O saturation where physical disk reads reach 1,500–2,300 MB/s while all user-mode process I/O counters combined show less than 1 MB/s. The unattributed fraction is consistently 99.95–100%. Disk queue depths reach 24 and disk time exceeds 3,600%. This kills all WSL processes, VS Code remote connections, and terminal sessions.

Environment

  • Windows: 11 25H2, Build 26200.8037
  • WSL version: Latest (kernel 6.6.87.2-microsoft-standard-WSL2)
  • Distro: Ubuntu (WSL2)
  • Hardware: MSI laptop, NVMe WD PC SN560 SDDPNQE-1T00-1032 (firmware 74116000)
  • VHDX: 45GB on D: drive (150GB free), compacted
  • Sleep states: S0 Modern Standby only (no S3)

.wslconfig

[wsl2]
memory=10GB
swap=4GB
processors=12
localhostForwarding=true
guiApplications=false

[experimental]
autoMemoryReclaim=gradual

Reproduction

  1. Start Windows fresh (restart)
  2. Open WSL (Ubuntu)
  3. Run 1-2 terminal sessions with CLI workloads (e.g. Node.js processes)
  4. Within 5-30 minutes, disk I/O saturates
  5. Task Manager shows vmmem consuming 100% disk
  6. All WSL processes are killed
  7. wsl --shutdown immediately stops the storm

The issue reproduces approximately 30-50% of the time, typically within the first 15 minutes of a WSL session.

Evidence

Process-level monitoring

A custom PowerShell logger using Get-Counter captures physical disk metrics and per-process I/O rates every 10 seconds when disk exceeds 90%. Typical storm snapshot:

08:32:31 - Disk at 1321.5% (Queue: 14, ReadMB/s: 2050.05, WriteMB/s: 0.72)
--- Attribution Summary ---
CapturedProcessMB/s: 1.03
UnattributedMB/s: 2050
UnattributedFraction: 0.9996

All visible processes (System, VS Code, Chrome, Everything, Slack, vmwp.exe) collectively account for ~1 MB/s of the 2,050 MB/s physical reads. The remaining 99.96% is invisible to process-level performance counters.

ETW/WPR kernel trace

A 45-second WPR trace captured automatically during a storm (GeneralProfile.light + DiskIO + FileIO) shows:

  • 5,111,169 events, 0 lost — trace is valid
  • 14,537 DiskIo ReadInit, 18,202 DiskIo Read, 96,590 FileIo Read events
  • 29,753 StorPort request dispatches, 29,797 completions, 60,208 queue operations
  • 718,496 stack walks

This confirms real kernel-level storage stack activity, not a counter artifact.

Storm timeline (typical)

06:34:57 -  249.1% (Queue: 0,  Read: 824 MB/s)   — starts
06:35:11 -  241.3% (Queue: 2,  Read: 1047 MB/s)   — escalating
06:35:37 -  323.0% (Queue: 1,  Read: 1403 MB/s)   — ramping
06:35:51 - 1187.4% (Queue: 14, Read: 2164 MB/s)   — peak
06:36:17 - 1476.9% (Queue: 11, Read: 2316 MB/s)   — sustained
06:37:23 - 3649.7% (Queue: 24, Read: 1984 MB/s)   — peak queue
06:38:03 -  898.6% (Queue: 13, Read: 1860 MB/s)   — sustained

Storms last 2-10 minutes. Reads are nearly 100% of the I/O (writes stay near 0).

File system filter drivers (clean)

fltmc output shows only standard Windows filters — no third-party drivers:

WdFilter (Windows Defender), bindflt, storqosflt, CldFlt, bfs, luafv, Wof, FileInfo

What has been ruled out

  1. Third-party antivirus — BitDefender fully uninstalled, Windows Defender exclusions added for D:\wsl, \\wsl$, vmmem, vmwp.exe
  2. Background downloaders — Windows Update Delivery Optimization throttled, VS Code/VS auto-updates disabled
  3. Windows Update KB5083769 — uninstalled, storms persist
  4. WSLg — disabled (guiApplications=false)
  5. WSL memory/swap config — tested multiple configurations
  6. VHDX bloat — compacted
  7. ext4 fragmentation inside VHDX — defragmented all files (verified with filefrag), storms persist
  8. Disk health — NVMe reports healthy, no read/write errors, Windows Event Logs clean
  9. Windows Search / Everything indexer — present but showing 0 MB/s during storms

Diagnostic logs available

I have the following ready to share via private link (not publicly, as the ETL contains file paths):

  • WSL diagnostic logs collected using Microsoft's collect-wsl-logs.ps1 with -LogProfile storage — in the exact format the WSL team expects
  • Multiple ETL traces (600MB+) captured by WPR during active storms with DiskIO + FileIO + GeneralProfile.light profiles
  • Detailed disk I/O logs with per-process attribution, PIDs, command lines, and physical-vs-process I/O gap analysis across multiple storms over 2 weeks

Happy to share any of these via a private channel, secure upload, or email with the investigating engineer.

Conclusion

The storm is confirmed at the StorPort level in the host storage stack. Process-level counters cannot attribute the I/O because it originates below user-mode — likely in the Hyper-V virtual disk driver path (vhdmp.sys / storvsp.sys). wsl --shutdown immediately stops the storm, confirming WSL2's VM is the trigger. The NVMe (WD SN560) is a possible contributing factor but unconfirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions