Skip to content

Save/restore UART state to/from snapshots#5764

Merged
ilstam merged 4 commits intofirecracker-microvm:mainfrom
ilstam:uart-save-restore
Mar 31, 2026
Merged

Save/restore UART state to/from snapshots#5764
ilstam merged 4 commits intofirecracker-microvm:mainfrom
ilstam:uart-save-restore

Conversation

@ilstam
Copy link
Copy Markdown
Contributor

@ilstam ilstam commented Mar 16, 2026

The VMM does not save the UART state when taking a snapshot. Upon
restoring it uses a weird hack in emulate_serial_init() where it enables
RX (but no TX) interrupts after a snapshot restore. This hack apart form
weird is also wrong as the guest driver might have never enabled RX
interrupts in the first place. A comment in the same function
acknowledges that the correct thing to do is save and restore the entire
device state.

Because we don't re-enable TX interrupts after a snapshot restore, the
guest driver might hang if it relies on those TX interrupts as reported
in 1.

Fix this by saving the UART state in snapshots and restoring it from
them.

1 - #5730

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ilstam ilstam added the Type: Bug Indicates an unexpected problem or unintended behavior label Mar 16, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 88.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.04%. Comparing base (f314fdd) to head (5d75957).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/device_manager/mod.rs 89.28% 3 Missing ⚠️
src/vmm/src/device_manager/persist.rs 90.32% 3 Missing ⚠️
src/vmm/src/devices/legacy/serial.rs 81.25% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5764      +/-   ##
==========================================
+ Coverage   83.02%   83.04%   +0.02%     
==========================================
  Files         276      276              
  Lines       29430    29466      +36     
==========================================
+ Hits        24435    24471      +36     
  Misses       4995     4995              
Flag Coverage Δ
5.10-m5n.metal 83.32% <95.52%> (+0.02%) ⬆️
5.10-m6a.metal 82.66% <95.52%> (+0.03%) ⬆️
5.10-m6g.metal 79.96% <38.33%> (-0.09%) ⬇️
5.10-m6i.metal 83.33% <95.52%> (+0.02%) ⬆️
5.10-m7a.metal-48xl 82.65% <95.52%> (+0.03%) ⬆️
5.10-m7g.metal 79.96% <38.33%> (-0.09%) ⬇️
5.10-m7i.metal-24xl 83.30% <95.52%> (+0.02%) ⬆️
5.10-m7i.metal-48xl 83.30% <95.52%> (+0.03%) ⬆️
5.10-m8g.metal-24xl 79.95% <38.33%> (-0.09%) ⬇️
5.10-m8g.metal-48xl 79.95% <38.33%> (-0.08%) ⬇️
5.10-m8i.metal-48xl 83.30% <95.52%> (+0.03%) ⬆️
5.10-m8i.metal-96xl 83.30% <95.52%> (+0.02%) ⬆️
6.1-m5n.metal 83.35% <95.52%> (+0.03%) ⬆️
6.1-m6a.metal 82.68% <95.52%> (+0.02%) ⬆️
6.1-m6g.metal 79.95% <38.33%> (-0.09%) ⬇️
6.1-m6i.metal 83.35% <95.52%> (+0.02%) ⬆️
6.1-m7a.metal-48xl 82.68% <95.52%> (+0.03%) ⬆️
6.1-m7g.metal 79.95% <38.33%> (-0.09%) ⬇️
6.1-m7i.metal-24xl 83.36% <95.52%> (+0.02%) ⬆️
6.1-m7i.metal-48xl 83.36% <95.52%> (+0.02%) ⬆️
6.1-m8g.metal-24xl 79.94% <38.33%> (-0.09%) ⬇️
6.1-m8g.metal-48xl 79.94% <38.33%> (-0.09%) ⬇️
6.1-m8i.metal-48xl 83.37% <95.52%> (+0.02%) ⬆️
6.1-m8i.metal-96xl 83.37% <95.52%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a changelog item as well for this as well

Comment thread src/vmm/src/device_manager/mod.rs
Comment thread src/vmm/src/device_manager/persist.rs
@ilstam ilstam force-pushed the uart-save-restore branch from 9082575 to 2e9fcf0 Compare March 16, 2026 17:17
@ilstam ilstam requested review from kalyazin and pb8o as code owners March 16, 2026 17:17
@ilstam ilstam force-pushed the uart-save-restore branch 3 times, most recently from b14b444 to 089666d Compare March 16, 2026 18:02
@ilstam ilstam added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Mar 16, 2026
@ilstam ilstam force-pushed the uart-save-restore branch from 089666d to 584bdc3 Compare March 16, 2026 18:05
@ilstam ilstam force-pushed the uart-save-restore branch from 584bdc3 to 6658ebf Compare March 31, 2026 14:19
Comment thread src/vmm/src/device_manager/persist.rs
Copy link
Copy Markdown
Contributor

@Manciukic Manciukic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of minor issues

Comment thread tests/integration_tests/functional/test_serial_io.py
Comment thread src/vmm/src/devices/legacy/serial.rs Outdated
ilstam added 2 commits March 31, 2026 15:54
The VMM does not save the UART state when taking a snapshot. Upon
restoring it uses a weird hack in emulate_serial_init() where it enables
RX (but no TX) interrupts after a snapshot restore. This hack apart form
weird is also wrong as the guest driver might have never enabled RX
interrupts in the first place. A comment in the same function
acknowledges that the correct thing to do is save and restore the entire
device state.

Because we don't re-enable TX interrupts after a snapshot restore, the
guest driver might hang if it relies on those TX interrupts as reported
in [1].

Fix this by saving the UART state in snapshots and restoring it from
them.

Bump the SNAPSHOT_VERSION appropriately.

[1] - firecracker-microvm#5730

Reported-by: EJ Ciramella <ejc3@meta.com>
Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
Add a simple unit test that injects a few bytes into the UART's FIFO,
stores it's state, create a new device from the saved state and is able
to read back the correct bytes.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
@ilstam ilstam force-pushed the uart-save-restore branch from 6658ebf to d4678ac Compare March 31, 2026 15:11
ilstam added 2 commits March 31, 2026 16:17
Commit 7e619f4 ("devices: serial: Save/restore UART state to/from
snapshots") fixed a bug where the guest driver would hang after a
snapshot restore because Firecracker didn't properly restore the state
of the UART device.

Add a test_serial_active_tx_snapshot test case (which mostly duplicates
the existing test_serial_after_snapshot).

Start an unbounded transmission before taking the snapshot. This will
saturate the TX buffer of the emulated UART and will make the guest
driver enable TX interrupts. Without the fix, the test case fails
because the driver is stuck waiting for a TX interrupt that will never
be delivered after the snapshot is restored.

Note that the serial.tx("") that sends a newline to flush the buffer in
the original test case isn't needed anymore since sending Ctrl+C will
achieve the same result.

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
firecracker-microvm#5705

Signed-off-by: Ilias Stamatis <ilstam@amazon.com>
@ilstam ilstam force-pushed the uart-save-restore branch from d4678ac to 5d75957 Compare March 31, 2026 15:17
@ilstam ilstam merged commit 86aac24 into firecracker-microvm:main Mar 31, 2026
7 checks passed
@ilstam ilstam deleted the uart-save-restore branch March 31, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Bug Indicates an unexpected problem or unintended behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants