-
Notifications
You must be signed in to change notification settings - Fork 157
Reset more vcpu state on snapshot::restore #1120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9cfcc9c to
8507c7a
Compare
| xsave | ||
| } | ||
|
|
||
| fn hyperlight_vm(code: &[u8]) -> Result<HyperlightVm> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this duplicating alot of the configuration code we do during VM setup in the test? how will we maintain this with the actual setup? What if there is a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is calling HyperlightVm::new so they should be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about the logic setting up the exclusive memory, stack cookie, etc. It seems like alot of the same in set_up_hypervisor_partition but simplier and having that not be the same might cause difference in configuration that might not be caught. This seems like a good case of doing integration testing at the host API and guest api layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it does indeed do a lot of setup... But that's more related to memory, and whether it's different or not shouldn't affect these tests which exercises vcpu state tests (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After rebasing on top of main branch with changes that changed snapshotting, it got even worse. But I still believe it's fine, because memory-specifics is irrelevant to these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this but I guess it might be fine. We might be able to add an abstraction that "preps the vm" but I haven't really thought this through fully.
d09b1fc to
18d4ff9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances vCPU state management during snapshot restoration by adding debug register abstraction and ensuring all vCPU state is properly reset. The changes follow the established Common* struct pattern for hypervisor abstraction and include comprehensive test coverage across KVM, MSHV, and WHP hypervisors.
Key changes:
- Added
CommonDebugRegsabstraction with implementations for all supported hypervisors (KVM, MSHV, WHP) - Removed padding fields from
CommonFputo simplify comparisons while maintaining correct conversions to hypervisor-specific types - Implemented
reset_vcpu()to reset general purpose registers, debug registers, FPU/XSAVE state, and special registers - Added
reset_vcpu()call inMultiUseSandbox::restore()to ensure clean state after snapshot restoration - Added extensive tests covering all reset scenarios with both direct state setting and actual code execution
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/hyperlight_host/src/hypervisor/regs/debug_regs.rs |
New file implementing CommonDebugRegs abstraction with conversions for KVM, MSHV, and WHP |
src/hyperlight_host/src/hypervisor/regs/fpu.rs |
Removed pad1 and pad2 fields from CommonFpu; updated all conversions to use literal 0 for padding |
src/hyperlight_host/src/hypervisor/regs.rs |
Added debug_regs module export |
src/hyperlight_host/src/hypervisor/mod.rs |
Added debug_regs, set_debug_regs, reset_xsave, and set_xsave methods to Hypervisor trait |
src/hyperlight_host/src/hypervisor/kvm.rs |
Implemented debug register and xsave methods for KVM hypervisor |
src/hyperlight_host/src/hypervisor/hyperv_linux.rs |
Implemented debug register and xsave methods for MSHV hypervisor |
src/hyperlight_host/src/hypervisor/hyperv_windows.rs |
Implemented debug register and xsave methods for WHP; refactored gdb breakpoint code to use new abstractions |
src/hyperlight_host/src/hypervisor/hyperlight_vm.rs |
Added reset_vcpu() method and comprehensive test suite covering all vCPU state reset scenarios |
src/hyperlight_host/src/sandbox/initialized_multi_use.rs |
Added reset_vcpu() call in restore() and test to verify debug registers are reset |
src/hyperlight_host/src/sandbox/hypervisor.rs |
Added Copy and Clone derives to HypervisorType enum |
src/hyperlight_host/src/mem/layout.rs |
Changed visibility from pub(super) to pub(crate) for methods used by tests |
src/tests/rust_guests/simpleguest/src/main.rs |
Added SetDr0 and GetDr0 guest functions for testing debug register reset |
.github/workflows/ValidatePullRequest.yml |
Changed fail-fast from true to false for build-test job to allow all matrix combinations to complete |
Signed-off-by: Ludvig Liljenberg <[email protected]>
4981f2d to
180efe7
Compare
Signed-off-by: Ludvig Liljenberg <[email protected]> Make set_xsave take slice Signed-off-by: Ludvig Liljenberg <[email protected]>
jsturtevant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great to see. Great work figuring out the differences between the hypervisors and paying attention and documenting the differences. I learned a bunch.
My main concern is around the readability and maintenance of the tests. I've left a few suggestions on how we could maybe clean them up to be more concise and easier to reason about
| // Note: On KVM this ignores MXCSR so it's being set as part of reset_xsave. | ||
| // See https://github.com/torvalds/linux/blob/d358e5254674b70f34c847715ca509e46eb81e6f/arch/x86/kvm/x86.c#L12554-L12599 | ||
| self.vm.set_fpu(&CommonFpu::default())?; | ||
| self.vm.reset_xsave()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe FPU state xsave overlap, could we just do the reset here and save a few cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: this would be a smaller pr if this was a sperate PR and would make its easier to review.
| #[cfg(feature = "init-paging")] | ||
| fn set_xsave(&self, xsave: &[u32]) -> Result<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it only for the init-paging feature? shouldn't there be a test for all the features?
| cr8: 0x5, | ||
| efer: 0x500, // LME + LMA | ||
| apic_base: 0xFEE00900, | ||
| // interrupt_bitmap: [0xFFFFFFFF; 4], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may need to remove this comment? or do something differnet for mshv
| expected_sregs.ss.db = got_sregs.ss.db; | ||
| // unusable and g are hypervisor implementation details (see comment below for details) | ||
| expected_sregs.cs.unusable = got_sregs.cs.unusable; | ||
| expected_sregs.cs.g = got_sregs.cs.g; | ||
| expected_sregs.ds.unusable = got_sregs.ds.unusable; | ||
| expected_sregs.ds.g = got_sregs.ds.g; | ||
| expected_sregs.es.unusable = got_sregs.es.unusable; | ||
| expected_sregs.es.g = got_sregs.es.g; | ||
| expected_sregs.fs.unusable = got_sregs.fs.unusable; | ||
| expected_sregs.fs.g = got_sregs.fs.g; | ||
| expected_sregs.gs.unusable = got_sregs.gs.unusable; | ||
| expected_sregs.gs.g = got_sregs.gs.g; | ||
| expected_sregs.ss.unusable = got_sregs.ss.unusable; | ||
| expected_sregs.ss.g = got_sregs.ss.g; | ||
| expected_sregs.tr.unusable = got_sregs.tr.unusable; | ||
| expected_sregs.tr.g = got_sregs.tr.g; | ||
| expected_sregs.ldt.unusable = got_sregs.ldt.unusable; | ||
| expected_sregs.ldt.g = got_sregs.ldt.g; | ||
| assert_eq!(got_sregs, expected_sregs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we move this to a single function (shared with the one below) so the comment stays the same? There is alot of duplication in the function and is really hard to read. and know the different phases we are in.
| } | ||
| // fpr only uses 80 bits per register. Normalize upper bits for comparison. | ||
| for i in 0..8 { | ||
| expected_fpu.fpr[i][10..16].copy_from_slice(&got_fpu.fpr[i][10..16]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is duplicated in multiple spots in the function, could we condense it?
| } | ||
|
|
||
| #[test] | ||
| fn reset_vcpu_simple() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this test is really hard to read. Could we evaluate a way to make it easier to read?
| rip: 0, | ||
| rflags: 1 << 1, // Reserved bit 1 is always set | ||
| }; | ||
| assert_eq!(regs, expected_reset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to be the same as in simple_reset:
// Verify the fpu was reset to defaults
assert_eq!(
hyperlight_vm.vm.regs().unwrap(),
CommonRegisters {
rflags: 1 << 1, // Reserved bit 1 is always set
..Default::default()
}
);
could we add helper funciton like verify_registers_reset and re-use this across all the tests? that would make off the tests more consistent and easier to read.
| #[rustfmt::skip] | ||
| const CODE: [u8; 289] = [ | ||
| // xmm0-xmm7: use movd + pshufd to fill with pattern | ||
| 0xb8, 0x11, 0x11, 0x11, 0x11, // mov eax, 0x11111111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems really hard to reason about, Could we use something like https://docs.rs/iced-x86/latest/iced_x86/ or put the files in test.asm and compile them at build time with nasm then bring them in with include_bytes
| /// Test that snapshot restore properly resets vCPU debug registers. This test verifies | ||
| /// that restore() calls reset_vcpu. | ||
| #[test] | ||
| fn snapshot_restore_resets_debug_registers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we also add a test like this for other registers too? Doesn't need to be all the registers but atleast the common registers kind of like it is done here?
MSRs will be added in another PR.
into()implementation for kvm/mshv due to single memcyp, but that seems like premature optimization to me