Skip to content

Conversation

@ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Dec 16, 2025

MSRs will be added in another PR.

  • Add debug register abstraction over hv implementation
  • Removed padding fields from fpu register to simplify comparisons. Not sure why I added them to begin with, we don't need C repr on these. It could possibly allow more efficient into() implementation for kvm/mshv due to single memcyp, but that seems like premature optimization to me
  • Reset vcpu state on snapshot::restore
  • Added bunch of tests

@ludfjig ludfjig added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Dec 16, 2025
@ludfjig ludfjig force-pushed the reset_vcpu branch 3 times, most recently from 9cfcc9c to 8507c7a Compare December 16, 2025 19:52
xsave
}

fn hyperlight_vm(code: &[u8]) -> Result<HyperlightVm> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this duplicating alot of the configuration code we do during VM setup in the test? how will we maintain this with the actual setup? What if there is a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is calling HyperlightVm::new so they should be the same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the logic setting up the exclusive memory, stack cookie, etc. It seems like alot of the same in set_up_hypervisor_partition but simplier and having that not be the same might cause difference in configuration that might not be caught. This seems like a good case of doing integration testing at the host API and guest api layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does indeed do a lot of setup... But that's more related to memory, and whether it's different or not shouldn't affect these tests which exercises vcpu state tests (I think)

Copy link
Contributor Author

@ludfjig ludfjig Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebasing on top of main branch with changes that changed snapshotting, it got even worse. But I still believe it's fine, because memory-specifics is irrelevant to these tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this but I guess it might be fine. We might be able to add an abstraction that "preps the vm" but I haven't really thought this through fully.

@ludfjig ludfjig force-pushed the reset_vcpu branch 11 times, most recently from d09b1fc to 18d4ff9 Compare December 17, 2025 06:46
@ludfjig ludfjig requested a review from Copilot December 17, 2025 06:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances vCPU state management during snapshot restoration by adding debug register abstraction and ensuring all vCPU state is properly reset. The changes follow the established Common* struct pattern for hypervisor abstraction and include comprehensive test coverage across KVM, MSHV, and WHP hypervisors.

Key changes:

  • Added CommonDebugRegs abstraction with implementations for all supported hypervisors (KVM, MSHV, WHP)
  • Removed padding fields from CommonFpu to simplify comparisons while maintaining correct conversions to hypervisor-specific types
  • Implemented reset_vcpu() to reset general purpose registers, debug registers, FPU/XSAVE state, and special registers
  • Added reset_vcpu() call in MultiUseSandbox::restore() to ensure clean state after snapshot restoration
  • Added extensive tests covering all reset scenarios with both direct state setting and actual code execution

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/hyperlight_host/src/hypervisor/regs/debug_regs.rs New file implementing CommonDebugRegs abstraction with conversions for KVM, MSHV, and WHP
src/hyperlight_host/src/hypervisor/regs/fpu.rs Removed pad1 and pad2 fields from CommonFpu; updated all conversions to use literal 0 for padding
src/hyperlight_host/src/hypervisor/regs.rs Added debug_regs module export
src/hyperlight_host/src/hypervisor/mod.rs Added debug_regs, set_debug_regs, reset_xsave, and set_xsave methods to Hypervisor trait
src/hyperlight_host/src/hypervisor/kvm.rs Implemented debug register and xsave methods for KVM hypervisor
src/hyperlight_host/src/hypervisor/hyperv_linux.rs Implemented debug register and xsave methods for MSHV hypervisor
src/hyperlight_host/src/hypervisor/hyperv_windows.rs Implemented debug register and xsave methods for WHP; refactored gdb breakpoint code to use new abstractions
src/hyperlight_host/src/hypervisor/hyperlight_vm.rs Added reset_vcpu() method and comprehensive test suite covering all vCPU state reset scenarios
src/hyperlight_host/src/sandbox/initialized_multi_use.rs Added reset_vcpu() call in restore() and test to verify debug registers are reset
src/hyperlight_host/src/sandbox/hypervisor.rs Added Copy and Clone derives to HypervisorType enum
src/hyperlight_host/src/mem/layout.rs Changed visibility from pub(super) to pub(crate) for methods used by tests
src/tests/rust_guests/simpleguest/src/main.rs Added SetDr0 and GetDr0 guest functions for testing debug register reset
.github/workflows/ValidatePullRequest.yml Changed fail-fast from true to false for build-test job to allow all matrix combinations to complete

@ludfjig ludfjig marked this pull request as ready for review December 17, 2025 07:26
Signed-off-by: Ludvig Liljenberg <[email protected]>
@ludfjig ludfjig force-pushed the reset_vcpu branch 2 times, most recently from 4981f2d to 180efe7 Compare January 12, 2026 22:56
Signed-off-by: Ludvig Liljenberg <[email protected]>

Make set_xsave take slice

Signed-off-by: Ludvig Liljenberg <[email protected]>
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to see. Great work figuring out the differences between the hypervisors and paying attention and documenting the differences. I learned a bunch.

My main concern is around the readability and maintenance of the tests. I've left a few suggestions on how we could maybe clean them up to be more concise and easier to reason about

// Note: On KVM this ignores MXCSR so it's being set as part of reset_xsave.
// See https://github.com/torvalds/linux/blob/d358e5254674b70f34c847715ca509e46eb81e6f/arch/x86/kvm/x86.c#L12554-L12599
self.vm.set_fpu(&CommonFpu::default())?;
self.vm.reset_xsave()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe FPU state xsave overlap, could we just do the reset here and save a few cycles?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: this would be a smaller pr if this was a sperate PR and would make its easier to review.

Comment on lines +177 to +178
#[cfg(feature = "init-paging")]
fn set_xsave(&self, xsave: &[u32]) -> Result<()>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it only for the init-paging feature? shouldn't there be a test for all the features?

cr8: 0x5,
efer: 0x500, // LME + LMA
apic_base: 0xFEE00900,
// interrupt_bitmap: [0xFFFFFFFF; 4],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may need to remove this comment? or do something differnet for mshv

Comment on lines +1621 to +1639
expected_sregs.ss.db = got_sregs.ss.db;
// unusable and g are hypervisor implementation details (see comment below for details)
expected_sregs.cs.unusable = got_sregs.cs.unusable;
expected_sregs.cs.g = got_sregs.cs.g;
expected_sregs.ds.unusable = got_sregs.ds.unusable;
expected_sregs.ds.g = got_sregs.ds.g;
expected_sregs.es.unusable = got_sregs.es.unusable;
expected_sregs.es.g = got_sregs.es.g;
expected_sregs.fs.unusable = got_sregs.fs.unusable;
expected_sregs.fs.g = got_sregs.fs.g;
expected_sregs.gs.unusable = got_sregs.gs.unusable;
expected_sregs.gs.g = got_sregs.gs.g;
expected_sregs.ss.unusable = got_sregs.ss.unusable;
expected_sregs.ss.g = got_sregs.ss.g;
expected_sregs.tr.unusable = got_sregs.tr.unusable;
expected_sregs.tr.g = got_sregs.tr.g;
expected_sregs.ldt.unusable = got_sregs.ldt.unusable;
expected_sregs.ldt.g = got_sregs.ldt.g;
assert_eq!(got_sregs, expected_sregs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move this to a single function (shared with the one below) so the comment stays the same? There is alot of duplication in the function and is really hard to read. and know the different phases we are in.

}
// fpr only uses 80 bits per register. Normalize upper bits for comparison.
for i in 0..8 {
expected_fpu.fpr[i][10..16].copy_from_slice(&got_fpu.fpr[i][10..16]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is duplicated in multiple spots in the function, could we condense it?

}

#[test]
fn reset_vcpu_simple() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this test is really hard to read. Could we evaluate a way to make it easier to read?

rip: 0,
rflags: 1 << 1, // Reserved bit 1 is always set
};
assert_eq!(regs, expected_reset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks to be the same as in simple_reset:
// Verify the fpu was reset to defaults
assert_eq!(
hyperlight_vm.vm.regs().unwrap(),
CommonRegisters {
rflags: 1 << 1, // Reserved bit 1 is always set
..Default::default()
}
);

could we add helper funciton like verify_registers_reset and re-use this across all the tests? that would make off the tests more consistent and easier to read.

#[rustfmt::skip]
const CODE: [u8; 289] = [
// xmm0-xmm7: use movd + pshufd to fill with pattern
0xb8, 0x11, 0x11, 0x11, 0x11, // mov eax, 0x11111111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems really hard to reason about, Could we use something like https://docs.rs/iced-x86/latest/iced_x86/ or put the files in test.asm and compile them at build time with nasm then bring them in with include_bytes

/// Test that snapshot restore properly resets vCPU debug registers. This test verifies
/// that restore() calls reset_vcpu.
#[test]
fn snapshot_restore_resets_debug_registers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also add a test like this for other registers too? Doesn't need to be all the registers but atleast the common registers kind of like it is done here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants