Skip to content

Conversation

@ludfjig
Copy link
Contributor

@ludfjig ludfjig commented Sep 30, 2025

This is PR 2/3 in a bigger effort to remove duplicate code across drivers.

depends on #907 which must be merged first, will mark this PR as ready then

This PR introduces

  • Vm trait. It's a minimal trait for common functionality of a minimal Vm. It abstracts over differences in kvm, mshv, whp. This traits only knows things like set/get registers, run, but nothing about guest functions or hyperlight specifics.
  • HyperlightVm struct. This is a struct that contains the dyn Vm above, as well as things like guest_ptr, rsp, memory-regions, gdb connections, etc. You can think of this as replacing the previous Hypervisor trait (but now it's just 1 struct to avoid duplicate code). HyperlightVm knows about initialization, dispatching guest calls, gdb-debugging etc, guest-tracing, which Vm trait doesn't.
  • Simplifies and refactors some cancellation stuff relating to kill() without changing behavior

Closes #465, #904

@ludfjig ludfjig force-pushed the vm_trait_new branch 3 times, most recently from 81f0d54 to 62fad87 Compare October 22, 2025 19:44
@ludfjig ludfjig added the kind/refactor For PRs that restructure or remove code without adding new functionality. label Oct 22, 2025
@ludfjig ludfjig force-pushed the vm_trait_new branch 17 times, most recently from 1562f26 to edc7f00 Compare October 24, 2025 20:15
@ludfjig ludfjig force-pushed the vm_trait_new branch 4 times, most recently from f6337f9 to 350b8e0 Compare October 28, 2025 18:05
@ludfjig ludfjig marked this pull request as ready for review October 28, 2025 18:39
@ludfjig ludfjig force-pushed the vm_trait_new branch 5 times, most recently from d2e6bf8 to afa720d Compare November 4, 2025 00:55
Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time looking at these changes. There seem to be alot of really good things in here. My main concern is that it is a very big change that encompases new features, changes, and simplifications across multiple implementations.

My initial thought would be break this down in to small chunks: Add a VM trait, and implement for a single platform. Then move each platform over, this would keep the set of changes smaller. I would also think that we could keep things simplier to review by not doing updates to implementations like the changes for "Simplifies and refactors some cancellation stuff" and have those as seperate change sets

let cancel_was_requested_manually =
interrupt_handle_internal.is_cancel_requested_for_generation(generation);
fn unmap_memory(&mut self, (_slot, _region): (u32, &MemoryRegion)) -> Result<()> {
log_then_return!("Mapping host memory into the guest not yet supported on this platform");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks implemented above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just the initial mapping. Subsequent mappings after creating the initial sandbox is not yet supported on windows


let mut regs = self.regs()?;
regs.rip = rip + instruction_length;
self.set_regs(&regs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this rip set in the new implementation in hyperlight_vm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since not all drivers require incrementing RIP, it's an implementation detail of each driver to handle it as part of their run_vcpu. If you check hyperv_linux/windows.rs, you'll see that it's still done


// --------------------------
// --- DEBUGGING BELOW ------
// --------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I preferd this being another trait that was implemented as it was before. Whats the motivation for bringing this in here?

Copy link
Contributor Author

@ludfjig ludfjig Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I thought it made sense for them to belong on the "Vm" and to be able to get rid of the vcpu abstraction, and a lot of these are not needed anymore. But I agree I think maybe it makes sense.... I can implement all the debugging functionality separately on a supertrait of Vm trait, that should be a bit cleaner. In additon, some code such as getting/setting debug registers, etc, will be needed in the future for purposes of resetting vcpu, even when debugging is not enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are all hidden behind the flag, you could keep GuestDebug (modified to what you have here) in the gdb folder and impl GuestDebug for Hyperlight_vm when the flag is enabled I believe

WriteAddr,
WriteRegisters,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we keep some of this trait here instead of moving it up into the new vm trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, I could make it a supertrait of vm

exception,
hw_breakpoints,
sw_breakpoints,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear why these have been removed from the error log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not necessary to determine stop reason so I removed them in order not having to keep unnecessary state around and simplify the code

Copy link
Contributor

@dblnz dblnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!
I like that we get rid of duplicated code, and we now have a clear separation of common Vm code and specific Vm logic.
It is definitely the right direction we want to go.

I left some small comments.
I need to emphasize the fact that being a big change, it is difficult to follow where code was moved.

)
)]
Retry(),
#[cfg(gdb)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Invert the comment and #[cfg..

Suggested change
#[cfg(gdb)]
/// The vCPU has exited due to a debug event (usually breakpoint)
#[cfg(gdb)]

use crate::sandbox::uninitialized::SandboxRuntimeConfig;
use crate::{HyperlightError, Result, log_then_return, new_error};

pub(crate) struct HyperlightVm {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments here explaining what this struct wants to achieve and why we need it would be helpful.
I assume this replaces the old Hypervisor

// Architectures Software Developer's Manual
if dr6 & DR6_BS_FLAG_MASK != 0 && single_step {
return VcpuStopReason::DoneStep;
if dr6 & DR6_BS_FLAG_MASK != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to remove the single_step variable here. It was only an additional state for verifying the correct flag was in sync with internal state of debugging


if BP_EX_ID == exception && sw_breakpoints.contains_key(&rip) {
return VcpuStopReason::SwBp;
if BP_EX_ID == exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might remove an option here.

If the vCPU stops because of an issue, not a SW/HW breakpoint set by the debugger, it will not be transmitted as an unknown breakpoint.

I am not sure though if this means that the debugger won't show it as an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/refactor For PRs that restructure or remove code without adding new functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rethink driver API

3 participants