[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/hvm: Add Kconfig option to disable nested virtualization



On Fri, 5 Feb 2026, Roger Pau Monné wrote:
> On Thu, Feb 05, 2026 at 05:50:32PM -0800, Stefano Stabellini wrote:
> > Introduce CONFIG_NESTED_VIRT (default y, requires EXPERT to disable)
> > to allow nested virtualization support to be disabled at build time.
> > This is useful for embedded or safety-focused deployments where
> > nested virtualization is not needed, reducing code size and attack
> > surface.
> > 
> > When CONFIG_NESTED_VIRT=n, the following source files are excluded:
> > - arch/x86/hvm/nestedhvm.c
> > - arch/x86/hvm/svm/nestedsvm.c
> > - arch/x86/hvm/vmx/vvmx.c
> > - arch/x86/mm/nested.c
> > - arch/x86/mm/hap/nested_hap.c
> > - arch/x86/mm/hap/nested_ept.c
> > 
> > Add inline stubs where needed in headers.
> > 
> > No functional change when CONFIG_NESTED_VIRT=y.
> 
> You also need to adjust arch_sanitise_domain_config() so it refuses to
> create domains with the XEN_DOMCTL_CDF_nested_virt flag set when
> CONFIG_NESTED_VIRT=n. 

Sounds good


> If you do that I think a bunch of the dummy
> helpers that you add when CONFIG_NESTED_VIRT=n should also gain an
> ASSERT_UNREACHABLE().

OK


> And IMO you will also need to add a new XEN_SYSCTL_PHYSCAP_nestedhvm
> (or alike) to signal the toolstack whether the nested HVM feature is
> available.  Much like we do for HAP/Shadow/gnttab availability.

yeah good point


> > Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx>
> > ---
> >  xen/arch/x86/hvm/Kconfig                 | 10 ++++++
> >  xen/arch/x86/hvm/Makefile                |  2 +-
> >  xen/arch/x86/hvm/svm/Makefile            |  2 +-
> >  xen/arch/x86/hvm/svm/nestedhvm.h         | 44 +++++++++++++++++++++---
> >  xen/arch/x86/hvm/svm/svm.c               |  6 ++++
> >  xen/arch/x86/hvm/vmx/Makefile            |  2 +-
> >  xen/arch/x86/hvm/vmx/vmx.c               | 10 ++++--
> >  xen/arch/x86/include/asm/hvm/nestedhvm.h | 41 +++++++++++++++++-----
> >  xen/arch/x86/include/asm/hvm/vmx/vvmx.h  | 30 ++++++++++++++++
> >  xen/arch/x86/mm/Makefile                 |  2 +-
> >  xen/arch/x86/mm/hap/Makefile             |  4 +--
> >  xen/arch/x86/mm/p2m.h                    |  6 ++++
> >  12 files changed, 137 insertions(+), 22 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/Kconfig b/xen/arch/x86/hvm/Kconfig
> > index f32bf5cbb7..12b5df4710 100644
> > --- a/xen/arch/x86/hvm/Kconfig
> > +++ b/xen/arch/x86/hvm/Kconfig
> > @@ -92,4 +92,14 @@ config MEM_SHARING
> >     bool "Xen memory sharing support (UNSUPPORTED)" if UNSUPPORTED
> >     depends on INTEL_VMX
> >  
> > +config NESTED_VIRT
> > +   bool "Nested virtualization support" if EXPERT
> > +   depends on AMD_SVM || INTEL_VMX
> > +   default y
> > +   help
> > +     Enable nested virtualization, allowing guests to run their own
> > +     hypervisors. This requires hardware support.
> > +
> > +     If unsure, say Y.
> 
> If we go that route, I think nested virt should become off by default.
> It's not security supported, and known to be broken in many areas.
> 
> I'm also unsure about whether this wants to be gated under EXPERT.
> But I'm not sure I'm any good at knowing whether something should be
> under EXPERT or not.

I am happy either way and I'll others decide on the default. I did it
this way to avoid changes over the current baseline. In case there is
disagreement, I am also happy if it gets changed on commit based on the
latest preference.


> > +
> >  endif
> > diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> > index f34fb03934..b8a0a68624 100644
> > --- a/xen/arch/x86/hvm/Makefile
> > +++ b/xen/arch/x86/hvm/Makefile
> > @@ -18,7 +18,7 @@ obj-y += irq.o
> >  obj-y += mmio.o
> >  obj-$(CONFIG_VM_EVENT) += monitor.o
> >  obj-y += mtrr.o
> > -obj-y += nestedhvm.o
> > +obj-$(CONFIG_NESTED_VIRT) += nestedhvm.o
> >  obj-y += pmtimer.o
> >  obj-y += quirks.o
> >  obj-y += rtc.o
> > diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile
> > index 8a072cafd5..92418e3444 100644
> > --- a/xen/arch/x86/hvm/svm/Makefile
> > +++ b/xen/arch/x86/hvm/svm/Makefile
> > @@ -2,6 +2,6 @@ obj-y += asid.o
> >  obj-y += emulate.o
> >  obj-bin-y += entry.o
> >  obj-y += intr.o
> > -obj-y += nestedsvm.o
> > +obj-$(CONFIG_NESTED_VIRT) += nestedsvm.o
> >  obj-y += svm.o
> >  obj-y += vmcb.o
> > diff --git a/xen/arch/x86/hvm/svm/nestedhvm.h 
> > b/xen/arch/x86/hvm/svm/nestedhvm.h
> > index 9bfed5ffd7..a102c076ea 100644
> > --- a/xen/arch/x86/hvm/svm/nestedhvm.h
> > +++ b/xen/arch/x86/hvm/svm/nestedhvm.h
> > @@ -26,6 +26,13 @@
> >  #define nsvm_efer_svm_enabled(v) \
> >      (!!((v)->arch.hvm.guest_efer & EFER_SVME))
> >  
> > +#define NSVM_INTR_NOTHANDLED     3
> > +#define NSVM_INTR_NOTINTERCEPTED 2
> > +#define NSVM_INTR_FORCEVMEXIT    1
> > +#define NSVM_INTR_MASKED         0
> > +
> > +#ifdef CONFIG_NESTED_VIRT
> > +
> >  int nestedsvm_vmcb_map(struct vcpu *v, uint64_t vmcbaddr);
> >  void nestedsvm_vmexit_defer(struct vcpu *v,
> >      uint64_t exitcode, uint64_t exitinfo1, uint64_t exitinfo2);
> > @@ -57,13 +64,40 @@ int cf_check nsvm_hap_walk_L1_p2m(
> >      struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, unsigned int 
> > *page_order,
> >      uint8_t *p2m_acc, struct npfec npfec);
> >  
> > -#define NSVM_INTR_NOTHANDLED     3
> > -#define NSVM_INTR_NOTINTERCEPTED 2
> > -#define NSVM_INTR_FORCEVMEXIT    1
> > -#define NSVM_INTR_MASKED         0
> > -
> >  int nestedsvm_vcpu_interrupt(struct vcpu *v, const struct hvm_intack 
> > intack);
> >  
> > +#else /* !CONFIG_NESTED_VIRT */
> > +
> > +static inline int nestedsvm_vmcb_map(struct vcpu *v, uint64_t vmcbaddr)
> > +{
> > +    return 0;
> > +}
> > +static inline void nestedsvm_vmexit_defer(struct vcpu *v,
> > +    uint64_t exitcode, uint64_t exitinfo1, uint64_t exitinfo2) { }
> > +static inline enum nestedhvm_vmexits nestedsvm_vmexit_n2n1(struct vcpu *v,
> > +    struct cpu_user_regs *regs)
> > +{
> > +    return NESTEDHVM_VMEXIT_ERROR;
> > +}
> > +static inline enum nestedhvm_vmexits nestedsvm_check_intercepts(struct 
> > vcpu *v,
> > +    struct cpu_user_regs *regs, uint64_t exitcode)
> > +{
> > +    return NESTEDHVM_VMEXIT_ERROR;
> > +}
> > +static inline void svm_nested_features_on_efer_update(struct vcpu *v) { }
> > +static inline void svm_vmexit_do_clgi(struct cpu_user_regs *regs,
> > +                                      struct vcpu *v) { }
> > +static inline void svm_vmexit_do_stgi(struct cpu_user_regs *regs,
> > +                                       struct vcpu *v) { }
> > +static inline bool nestedsvm_gif_isset(struct vcpu *v) { return true; }
> > +static inline int nestedsvm_vcpu_interrupt(struct vcpu *v,
> > +                                           const struct hvm_intack intack)
> > +{
> > +    return NSVM_INTR_NOTINTERCEPTED;
> > +}
> > +
> > +#endif /* CONFIG_NESTED_VIRT */
> > +
> >  #endif /* __X86_HVM_SVM_NESTEDHVM_PRIV_H__ */
> >  
> >  /*
> > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> > index 18ba837738..0234b57afb 100644
> > --- a/xen/arch/x86/hvm/svm/svm.c
> > +++ b/xen/arch/x86/hvm/svm/svm.c
> > @@ -46,6 +46,10 @@
> >  
> >  void noreturn svm_asm_do_resume(void);
> >  
> > +#ifndef CONFIG_NESTED_VIRT
> > +void asmlinkage nsvm_vcpu_switch(void) { }
> > +#endif
> > +
> >  u32 svm_feature_flags;
> >  
> >  /*
> > @@ -2465,6 +2469,7 @@ static struct hvm_function_table 
> > __initdata_cf_clobber svm_function_table = {
> >      .set_rdtsc_exiting    = svm_set_rdtsc_exiting,
> >      .get_insn_bytes       = svm_get_insn_bytes,
> >  
> > +#ifdef CONFIG_NESTED_VIRT
> >      .nhvm_vcpu_initialise = nsvm_vcpu_initialise,
> >      .nhvm_vcpu_destroy = nsvm_vcpu_destroy,
> >      .nhvm_vcpu_reset = nsvm_vcpu_reset,
> > @@ -2474,6 +2479,7 @@ static struct hvm_function_table 
> > __initdata_cf_clobber svm_function_table = {
> >      .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
> >      .nhvm_intr_blocked = nsvm_intr_blocked,
> >      .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
> > +#endif
> >  
> >      .get_reg = svm_get_reg,
> >      .set_reg = svm_set_reg,
> > diff --git a/xen/arch/x86/hvm/vmx/Makefile b/xen/arch/x86/hvm/vmx/Makefile
> > index 04a29ce59d..902564b3e2 100644
> > --- a/xen/arch/x86/hvm/vmx/Makefile
> > +++ b/xen/arch/x86/hvm/vmx/Makefile
> > @@ -3,4 +3,4 @@ obj-y += intr.o
> >  obj-y += realmode.o
> >  obj-y += vmcs.o
> >  obj-y += vmx.o
> > -obj-y += vvmx.o
> > +obj-$(CONFIG_NESTED_VIRT) += vvmx.o
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index 82c55f49ae..252f27322b 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -55,6 +55,10 @@
> >  #include <public/hvm/save.h>
> >  #include <public/sched.h>
> >  
> > +#ifndef CONFIG_NESTED_VIRT
> > +void asmlinkage nvmx_switch_guest(void) { }
> > +#endif
> > +
> >  static bool __initdata opt_force_ept;
> >  boolean_param("force-ept", opt_force_ept);
> >  
> > @@ -2033,7 +2037,7 @@ static void nvmx_enqueue_n2_exceptions(struct vcpu *v,
> >                   nvmx->intr.intr_info, nvmx->intr.error_code);
> >  }
> >  
> > -static int cf_check nvmx_vmexit_event(
> > +static int cf_check __maybe_unused nvmx_vmexit_event(
> >      struct vcpu *v, const struct x86_event *event)
> >  {
> >      nvmx_enqueue_n2_exceptions(v, event->vector, event->error_code,
> > @@ -2933,6 +2937,7 @@ static struct hvm_function_table 
> > __initdata_cf_clobber vmx_function_table = {
> >      .handle_cd            = vmx_handle_cd,
> >      .set_info_guest       = vmx_set_info_guest,
> >      .set_rdtsc_exiting    = vmx_set_rdtsc_exiting,
> > +#ifdef CONFIG_NESTED_VIRT
> >      .nhvm_vcpu_initialise = nvmx_vcpu_initialise,
> >      .nhvm_vcpu_destroy    = nvmx_vcpu_destroy,
> >      .nhvm_vcpu_reset      = nvmx_vcpu_reset,
> > @@ -2942,8 +2947,9 @@ static struct hvm_function_table 
> > __initdata_cf_clobber vmx_function_table = {
> >      .nhvm_vcpu_vmexit_event = nvmx_vmexit_event,
> >      .nhvm_intr_blocked    = nvmx_intr_blocked,
> >      .nhvm_domain_relinquish_resources = nvmx_domain_relinquish_resources,
> > -    .update_vlapic_mode = vmx_vlapic_msr_changed,
> >      .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
> > +#endif
> > +    .update_vlapic_mode = vmx_vlapic_msr_changed,
> >  #ifdef CONFIG_VM_EVENT
> >      .enable_msr_interception = vmx_enable_msr_interception,
> >  #endif
> > diff --git a/xen/arch/x86/include/asm/hvm/nestedhvm.h 
> > b/xen/arch/x86/include/asm/hvm/nestedhvm.h
> > index ea2c1bc328..0372974b24 100644
> > --- a/xen/arch/x86/include/asm/hvm/nestedhvm.h
> > +++ b/xen/arch/x86/include/asm/hvm/nestedhvm.h
> > @@ -25,9 +25,21 @@ enum nestedhvm_vmexits {
> >  /* Nested HVM on/off per domain */
> >  static inline bool nestedhvm_enabled(const struct domain *d)
> >  {
> > -    return IS_ENABLED(CONFIG_HVM) && (d->options & 
> > XEN_DOMCTL_CDF_nested_virt);
> > +    return IS_ENABLED(CONFIG_NESTED_VIRT) &&
> > +           (d->options & XEN_DOMCTL_CDF_nested_virt);
> >  }
> >  
> > +/* Nested paging */
> > +#define NESTEDHVM_PAGEFAULT_DONE       0
> > +#define NESTEDHVM_PAGEFAULT_INJECT     1
> > +#define NESTEDHVM_PAGEFAULT_L1_ERROR   2
> > +#define NESTEDHVM_PAGEFAULT_L0_ERROR   3
> > +#define NESTEDHVM_PAGEFAULT_MMIO       4
> > +#define NESTEDHVM_PAGEFAULT_RETRY      5
> > +#define NESTEDHVM_PAGEFAULT_DIRECT_MMIO 6
> > +
> > +#ifdef CONFIG_NESTED_VIRT
> > +
> >  /* Nested VCPU */
> >  int nestedhvm_vcpu_initialise(struct vcpu *v);
> >  void nestedhvm_vcpu_destroy(struct vcpu *v);
> > @@ -38,14 +50,6 @@ bool nestedhvm_vcpu_in_guestmode(struct vcpu *v);
> >  #define nestedhvm_vcpu_exit_guestmode(v)  \
> >      vcpu_nestedhvm(v).nv_guestmode = 0
> >  
> > -/* Nested paging */
> > -#define NESTEDHVM_PAGEFAULT_DONE       0
> > -#define NESTEDHVM_PAGEFAULT_INJECT     1
> > -#define NESTEDHVM_PAGEFAULT_L1_ERROR   2
> > -#define NESTEDHVM_PAGEFAULT_L0_ERROR   3
> > -#define NESTEDHVM_PAGEFAULT_MMIO       4
> > -#define NESTEDHVM_PAGEFAULT_RETRY      5
> > -#define NESTEDHVM_PAGEFAULT_DIRECT_MMIO 6
> >  int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
> >                                      struct npfec npfec);
> >  
> > @@ -59,6 +63,25 @@ unsigned long *nestedhvm_vcpu_iomap_get(bool ioport_80, 
> > bool ioport_ed);
> >  
> >  void nestedhvm_vmcx_flushtlb(struct p2m_domain *p2m);
> >  
> > +#else /* !CONFIG_NESTED_VIRT */
> > +
> > +static inline int nestedhvm_vcpu_initialise(struct vcpu *v) { return 0; }
> > +static inline void nestedhvm_vcpu_destroy(struct vcpu *v) { }
> > +static inline void nestedhvm_vcpu_reset(struct vcpu *v) { }
> > +static inline bool nestedhvm_vcpu_in_guestmode(struct vcpu *v) { return 
> > false; }
> > +static inline int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t 
> > *L2_gpa,
> > +                                                  struct npfec npfec)
> > +{
> > +    return NESTEDHVM_PAGEFAULT_L0_ERROR;
> > +}
> > +#define nestedhvm_vcpu_enter_guestmode(v) do { } while (0)
> > +#define nestedhvm_vcpu_exit_guestmode(v)  do { } while (0)
> > +#define nestedhvm_paging_mode_hap(v) false
> > +#define nestedhvm_vmswitch_in_progress(v) false
> > +static inline void nestedhvm_vmcx_flushtlb(struct p2m_domain *p2m) { }
> > +
> > +#endif /* CONFIG_NESTED_VIRT */
> > +
> >  static inline bool nestedhvm_is_n2(struct vcpu *v)
> >  {
> >      if ( !nestedhvm_enabled(v->domain) ||
> > diff --git a/xen/arch/x86/include/asm/hvm/vmx/vvmx.h 
> > b/xen/arch/x86/include/asm/hvm/vmx/vvmx.h
> > index da10d3fa96..8dc876a4c2 100644
> > --- a/xen/arch/x86/include/asm/hvm/vmx/vvmx.h
> > +++ b/xen/arch/x86/include/asm/hvm/vmx/vvmx.h
> > @@ -73,6 +73,8 @@ union vmx_inst_info {
> >      u32 word;
> >  };
> >  
> > +#ifdef CONFIG_NESTED_VIRT
> > +
> >  int cf_check nvmx_vcpu_initialise(struct vcpu *v);
> >  void cf_check nvmx_vcpu_destroy(struct vcpu *v);
> >  int cf_check nvmx_vcpu_reset(struct vcpu *v);
> > @@ -199,5 +201,33 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
> >                          uint64_t *exit_qual, uint32_t *exit_reason);
> >  int nvmx_cpu_up_prepare(unsigned int cpu);
> >  void nvmx_cpu_dead(unsigned int cpu);
> > +
> > +#else /* !CONFIG_NESTED_VIRT */
> > +
> > +static inline void nvmx_update_exec_control(struct vcpu *v, u32 value) { }
> > +static inline void nvmx_update_secondary_exec_control(struct vcpu *v,
> > +                                                      unsigned long value) 
> > { }
> > +static inline void nvmx_update_exception_bitmap(struct vcpu *v,
> > +                                                unsigned long value) { }
> > +static inline u64 nvmx_get_tsc_offset(struct vcpu *v) { return 0; }
> > +static inline void nvmx_set_cr_read_shadow(struct vcpu *v, unsigned int 
> > cr) { }
> > +static inline bool nvmx_intercepts_exception(struct vcpu *v, unsigned int 
> > vector,
> > +                                             int error_code) { return 
> > false; }
> > +static inline int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
> > +                                         unsigned int exit_reason) { 
> > return 0; }
> > +static inline void nvmx_idtv_handling(void) { }
> > +static inline int nvmx_msr_read_intercept(unsigned int msr, u64 
> > *msr_content)
> > +{
> > +    return 0;
> > +}
> > +static inline int nvmx_handle_vmx_insn(struct cpu_user_regs *regs,
> > +                                       unsigned int exit_reason) { return 
> > 0; }
> > +static inline int nvmx_cpu_up_prepare(unsigned int cpu) { return 0; }
> > +static inline void nvmx_cpu_dead(unsigned int cpu) { }
> > +
> > +#define get_vvmcs(vcpu, encoding) 0
> > +
> > +#endif /* CONFIG_NESTED_VIRT */
> > +
> >  #endif /* __ASM_X86_HVM_VVMX_H__ */
> >  
> > diff --git a/xen/arch/x86/mm/Makefile b/xen/arch/x86/mm/Makefile
> > index 960f6e8409..aa15811c2e 100644
> > --- a/xen/arch/x86/mm/Makefile
> > +++ b/xen/arch/x86/mm/Makefile
> > @@ -7,7 +7,7 @@ obj-$(CONFIG_SHADOW_PAGING) += guest_walk_4.o
> >  obj-$(CONFIG_VM_EVENT) += mem_access.o
> >  obj-$(CONFIG_MEM_PAGING) += mem_paging.o
> >  obj-$(CONFIG_MEM_SHARING) += mem_sharing.o
> > -obj-$(CONFIG_HVM) += nested.o
> > +obj-$(CONFIG_NESTED_VIRT) += nested.o
> >  obj-$(CONFIG_HVM) += p2m.o
> >  obj-y += p2m-basic.o
> >  obj-$(CONFIG_INTEL_VMX) += p2m-ept.o
> > diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
> > index 67c29b2162..de1bb3abde 100644
> > --- a/xen/arch/x86/mm/hap/Makefile
> > +++ b/xen/arch/x86/mm/hap/Makefile
> > @@ -2,5 +2,5 @@ obj-y += hap.o
> >  obj-y += guest_walk_2.o
> >  obj-y += guest_walk_3.o
> >  obj-y += guest_walk_4.o
> > -obj-y += nested_hap.o
> > -obj-$(CONFIG_INTEL_VMX) += nested_ept.o
> > +obj-$(CONFIG_NESTED_VIRT) += nested_hap.o
> > +obj-$(CONFIG_NESTED_VIRT) += nested_ept.o
> 
> With this change nested_ept.o is no longer gated explicitly on
> CONFIG_INTEL_VMX, which could cause build issues if you have a Kconfig
> like:
> 
> CONFIG_INTEL_VMX=n
> CONFIG_AMD_SVM=y
> CONFIG_NESTED_VIRT=y
> 
> Does the code in nested_ept.o have dependencies on other files gated
> by CONFIG_INTEL_VMX, and hence would fail at the linking stage?  And
> even if it builds, the code in nested_ept.o would be unreachable I
> expect.

It does build, but you are right. I'll improve this.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.