[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Nested Virtualization Bug on x86-64 AMD CPU



I am writing to follow up on the bug report I sent, regarding a BUG()
triggered in Xen when performing a nested VMRUN with CR0.PG=0 in Long
Mode. The issue was discussed with Andrew Cooper at that time, and I
would like to check if there have been any updates or plans for
addressing this issue.

To briefly recap:
- The problem occurs when an L1 hypervisor, while in 64-bit mode,
executes VMRUN with CR0.PG=0 in VMCB12, targeting a 64-bit L2 guest.
- Instead of raising VMEXIT_INVALID, the system encounters a BUG() at
`nsvm_vmcb_guest_intercepts_exitcode`.
- VMEXIT reason observed was 0x402 (AVIC_NOACCEL), although Xen does not
support AVIC.

Andrew pointed out that this could indicate either a missing validity
check (as the state LMA=1 && PG=0 is invalid) or possible memory
corruption.

Given that this issue could potentially allow a guest VM to trigger a
hypervisor panic, I believe it might be worth formally recognizing and
addressing.
May I kindly ask if this has been acknowledged as a bug internally, or
if there are any plans to handle this case safely (e.g., raising
VMEXIT_INVALID instead of BUG()) in future Xen releases?

Thank you very much for your time


On Wed, Dec 6, 2023 at 12:05 PM Reima ISHII <ishiir@xxxxxxxxxxxxxxxxxxx> wrote:
Thank you for your prompt response.

On Tue, Dec 5, 2023 at 11:43 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> Who is still in 64-bit mode ?
>
> It is legal for a 64-bit L1 to VMRUN into a 32-bit L2 with PG=0.
>
> But I'm guessing that you mean L2 is also 64-bit, and we're clearing PG,
> thus creating an illegal state (LMA=1 && PG=0) in VMCB12.
>
> And yes, in that case (virtual) VMRUN at L1 ought to fail with
> VMEXIT_INVALID.

Yes, you are correct in your understanding. This issue is triggered by
VMRUN execution to 64-bit L2 guests, when CR0.PG is cleared in VMCB12.
Contrary to the expected behavior where a VMRUN at L1 should fail with
VMEXIT_INVALID, the VMRUN does not fail but instead, the system
encounters a BUG().

> As an incidental observation, that function is particularly absurd and
> the two switches should be merged.
>
> VMExit reason 0x402 is AVIC_NOACCEL and Xen has no support for AVIC in
> the slightest right now.  i.e. Xen shouldn't have AVIC active in the
> VMCB, and should never any AVIC related VMExits.
>
> It is possible that we've got memory corruption, and have accidentally
> activated AVIC in the VMCB.

The idea of potential memory corruption activating AVIC in the VMCB is
certainly an interesting perspective. While I'm not sure how exactly
such memory corruption could occur, the suggestion does provide a
compelling explanation for the VMExit reason 0x402 (AVIC_NOACCEL),
particularly considering Xen's current lack of AVIC support.

> But, is this by any chance all running nested under KVM in your fuzzer?

No, KVM was not used. The issue was observed on a Xen hypervisor's
domU HVM running directly on the hardware. Within the guest HVM, a
simple custom hypervisor was utilized.

--
Graduate School of Information Science and Technology, The University of Tokyo
Reima Ishii
ishiir@xxxxxxxxxxxxxxxxxxx


--
東京大学大学院 情報理工学系研究科 システム情報学専攻 修士2年
石井玲真

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.