Re: [Xen-devel] Failed vm entry with heavy use of emulator

On 05/01/16 12:05, Tamas K Lengyel wrote:

On Tue, Jan 5, 2016 at 12:56 PM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
On 05/01/16 11:49, Tamas K Lengyel wrote:
Hi all,
I've been stress-testing the built-in emulator using the vm_event response VM_EVENT_FLAG_EMULATE feature. In the test I've turned all pages non-readable by default and all trapped instructions to be emulated. My test code can be found at https://github.com/tklengyel/xen/compare/read_emul?expand=1.

The following crash is reproducible and has been verified by Razvan as well.

(XEN) p2m.c:1726:d1v0 calling mem_access_emulate_one, kind 0
(XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state (0).
(XEN) ************* VMCS Area **************
(XEN) *** Guest State ***
(XEN) CR0: actual=0x000000008001003b, shadow=0x000000008001003b, gh_mask=ffffffffffffffff
(XEN) CR4: actual=0x00000000000426f9, shadow=0x00000000000406f9, gh_mask=ffffffffffffffff
(XEN) CR3 = 0x0000000000185000
(XEN) PDPTE0 = 0x0000000000186001Â PDPTE1 = 0x0000000000187001
(XEN) PDPTE2 = 0x0000000000188001Â PDPTE3 = 0x0000000000189001
(XEN) RSP = 0x000000008276dc28 (0x000000008276dc28)Â RIP = 0x00000000826bce1c (0x00000000826bce1c)
(XEN) RFLAGS=0x00000002 (0x00000002)Â DR7 = 0x0000000000000400
(XEN) Sysenter RSP=000000008078b000 CS:RIP=0008:00000000826830c0
(XEN) sel attr limit base
(XEN)ÂÂ CS: 0008 0c09b ffffffff 0000000000000000
(XEN)ÂÂ DS: 0023 0c0f3 ffffffff 0000000000000000
(XEN)ÂÂ SS: 0010 0c093 ffffffff 0000000000000000
(XEN)ÂÂ ES: 0023 0c0f3 ffffffff 0000000000000000
(XEN)ÂÂ FS: 0030 04093 00003748 0000000082770c00
(XEN)ÂÂ GS: 0000 1c000 ffffffff 0000000000000000
(XEN) GDTR:ÂÂÂÂÂÂÂÂÂÂÂ 000003ff 0000000080b95000
(XEN) LDTR: 0000 1c000 ffffffff 0000000000000000
(XEN) IDTR:ÂÂÂÂÂÂÂÂÂÂÂ 000007ff 0000000080b95400
(XEN)ÂÂ TR: 0028 0008b 000020ab 00000000801da000
(XEN) EFER = 0x0000000000000000Â PAT = 0x0007010600070106
(XEN) PreemptionTimer = 0x00000000Â SM Base = 0x00000000
(XEN) DebugCtl = 0x0000000000000000Â DebugExceptions = 0x0000000000000000
(XEN) Interruptibility = 00000000Â ActivityState = 00000000
(XEN) *** Host State ***
(XEN) RIP = 0xffff82d0802075c0 (vmx_asm_vmexit_handler)Â RSP = 0xffff830430d97f90
(XEN) CS=e008 SS=0000 DS=0000 ES=0000 FS=0000 GS=0000 TR=e040
(XEN) FSBase=0000000000000000 GSBase=0000000000000000 TRBase=ffff830430d9bc00
(XEN) GDTBase=ffff830430d8c000 IDTBase=ffff830430d98000
(XEN) CR0=000000008005003b CR3=00000004136d0000 CR4=00000000000426e0
(XEN) Sysenter RSP=ffff830430d97fc0 CS:RIP=e008:ffff82d08024db30
(XEN) EFER = 0x0000000000000000Â PAT = 0x0000050100070406
(XEN) *** Control State ***
(XEN) PinBased=0000003f CPUBased=b6a075fa SecondaryExec=000000eb
(XEN) EntryControls=000051ff ExitControls=000fefff
(XEN) ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
(XEN) VMEntry: intr_info=800000d1 errcode=00000000 ilen=00000000
(XEN) VMExit: intr_info=00000000 errcode=00000000 ilen=00000003
(XEN)ÂÂÂÂÂÂÂÂ reason=80000021 qualification=0000000000000000
(XEN) IDTVectoring: info=800000d1 errcode=00000000
(XEN) TSC Offset = 0x0000004ed9c86354
(XEN) TPR Threshold = 0x00Â PostedIntrVec = 0x00
(XEN) EPT pointer = 0x000000041124e01e EPTP index = 0x0000
(XEN) Virtual processor ID = 0x0011 VMfunc controls = 0000000000000000
(XEN) **************************************
(XEN) domain_crash called from vmx.c:2761

Any tips on how to further debug this issue?

Do you have a log of the instructions emulated?

I don't. Is there an easy way to get that beside manually sprinkling debug messages around in the emulator?

Not trivially, sadly.


Has the emulator by any chance just emulated setting CR4.PAE?

Possibly but I don't think so as the guest has already been fully booted so I would not expect it to touch that.

At a guess, I think the fault is an emulated 'mov %reg, %cr3' while in 32bit PAE mode. The PDPTE{0..3} values look wonky.

I encountered a similar crash with the xen test framework in HAP mode with a bad %cr3 update. The VMM is expected to emulate updates to PDPTE{0..3} if writes to %cr3 are trapped. See vmx_update_guest_cr() and vmx_load_pdptrs().

