[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Nested virtualization off VMware vSphere 6.0 with EL6 guests crashes on Xen 4.6



On Fri, Feb 05, 2016 at 03:33:44AM -0700, Jan Beulich wrote:
> >>> On 04.02.16 at 19:36, <konrad.wilk@xxxxxxxxxx> wrote:
> > (XEN) nvmx_handle_vmwrite 1: IO_BITMAP_A(2000)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 0: IO_BITMAP_A(2000)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 1: IO_BITMAP_B(2002)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 2: IO_BITMAP_A(2000)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 1: 
> > VIRTUAL_APIC_PAGE_ADDR(2012)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 2: IO_BITMAP_B(2002)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 1: (2006)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 2: 
> > VIRTUAL_APIC_PAGE_ADDR(2012)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 1: VM_EXIT_MSR_LOAD_ADDR(2008)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 3: IO_BITMAP_A(2000)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 3: IO_BITMAP_B(2002)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 2: MSR_BITMAP(2004)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 1: MSR_BITMAP(2004)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 0: MSR_BITMAP(2004)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 3: (2006)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 3: VM_EXIT_MSR_LOAD_ADDR(2008)[0=ffffffffffffffff]
> > (XEN) nvmx_handle_vmwrite 3: MSR_BITMAP(2004)[0=ffffffffffffffff]
> 
> So there's a whole lot of "interesting" writes of all ones, and indeed
> VIRTUAL_APIC_PAGE_ADDR is among them, and the code doesn't
> handle that case (nor the equivalent for APIC_ACCESS_ADDR).
> What's odd though is that the writes are for vCPU 1 and 2, while
> the crash is on vCPU 3 (it would of course help if the guest had as
> few vCPU-s as possible without making the issue disappear). While
> you have circumvented the ASSERT() you've originally hit, the log
> messages you've added there don't appear anywhere, which is
> clearly confusing, so I wonder what other unintended effects your
> debugging code has (there's clearly an uninitialized variable issue
> in your additions to vmx_vmexit_handler(), but that shouldn't
> matter here, albeit it should have cause build failure, making me
> suspect the patch to be stale).
> 
> Oddly enough the various bitmap field VMWRITEs above should all
> fail, yet the guest appears to recover from (ignore?) these
> failures. (From all I can tell we're prone to NULL dereferences due
> to that at least in _shadow_io_bitmap().)
> 
> > (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest 
> > state (4).
> 
> 4 means invalid VMCS link pointer - interesting.
>

Hey Jan,

I hadn't been able to look at this for a quite while. A couple of folks have
showed interest in looking at this, CC-ing them.

For folks that are new, it may also be worth looking at:
http://www.gossamer-threads.com/lists/xen/devel/413285?page=last
which has the full thread.

Here are also the instructions on how to reproduce it:
(This Xen 4.7 'staging-4.7')

2) Download VMWare ESX and install it:
[root@localhost ~]# more vmware.xm
memory=8192
maxvcpus = 4
name = "VMWARE"
vif = [ 'mac=00:0f:4b:00:00:85,bridge=switch,model=e1000' ]
disk= ['phy:/dev/nested_guests/VMWare_ESX,hda,w']
#,'file:/mnt/iso/VMware-VMvisor-Installer-6.0.0.update02-3620759.x86_64.iso,hdc:cdrom,r']
#boot="dn"
kernel = "/usr/lib/xen/boot/hvmloader"
builder='hvm'
serial='pty'
vcpus = 4
vnc=1
vnclisten="0.0.0.0"
usb=1
nestedhvm=1

3) Let the guest be installed - once it has rebooted.
4) Enable SSH on the VMWare ESX,  Press F2 on guest console, login, select
'Troubleshooting Options', Enter 'Enable ESXi Shell' and 'Enable SSH'
5). Create a guest using VMWare ESXi Client (you need to use Windows for
that). I picked the simplest option and went ahead with FreeBSD (you can also 
do Linux).
6) To download the guest in VMWare you can SSH in the ESXi:
#cd /vmfs/volumes/datastore1
#wget 
http://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/11.0/FreeBSD-11.0-RELEASE-amd64-disc1.iso

7) In the VMWare ESXI client hook up the 'CD' to the ISO.

8). Hit Start and get greeted with:  You are running VMware ESX through an
incompatible hypervisor. You cannot power on a virtual machine until this
hypervisor is disabled". Go to
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2108724

which is just editing the .vmx file with an attribute, so login back in the
VMWare ESXi and:
[root@g-osstest:~] vi `find / -name *.vmx`

and add:
vmx.allowNested="TRUE"      

9) Start the guest up again in VMWare and be greeted with that splash screen. 

 (XEN) ----[ Xen-4.7.1-pre  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82d08027cd71>] put_page+0x1/0xd0
(XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d1v0)
(XEN) rax: 0000000000002012   rbx: ffff84802eddb000   rcx: 557f000000000000
(XEN) rdx: 555557f000000000   rsi: 000ffffff8000000   rdi: 0000000000000000
(XEN) rbp: 0000000000000000   rsp: ffff8488bf06fe98   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000014   r11: 0000000000000000
(XEN) r12: ffff8488bf06ff18   r13: 0000000000000d00   r14: 0000000000000000
(XEN) r15: ffff8488464c4000   cr0: 0000000080050033   cr4: 00000000003526e0
(XEN) cr3: 00000008464ab000   cr2: 0000000000000010
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08027cd71> (put_page+0x1/0xd0):
(XEN)  ff 66 0f 1f 44 00 00 53 <48> 8b 57 10 48 8d 77 10 48 89 fb 48 8d 4a ff 48
(XEN) Xen stack trace from rsp=ffff8488bf06fe98:
(XEN)    ffff84802eddb000 ffff82d0802faba1 ffff8488bf06ff18 ffff82d0802f4af9
(XEN)    ffff8488bf06ff18 0000000000000000 ffff82d080613200 00000004802fb799
(XEN)    ffff8488bf06ff18 ffff84802eddb000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d0802fc186
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    00000000078bfbff 0000000000000000 0000000000000000 000200fa00000000
(XEN)    fffffffffc4b3420 0000000000000000 0000000000040046 fffffffffc607f00
(XEN)    0000000000000000 f9034c0b7e7cdfdf b113252c5e7bdbbf 31379d5bde78bbdd
(XEN)    7043c568fe7a51b5 384b956900000002 ffff84802eddb000 000001b83ea37880
(XEN)    00000000003526e0
(XEN) Xen call trace:
(XEN)    [<ffff82d08027cd71>] put_page+0x1/0xd0
(XEN)    [<ffff82d0802faba1>] vvmx.c#virtual_vmentry+0x261/0xdf0
(XEN)    [<ffff82d0802f4af9>] vmx_vmenter_helper+0xd9/0x260
(XEN)    [<ffff82d0802fc186>] vmx_asm_vmexit_handler+0x46/0x110
(XEN) 
(XEN) Pagetable walk from 0000000000000010:
(XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000010
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.