Xen project Mailing List

Re: [Xen-devel] GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: "Aravindh Puthiyaparambil (aravindp)" <aravindp@xxxxxxxxx>

Date: Fri, 31 May 2013 19:40:17 +0000

Accept-language: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 31 May 2013 19:42:01 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: Ac5eM+oPQd6HE1iRQbOQYkvn9nwW9QAK486AAAowHmA=

Thread-topic: [Xen-devel] GPF in mcheck_init() when booting xen-unstable on VMware ESX 5.1

> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Sent: Friday, May 31, 2013 12:32 PM > To: Aravindh Puthiyaparambil (aravindp) > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] GPF in mcheck_init() when booting xen-unstable on > VMware ESX 5.1 > > On 31/05/13 20:19, Aravindh Puthiyaparambil (aravindp) wrote: > > I am trying to boot xen-unstable > (9204bc654562976c7cdebf21c6b5013f6e3057b3) on VMware ESX 5.1 and > Workstation 9. I have enabled "Virtualize Intel VT-x/EPT" option. I am seeing > the following GPF during boot: > > > > (XEN) mce_intel.c:717: MCA Capability: BCAST 1 SER 0 CMCI 0 firstbank > > 0 extended MCE MSR 0 > > (XEN) Intel machine check reporting enabled > > (XEN) ----[ Xen-4.3-unstable x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 0 > > (XEN) RIP: e008:[<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d > > (XEN) RFLAGS: 0000000000010087 CONTEXT: hypervisor > > (XEN) rax: 0000000000000000 rbx: ffff82c4c026ca80 rcx: 0000000000000000 > > (XEN) rdx: ffff83001d6b2fe0 rsi: bad0bad0bad0bad0 rdi: > bad0bad0bad0bad0 > > (XEN) rbp: ffff82c4c02cfe08 rsp: ffff82c4c02cfde8 r8: ffff8300000b8f00 > > (XEN) r9: 0000000000000010 r10: bad0bad0bad0bad0 r11: > 0000000000000010 > > (XEN) r12: ffff83001ffd9fe0 r13: 0000000000000000 r14: ffff82c4c02c8000 > > (XEN) r15: ffff83000008efb0 cr0: 000000008005003b cr4: 00000000000400f0 > > (XEN) cr3: 000000001fc7b000 cr2: 0000000000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff82c4c02cfde8: > > (XEN) 0000000000000000 ffff82c4c026ca80 0000000080000008 > 00000000ffffffff > > (XEN) ffff82c4c02cfe48 ffff82c4c01a7356 1fabfbff000206a7 > 0000000096ba2223 > > (XEN) ffff83001ffd9820 0000000000000002 ffff83001ffd9820 > ffff82c4c02c8000 > > (XEN) ffff82c4c02cff08 ffff82c4c02a4536 0000000200000000 > 0000000000000000 > > (XEN) ffff83000008ed90 00000000011fb000 0000000000100000 > ffff83000008efb0 > > (XEN) 0000000000000000 ffff83000051bc90 ffff830000000010 > ffff8300ffffff00 > > (XEN) ffff83000008ef40 ffff82c400000001 0000000800000000 > 000000010000006e > > (XEN) 0000000000000003 00000000000002f8 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 ffff82c4c01000b5 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) ffff83001d6b0000 0000000000000000 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [<ffff82c4c01aa71e>] mcheck_init+0x38a/0x45d > > (XEN) [<ffff82c4c01a7356>] identify_cpu+0x2b4/0x2d0 > > (XEN) [<ffff82c4c02a4536>] __start_xen+0x26e9/0x2c98 > > (XEN) > > (XEN) > > (XEN) **************************************** > > (XEN) Panic on CPU 0: > > (XEN) GENERAL PROTECTION FAULT > > (XEN) [error_code=0000] > > (XEN) **************************************** > > (XEN) > > > > I have narrowed it down to line 631 in set_poll_bankmask(): > > bitmap_copy(mb->bank_map, mca_allbanks->bank_map, > nr_mce_banks); > > > > What is happening is that in mca_cap_init(), nr_mce_banks is being set to > 0. This causes the allocation of bank_map to be set to ZERO_BLOCK_PTR > which is the return value for zero-size allocation by > xzalloc_array()/_xmalloc(). This results in the bitmap_copy() to fail > disastrously. Is it correct to disable MCE if nr_mce_banks is 0? Or say this > is a > quirk of the VMware virtual platform and run with mce=0? Linux is to be able > to handle this gracefully. > > > > Another question I have is that callers of xzalloc_array() and friends only > check for a NULL return as an error. So what about cases like the one above > which fell through the cracks because the return value is ZERO_BLOCK_PTR? > Should they all be checking for ZERO_BLOCK_PTR too or ensuring that no calls > are made with zero size allocations? > > > > Thanks, > > Aravindh > > ZERO_BLOCK_PTR is specifically distinguished from NULL (As the comment > beside it says). > > The real bug is calling **alloc() with 0 as a parameter. > > I would say that nr_mce_banks of 0 should result in an implicit mce=0. > You certainly cant sensibly use MCEs with 0 banks to play with. OK. I will submit a patch. Aravindh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.