Xen project Mailing List

Re: [Xen-devel] Strange kernel BUG() on PV DomU boot

To: "Joanna Rutkowska" <joanna@xxxxxxxxxxxxxxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 22 Jun 2012 13:38:21 +0100

Cc: Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Fri, 22 Jun 2012 12:37:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 22.06.12 at 14:26, Joanna Rutkowska <joanna@xxxxxxxxxxxxxxxxxxxxxx> >>> wrote: > On 06/22/12 14:21, Joanna Rutkowska wrote: >> Hello, >> >> From time to time (every several weeks or even less) I run into a >> strange Dom0 kernel BUG() that manifests itself with the following >> message (see the end of the message). The Dom0 and VM kernels are 3.2.7 >> pvops, and the Xen hypervisor is 4.1.2 both with only some minor, >> irrelevant (I think) modifications for Qubes. >> >> The bug is very hard to reproduce, but once this BUG() starts being >> signaled, it consistently prevents me from starting any new VMs in the >> system (e.g. tried over a dozen of times now, and every time the VM boot >> fails). >> >> The following lines in the VM kernel are responsible for signaling the >> BUG(): >> >> if (HYPERVISOR_vcpu_op(VCPUOP_initialise, cpu, ctxt)) >> BUG(); >> >> ...yet, there is nothing in the xl dmesg that would provide more info >> why this hypercall fails. Ah, that's because there are not printk's in >> the hypercall code: >> >> case VCPUOP_initialise: >> if ( v->vcpu_info == &dummy_vcpu_info ) >> return -EINVAL; >> >> if ( (ctxt = xmalloc(struct vcpu_guest_context)) == NULL ) >> return -ENOMEM; >> >> if ( copy_from_guest(ctxt, arg, 1) ) >> { >> xfree(ctxt); >> return -EFAULT; >> } >> >> domain_lock(d); >> rc = -EEXIST; >> if ( !v->is_initialised ) >> rc = boot_vcpu(d, vcpuid, ctxt); >> domain_unlock(d); >> >> xfree(ctxt); >> break; >> >> So, looking at the above it seems like it might be failing because of >> xmalloc() fails, however Xen seems to have enough memory as reported by >> xl info: >> >> total_memory : 8074 >> free_memory : 66 >> free_cpus : 0 >> >> Any ideas what might be the cause? >> >> FWIW, below the actual oops message. >> > > Ok, it seems like this was an out-of-memeory condition indeed, because > once I did: > > xl mem-set 0 1800m > > and then quickly started a VM, it booted fine... Had you looked at the error value in %rax, you would also have seen that it's -ENOMEM. I suppose the problem here is that a multi-page allocation was needed, yet only single pages were available. > Is there any proposal of how to handle out of memory conditions in Xen > (like this one, as well as e.g. SWIOTLB problem) in a more user friendly > way? In 4.2, I hope we managed to remove all runtime allocations larger than a page, so the particular situation here should arise anymore. As to more user-friendly - what do you think of? An error is an error (and converting this to a meaningful, user visible message is the responsibility of the entity receiving the error). In the case at hand, printing an error message wouldn't meaningfully increase user-friendliness imo. > Any recommendations regarding the preferred minimum Xen free memory, as > reported by xl info, that should be preserved in order to assure Xen > runs smoothly? In pre-4.2 Xen, there's not much you can do when memory gets fragmented (otherwise you'd have to keep more than half the memory in the box in the hypervisor). With multi-page runtime allocations gone, you should be fine leaving just a minimal amount to the hypervisor. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.