[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] POD: soft lockups in dom0 kernel
On 06/12/13 10:00, Jan Beulich wrote: >>>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote: >> when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get >> softlockups from time to time. >> >> kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] >> >> I tracked this down to the call of xc_domain_set_pod_target() and further >> p2m_pod_set_mem_target(). >> >> Unfortunately I can this check only with xen-4.2.2 as I don't have a machine >> with enough memory for current hypervisors. But it seems the code is nearly >> the same. >> >> My suggestion would be to do the 'pod set target' in the function >> xc_domain_set_pod_target() in chunks of maybe 1GB to give the dom0 scheduler >> a chance to run. >> As this is not performance critical it should not be a problem. > > This is a broader problem: There are more long running hypercalls > than just the one setting the POD target. While a kernel built with > CONFIG_PREEMPT ought to have no issue with this (as the > hypervisor internal preemption will always exit back to the guest, > thus allowing interrupts to be processed) as long as such > hypercalls aren't invoked with preemption disabled, non- > preemptable kernels (the suggested default for servers) have - > afaict - no way to deal with this. > > However, as long as interrupts and softirqs can get serviced by > the kernel (which they can as long as they weren't disabled upon > invocation of the hypercall), that may also be a mostly cosmetic > problem (in that the soft lockup is being reported) as long as no > real time like guarantees are required (which if they were would > be sort of contradictory to the kernel being non-preemptable), > i.e. other tasks may get starved for some time, but OS health > shouldn't be impacted. > > Hence I wonder whether it wouldn't make sense to simply > suppress the soft lockup detection at least across privcmd > invoked hypercalls - Cc-ing upstream Linux maintainers to see if > they have an opinion or thoughts towards a proper solution. We do not want to disable the soft lockup detection here as it has found a bug. We can't have tasks that are unschedulable for minutes as it would only take a handful of such tasks to hose the system. We should put an explicit preemption point in. This will fix it for the CONFIG_PREEMPT_VOLUNTARY case which I think is the most common configuration. Or perhaps this should even be a cond_reched() call to fix it for fully non-preemptible as well. David _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |