Xen project Mailing List

Re: [Xen-devel] POD: soft lockups in dom0 kernel

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 20 Jan 2014 15:16:19 +0000

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, David Vrabel <david.vrabel@xxxxxxxxxx>, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx>

Delivery-date: Mon, 20 Jan 2014 15:16:29 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 20.01.14 at 15:39, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 16/01/14 11:10, Jan Beulich wrote: >>>>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote: >>> when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get >>> softlockups from time to time. >>> >>> kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] >>> >>> I tracked this down to the call of xc_domain_set_pod_target() and further >>> p2m_pod_set_mem_target(). >>> >>> Unfortunately I can this check only with xen-4.2.2 as I don't have a machine >>> with enough memory for current hypervisors. But it seems the code is nearly >>> the same. >> While I still didn't see a formal report of this against SLE11 yet, >> attached a draft patch against the SP3 code base adding manual >> preemption to the hypercall path of privcmd. This is only lightly >> tested, and therefore has a little bit of debugging code still left in >> there. Mind giving this an try (perhaps together with the patch >> David had sent for the other issue - there may still be a need for >> further preemption points in the IOCTL_PRIVCMD_MMAP* >> handling, but without knowing for sure whether that matters to >> you I didn't want to add this right away)? >> >> Jan >> > > With my 4.4-rc2 testing, these softlockups are becoming more of a > problem, especially with construction/migration of 128GB guests. > > I have been looking at doing a similar patch against mainline. > > Having talked it through with David, it seems more sensible to have a > second hypercall page, at which point in_hypercall() becomes > in_preemptable_hypercall(). > > Any task (which could even be kernel tasks) could use the preemptable > page, rather than the main hypercall page, and the asm code doesn't need > to care whether the task was in privcmd. Of course this can be generalized, but I don't think a second hypercall page is the answer here: You'd then also need a second set of hypercall wrappers (_hypercall0() etc as well as HYPERVISOR_*()), and generic library routines would need to have a way to know which one to call. Therefore I think having a per-CPU state flag (which gets cleared/ restored during interrupt handling, or - like my patch does - honored only when outside or atomic context) is still the more reasonable approach. > This would avoid having to maintain extra state to identify whether the > hypercall was preemptable, and would avoid modification to > evtchn_do_upcall(). I'd be curious to see how you avoid modifying evtchn_do_upcall() (other than by adding what I added there at the assembly call site) - I especially don't see where your in_preemptable_hypercall() would get invoked. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.