[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] POD: soft lockups in dom0 kernel
>>> On 20.01.14 at 15:39, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote: > On 16/01/14 11:10, Jan Beulich wrote: >>>>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote: >>> when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get >>> softlockups from time to time. >>> >>> kernel: [ 802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351] >>> >>> I tracked this down to the call of xc_domain_set_pod_target() and further >>> p2m_pod_set_mem_target(). >>> >>> Unfortunately I can this check only with xen-4.2.2 as I don't have a machine >>> with enough memory for current hypervisors. But it seems the code is nearly >>> the same. >> While I still didn't see a formal report of this against SLE11 yet, >> attached a draft patch against the SP3 code base adding manual >> preemption to the hypercall path of privcmd. This is only lightly >> tested, and therefore has a little bit of debugging code still left in >> there. Mind giving this an try (perhaps together with the patch >> David had sent for the other issue - there may still be a need for >> further preemption points in the IOCTL_PRIVCMD_MMAP* >> handling, but without knowing for sure whether that matters to >> you I didn't want to add this right away)? >> >> Jan >> > > With my 4.4-rc2 testing, these softlockups are becoming more of a > problem, especially with construction/migration of 128GB guests. > > I have been looking at doing a similar patch against mainline. > > Having talked it through with David, it seems more sensible to have a > second hypercall page, at which point in_hypercall() becomes > in_preemptable_hypercall(). > > Any task (which could even be kernel tasks) could use the preemptable > page, rather than the main hypercall page, and the asm code doesn't need > to care whether the task was in privcmd. Of course this can be generalized, but I don't think a second hypercall page is the answer here: You'd then also need a second set of hypercall wrappers (_hypercall0() etc as well as HYPERVISOR_*()), and generic library routines would need to have a way to know which one to call. Therefore I think having a per-CPU state flag (which gets cleared/ restored during interrupt handling, or - like my patch does - honored only when outside or atomic context) is still the more reasonable approach. > This would avoid having to maintain extra state to identify whether the > hypercall was preemptable, and would avoid modification to > evtchn_do_upcall(). I'd be curious to see how you avoid modifying evtchn_do_upcall() (other than by adding what I added there at the assembly call site) - I especially don't see where your in_preemptable_hypercall() would get invoked. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |