[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] POD: soft lockups in dom0 kernel



>>> On 20.01.14 at 15:39, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> On 16/01/14 11:10, Jan Beulich wrote:
>>>>> On 05.12.13 at 14:55, Dietmar Hahn <dietmar.hahn@xxxxxxxxxxxxxx> wrote:
>>> when creating a bigger (> 50 GB) HVM guest with maxmem > memory we get
>>> softlockups from time to time.
>>>
>>> kernel: [  802.084335] BUG: soft lockup - CPU#1 stuck for 22s! [xend:31351]
>>>
>>> I tracked this down to the call of xc_domain_set_pod_target() and further
>>> p2m_pod_set_mem_target().
>>>
>>> Unfortunately I can this check only with xen-4.2.2 as I don't have a machine
>>> with enough memory for current hypervisors. But it seems the code is nearly
>>> the same.
>> While I still didn't see a formal report of this against SLE11 yet,
>> attached a draft patch against the SP3 code base adding manual
>> preemption to the hypercall path of privcmd. This is only lightly
>> tested, and therefore has a little bit of debugging code still left in
>> there. Mind giving this an try (perhaps together with the patch
>> David had sent for the other issue - there may still be a need for
>> further preemption points in the IOCTL_PRIVCMD_MMAP*
>> handling, but without knowing for sure whether that matters to
>> you I didn't want to add this right away)?
>>
>> Jan
>>
> 
> With my 4.4-rc2 testing, these softlockups are becoming more of a
> problem, especially with construction/migration of 128GB guests.
> 
> I have been looking at doing a similar patch against mainline.
> 
> Having talked it through with David, it seems more sensible to have a
> second hypercall page, at which point in_hypercall() becomes
> in_preemptable_hypercall().
> 
> Any task (which could even be kernel tasks) could use the preemptable
> page, rather than the main hypercall page, and the asm code doesn't need
> to care whether the task was in privcmd.

Of course this can be generalized, but I don't think a second
hypercall page is the answer here: You'd then also need a
second set of hypercall wrappers (_hypercall0() etc as well as
HYPERVISOR_*()), and generic library routines would need to
have a way to know which one to call.

Therefore I think having a per-CPU state flag (which gets cleared/
restored during interrupt handling, or - like my patch does -
honored only when outside or atomic context) is still the more
reasonable approach.

> This would avoid having to maintain extra state to identify whether the
> hypercall was preemptable, and would avoid modification to
> evtchn_do_upcall().

I'd be curious to see how you avoid modifying evtchn_do_upcall()
(other than by adding what I added there at the assembly call
site) - I especially don't see where your in_preemptable_hypercall()
would get invoked.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.