[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [xen-unstable test] 112855: regressions - trouble: blocked/broken/fail/pass

On 08/29/2017 04:07 AM, Jan Beulich wrote:
>>>> On 28.08.17 at 17:36, <boris.ostrovsky@xxxxxxxxxx> wrote:
>> On 08/28/2017 10:52 AM, Jan Beulich wrote:
>>>>>> On 28.08.17 at 16:24, <boris.ostrovsky@xxxxxxxxxx> wrote:
>>>>>> As for periodically testing process_pending_softirqs() we may still want
>>>>>> to do this in alloc_heap_pages(), even without CONFIG_SCRUB_DEBUG.
>>>>> For my taste, alloc_heap_pages() is the wrong place for such
>>>>> calls.
>>>> But the loop is in alloc_heap_pages() --- where else would you be testing?
>>> It can only reasonably be the callers of alloc_heap_pages() imo.
>>> A single call to it should never trigger the watchdog, 
>> check_one_page() is rather slow so for a large order allocation even
>> with clean heap the 'for' loop may take quite some time. Whether it
>> could trip the watchdog -- I don't know.
> If that was a problem, we'd have to think about shortening the
> loop. I stand by my assertion that nowhere down from
> alloc_heap_pages() should be any invocation of
> process_pending_softirqs() - it is simply too risky, as we don't
> know what state we're in. One thing I could imagine to do is not
> check the entire page, but (randomly?) pick a couple of locations
> to check. But first of all we really need to be clear about whether
> it's really a single alloc_heap_pages() invocation that trips the
> watchdog, or whether something can be done about it in the
> caller(s).

At least one of the crashes was from alloc_chunk()->free_heap_pages(),
i.e. not from inside alloc_heap_pages()' loop. My proposal was not
necessarily based on the specific crashes in this flight (this issue
will be addressed by the patches I sent yesterday) but rather as a
general suggestion. But I understand that calling alloc_heap_pages()
from alloc_heap_pages() may not be a great idea.

I am somewhat puzzled though by the fact that I haven't seen this in my
testing --- I was creating/destroying very large guests (> 1TB) in
parallel so there must have been loops over high orders and I never had
a watchdog go off. And my dom0s were quite large too while the one in
this flight is only 512M.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.