[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] CONFIG_SCRUB_DEBUG=y + arm64 + livepatch = Xen BUG at page_alloc.c:738
On 09/13/2017 11:32 AM, Konrad Rzeszutek Wilk wrote: > On Tue, Sep 12, 2017 at 09:19:23PM -0400, Boris Ostrovsky wrote: >> >> On 09/12/2017 08:01 PM, Konrad Rzeszutek Wilk wrote: >>> On Mon, Sep 11, 2017 at 08:45:02PM -0400, Boris Ostrovsky wrote: >>>> >>>> On 09/11/2017 07:55 PM, Konrad Rzeszutek Wilk wrote: >>>>> Hey, >>>>> >>>>> I've only been able to reproduce this on ARM64 (trying right now ARM32 >>>>> as well), and not on x86. >>>>> >>>>> If I compile Xen without CONFIG_SCRUB_DEBUG it works great. But if >>>>> enable it and try to load a livepatch it blows up in page_alloc.c:738 >>>>> >>>>> This is with origin/staging (d0291f3391) >>>> Can you still reproduce this if you revert 307c3be? >>> Sadly yes - it still crashes. I didn't capture the serial output. >>> >>> I honestly think the issue is that on ARM64 the "sleep" loop does not >>> wake up as often as on x86 (CC-ing Dariof who I believe observed this >>> with Credit2 and the wakeup.. something) - maybe he remembers the >>> details. Anyhow my theory is that the pages are not scrubbed at all >>> when they go in the idle loop as once it goes to sleep - it stays there. >> >> There is no (well, should not be) any timing dependencies in how/whether >> pages are scrubbed. If a page doesn't get scrubbed because someone didn't >> wake up then it should be scrubbed in alloc_heap_pages(). So in this case >> the page is thought to be clean (_PGC_need_scrub is not set), but it is not. >> >> Have you tried running a guest (or two), rebooting in a loop? > No. I just cold-booted it and tried to livepatch. >> Another thing to try is to set need_scrub to true in free_heap_pages(). > Magic! > > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c > index dbad1e1ca0..9303eb4517 100644 > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1308,6 +1308,7 @@ static void free_heap_pages( > ASSERT(node >= 0); > > spin_lock(&heap_lock); > + need_scrub = true; > > for ( i = 0; i < (1 << order); i++ ) > { > > Fixes it ! :-) Well, that's not a fix. This eliminates the case that something in ARM-specific code (which I haven't tested) accidentally clears _PGC_need_scrub. OK, I think I know what the problem is. You are using CONFIG_SEPARATE_XENHEAP, are you? -boris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |