Xen project Mailing List

Re: [Xen-devel] PoD code killing domain before it really gets started

From: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

Date: Mon, 6 Aug 2012 17:03:33 +0100

Cc: Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Mon, 06 Aug 2012 16:03:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, Aug 6, 2012 at 3:12 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote: >>>> On 06.08.12 at 15:57, "Jan Beulich" <JBeulich@xxxxxxxx> wrote: >> The domain indeed has 0x1e0 pages allocated, and a huge (still >> growing number) of PoD entries. And apparently this fails so >> rarely because it's pretty unlikely that there's not a single clear >> page that the PoD code can select as victim, plus the Dom0 >> space code likely also only infrequently happens to kick in at >> the wrong time. > > Just realized that of course it's also suspicious that there > shouldn't be any clear page among those 480 - Dom0 scrubs > its pages at balloons them out (but I think ballooning isn't even > in use there), Xen scrubs the free pages on boot, yet this > reportedly has happened also for the very first domain ever > created after boot. Or does the PoD code not touch the low > 2Mb for some reason? Hmm -- the sweep code has some fairly complicated heuristics. Ah -- I bet this is it: The algorithm implicitly assumes that he first sweep will happen after the first demand-fault. It's designed to start at the last demand-faulted gpfn (tracked by p2m->pod.max_guest) and go downwards. When it reaches 0, it stops its sweep (?!), and resets to max_guest on the next entry. But if max_guest is 0, this means it will basically never sweep at all. I guess there are two problems with that: * As you've seen, apparently dom0 may access these pages before any faults happen. * If it happens that reclaim_single is below the only zeroed page, the guest will crash even when there is reclaim-able memory available. Two ways we could fix this: 1. Remove dom0 accesses (what on earth could be looking at a not-yet-created VM?) 2. Allocate the PoD cache before populating the p2m table 3. Make it so that some accesses fail w/o crashing the guest? I don't see how that's really practical. 4. Change the sweep routine so that the lower 2MiB gets swept #2 would require us to use all PoD entries when building the p2m table, thus addressing the mail you mentioned from 25 July*. Given that you don't want #1, it seems like #2 is the best option. No matter what we do, the sweep routine for 4.2 should be re-written to search all of memory at least once (maybe with a timeout for watchdogs), since it's only called in an actual emergency. Let me take a look... -George * Sorry for not responding to that one; I must have missed it in my return-from-travelling e-mail sweep. If you CC me next time I'll be sure to get it. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.