[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PoD code killing domain before it really gets started



On Mon, Aug 6, 2012 at 3:12 PM, Jan Beulich <JBeulich@xxxxxxxx> wrote:
>>>> On 06.08.12 at 15:57, "Jan Beulich" <JBeulich@xxxxxxxx> wrote:
>> The domain indeed has 0x1e0 pages allocated, and a huge (still
>> growing number) of PoD entries. And apparently this fails so
>> rarely because it's pretty unlikely that there's not a single clear
>> page that the PoD code can select as victim, plus the Dom0
>> space code likely also only infrequently happens to kick in at
>> the wrong time.
>
> Just realized that of course it's also suspicious that there
> shouldn't be any clear page among those 480 - Dom0 scrubs
> its pages at balloons them out (but I think ballooning isn't even
> in use there), Xen scrubs the free pages on boot, yet this
> reportedly has happened also for the very first domain ever
> created after boot. Or does the PoD code not touch the low
> 2Mb for some reason?

Hmm -- the sweep code has some fairly complicated heuristics.  Ah -- I
bet this is it: The algorithm implicitly assumes that he first sweep
will happen after the first demand-fault.  It's designed to start at
the last demand-faulted gpfn (tracked by p2m->pod.max_guest) and go
downwards.  When it reaches 0, it stops its sweep (?!), and resets to
max_guest on the next entry.  But if max_guest is 0, this means it
will basically never sweep at all.

I guess there are two problems with that:
* As you've seen, apparently dom0 may access these pages before any
faults happen.
* If it happens that reclaim_single is below the only zeroed page, the
guest will crash even when there is reclaim-able memory available.

Two ways we could fix this:
1. Remove dom0 accesses (what on earth could be looking at a
not-yet-created VM?)
2. Allocate the PoD cache before populating the p2m table
3. Make it so that some accesses fail w/o crashing the guest?  I don't
see how that's really practical.
4. Change the sweep routine so that the lower 2MiB gets swept

#2 would require us to use all PoD entries when building the p2m
table, thus addressing the mail you mentioned from 25 July*.  Given
that you don't want #1, it seems like #2 is the best option.

No matter what we do, the sweep routine for 4.2 should be re-written
to search all of memory at least once (maybe with a timeout for
watchdogs), since it's only called in an actual emergency.

Let me take a look...

 -George

* Sorry for not responding to that one; I must have missed it in my
return-from-travelling e-mail sweep.  If you CC me next time I'll be
sure to get it.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.