|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging
Hi,
At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote:
> Well, thanks for taking a stab at it. Looks fairly well. My main
> observation is that we do not want to unconditionally go on a wait queue
> everywhere. For example, the grant table code pretty reliable unwinds
> itself and correctly tells the grant mapper to retry, in the presence of a
> paged page.
The grant table code is unlikely to hit a paged-out gfn in the caller's
address space.
> The same could be said of emulation (X86_EMUL_RETRY).
Fair enough. But the emulator uses hvm_copy() in its callbacks anyway
we'd need another flag to say 'special handling for p2m fixups plz' in
all the hvm_copy() interfaces. Unpleasant!
> And certainly the balloon code should not sleep waiting for populate!
The balloon code should not be trying to populate at all!
> Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the
> wait queue code is invoked only there. And then we can call get_gfn_sleep
> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy.
I see. In the longer term I definitely do not want get_gfn()'s callers
to have to deal with p2m_is_paged(), and PoD, and sharing, and every
other p2m hack we come up with. I want all that stuff hidden behind
get_gfn_*(), with at most a flag to say 'fix up magic MM tricks'.
I don't like get_gfn_sleep() very much but I'll see if I can find a way
to do the equivalent with a flag to get_gfn_*(). But I suspect it will
end up with a tri-state argument of:
a) just give me the current state;
b) try (without sleeping) to fix up paging, PoD and sharing but
still maybe make me handle it myself by error propagation; and
c) just fix it up.
'b' is a pretty weird interface. Specially given that in corner cases,
'make me handle it' might involve hitting a wait queue anyway. If we
intend in the fullness of time to get rid of it (as I hope we would)
then probably we shouldn't introduce it. (And if that means pushing
this whole change out past 4.2, we should consider doing that.)
> >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c
> >> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100
> >> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000
> >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d
> >> }
> >>
> >> /* For now only perform locking on hap domains */
> >> - if ( locked && (hap_enabled(p2m->domain)) )
> >> + locked = locked && hap_enabled(p2m->domain);
> >> +
> >> +#ifdef __x86_64__
> When are we getting rid of 32 bit hypervisor? (half kidding. Only half)
Can't be soon enough for me. I've been sharpening the axe for that one
for years now. :)
> >> +again:
> >> +#endif
> >> + if ( locked )
> >> /* Grab the lock here, don't release until put_gfn */
> >> gfn_lock(p2m, gfn, 0);
> >>
> >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order);
> >>
> >> #ifdef __x86_64__
> >> + if ( p2m_is_paging(*t) && q != p2m_query
> >> + && p2m->domain == current->domain )
> >> + {
> >> + if ( locked )
> >> + gfn_unlock(p2m, gfn, 0);
> >> +
> >> + /* Ping the pager */
> >> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged )
> >> + p2m_mem_paging_populate(p2m->domain, gfn);
>
> You want to make sure the mfn is not valid. For p2m_ram_paging_out we
> still have the mfn in place.
Doesn't p2m_mem_paging_populate already DTRT in all these cases? If
not, it really should.
> >> +
> >> + /* Safety catch: it _should_ be safe to wait here but if it's
> >> not,
> >> + * crash the VM, not the host */
> >> + if ( in_atomic() )
> >> + {
> >> + WARN();
> >> + domain_crash(p2m->domain);
> >> + }
> >> + else
> >> + {
> >> + current->arch.mem_paging_gfn = gfn;
> >> + wait_event(current->arch.mem_paging_wq, ({
> >> + mfn = p2m->get_entry(p2m, gfn, t, a,
> >> + p2m_query, page_order);
> >> + (*t != p2m_ram_paging_in);
>
> Well, first of all mfn->get_entry will not happen in a locked context, so
> you will exit the wait loop not holding the p2m lock and crash later.
Yes - we always exit the loop without the lock; then we 'goto again' and
do things properly. I realise it's a bit inefficient, but on the
page-in path the p2m lookups shouldn't be the bottleneck. :) I did
write the version that handled the locking in this loop but it got a bit
twisty, plus we need the 'goto again' for other reasons (see below).
As a follow-up I ought to make the page-sharing fixup code that's just
below this use 'goto again' as well; we shouldn't really leave this
function until we get a lookup where all these cases are dealt with.
> So you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the
> condition fails. Probably not gonna fit in a ({}) block ;)
>
> I think checking for (!p2m_is_paging(*t)) is more appropriate.
No, that was deliberate, and IIRC a change from Olaf's patch. If the
page gets paged in and back out while we're asleep, the type goes back
to p2m_ram_paged, and in that case we _must_ break out and retry or this
vcpu might stall forever.
> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn)
> >> {
> >> struct vcpu *v = current;
> >> - mem_event_request_t req;
> >> + mem_event_request_t req = {0};
> >> p2m_type_t p2mt;
> >> p2m_access_t a;
> >> mfn_t mfn;
> >> struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> + int send_request = 0;
> >>
> >> /* We're paging. There should be a ring */
> >> int rc = mem_event_claim_slot(d, &d->mem_event->paging);
>
> Given all the optimizations around the send_request flag, it might be
> worth moving the claim_slot call to an if(send_request) block at the
> bottom, and get rid of cancel_slot altogether. Claim_slot could
> potentially go (or cause someone else to go) to a wait queue, and it's
> best to not be that rude.
Sure. It might be tricky to make sure we unwind properly in the failure
case, but I'll have a go at a follow-up patch that looks at that.
(This is not really a regression in this patch AFAICS - in the existing code
the get_gfn() callers end up calling this function anyway.)
> >> - /* Pause domain if request came from guest and gfn has paging type
> >> */
> >> - if ( p2m_is_paging(p2mt) && v->domain == d )
> >> + /* No need to inform pager if the gfn is not in the page-out path
> >> */
> >> + if ( !send_request )
> >> {
> >> - vcpu_pause_nosync(v);
>
> I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we
> (hopefully) adopt get_gfn_sleep.
Yes, if we do it that way, this pause definitely needs to stay. But if
not then the waitqueue takes care of this case entirely, and this code
is just redundant and confusing.
Cheers,
Tim.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |