Xen project Mailing List

Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging

To: Andres Lagar-Cavilla <andres@xxxxxxxxxxxxxxxx>

From: Tim Deegan <tim@xxxxxxx>

Date: Fri, 17 Feb 2012 17:56:19 +0000

Cc: olaf@xxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxx, andres@xxxxxxxxxxxxxx, adin@xxxxxxxxxxxxxx

Delivery-date: Fri, 17 Feb 2012 17:56:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Hi, At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote: > Well, thanks for taking a stab at it. Looks fairly well. My main > observation is that we do not want to unconditionally go on a wait queue > everywhere. For example, the grant table code pretty reliable unwinds > itself and correctly tells the grant mapper to retry, in the presence of a > paged page. The grant table code is unlikely to hit a paged-out gfn in the caller's address space. > The same could be said of emulation (X86_EMUL_RETRY). Fair enough. But the emulator uses hvm_copy() in its callbacks anyway we'd need another flag to say 'special handling for p2m fixups plz' in all the hvm_copy() interfaces. Unpleasant! > And certainly the balloon code should not sleep waiting for populate! The balloon code should not be trying to populate at all! > Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the > wait queue code is invoked only there. And then we can call get_gfn_sleep > in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy. I see. In the longer term I definitely do not want get_gfn()'s callers to have to deal with p2m_is_paged(), and PoD, and sharing, and every other p2m hack we come up with. I want all that stuff hidden behind get_gfn_*(), with at most a flag to say 'fix up magic MM tricks'. I don't like get_gfn_sleep() very much but I'll see if I can find a way to do the equivalent with a flag to get_gfn_*(). But I suspect it will end up with a tri-state argument of: a) just give me the current state; b) try (without sleeping) to fix up paging, PoD and sharing but still maybe make me handle it myself by error propagation; and c) just fix it up. 'b' is a pretty weird interface. Specially given that in corner cases, 'make me handle it' might involve hitting a wait queue anyway. If we intend in the fullness of time to get rid of it (as I hope we would) then probably we shouldn't introduce it. (And if that means pushing this whole change out past 4.2, we should consider doing that.) > >> diff -r 01c1bcbc6222 xen/arch/x86/mm/p2m.c > >> --- a/xen/arch/x86/mm/p2m.c Thu Feb 16 08:48:23 2012 +0100 > >> +++ b/xen/arch/x86/mm/p2m.c Thu Feb 16 14:39:13 2012 +0000 > >> @@ -160,13 +160,49 @@ mfn_t __get_gfn_type_access(struct p2m_d > >> } > >> > >> /* For now only perform locking on hap domains */ > >> - if ( locked && (hap_enabled(p2m->domain)) ) > >> + locked = locked && hap_enabled(p2m->domain); > >> + > >> +#ifdef __x86_64__ > When are we getting rid of 32 bit hypervisor? (half kidding. Only half) Can't be soon enough for me. I've been sharpening the axe for that one for years now. :) > >> +again: > >> +#endif > >> + if ( locked ) > >> /* Grab the lock here, don't release until put_gfn */ > >> gfn_lock(p2m, gfn, 0); > >> > >> mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order); > >> > >> #ifdef __x86_64__ > >> + if ( p2m_is_paging(*t) && q != p2m_query > >> + && p2m->domain == current->domain ) > >> + { > >> + if ( locked ) > >> + gfn_unlock(p2m, gfn, 0); > >> + > >> + /* Ping the pager */ > >> + if ( *t == p2m_ram_paging_out || *t == p2m_ram_paged ) > >> + p2m_mem_paging_populate(p2m->domain, gfn); > > You want to make sure the mfn is not valid. For p2m_ram_paging_out we > still have the mfn in place. Doesn't p2m_mem_paging_populate already DTRT in all these cases? If not, it really should. > >> + > >> + /* Safety catch: it _should_ be safe to wait here but if it's > >> not, > >> + * crash the VM, not the host */ > >> + if ( in_atomic() ) > >> + { > >> + WARN(); > >> + domain_crash(p2m->domain); > >> + } > >> + else > >> + { > >> + current->arch.mem_paging_gfn = gfn; > >> + wait_event(current->arch.mem_paging_wq, ({ > >> + mfn = p2m->get_entry(p2m, gfn, t, a, > >> + p2m_query, page_order); > >> + (*t != p2m_ram_paging_in); > > Well, first of all mfn->get_entry will not happen in a locked context, so > you will exit the wait loop not holding the p2m lock and crash later. Yes - we always exit the loop without the lock; then we 'goto again' and do things properly. I realise it's a bit inefficient, but on the page-in path the p2m lookups shouldn't be the bottleneck. :) I did write the version that handled the locking in this loop but it got a bit twisty, plus we need the 'goto again' for other reasons (see below). As a follow-up I ought to make the page-sharing fixup code that's just below this use 'goto again' as well; we shouldn't really leave this function until we get a lookup where all these cases are dealt with. > So you want __get_gfn_type_access with q = p2m_query, and then put_gfn if the > condition fails. Probably not gonna fit in a ({}) block ;) > > I think checking for (!p2m_is_paging(*t)) is more appropriate. No, that was deliberate, and IIRC a change from Olaf's patch. If the page gets paged in and back out while we're asleep, the type goes back to p2m_ram_paged, and in that case we _must_ break out and retry or this vcpu might stall forever. > >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn) > >> { > >> struct vcpu *v = current; > >> - mem_event_request_t req; > >> + mem_event_request_t req = {0}; > >> p2m_type_t p2mt; > >> p2m_access_t a; > >> mfn_t mfn; > >> struct p2m_domain *p2m = p2m_get_hostp2m(d); > >> + int send_request = 0; > >> > >> /* We're paging. There should be a ring */ > >> int rc = mem_event_claim_slot(d, &d->mem_event->paging); > > Given all the optimizations around the send_request flag, it might be > worth moving the claim_slot call to an if(send_request) block at the > bottom, and get rid of cancel_slot altogether. Claim_slot could > potentially go (or cause someone else to go) to a wait queue, and it's > best to not be that rude. Sure. It might be tricky to make sure we unwind properly in the failure case, but I'll have a go at a follow-up patch that looks at that. (This is not really a regression in this patch AFAICS - in the existing code the get_gfn() callers end up calling this function anyway.) > >> - /* Pause domain if request came from guest and gfn has paging type > >> */ > >> - if ( p2m_is_paging(p2mt) && v->domain == d ) > >> + /* No need to inform pager if the gfn is not in the page-out path > >> */ > >> + if ( !send_request ) > >> { > >> - vcpu_pause_nosync(v); > > I see no sin in stacking vcpu_pauses. Plus, this will need to stay if we > (hopefully) adopt get_gfn_sleep. Yes, if we do it that way, this pause definitely needs to stay. But if not then the waitqueue takes care of this case entirely, and this code is just redundant and confusing. Cheers, Tim. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.