|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] [PATCH] x86/mm: use wait queues for mem_paging
Hi,
I'm about to repost this patch, with a few other changes and fixups.
Further comments below.
At 12:57 -0800 on 17 Feb (1329483430), Andres Lagar-Cavilla wrote:
> > At 08:05 -0800 on 17 Feb (1329465959), Andres Lagar-Cavilla wrote:
> >> Well, thanks for taking a stab at it. Looks fairly well. My main
> >> observation is that we do not want to unconditionally go on a wait queue
> >> everywhere. For example, the grant table code pretty reliable unwinds
> >> itself and correctly tells the grant mapper to retry, in the presence of
> >> a
> >> paged page.
> >
> > The grant table code is unlikely to hit a paged-out gfn in the caller's
> > address space.
>
> Oh yeah they do. At least for the target gfn's behind the grants. That's
> why we need the retry loops in the PV backends in dom0.
The target gfns aren't in the caller's p2m, though.
> > The balloon code should not be trying to populate at all!
>
> Agreed. get_gfn_query is needed there. Nothing bad happens because we
> immediately send drop_page. But it's wrong-ish.
Fixed
> >> Hence I had proposed introducing get_gfn_sleep, and I'd be happy if the
> >> wait queue code is invoked only there. And then we can call
> >> get_gfn_sleep
> >> in vmx_load_pdptrs, in guest page table walks, and in __hvm_copy.
> >
> > I see. In the longer term I definitely do not want get_gfn()'s callers
> > to have to deal with p2m_is_paged(), and PoD, and sharing, and every
> > other p2m hack we come up with. I want all that stuff hidden behind
> > get_gfn_*(), with at most a flag to say 'fix up magic MM tricks'.
> >
> > I don't like get_gfn_sleep() very much but I'll see if I can find a way
> > to do the equivalent with a flag to get_gfn_*(). But I suspect it will
> > end up with a tri-state argument of:
> > a) just give me the current state;
> > b) try (without sleeping) to fix up paging, PoD and sharing but
> > still maybe make me handle it myself by error propagation; and
> > c) just fix it up.
> >
> > 'b' is a pretty weird interface. Specially given that in corner cases,
> > 'make me handle it' might involve hitting a wait queue anyway. If we
> > intend in the fullness of time to get rid of it (as I hope we would)
> > then probably we shouldn't introduce it. (And if that means pushing
> > this whole change out past 4.2, we should consider doing that.)
> >
> It's weird. b) is an evolutionary side-effect. It should disappear. I
> guess what I'm arguing for is "it should disappear in the long term, not
> on a patch so close to 4.2"
Well, I tried making the 'but don't go to a waitq' flag; the
implementation is trivial on top of the patches I'm about to post but
the question of when to set it was so ugly I gave up.
I have hit another stumbling block though - get_two_gfns() can't be
safely ask get_gfn to populate or unshare if that might involve a waitq,
since by definition one of its two calls is made with a lock held. I
can't see a nice way of having it retry (not one that's guaranteed to
make progress) without pulling the fixup-and-retry up into that
function. I might do that in the next revision.
> > Can't be soon enough for me. I've been sharpening the axe for that one
> > for years now. :)
>
> What's stopping the death blow?
People said they still need it.
> >> You want to make sure the mfn is not valid. For p2m_ram_paging_out we
> >> still have the mfn in place.
> >
> > Doesn't p2m_mem_paging_populate already DTRT in all these cases? If
> > not, it really should.
>
> It's not about populate, it's about killing the domain unnecessarily in
> the in_atomic check.
Oh I see, thanks. Fixed.
> >> >> +
> >> >> + /* Safety catch: it _should_ be safe to wait here but if
> >> it's
> >> >> not,
> >> >> + * crash the VM, not the host */
> >> >> + if ( in_atomic() )
> >> >> + {
> >> >> + WARN();
> >> >> + domain_crash(p2m->domain);
> >> >> + }
> >> >> + else
> >> >> + {
> >> >> + current->arch.mem_paging_gfn = gfn;
> >> >> + wait_event(current->arch.mem_paging_wq, ({
> >> >> + mfn = p2m->get_entry(p2m, gfn, t, a,
> >> >> + p2m_query, page_order);
> >> >> + (*t != p2m_ram_paging_in);
> >>
> >> Well, first of all mfn->get_entry will not happen in a locked context,
> >> so
> >> you will exit the wait loop not holding the p2m lock and crash later.
> >
> > Yes - we always exit the loop without the lock; then we 'goto again' and
> > do things properly. I realise it's a bit inefficient, but on the
> > page-in path the p2m lookups shouldn't be the bottleneck. :) I did
> > write the version that handled the locking in this loop but it got a bit
> > twisty, plus we need the 'goto again' for other reasons (see below).
>
> Yes, but, you'll be reading the p2m in a racy manner.
I don't think there are any real races here. Either we see
p2m_paging_in (and someone will eventually wake us when it pages in)
or we don't (and we go back and do the lookup safely).
> > As a follow-up I ought to make the page-sharing fixup code that's just
> > below this use 'goto again' as well; we shouldn't really leave this
> > function until we get a lookup where all these cases are dealt with.
On closer inspection the page-sharing version is OK because it doesn't
release the lock, and the unshare function won't leave us with a
paged-out page.
> I think we won't be able to have a magic
> get_gfn_solve_everything_silently() call until we find a solution to the
> fact that get_gfn happens in locked contexts.
Yes, I agree, and I'm coming round to the opinion that we might be too
late to fix that in 4.2. :|
> I've though about a wacky
> idea that
> 1. schedules a sleep-n-retry softirq
> 2. fails the call
> 3. avoids crashing the domain on the unwind path
> 4. when executing softirqs pauses the vcpu as per 1.
> 5. When woken up retries the last failed hypercall/vmexit/whatever
>
> I shouldn't have said that out loud...
The first time I approached the idea of paging guest RAM (before Patrick
started on the current system) I prototyped something not a million
miles from this (basically, a sort of setjmp() on hypervisor entry) but
the question of making all hypercalls and VMEXITs idempotent scared me
off. :)
> >> >> void p2m_mem_paging_populate(struct domain *d, unsigned long gfn)
> >> >> {
> >> >> struct vcpu *v = current;
> >> >> - mem_event_request_t req;
> >> >> + mem_event_request_t req = {0};
> >> >> p2m_type_t p2mt;
> >> >> p2m_access_t a;
> >> >> mfn_t mfn;
> >> >> struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> >> + int send_request = 0;
> >> >>
> >> >> /* We're paging. There should be a ring */
> >> >> int rc = mem_event_claim_slot(d, &d->mem_event->paging);
> >>
> >> Given all the optimizations around the send_request flag, it might be
> >> worth moving the claim_slot call to an if(send_request) block at the
> >> bottom, and get rid of cancel_slot altogether. Claim_slot could
> >> potentially go (or cause someone else to go) to a wait queue, and it's
> >> best to not be that rude.
> >
> > Sure. It might be tricky to make sure we unwind properly in the failure
> > case, but I'll have a go at a follow-up patch that looks at that.
> >
> > (This is not really a regression in this patch AFAICS - in the existing
> > code
> > the get_gfn() callers end up calling this function anyway.)
>
> *After* they've put gfn. Not a regression, just cleaner code imo.
This is also called after putting the gfn - but in any case I've
reshuffled the mem_event_claim_slot in the new version.
Cheers,
Tim.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |