[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 0 of 2] x86/mm: Unsharing ENOMEM handling


  • To: "Tim Deegan" <tim@xxxxxxx>
  • From: "Andres Lagar-Cavilla" <andres@xxxxxxxxxxxxxxxx>
  • Date: Thu, 15 Mar 2012 07:35:17 -0700
  • Cc: andres@xxxxxxxxxxxxxx, adin@xxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxx
  • Delivery-date: Thu, 15 Mar 2012 14:35:43 +0000
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=lagarcavilla.org; h=message-id :in-reply-to:references:date:subject:from:to:cc:reply-to :mime-version:content-type:content-transfer-encoding; q=dns; s= lagarcavilla.org; b=MpiPYtwM7E1k6oKMaocXLKi3aR3Uz3jSKt2IKOD4FzDA jADCfjvD9xmy7FyTYo4gOSyqlwZ1SXBwKvTRJSDqBwhsr9o7s6nYcklTDZQ4B9TB 2xU0JjnQ8fD6zYFedD31si/nbZS0qGwBnHEaBInIH4145OT20BdMTjnTLIM9fzc=
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

> At 11:29 -0400 on 12 Mar (1331551776), Andres Lagar-Cavilla wrote:
>> These two patches were originally posted on Feb 15th as part of a larger
>> series.
>>
>> They were left to simmer as a discussion on wait queues took precedence.
>>
>> Regardless of the ultimate fate of wait queues, these two patches are
>> necessary
>> as they solve some bugs on the memory sharing side. When unsharing
>> fails,
>> domains would spin forever, hosts would crash, etc.
>>
>> The patches also clarify the semantics of unsharing, and comment how
>> it's
>> handled.
>>
>> Two comments against the Feb 15th series taken care of here:
>>  - We assert that the unsharing code can only return success or ENOMEN.
>>  - Acked-by Tim Deegan added to patch #1
>
> Applied, thanks.
>
> I'm a bit uneasy about the way this increases the amount of boilerplate
> and p2m-related knowledge that's needed at call sites, but it fixes real
> problems and I can't see an easy way to avoid it.
>
Agreed, completely. Luckily it's all internal to the hypervisor.

I'm gonna float an idea right now, risking egg-in-the-face again. Our main
issue is that going to sleep on a wait queue is disallowed in an atomic
context. For good reason, the vcpu goes to sleep holding locks. Therefore,
we can't magically hide all the complexity behind get_gfn, and callers
need to know things they shouldn't.

However, sleeping only deadlocks if the "waker upper" would need to grab
any of those locks.

This is not the case intrinsically for mem event ring congestion. Although
how congestion is handled may run into the problem. Let's see:

It is not at all the case for mem access: helpers take note of what
happened and issue a wake up.

It is not necessarily the case for sharing enomem: dom0 could balloon, or
any other domain could relinquish pages somehow (luckily no more global
sharing lock!).

But it is the case for paging, as paging_load and paging_resume would need
to grab the p2m lock. In this case, it is the sleeper's responsibility to
not go to sleep holding the p2m lock. Which I believe to be the case, save
for the get_two_gfn cases.

So, maybe we could just go to sleep on a wait queue holding locks. Which
would be extremely fragile semantics, and serious constraints for the
"waker uppers" ("nothing but RCU!"). But it would make for a nice API into
the p2m.

Andres

> Tim.
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.