[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration



On 19/11/12 16:57, Ian Campbell wrote:
On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:
On 19/11/12 09:55, Ian Campbell wrote:
If we get to this stage then haven't we either already sent something
over the wire for this page or marked it as dirty when we tried and
failed to send it?

In the former case we don't care that the page is now broken on the
source since the target has got a good pre-breakage copy.

In the latter case could we not set a flag at the same time as we mark
the page dirty which means "go round at least one more time"?
Yeah -- on the last iteration, the VM itself has to be paused; if any
pages get broken after that, it doesn't really matter, does it? The real
thing is to have a consistent "snapshot" of behavior.

I guess the one potentially tricky case to worry about is whether to
deliver an MCE to the guest on restore.  Consider the following scenario:

- Page A is modified (and marked dirty)
- VM paused for last iteration
- Page breaks, is marked broken in the p2m
- Save code sends page A

In that case, the save code would send a "broken" page, and the restore
code would mark a page as broken, and we *would* want to deliver an MCE
on the far side.  But suppose the last two steps were reversed:

- Page A modified
- VM paused for last iteration
- Save code sends page A
- Page breaks, marked broken in the p2m

In that case, when the save code sends page A, it will send a good page;
there's no need to mark it broken, or to send the guest an MCE.
I guess you'd want to err on the side of stopping using a good page, as
opposed to continuing to use a bad page? i.e. its better to take a
spurious vMCE than to not take an actual one.

While that's true, taking a spurious MCE means at very least one less page available to the guest to use (for HVM guests that haven't ballooned down, at least), and the unnecessary loss of the data in that page.

The problem I guess is that the save code at the moment has no way of distinguishing the following cases: 1. Marked broken after the last time I sent it, but before the VM was paused; but the page hasn't been written to
2. Marked broken after the VM was paused; page hadn't been written to
3. Marked broken after the VM was paused, but the page had been written to

In case 1, we definitely need to send a broken page; but the VM may have already received a vMCE. In case 2, we don't need to send a broken page or a vMCE, while in #3 we need to do both.

On the other hand, the whole situation is hopefully rare enough that maybe we can just do the simple correct thing, even if it's a tiny bit sub-optimal. In that case, assuming that spurious vMCEs aren't a problem (e.g., #1), I think we basically just need to see if the last iteration contains a broken page, and if so, send the guest a vMCE on resume.

Thoughts?

I'm not actually sure what a guest does with a vMCE, I guess it does
some sort of memory exchange to give the bad page back to the h/v and
get a good page in return? If the hypervisor thinks the old page is ok
rather than bad I guess it'll just put it in the free list instead of
the bad list?

Yes, I'm pretty sure the hypervisor's accounting of broken pages is separate from guest p2m entries; I think if you mark a p2m entry broken, the hypervisor will just free the ram page that was mapped there before.

I think the guest just tries to recover gracefully when it gets a vMCE (e.g., by re-reading the page from disk or killing the process). I don't think it asks the hypervisor for another page to replace it at this point.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.