[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration

Ian Campbell wrote:
> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:
>> On 19/11/12 09:55, Ian Campbell wrote:
>>> If we get to this stage then haven't we either already sent
>>> something over the wire for this page or marked it as dirty when we
>>> tried and failed to send it? 
>>> In the former case we don't care that the page is now broken on the
>>> source since the target has got a good pre-breakage copy.
>>> In the latter case could we not set a flag at the same time as we
>>> mark the page dirty which means "go round at least one more time"?
>> Yeah -- on the last iteration, the VM itself has to be paused; if any
>> pages get broken after that, it doesn't really matter, does it? The
>> real thing is to have a consistent "snapshot" of behavior.
>> I guess the one potentially tricky case to worry about is whether to
>> deliver an MCE to the guest on restore.  Consider the following
>> scenario: 
>> - Page A is modified (and marked dirty)
>> - VM paused for last iteration
>> - Page breaks, is marked broken in the p2m
>> - Save code sends page A
>> In that case, the save code would send a "broken" page, and the
>> restore code would mark a page as broken, and we *would* want to
>> deliver an MCE on the far side.  But suppose the last two steps were
>> reversed: 
>> - Page A modified
>> - VM paused for last iteration
>> - Save code sends page A
>> - Page breaks, marked broken in the p2m
>> In that case, when the save code sends page A, it will send a good
>> page; there's no need to mark it broken, or to send the guest an MCE.
> I guess you'd want to err on the side of stopping using a good page,
> as opposed to continuing to use a bad page? i.e. its better to take a
> spurious vMCE than to not take an actual one.
> I'm not actually sure what a guest does with a vMCE, I guess it does
> some sort of memory exchange to give the bad page back to the h/v and
> get a good page in return? If the hypervisor thinks the old page is ok
> rather than bad I guess it'll just put it in the free list instead of
> the bad list?

the broken page is h/w bad page, means its content corrupted and cannot be 
corrected by h/w circuit (say ECC).

For vmce and the broken page, guest and h/v are agnostic each other so I don't 
think h/v should have chance to do anything, except inject vMCE# to guest.
Guest handle vMCE# exactly same as native, like, to simply ignore vMCE, to kill 
process and isolate broken page if possible (in its own range), or kill guest 
itself if not possible, depending on what the error is.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.