[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V3] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote:
> On 21/11/12 13:26, Liu, Jinsong wrote:
>> Ian Campbell wrote:
>>> On Wed, 2012-11-21 at 11:34 +0000, George Dunlap wrote:
>>>> On 20/11/12 18:42, Ian Jackson wrote:
>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE: handle
>>>>> broken page with regard to migration"):
>>>>>> Ian Jackson wrote:
>>>>>>> Liu, Jinsong writes ("RE: [Xen-devel] [PATCH V3] X86/vMCE:
>>>>>>> handle broken page with regard to migration"):
>>>>>>>> No, at last lter, there are 4 points:
>>>>>>>> 1. start last iter
>>>>>>>> 2. get and transfer pfn_type to target
>>>>>>>> 3. copy page to target
>>>>>>>> 4. end last iter
>>>>> ...
>>>>>> It indeed checks mce after point 3 for each page, but what's the
>>>>>> advantage of keeping a separate list?
>>>>> It avoids yet another loop over all the pages.  Unless I have
>>>>> misunderstood.  Which I may have, because: if it checks for mce
>>>>> after point 3 then surely that is sufficient ?  We don't need to
>>>>> worry about mces after that check.
>>>> It's sufficient, but wouldn't each check require a separate
>>>> hypercall? That would surely be slower than just a single hypercall
>>>> and a loop (which is what Jinsong's patch does).
>>>> We don't actually need a list -- I think we just need to know,
>>>> "Have any pages broken between reading the p2m table (
>>>> xc_get_pfn_type_batch() ); if so, we do another full iteration.
>>> If a page fails between 2. and 3. above then what happens at point
>>> 3? I presume we can't map and send the page (since it is broken),
>>> do we get some sort of failure to map?
>>> What happens if the failure occurs during stage 3, i.e. while the
>>> page is mapped and we are reading from it?
>>> Ian.
>> If read a broken page, it generates more serious error (say, SRAR
>> error). 
>> I don't think guest has good opportunity to survive under this case
>> --> most probably it kill itself and of course we don't need care
>> migration now.  
>> However, if guest can luckly survive (say complete broken page
>> copying to target), it's OK to continue --> its broken pfn_type will
>> transfer to target next iter so guest will kill itself if access
>> then.   
> But in this case, I'm asking what happens if the migration code reads
> the page.  If reading the page in the migration code causes dom0 to
> crash, then the whole "last iteration" stuff is fairly pointless. :-)
>   -George

If migration code read the page it will trigger more serious error and may kill 
hypervisor or guest.

But unfortunately we cannot prevent it since we cannot predict whether a vmce 
will occur *during* migration. What we can do is do our best to handle it:
1. for vmce occur before migration, we can safely handle it;
2. for vmce occur during migration, we can only do our best:
  2.1 if fortunately vmce occur at some area (say, before point2), we can 
successfully prevent page reading;
  2.1 if vmce occur after point2, it will read the page, under such case
    * if guest/hypervisor can survive, it's OK to transfer broken pfn_type to 
target so that no further harm to target;
    * if guest/hypervisor crash, we definitely needn't care migration any more;
The key point is, before migration we have no way to predict it, and we cannot 
forbid migration for fear that it potentially crash system.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.