Xen project Mailing List

Re: [Xen-devel] [PATCH V4] X86/vMCE: handle broken page with regard to migration

To: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>

Date: Fri, 30 Nov 2012 18:57:11 +0000

Accept-language: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Fri, 30 Nov 2012 18:57:43 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHNzh9lH+TZqbearkyCq5DrDs2IGJgCvApA

Thread-topic: [Xen-devel] [PATCH V4] X86/vMCE: handle broken page with regard to migration

George Dunlap wrote: > On 29/11/12 10:02, Ian Campbell wrote: >> On Wed, 2012-11-28 at 14:37 +0000, Liu, Jinsong wrote: >>> Ping? >> Sorry I've been meaning to reply but didn't manage to yet. Also you >> replied to V4 saying to ignore it, so I was half waiting for V5 but I >> see this should actually be labelled V5 anyway. >> >> I'm afraid I still don't fully grok the reason for the loop that goes >> with: >> + /* >> + * At the last iter, count the number of broken pages >> after sending, + * and if there are more than before >> sending, do one or more iter + * to make sure the pages >> are marked broken on the receiving side. + */ >> >> Can we go through it one more time? Sorry. >> >> Let me outline the sequence of events and you can point out where I'm >> going wrong. I'm afraid this has turned out to be rather long, again >> I'm >> sorry for that. >> >> First we do some number of iterations with the guest live. If an MCE >> occurs during this phase then the page will be marked dirty and we >> will >> pick this up on the next iteration and resend the page with the dirty >> type etc and all is fine. This all looks good to me, so we don't >> need to >> worry about anything at this stage. >> >> Eventually we get to the last iteration, at which point we pause the >> guest. From here on in the guest itself is not going to cause an MCE >> (e.g. by touching its RAM) because it is not running but we must >> still >> account for the possibility of an asynchronous MCE of some sort e.g. >> triggered by the error detection mechanisms in the hardware, cosmic >> rays >> and such like. >> >> The final iteration proceeds roughly as follows. >> >> 1. The domain is paused >> 2. We scan the dirty bitmap and add dirty pages to the batch of >> pages to process (there may be several batches in the last >> iteration, we only need to concern ourselves with any one >> batch here). >> 3. We map all of the pages in the resulting batch with >> xc_map_foreign_bulk >> 4. We query the types of all the pages in the batch with >> xc_get_pfn_type_batch >> 5. We iterate over the batch, checking the type of each page, >> in some cases we do some incidental processing. >> 6. We send the types of the pages in the batch over the wire. >> 7. We iterate over the batch again, and send any normal page >> (not broken, xtab etc) over the wire. Actually we do this >> as runs of normal pages, but the key point is we avoid >> touching any special page (including ones marked as broken >> by #4) >> >> Is this sequence of events accurate? >> >> Now lets consider the consequences of an MCE occurring at various >> stages >> here. >> >> Any MCE which happens before #4 is fine, we will pick that up in #4 >> and >> the following steps will do the right thing. >> >> Note that I am assuming that the mapping step in #3 is safe even for >> a >> broken page, so long as we don't actually try and use the mapping >> (more >> on that later), is this true? >> >> If an MCE occurs after #4 then the page will be marked as dirty in >> the >> bitmap and Xen will internally mark it as broken, but we won't see >> either of those with the current algorithm. There are two cases to >> think >> about here AFAICT, >> A. The page was not already dirty at #2. In this case we know >> that the guest hasn't dirtied the page since the previous >> iteration and therefore the target has a good copy of this >> page from that time. The page isn't even in the batch we >> are processing So we don't particularly care about the MCE >> here and can, from the PoV of migrating this guest, ignore >> it. B. The page was already dirty (but not broken, we handled >> that case above in "Any MCE which happens before #4...") at >> #2 which means we have do not have an up to date copy on >> the target. This has two subcases: I. The MCE >> occurs before (or during) #6 (sending the page) and >> therefore we do not have a good up to date copy of >> that data at either end. II. The MCE occurs after #6, >> in which case we already have a good copy at the >> target end. >> >> To fix B you have added an 8th step to the above: >> >> 8. Query the types of the pages again, using >> xc_get_pfn_type_batch, and if there are more pages dirty now >> than we say at #4 (actually #5 when we scanned the array, >> but that distinction doesn't matter) then a new MCE must >> have occurred. Go back to #2 and try again. >> >> This won't do anything for A since the page wasn't in the batch to >> start >> with and so neither #4 or #8 will look at its type, this is good and >> proper. >> >> So now we consider the two subcases of B. Lets consider B.II first >> since >> it seems to be the more obvious case. >> >> In case B.II the target end already has a good copy of the data page, >> there is no need to mark the page as broken on the far end, nor to >> arrange for a vMCE to be injected. I don't know if/how we arrange for >> vMCEs to be injected under these circumstances, however even if a >> vMCE >> does get injected into the guest when it eventually gets unpaused on >> the >> target then all that will happen is that it will needlessly throw >> away a >> good page. However this is a rare corner case which is not worth >> concerning ourselves with (it's largely indistinguishable from case >> A). >> If the MCE had happened even a single cycle earlier then this would >> have >> been a B.I event instead of a B.II one. In any case there is no need >> to >> return to #2 and try again, everything will be just fine if we >> complete >> the migration at this point. >> >> In case B.I the MCE occurs before (or while) we send the page onto >> the >> wire. We will therefore try to read from this page because we haven't >> looked at the type since #4 and have no idea that it is now broken. >> Reading from the broken page will cause a fault, perhaps causing a >> vMCE >> to be delivered to dom0, which causes the kernel to kill the process >> doing the migration. Or maybe it kills dom0 or the host entirely. >> Either >> way the idea of looping again is rather moot. >> >> Have I missed a case which needs thinking about? >> >> I suspect B.I is the case where you are most likely to find a flaw >> in my argument. Is there something else which is done in this case >> which would >> allow us to continue? > > I think your analysis is correct -- the only question is whether B.I > will 100% cause the migration to crash, or whether there's a chance of > not crashing on the read. I had tried to ask that question before, > and understood Jinsong's response to be that it's not 100% sure that > the read would cause an error. However, looking back at the thread, > I think I may have understood something that was not there. :-) > > So I guess the question for Jinsong is this: > > The only time this extra loop could help is if there is a page broken > after being paused but before being sent the last time (B.I in Ian's > analysis) -- in which case, the migration code is 100% guaranteed to > read a now-broken page. What are the chances that this read of a > broken page will *not* cause a fault which will kill at least the > migration process, if not dom0? If the chances are "slim-to-none", > then there is no point for the extra check. > > -George Yes, the extra check is pointless, xl migration reading broken page will eventually kill system. See my reply IanC's email :-) Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.