Xen project Mailing List

Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce

To: David Vrabel <david.vrabel@xxxxxxxxxx>

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Mon, 30 Jun 2014 10:53:08 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Xen-devel List <xen-devel@xxxxxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>

Delivery-date: Mon, 30 Jun 2014 09:53:15 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Mon, 2014-06-30 at 10:47 +0100, David Vrabel wrote: > On 30/06/14 09:38, Ian Campbell wrote: > > On Fri, 2014-06-27 at 17:51 +0100, Andrew Cooper wrote: > >> Hello, > >> > >> After a long time fixing my own memory corruption bugs with migration > >> v2, I have finally tracked down (what I really really hope is) the last > >> of the corruption. > >> > >> > >> There appears to be a systematic problem affecting all PV drivers, > >> whereby a non-quiescent backend can cause memory corruption in the VM. > >> > >> Active grant mapped pages are only reflected in the dirty bitmap after > >> the grant has been unmapped, as mapping the ring read-only would be > >> catastrophic to performance, and remapping as read-only when logdirty is > >> enabled is (as far as I understand) impossible, as Xen doesn't track the > >> PTEs pointing at granted frames. > >> > >> PV backend drivers hold their mappings of the rings (and persistently > >> granted frames) open until the domain is destroyed, which is after the > >> memory image has been sent. Therefore, any requests which are processed > >> after the migration code sending the ring frame on its first pass will > >> not be reflected in the resumed domain, as this frame will never be > >> marked as dirty in Xen. > >> > >> Furthermore, as the migration code uses memcpy() on the frames, it is > >> possible that a backed update intersects with the copy, and a corrupt > >> descriptor appears on the resumed side. > >> > >> In addition, after the domain has been paused, the backend might still > >> process requests. The migration code excepts the guest be completely > >> quiesced after it has been suspended, so will only check the dirty > >> bitmap once. Any requests which get processed and completed might still > >> be missed by the migration code. > >> > >> From a heavily instrumented Xen and migration code, I am fairly sure I > >> have confirmed that all pages corrupted on migration are a result of > >> still-active grant maps, grant copies which complete after domain > >> suspend, or the xenstore ring which xenstored has a magic mapping of, > >> and will never be reflected in the dirty bitmap. > >> > >> > >> Overall, it would appear that there needs to be a hook for all PV > >> drivers to force quiescence. In particular, a backend must guarantee to > >> unmap all active grant maps (so the frames get properly reflected in the > >> dirty bitmap), and never process subsequent requests (so no new frames > >> appear dirty in the bitmap after the guest has been paused). > >> > >> Thoughts/comments? > > > > I thought PV drivers were already (supposed to be) handling this in the > > frontend. > > > > For reasons of checkpoint performance I think Linux's net and blkfront > > are handling this on resume rather than on suspend by tearing down on > > resume and then requeueing any outstanding I/O after they reattach to > > the new backend. In the blkfront case this is explicit, whereas IIRC > > netfront just discards any active requests and relies on L3+ > > retransmition to get the job done. (see netfront_resume and > > blkfront_resume/blkif_recover). > > > > Part of the tear down and reconnect should involve invalidating any > > inflight descriptors, whether or not they were partially completed or > > have corrupted replies in them etc. This ought to be happening before > > the new backend sees the ring at all. > > > > Can you give an example of an instance of the corruption which you've > > seen? > > > > Do you have a theory why this wasn't seen with the migration v1 code? > > I think it does not result in any problems (except if persistent grants > are used, see below). > > Shared ring updates are strictly ordered with respect to the writes to > data pages (either via grant map or grant copy). This means that is the > guest sees a response in the ring it is guaranteed that all writes to > the associated pages are also present. > > The write of the response and the write of the producer index are > strictly ordered. If the backend is in the process of writing a > response and the page is saved then the partial (corrupt) response is > not visible to the guest. The write of the producer index is atomic so > the saver cannot see a partial producer index write. Exactly. > Using persistent grants means that backend writes may be lost since the > memcpy in the backend does not update the dirty bitmap and guest writes > to the shared ring may make the backends response writes visible. I > think we may need to disable persistent grant support from blkback until > a mechanism for updating the dirty log is in place. Yes, sadly I think you might be right here. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.