[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce
On 27/06/14 19:15, Tim Deegan wrote: > At 18:28 +0100 on 27 Jun (1403890088), David Vrabel wrote: >> On 27/06/14 17:51, Andrew Cooper wrote: >>> Overall, it would appear that there needs to be a hook for all PV >>> drivers to force quiescence. In particular, a backend must guarantee to >>> unmap all active grant maps (so the frames get properly reflected in the >>> dirty bitmap), and never process subsequent requests (so no new frames >>> appear dirty in the bitmap after the guest has been paused). >> I think this would be much too expensive for snapshots and things like >> remus. Waiting for all outstanding I/O could take seconds. > The other option we talked about yesterday was a flag to the log-dirty > operation that reports all grant-mapped frames as dirty. Then the > tools would add such frames to the final pass. That could take a long > time too, of course. > > I'm not sure how you would synchronize the final pass with backends > that were doing grant copy operations -- you could exclude copies for > the duration, but I'm not sure what that would look like for the > backend. > > Tim. Hmm - I have a crazy idea. As identified by David, it is impractical to wait for backends to complete any outstanding requests and unmap the grants, as this could take seconds. However, what the backend can do very quickly is guarantee that it will never start processing any further requests, and never mark subsequently-completed requests as complete in the ring. This means that a the backend will not submit any new grant copy operations, or regular copies to/from persistent grants, and even if a hardware device has a dma mapping of an active grant, the request will not be marked as completed in the ring. Even if the eventual dma'd pages end up dirty, the frontend will replay the uncompleted requests in the ring and be mostly fine[1]. Combined with a XEN_DOMCTL_SHADOW_OP_PEEK_INCLUDING_ACTIVE_GRANTS (name subject to improvement), the migration code can guarantee that there will be no corruption of the ring, and no relevant corruption of guest memory. I *believe* this covers all the cases, and doesn't depend on waiting for the backends to fully complete all outstanding requests. ~Andrew [1] The caveat is a pending read followed by a write of the same block which, once replayed, might be out-of-order if the write did take effect on the source side. Any frontends which care about this must wait for all write requests to complete before entering the suspend state. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |