[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Migration memory corruption - PV backends need to quiesce

On Mon, 2014-06-30 at 12:57 +0100, David Vrabel wrote:
> On 30/06/14 12:07, Ian Campbell wrote:
> > On Mon, 2014-06-30 at 12:52 +0200, Tim Deegan wrote:
> >> At 12:14 +0200 on 30 Jun (1404126862), Tim Deegan wrote:
> >>> At 10:47 +0100 on 30 Jun (1404121679), David Vrabel wrote:
> >>>> Shared ring updates are strictly ordered with respect to the writes to
> >>>> data pages (either via grant map or grant copy).  This means that is the
> >>>> guest sees a response in the ring it is guaranteed that all writes to
> >>>> the associated pages are also present.
> >>>
> >>> Is the ring update also strictly ordered wrt the grant unmap operation?
> >>>
> >>>> The write of the response and the write of the producer index are
> >>>> strictly ordered.  If the backend is in the process of writing a
> >>>> response and the page is saved then the partial (corrupt) response is
> >>>> not visible to the guest.  The write of the producer index is atomic so
> >>>> the saver cannot see a partial producer index write.
> >>>
> >>> Yes.  The (suggested) problem is that live migration does not preserve
> >>> that write ordering.  So we have to worry about something like this:
> >>>
> >>> 1. Toolstack pauses the domain for the final pass.  Reads the final
> >>>    LGD bitmap, which happens to include the shared ring but not the
> >>>    data pages.
> >>> 2. Backend writes the data.
> >>> 3. Backend unmaps the data page, marking it dirty.
> >>> 4. Backend writes the ring.
> >>> 5. Toolstack sends the ring page across in the last pass.
> >>> 6. Guest resumes, seeing the I/O marked as complete, but without the
> >>>    data.
> >>
> >> It occurs to me that the guest should be able to defend against this
> >> by taking a local copy of the response producer before migration and
> >> using _that_ for the replay logic afterwards.  That is guaranteed to
> >> exclude any I/O that completed after the VM was paused, and as long as
> >> the unmap is guaranteed to happen before the ring update, we're OK.
> > 
> > AIUI blkfront at least maintains it's own shadow copy of the ring at all
> > times, and the recovery process doesn't use the migrated copy of the
> > ring at all (at least not the responses). I might be misunderstanding
> > the code there though.
> Yes, this is what it looks like to me.  netfront is also safe since it
> just discards everything and doesn't replay at all.
> This does rather feel like we're discovering problems that were
> identified (and fixed) a long time ago.
> I think there needs to be better documentation of front/backend drivers
> so people know how to write them correctly.

Yes :-/

> I'll try to carve out some
> time for this if no one else volunteers...

Thanks, that would be wonderful as always.

(Maybe add a TODO to the doc day page too?)

> Perhaps all we need is a clear statement of: The contents of shared ring
> are undefined on resume.  A frontend driver must not use the shared ring
> to replay any requests.

I *think* that should be sufficient.

> >> (That still leaves the question that Andrew raised of memcpy()
> >> breaking atomicity/ordering of updates.)
> > 
> > That's the memcpy in the migration code vs the definitely correctly
> > ordered updates done by the b.e., right?
> But if the frontend doesn't use the shared ring for replay, it doesn't
> matter if this memcpy is correctly ordered or not?

That's my understanding, yes.

> David

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.