[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.

On Tue, 19 Mar 2013, Paolo Bonzini wrote:
> Il 19/03/2013 11:06, George Dunlap ha scritto:
> > On Mon, Mar 18, 2013 at 6:00 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> >> Il 18/03/2013 18:38, George Dunlap ha scritto:
> >>>>>
> >>>> This might be a difference between Xen and KVM. On Xen migration is
> >>>> made to a server in a paused state, and it's only unpaused when
> >>>> the migration to B is complete. There's a sort of extra handshake at
> >>>> the end.
> >>>
> >>> I think what you mean is that all the memory is handled by Xen and the
> >>> toolstack, not by qemu.  The qemu state is sent as the very last thing,
> >>> after all of the memory, and therefore (you are arguing) that qemu is
> >>> not started, and the files cannot be opened, until after the migration
> >>> is nearly complete, and certainly until after the file is closed on the
> >>> sending side.
> >>
> >> That would be quite dangerous.  Files aren't closed until after QEMU
> >> exits; at this point whatever problem you have launching QEMU on the
> >> destination would be unrecoverable.
> > 
> > But if I understand your concern correctly, you were concerned about
> > the following scenario:
> > R1. Receiver qemu opens file
> > R2. Something causes receiver kernel to cache parts of file (maybe
> > optimistic read-ahead)
> For some image formats, metadata is cached inside QEMU on startup.
> There is a callback to invalidate QEMU's cache at the end of migration,
> but that does not extend to the page cache.


> > Well, if qemu isn't present at the destination, that's definitely user
> > error. :-)  In any case, I know that he migrate can resume if it
> > fails, so I suspect that the qemu is just paused on the sending side
> > until the migration is known to complete.  As long as the last write
> > was flushed to the NFS server before the receiver opens the file, we
> > should be safe.
> Note that the close really must happen before the next open.  Otherwise
> the file metadata might not be up-to-date on the destination, too.

First of all, I want to thank Paolo for spending time on this, because
it's a very delicate issue. I didn't spot this possible race before. If
we make a mistake now, in 6 months time a very poor soul will be
spending weeks debugging a difficult, not very reproducible, bug.

Also, just as a clarification for the audience at home, it's either safe
or it's not: if it's not safe, we certainly can't have this patch go in.
If it is safe, then it should go in.

This patch only impacts the PV backend in QEMU, not the IDE interface.
PV frontends and backends always disconnect and reconnect during
So we can be *certain* that bdrv_close at the sender side is called
before the new connection is established at the receiver side.

Unfortunately the QEMU PV backend opens the disk during initialization,
so before the actual connection is established (blk_init).

Therefore I think that the current change is not safe, but it is pretty
easy to make it safe.
You just need to move the call to blk_open from blk_init to blk_connect.
Actually you can move most of blk_init to blk_connect, you just need to
keep the 4 xenstore_write_be_int at the end of the function.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.