[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Shared disk corruption caused by migration
> The blkfront xenbus_driver doesn't have a "suspend" method. I was going to > add one to flush the outstanding requests from the migration source to fix > the problem. Or maybe we can cancel all outstanding I/O requests to > eliminate the concurrency between the two nodes. Does the Linux block I/O > interface allow the canceling of requests? > > Anyone else seeing this problem? Any other ideas for solutions? There's already work in progress on this. The simplest thing to do is to wait until the backend queues are empty before signalling the destination host to unpause the relocated domain. However, this would add to migration downtime. It would be nice if we could quickly cancel the IOs queued at the original host, but Linux doesn't have a good mechanism for this. For targets that support fencing it's possible to quickly and synchronously fence the original host. For other targets, we need to be a bit cunning to minimize downtime: we can actually start running the VM on the destination host before we've had the 'all queues empty' message from the source host. We just have to be careful to make sure that we don't issue any writes to blocks that also potentially still have writes pending on them in the source host. If such a write occurs, we have to stall issuing of the write until we receive the 'all queues empty' from the source host. However, such conflicting writes are actually pretty unusual, so the majority of relocations won't incur the stall. Stay tuned for a patch. Ian _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |