[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [Qemu-devel] [PATCHv3] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes.
Paolo, --On 18 March 2013 15:05:08 +0100 Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: Presumably the same way as if writeback caching is selected. I presume that must fsync() / fdatasync() all the data to disk, and a barrier will produce one of those.No, that's done already. The source does an fsync/fdatasync before terminating the migration. The problem is that the target's page cache might host image data from a previous run. If you do not use O_DIRECT, it will not see the changes made on the source. I was under the impression that with cache=writeback, qemu doesn't use O_DIRECT, in which case why isn't there the same danger under kvm, i.e. that the target page cache contains data from a previous run? It would be great to fix the kernel bug (and I have submitted code), but the fix is pretty intrusive (see the link I posted) and there appears to be little interest in taking it forward. Certainly my kernel hacking skills are not adequate to the task. The current position is that booting a Xen domU which does disk I/O (Ubuntu cloud image used as the test case) with an NFS root crashes dom0 absolutely repeatably, and kills all other guests. Unless and until there is a kernel fix for that, Xen is in essence unusable with HVM and network based disk backend. So we need a workaround in the meantime which doesn't require a kernel fix.If you want to have this patch, you need to detect the bug and only do the hack if the bug is detect. Plus, disable migration when the hack is in use. I originally suggested having this as an option (detecting it live and non-destructively is practically impossible - suggestions welcome), but xen-devel felt it should just be changed. My original preference was for xl to process cache= type options (so those using a local file system known to be safe could use O_DIRECT still), but that requires a change to xenstore, was not popular, and is probably too intrusive. I patched it the way the xendevel folks wanted. Disabling migration seems a bit excessive when migration isn't disabled with cache=unsafe (AFAIK), and the alternative (using O_DIRECT) is far far more unsafe (one tcp retransmit and your system is dead). 1) why does blkback not have the bug? 2) does it also affect virtio disks (or perhaps AHCI too)? I think Stefano experimented with virtio in Xen. If it does, then you're working around the problem in the wrong place. I believe it affects PV disks and not emulated disks as emulated disks under Xen do not use O_DIRECT (despite migration apparently working notwithstanding your comment above). Stefano did ack the patch, and for a one line change it's been through a pretty extensive discussion on xen-devel ... I've no idea what else it affects. I'd suggest it also affects kvm, save that the kvm 'bad' will be writing the wrong data, not hosing the whole machine. -- Alex Bligh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |