[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
On Wed, 16 Jan 2013, Alex Bligh wrote: > Stefano, > > --On 16 January 2013 14:34:34 +0000 Stefano Stabellini > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > > > It seems that the grant mapping is already gone by the time > > tcp_retransmit is called. > > That might happen because QEMU already completed the read/write > > operation and called xc_gnttab_munmap, that causes the grant_table and > > the m2p_override to remove the p2m and m2p mappings of the foreign > > pages. > > What I want to know is why QEMU is completing the read/write operation > before the write (as it surely must be a write) has completed in any > case. This /seems/ to happen only if a backing file is being used > but I'm not sure if that's just triggering the retransmits due to > (e.g.) a slow filer. > > If QEMU is completing writes before they've actually been done, haven't > we got a wider set of problems to worry about? Reading the thread you linked in a previous email, it seems that it can actually happen that a userspace application is told that the write is completed before all the outstanding network requests are dealt with. > Could the problem be "cache=writeback" on the QEMU command > line (evident from a 'ps'). If caching is writeback perhaps QEMU > needs to copy the data. Is there some setting to turn this off in > xl for test purposes? The command line cache options are ignored by xen_disk, so, assuming that the guest is using the PV disk interface, that can't be the issue. > > Isn't there a way to prevent tcp_retransmit from running when the > > request is already completed? Or stop it if you find out that the pages > > are already gone? > > But what would you do? If you don't run the tcp_retransmit the write > would be lost (to say nothing of the NFS connection to the server). Well, that is not true: if the write was really lost, the kernel wouldn't have completed the AIO write and notified QEMU. > > You could try persistent grants, that wouldn't solve the bug but they > > should be able to "hide" it pretty well. Not ideal, I know. > > The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd. > > Konrad issued a pull request recently with the corresponding Linux > > blkfront changes: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git > > stable/for-jens-3.8 > > That's presumably the fir 8 commits at: > http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/stable/for-jens-3.8 > > So I'd need a new dom0 kernel and to backport the QEMU patch. Yep. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |