[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS


--On 16 January 2013 14:34:34 +0000 Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote:

It seems that the grant mapping is already gone by the time
tcp_retransmit is called.
That might happen because QEMU already completed the read/write
operation and called xc_gnttab_munmap, that causes the grant_table and
the m2p_override to remove the p2m and m2p mappings of the foreign

What I want to know is why QEMU is completing the read/write operation
before the write (as it surely must be a write) has completed in any
case. This /seems/ to happen only if a backing file is being used
but I'm not sure if that's just triggering the retransmits due to
(e.g.) a slow filer.

If QEMU is completing writes before they've actually been done, haven't
we got a wider set of problems to worry about?

Could the problem be "cache=writeback" on the QEMU command
line (evident from a 'ps'). If caching is writeback perhaps QEMU
needs to copy the data. Is there some setting to turn this off in
xl for test purposes?

Isn't there a way to prevent tcp_retransmit from running when the
request is already completed? Or stop it if you find out that the pages
are already gone?

But what would you do? If you don't run the tcp_retransmit the write
would be lost (to say nothing of the NFS connection to the server).

You could try persistent grants, that wouldn't solve the bug but they
should be able to "hide" it pretty well. Not ideal, I know.
The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
Konrad issued a pull request recently with the corresponding Linux
blkfront changes:


That's presumably the fir 8 commits at:

So I'd need a new dom0 kernel and to backport the QEMU patch.

Alex Bligh

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.