[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS

To: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>
From: Alex Bligh <alex@xxxxxxxxxxx>
Date: Wed, 16 Jan 2013 15:06:49 +0000
Cc: Alex Bligh <alex@xxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, Xen Devel <xen-devel@xxxxxxxxxxxxx>
Delivery-date: Wed, 16 Jan 2013 15:07:23 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

Stefano,

--On 16 January 2013 14:34:34 +0000 Stefano Stabellini<stefano.stabellini@xxxxxxxxxxxxx> wrote:

It seems that the grant mapping is already gone by the time
tcp_retransmit is called.
That might happen because QEMU already completed the read/write
operation and called xc_gnttab_munmap, that causes the grant_table and
the m2p_override to remove the p2m and m2p mappings of the foreign
pages.


What I want to know is why QEMU is completing the read/write operation
before the write (as it surely must be a write) has completed in any
case. This /seems/ to happen only if a backing file is being used
but I'm not sure if that's just triggering the retransmits due to
(e.g.) a slow filer.

If QEMU is completing writes before they've actually been done, haven't
we got a wider set of problems to worry about?

Could the problem be "cache=writeback" on the QEMU command
line (evident from a 'ps'). If caching is writeback perhaps QEMU
needs to copy the data. Is there some setting to turn this off in
xl for test purposes?

Isn't there a way to prevent tcp_retransmit from running when the
request is already completed? Or stop it if you find out that the pages
are already gone?


But what would you do? If you don't run the tcp_retransmit the write
would be lost (to say nothing of the NFS connection to the server).

You could try persistent grants, that wouldn't solve the bug but they
should be able to "hide" it pretty well. Not ideal, I know.
The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
Konrad issued a pull request recently with the corresponding Linux
blkfront changes:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
stable/for-jens-3.8


That's presumably the fir 8 commits at:
http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/stable/for-jens-3.8

So I'd need a new dom0 kernel and to backport the QEMU patch.

--
Alex Bligh

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  - From: Stefano Stabellini
- Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  - From: Alex Bligh

References:
- Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  - From: Alex Bligh
- Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
  - From: Stefano Stabellini

Prev by Date: Re: [Xen-devel] [PATCH] xen: return a per-mapping error from XENMEM_add_to_physmap_range.
Next by Date: Re: [Xen-devel] [PATCH] xen: return a per-mapping error from XENMEM_add_to_physmap_range.
Previous by thread: Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
Next by thread: Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.