[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS


--On 21 January 2013 15:23:10 +0000 Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:

On Mon, 2013-01-21 at 15:15 +0000, Alex Bligh wrote:
Surely before Xen removes the grant on the page, unmapping it from dom0's
memory, it should check to see if there are any existing references
to the page and if there are, given the kernel its own COW copy, rather
than unmap it totally which is going to lead to problems.

Unfortunately each page only has one reference count, so you cannot
distinguish between references from this particular NFS write from other
references (other writes, the ref held by the process itself, etc).

My old series added a reference count to the SKB itself exactly so that
it would be possible to know when the network stack was truly finished
with the page in the context of a specific operation.

Unfortunately due to lack of time I've not been able to finish those

Does that apply even when O_DIRECT is not being used (which I don't
think it is by default for upstream qemu & xen, as it's
cache=writeback, and cache=none produces a different failure)?

If so, I think it's the case that *ALL* NFS dom0 access by Xen domU
VMs is unsafe in the event of tcp retransmit (both in the sense that
the grant can be freed up causing a crash, or the domU's data can be
rewritten post write causing corruption). I think that would also
apply to iSCSI over tcp, which would presumably suffer similarly.

Is that analysis correct?

Alex Bligh

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.