Re: [Xen-devel] Fatal crash on xen4.2 HVM + qemu-xen dm + NFS

On Mon, 2013-01-21 at 17:06 +0000, Alex Bligh wrote:
> >> I'm wondering whether what's happening is that when the disk grows
> >> (or there's a backing file in place) some sort of different I/O is
> >> done by qemu. Perhaps irrespective of write cache setting, it does some
> >> form of zero copy I/O when there's a backing file in place.
> >
> > I doubt that, but I don't really know anything about qdisk.
> >
> > I'd be much more inclined to suspect a bug in the xen_qdisk backend's
> > handling of disks resizes, if that's what you are doing.
> We aren't resizing the qcow2 disk itself. What we're doing is
> creating a 20G (virtual size) qcow2 disk, containing a 3G (or
> so) Ubuntu image - i.e. the partition table says it's 3G. We
> then take a snapshot of it and use that as a backing file. The
> guest then writes to the partition table enlarging it to the
> virtual size of the disk, then resizes the file system. This
> triggers it. Unless QEMU has some special reason to care about
> what is in the partition table (e.g. to support the old xen
> 'mount a file as a partition' stuff), it's just a pile of sectors
> being written.
> > tap == blktap2. I don't know if it supports qcow or not but I don't
> > think xl exposes it if it does.
> Well, in xl's conf file we are using
>  disk = [ 'tap:qcow2:/my/nfs/directory/testdisk.qcow2,xvda,w' ]
> I think that's how you are meant to do qcow2 isn't it?

See docs/misc/xl-disk-configuration.txt, the "tap" prefix is deprecated
and ignored by xl. Sorry, I didn't think of this usage of "tap" above.
With xend the tap: prefix did force blktap (1 or 2) to be used. xl tries
to pick the most suitable, and picks xen_qdisk for qcow, I think always.

> > You could try with a test .vhd or .raw file though.
> We can do this but I'm betting it won't fail (at least with .raw)
> as it only breaks on qcow2 if there's a backing file associated
> with the qcow2 file (i.e. if we're writing to a snapshot).
> > Unfortunately it won't be zero. There will be at least one reference
> > from the page being part of the process, which won't be dropped until
> > the process dies.
> OK, well this is my ignorance of how the grant mechanism work.
> I had assumed the page from the relevant domU got mapped into the
> process in dom0, and that when it was unmapped it would be mapped
> back out of the process's memory. Otherwise would the process's
> memory map not fill up?

The page is mapped out of the user process like you expect. The problem
is that you cannot tell if the network stack still has a reference after
the write() syscall has finished. if you were to assume it did then you
would indeed fill the processes memory map.


