[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"
Hey all! I'm currently in the process of migrating a (Gentoo-based) Xen-server to use Xen 4.0.0 (where I'm using the Xen ebuilds from bugs.gentoo.org), and I'm having severe problems with tapdisk2 (which I wish to use to do I/O prioritizing using CFQ on the LVM-based backing storage of a virtual server). It seems that after a while of heavy I/O in the virtual domain, the communication between the (paravirtualized) DomU and Dom0 (the tapdisk2-process) breaks, in that no more interrupts are delivered to Dom0 for I/O requests from the virtual domain, and as such the virtual host "loses" its harddisk (but does not "break" besides not responding). The network front-/backend is not affected by this communication loss, AFAICT. The virtual host can be destroyed by an xm destroy, but the created blktap2 interface does not disappear until the next reboot, and cannot be removed by the respective sysfs accesses (rather, echoing a 1 into "remove" blocks, too, and is "unkillable", i.e. stays in kernel space). After a blktap2 device has entered this broken state, no more hosts can be created by xm create (that blocks, too), and the host system must be rebooted to enter a usable state again. I've not been able to provoke this breakage by "normal" I/O (i.e., when the hosts run normally), but I have been able to provoke it by using bonnie, which after a short period of substained read/write I/O of +120MB/s will freeze the blktap2 device. The Dom0 and the DomU kernels that are being used are xen-sources-2.6.32-r1 (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel sources, AFAIK) from the official portage tree; the kernel configuration that's in use is attached. I've tried iommu=off for xen (the mobo doesn't support VT-d anyway, so Xen never turns it on), and I've also looked for any signs of errors appearing when setting verbosity 9 for the blktap2 module and loglvl=all and guest_loglvl=all for Xen, but there are no errors that I've seen so far. Strace-ing the tapdisk2 process reveals that it's blocked on select(), and none of the descriptors it's polling on ever return as readable (which is the condition that tapdisk2 queries), rather they always timeout after 600s. Thanks in advance for any hint as to what is causing this, or if there's anything I might try to get things working... PS: I have to boot with acpi=off, as the mobo won't reboot when acpi is turned on for Dom0 (not even when disabling ACPI reboots), but using acpi directly doesn't change that blktap2 blocks. --- Heiko. Attachment:
config Attachment:
dmesg.dump Attachment:
interrupts Attachment:
lspci.dump Attachment:
xmdmesg.dump _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |