[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"



Just to let you know: I let the machine which tested the 2.6.32-xen-r2
kernel run with it (i.e., I didn't downgrade again), and the machine froze
completely yesterday (out of the blue, without any specific strain on the
machines running on it, using the "known working" Xen setup). This didn't
happen for me with 2.6.32-xen-r1.

--- Heiko.


-----Ursprüngliche Nachricht-----
Von: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
[mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] Im Auftrag von Andrew Lyon
Gesendet: Montag, 17. Mai 2010 11:08
An: Heiko Wundram
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Betreff: Re: [Xen-users] Xen 4.0.0 - tapdisk2 "hangs"

On Wed, May 5, 2010 at 7:02 PM, Andrew Lyon <andrew.lyon@xxxxxxxxx> wrote:
> On Tue, May 4, 2010 at 2:09 PM, Heiko Wundram <modelnine@xxxxxxxxxxxxx>
wrote:
>> Hey all!
>>
>> I'm currently in the process of migrating a (Gentoo-based) Xen-server to
use
>> Xen 4.0.0 (where I'm using the Xen ebuilds from bugs.gentoo.org), and I'm
>> having severe problems with tapdisk2 (which I wish to use to do I/O
>> prioritizing using CFQ on the LVM-based backing storage of a virtual
>> server).
>>
>> It seems that after a while of heavy I/O in the virtual domain, the
>> communication between the (paravirtualized) DomU and Dom0 (the
>> tapdisk2-process) breaks, in that no more interrupts are delivered to
Dom0
>> for I/O requests from the virtual domain, and as such the virtual host
>> "loses" its harddisk (but does not "break" besides not responding). The
>> network front-/backend is not affected by this communication loss,
AFAICT.
>>
>> The virtual host can be destroyed by an xm destroy, but the created
blktap2
>> interface does not disappear until the next reboot, and cannot be removed
by
>> the respective sysfs accesses (rather, echoing a 1 into "remove" blocks,
>> too, and is "unkillable", i.e. stays in kernel space). After a blktap2
>> device has entered this broken state, no more hosts can be created by xm
>> create (that blocks, too), and the host system must be rebooted to enter
a
>> usable state again.
>>
>> I've not been able to provoke this breakage by "normal" I/O (i.e., when
the
>> hosts run normally), but I have been able to provoke it by using bonnie,
>> which after a short period of substained read/write I/O of +120MB/s will
>> freeze the blktap2 device.
>>
>> The Dom0 and the DomU kernels that are being used are
xen-sources-2.6.32-r1
>> (which are the xen-stable 2.6.32.10 [11?] based OpenSuSE Xen-kernel
sources,
>> AFAIK) from the official portage tree; the kernel configuration that's in
>> use is attached.
>>
>> I've tried iommu=off for xen (the mobo doesn't support VT-d anyway, so
Xen
>> never turns it on), and I've also looked for any signs of errors
appearing
>> when setting verbosity 9 for the blktap2 module and loglvl=all and
>> guest_loglvl=all for Xen, but there are no errors that I've seen so far.
>>
>> Strace-ing the tapdisk2 process reveals that it's blocked on select(),
and
>> none of the descriptors it's polling on ever return as readable (which is
>> the condition that tapdisk2 queries), rather they always timeout after
600s.
>>
>> Thanks in advance for any hint as to what is causing this, or if there's
>> anything I might try to get things working...
>>
>> PS: I have to boot with acpi=off, as the mobo won't reboot when acpi is
>> turned on for Dom0 (not even when disabling ACPI reboots), but using acpi
>> directly doesn't change that blktap2 blocks.
>>
>> --- Heiko.
>>
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-users
>>
>
> I have had exactly the same problem and ended up going back to tapdisk1.
>
> I was able to replicate the problem using the entire SLE11-SP1 kernel
> source patch set which proves that the bug exists upstream,
> unfortunately I am very busy on other projects at the moment so did
> not have time to debug it at all.
>
> The SLE11-SP1 tree has been updated since xen-sources-2.6.32-r1, I
> will make a updated set of patches for you to try but it will take me
> a couple of days.
>
> Andy
>

Hi,

I have uploaded updated 2.6.32 patches and ebuild to
http://code.google.com/p/gentoo-xen-kernel/downloads/list, note that
patches should be applied to 2.6.32.13.

They should be added to portage in a few days time, provided no
problems are found.

Andy

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.