[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Instability with Xen, interrupt routing frozen, HPET broadcast
I am talking a while (via email) with Jan now to track the following problem and he suggested that I report the problem on xen-devel: Jul 9 01:48:04 virt kernel: aacraid: Host adapter reset request. SCSI hang ? Jul 9 01:49:05 virt kernel: aacraid: SCSI bus appears hung Jul 9 01:49:10 virt kernel: Calling adapter initJul 9 01:49:49 virt kernel: IRQ 16/aacraid: IRQF_DISABLED is not guaranteed on shared IRQs Jul 9 01:49:49 virt kernel: Acquiring adapter information Jul 9 01:49:49 virt kernel: update_interval=30:00 check_interval=86400sJul 9 01:53:13 virt kernel: aacraid: aac_fib_send: first asynchronous command timed out. Jul 9 01:53:13 virt kernel: Usually a result of a PCI interrupt routing problem; Jul 9 01:53:13 virt kernel: update mother board BIOS or consider utilizing one of Jul 9 01:53:13 virt kernel: the SAFE mode kernel options (acpi, apic etc)After the VMs have been running a while the aacraid driver reports a non-responding RAID controller. Most of the time the NIC is also no longer working. I nearly tried every combination of dom0 kernel (pvops0, xenfied suse 2.6.31.x, xenfied suse 2.6.32.x, xenfied suse 2.6.34.x) with Xen hypervisor 3.4.2, 3.4.4-cs19986, 4.0.1, unstable. No success in two month. Every combination earlier or later had the problem shown above. I did extensive tests to make sure that the hardware is OK. And it is - I am sure it is a Xen/dom0 problem. Jan suggested to try the fix in c/s 22051 but it did not help. My answer to him: > In the meantime I did try xen-unstable c/s 22068 (contains staging c/s 22051) and > it did not fix the problem at all. I was able to fix a problem with the serial console > and so I got some debug info that is attached to this email. The following line looks > suspicious to me (irr=1, delivery_status=1): > (XEN) IRQ 16 Vec216:> (XEN) Apic 0x00, Pin 16: vector=216, delivery_mode=1, dest_mode=logical, > delivery_status=1, polarity=1, irr=1, trigger=level, mask=0, dest_id:1 > IRQ 16 is the aacraid controller which after some while seems to be enable to receive > interrupts. Can you see from the debug info what is going on?I also applied a small patch which disables HPET broadcast. The machine is now running for 110 hours without a crash while normally it crashes within a few minutes. Is there something wrong (race, deadlock) with HPET broadcasts in relation to blocked interrupt reception (see above)? Andreas Attachment:
xen-nohpet-broadcast.patch Attachment:
jan-debugkeys.txt _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |