[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] IO-APIC line level race condition



Sadly, we have discovered another line level interrupt race condition in
Xen-4.1.  The result was that an outstanding un-eoi'd interrupt at the
IO-APIC resulted in the mptsas controller offlining the root filesystem.

This is now two separate IO-APIC bugs found recently.

1) Cisco C210 M2 server - EOI Broadcast Suppression, io_apci_ack=old
2) Dell R710 - No EOI Broadcast Suppression, io_apic_ack=new

Both servers use IO-APIC version 0x20 and have an mptsas controller for
their disks, using Legacy PCI line level interrupts.  Workload on both
servers appear to have more active vcpus

Case 1 is now considered stable by the customer after I provided a
private fix which caused Xen to never consider turning on EOI Broadcast
Suppression.  I have re-attached a patch which allows this problem to be
"fixed" by specifying "ioapic_ack=new" on the command line, rather than
requiring a patch and recompile of Xen.

Case 2 has only been seen once (this morning) so we currently have no
idea as to its reproducibility.  However, given that this hardware is
fairly common in our test infrastructure, i would say that it is fairly
rare.

With Case1 and the new patch, Case1 becomes the same as Case2 with
respect to IO-APIC setup, presumably meaning that the Case2 bug still
exists with Case1.


I will start working on cleaning up the IO-APIC code as soon as I can,
as reducing the unnecessary complexity should make race conditions like
this easier to find.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

Attachment: ioapic-ack-fix.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.