[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] IRQ: issues with directed EOI and IO-APIC ack methods
On 13/02/12 16:53, Keir Fraser wrote: > On 13/02/2012 16:03, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx> wrote: > >> Hello, >> >> XenServer6.0 (Xen 4.1.1) has had a support escalation against it for >> Cisco C210 M2 servers. I do not have access to any of these servers, so >> cant debug the issue myself. >> >> The pcpu LAPICs support EOI Broadcast suppression and Xen enabled it. >> In arch/x86/apic.c:verify_local_APIC, there is a comment stating that >> directed EOI support must use the old IO-APIC ack method. > Well, it's not surprising that some systems won't like this method. Firstly, > calling the LAPIC feature 'directed EOI' is misleading. The feature is 'EOI > broadcast suppression' -- specifically, EOI to the LAPIC does not cause EOI > to the IO-APIC, instead the IO-APIC has to be manually EOIed as a separate > operation. Yes - I had noticed the naming discrepancy but decided that fixing the issue was more important than arguing over naming at this point. > Now, not all IO-APICs directly support this. See io_apic.c:__io_apic_eoi() > -- if the IO-APIC does not have an EOI register, then an EOI is forced in a > slightly gross way. I wonder how reliable that is across a broad range of > chipsets; reliable enough to rely on it for *every* interrupt? ;-) When I wrote __io_apic_eoi(), it was based on the comment about an erratum on the 82093AA chipset. The comment appears in io_apic.c:mask_and_ack_level_ioapic_irq and end_level_ioapic_irq. Before my patch, Xen assumed that every IOAPIC had an EOI register which was causing issues on older chipsets. However, it was still using this hack about flipping the trigger mode because of the erratum, which is why Xen was only encountering problems as a race condition when migrating pirqs. The IO-APICs in question advertise their version as 0x20 so should an EOI register. However, I can't find a chip number reference so I cant find a definite specification for the chip in question. It appears that Citrix does have an almost identical server so I am currently negotiating for access to it so I can debug this issue properly. I have attached a patch which prevents the advertisement of EOI Broadcast Suppression from overriding what the user specifies on the command line. ~Andrew > Cc'ing the patch author Edwin Zhai. If it can't be resolved with Intel, I'm > personally quite happy to see the original patch reverted. > > -- Keir > >> A hypervisor with this check disabled (i.e. never checking for, or >> enabling directed EOI) seems to make the system stable again (5 days >> stable now, as opposed to a hang due to lost interrupts once every few >> hours before). >> >> First of all, I have discovered that forcing "ioapic_ack=new" does not >> have the indented effect, because verify_local_APIC trashes it, even if >> the user has specified the ack method. I intend to send a patch to fix >> this in due course. >> >> However, as for the main issue, I cant work out any logical reason why >> directed EOI would not work with the new ack mode. I am still trying to >> work out the differences in the code path incase I have missed something >> subtle, but I wondered if anyone on the list has more knowledge of these >> intricacies than me? Either way, it appears that there is a bug on the >> codepath with directed EOI and old ack method. >> >> Thanks in advance, > -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com Attachment:
ioapic-ack-fix.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |