[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] megasas stops I/O when running kernel as dom0 under xen4.1/4.2
On 26/08/11 19:16, Andrew Cooper wrote: > On 24/08/11 18:20, Andrew Cooper wrote: >> On 24/08/11 18:09, Konrad Rzeszutek Wilk wrote: >>> On Wed, Aug 24, 2011 at 05:57:06PM +0100, Andrew Cooper wrote: >>>> On 24/08/11 13:06, Andrew Cooper wrote: >>>>> On 22/08/11 10:05, Andrew Cooper wrote: >>>>>> On 19/08/11 19:10, Andreas Olsowski wrote: >>>>>>> Am 19.08.2011 18:49, schrieb Andrew Cooper: >>>>>>> >>>>>>>> The only change you need to make is in megasas_probe_one() in >>>>>>>> megaraid_sas_base.c >>>>>>>> >>>>>>>> Add a call to pci_enable_msi(pdev) immediately after the current >>>>>>> call to >>>>>>>> pci_set_master(pdev); >>>>>>>> >>>>>>>> ~Andrew >>>>>>>> >>>>>>> Yep, that works fine. Removed the module option as well. >>>>>>> >>>>>>> root@tarballerina:~# cat /proc/interrupts |grep mega >>>>>>> 2236: 69010 0 0 0 0 >>>>>>> 0 0 0 xen-pirq-msi megasas >>>>>>> >>>>>>> The same procedure that would have lead to almost instant errors has >>>>>>> not brought them to appear again. >>>>>>> >>>>>> Good. This is what we are seeing as well. I am still awaiting a reply >>>>>> from LSI on this topic. >>>>>> >>>>>> Unfortunately, this does point to a regression in the way Xen deals with >>>>>> legacy interrupts. >>>>> Out of interest, on all 3 of your boxes with the megaraid_sas cards, >>>>> could you gather the io_apic information? >>>>> >>>>> It is the z xen debug key on the serial console (or alternatively put >>>>> apic_verbosity=debug on the xen commandline and the information gets >>>>> dumped into the dmesg) >>>> You can ignore this - it is not relevant. >>>> >>>> I have narrowed the problem to a bug in the interrupt migration code. >>> Goodies! >>>> The bug occurs when the move pending flag is set, and somehow another >>>> interrupt comes in on the old pcpu without triggering the move >>>> completion code. This leaves the IO_APIC with ack'd but not EOI'd >>>> interrupt from the megaraid_sas device. >>> Ah, so the interrupt is delievered to Dom0 on the old per_cpu >>> event which is ignored. Ignored b/c we have rebinded the event channel >>> to the other CPU, right? >> The interrupt is not ignored - it seems to be being serviced by the >> device driver in dom0. I will admit that my debugging code may be a >> bit flaky - I started by trying to match IRQ35 (which is always claimed >> by PCI INTA on this server - very useful for debugging) between do_IRQ >> and its related PHYSDEVOP_eoi. >> >> I am currently trying to track the exact order of events around this >> interrupt which misses the move completion code. >> >>> Is there any code in the Hypervisor to turn off interrupt migration code? >> Not that I have found, although playing around with vcpu and task >> pinning should work. My debugging shows that Xen-4.1.1 is migrating >> this interrupt between PCPUs on average once every 4 real interrupts >> when dom0 is under any load whatsoever. >> > Please try attached patch. It is a hack, but it works as far as I can test. > > (Patch is taken against xen-4.1.1 but should be trivial to port if it > doesn't apply cleanly) > > ~Andrew > Apologies - previous patch fails to compile (i forgot to hg qrefresh before sending - it has been a very long day). Try this patch. ~Andrew -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com Attachment:
CA-65000-manually-eoi-migrating-irqs.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |