[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Problems with MSI interrupts
On 03/08/11 12:51, Andrew Cooper wrote: > Hello, > > I am currently investigating an issue with MSI allocation/deallocation > which appears to be an MSI resource leak in Xen. This is XenServer 6.0 > based on Xen 4.1.1, with no changesets I can see affecting the relevant > Xen codepaths. > > The box in question is a Netscalar SDX box with 24 logical cores (2 > Nehalem sockets , 6 cores , hyperthreading), 96GB RAM, with 4 dual-port > Intel 10G ixgbe cards, (and two SSL 'Xcelerator' cards, but I have > disabled these for debugging purposes). Each of the 8 NIC ports exports > 40 virtual functions. There are 40 (identical) VMs which have 1 VF from > each NIC passed through to them, giving each VM 8 VFs. Each VF itself > uses 3 MSI-X interrupts. Therefore, for all VMs to be working > correctly, there are 3irqs per VF for 8 VFs for 40 VMs = 960 MSI-X > interrupts. > > The symptoms are: Reboot the VMs a couple of times, and eventually Xen > says "(XEN) ../physdev.c:140: domXXX: can't create irq for msi!". After > adding extra debugging, the call call to create_irq() was returning > -ENOSPC. At the point at which create_irq() was failing, there were > huge numbers of irqs listed with the debugkeys 'i' with a descriptor > affinity mask of all cpus, which I believe is interfering with the > calculations in __assign_irq_vector(). > > I suspected that this might be because of scheduling under load swapping > VCPUs across PCPUs, resulting in the irq descriptor being written into > all PCPU IDTs. As a result, I pinned each VM to a specific PCPU in the > hope that this would go away. > > When starting each VM individually, the problem appears to go away. > However, when starting all VMs at once, there are still some irqs with > an affinity mask of all CPUs. > > Specifically, one case is this: (I added extra debugging to put > irq_cfg->cpu_mask into the 'i' debugkeys) > > (XEN) IRQ: 845 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000010 vec:7e type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 55(----), > (XEN) IRQ: 846 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:86 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 54(----), > (XEN) IRQ: 847 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:96 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 53(----), > (XEN) IRQ: 848 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:be type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 52(----), > (XEN) IRQ: 849 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:c6 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 51(----), > (XEN) IRQ: 850 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:ce type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 50(----), > (XEN) IRQ: 851 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:b7 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 49(----), > (XEN) IRQ: 852 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:cf type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 48(----), > (XEN) IRQ: 853 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:d7 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 47(----), > (XEN) IRQ: 854 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:d9 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 46(----), > (XEN) IRQ: 855 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:22 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 45(----), > (XEN) IRQ: 856 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:2a type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 44(----), > (XEN) IRQ: 857 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000010 vec:3c type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 43(----), > (XEN) IRQ: 858 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:4c type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 42(----), > (XEN) IRQ: 859 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:54 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 41(----), > (XEN) IRQ: 860 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:b5 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 40(----), > (XEN) IRQ: 861 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:ae type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 39(----), > (XEN) IRQ: 862 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:de type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 38(----), > (XEN) IRQ: 863 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000010 vec:55 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 37(----), > (XEN) IRQ: 864 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:9d type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 36(----), > (XEN) IRQ: 865 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:46 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 35(----), > (XEN) IRQ: 866 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:a6 type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 34(----), > (XEN) IRQ: 867 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:5f type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 33(----), > (XEN) IRQ: 868 desc_aff:ffffffff,ffffffff,ffffffff,ffffffff > cfg_aff:00000000,00000000,00000000,00000020 vec:7f type=PCI-MSI > status=00000050 in-flight=0 domain-list=34: 32(----), > > Shows all irqs for dom34. The descriptors have full affinity, but the > irq_cfg has a cpu_mask between processor 8 and 9. > > The domain dump for dom34 is > (XEN) General information for domain 34: > (XEN) refcnt=3 dying=0 nr_pages=131065 xenheap_pages=8 dirty_cpus={} > max_pages=133376 > (XEN) handle=97ef6eef-69c2-024c-1bbb-a150ca668691 vm_assist=00000000 > (XEN) paging assistance: hap refcounts translate external > (XEN) Rangesets belonging to domain 34: > (XEN) I/O Ports { } > (XEN) Interrupts { 32-55 } > (XEN) I/O Memory { f9f00-f9f03, fa001-fa003, fa19c-fa19f, > fa29d-fa29f, fa39c-fa39f, fa49d-fa49f, fa59c-fa59f, fa69d-fa69f, > fa79c-fa79f, fa89d-fa89f, fa99c-fa99f, faa9d-faa9f, fab9c-fab9f, > fac9d-fac9f, fad9c-fad9f, fae9d-fae9f } > (XEN) Memory pages belonging to domain 34: > (XEN) DomPage list too long to display > (XEN) P2M entry stats: > (XEN) L1: 1590 entries, 6512640 bytes > (XEN) L2: 253 entries, 530579456 bytes > (XEN) PoD entries=0 cachesize=0 superpages=0 > (XEN) XenPage 00000000001146e1: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 00000000001146e0: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 00000000001146df: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 00000000001146de: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 00000000000bdc0e: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 0000000000114592: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 000000000011458f: caf=c000000000000001, > taf=7400000000000001 > (XEN) XenPage 000000000011458c: caf=c000000000000001, > taf=7400000000000001 > (XEN) VCPU information and callbacks for domain 34: > (XEN) VCPU0: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00, > upcall_mask = 00 dirty_cpus={} cpu_affinity={3} > (XEN) paging assistance: hap, 4 levels > (XEN) No periodic timer > (XEN) VCPU1: CPU3 [has=F] flags=1 poll=0 upcall_pend = 00, > upcall_mask = 00 dirty_cpus={3} cpu_affinity={3} > (XEN) paging assistance: hap, 4 levels > (XEN) No periodic timer > > Showing that this domain is actually pinned to pcpu 3. > > Am I mis-interpreting the information, or does this indicate that the > scheduler (credit) is not obeying the cpu_affinity? The virtual > functions seem to be passing network traffic correctly so I would assume > that interrupts are getting where they are supposed to be going. > > > Another question which may or may not be related. cpu_cfg has a vector > and a cpu_mask. From this, I assume that the same interrupt must occupy > the same IDT entry for every pcpu it might be received on. Is there an > architectural reason why this should be the case, or is it just the way > Xen is coded? > > (Also, it seems that <asm/irq.h> and <xen/irq.h> both define struct > irq_cfg and while one is strictly an extension of the other, there > appears to be no guards around them meaning that sizeof(irq_cfg) depends > on which header file you include. I don't know if this is relevant or > not, but it strikes me that code getting confused as to which they are > using could be computing on junk if it is expecting the longer irq_cfg > and actually getting the shorter irq_cfg. Correction - I wasn't reading the source closely enough. There are #ifdef __ia64__ guards around this. -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |