[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP ProLiant G6 with dual Xeon 5540 (Nehalem)



Xiantao,
With vcpus=16 (all CPUs) in domU, I'm able to change the IRQ smp_affinity to 
any one-hot value and see the interrupts routed to the specified CPU. Every now 
and then though, both domU and dom0 will permanently lockup (cold reboot 
required) after changing the smp_affinity. If I change it manually via 
command-line, it seems to be okay but if I change it within a script (such as 
shifting-left a walking "1" to test all 16 CPUs), it will lockup part way 
through the script.

Other observations:

The MSI message address/data in dom0 "lspci -vv" stays the same as well as the 
"interrupt guest information" from the Xen console even though I see the 
destination ID and vector change in domU "lspci -vv." You're probably expecting 
this behavior since you removed the set_affinity call in the last patch.

With vcpus=5, I can only change smp_affinity to 1. Any other value aside from 1 
or 1f (default) results in an instant, permanent lockup of both domU and dom0 
(Xen console still accessible). I also observed when I tried changing the 
smp_affinity of the first function of the 4-function PCI device to 2, the 3rd 
and 4th functions got masked:

(XEN)    IRQ: 66, IRQ affinity:0x00000001, Vec:186 type=PCI-MSI status=00000010 
in-flight=0 domain-list=1: 79(----)
(XEN)    IRQ: 67, IRQ affinity:0x00000001, Vec:194 type=PCI-MSI status=00000010 
in-flight=0 domain-list=1: 78(----)
(XEN)    IRQ: 68, IRQ affinity:0x00000001, Vec:202 type=PCI-MSI status=00000010 
in-flight=1 domain-list=1: 77(---M)
(XEN)    IRQ: 69, IRQ affinity:0x00000001, Vec:210 type=PCI-MSI status=00000010 
in-flight=1 domain-list=1: 76(---M)

In the above log, I had changed the smp_affinity for IRQ 66 but IRQ 68 and 69 
got masked.

Dante

-----Original Message-----
From: Zhang, Xiantao [mailto:xiantao.zhang@xxxxxxxxx] 
Sent: Friday, October 16, 2009 5:59 PM
To: Cinco, Dante; He, Qing
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Fraser; Fraser
Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP 
ProLiant G6 with dual Xeon 5540 (Nehalem)

 Dante
 It should be another issue as you described.  Can you try the following code 
to see whether it works for you ?  Just a try.  
Xiantao

diff -r 0705efd9c69e xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c    Fri Oct 16 09:04:53 2009 +0100
+++ b/xen/arch/x86/hvm/hvm.c    Sat Oct 17 08:48:23 2009 +0800
@@ -243,7 +243,7 @@ void hvm_migrate_pirqs(struct vcpu *v)
             continue;
         irq = desc - irq_desc;
         ASSERT(MSI_IRQ(irq));
-        desc->handler->set_affinity(irq, *cpumask_of(v->processor));
+        //desc->handler->set_affinity(irq, *cpumask_of(v->processor));
         spin_unlock_irq(&desc->lock);
     }
     spin_unlock(&d->event_lock);

-----Original Message-----
From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx 
[mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Cinco, Dante
Sent: Saturday, October 17, 2009 2:24 AM
To: Zhang, Xiantao; He, Qing
Cc: Keir; xen-devel@xxxxxxxxxxxxxxxxxxx; Fraser
Subject: RE: [Xen-devel] IRQ SMP affinity problems in domU with vcpus > 4 on HP 
ProLiant G6 with dual Xeon 5540 (Nehalem)

Xiantao,
I'm still losing the interrupts with your patch but I see some differences. To 
simplifiy the data, I'm only going to focus on the first function of my 
4-function PCI device.

After changing the IRQ affinity, the IRQ is not masked anymore (unlike before 
the patch). What stands out for me is the new vector (219) as reported by 
"guest interrupt information" does not match the vector (187) in dom0 lspci. 
Before the patch, the new vector in "guest interrupt information" matched the 
new vector in dom0 lspci (dest ID in dom0 lspci was unchanged). I also saw this 
message pop on the Xen console when I changed smp_affinity:

(XEN) do_IRQ: 1.187 No irq handler for vector (irq -1).

187 is the vector from dom0 lspci before and after the smp_affinity change but 
"guest interrupt information" reports the new vector is 219. To me, this looks 
like the new MSI message data (with vector=219) did not get written into the 
PCI device, right?

Here's a comparison before and after changing smp_affinity from ffff to 2 (dom0 
is pvops 2.6.31.1, domU is 2.6.30.1):

------------------------------------------------------------------------

/proc/irq/48/smp_affinity=ffff (default):

dom0 lspci: Address: 00000000fee00000  Data: 40bb (vector=187)

domU lspci: Address: 00000000fee00000  Data: 4071 (vector=113)

qemu-dm-dpm.log: pt_msi_setup: msi mapped with pirq 4f (79)
                 pt_msi_update: Update msi with pirq 4f gvec 71 gflags 0

Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000001, Vec:187 
type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----)

Xen console: (XEN) [VT-D]iommu.c:1289:d0 domain_context_unmap:PCIe: bdf = 7:0.0
             (XEN) [VT-D]iommu.c:1175:d0 domain_context_mapping:PCIe: bdf = 
7:0.0
             (XEN) [VT-D]io.c:301:d0 VT-d irq bind: m_irq = 4f device = 5 intx 
= 0
             (XEN) io.c:326:d0 pt_irq_destroy_bind_vtd: machine_gsi=79 
guest_gsi=36, device=5, intx=0
             (XEN) io.c:381:d0 XEN_DOMCTL_irq_unmapping: m_irq = 0x4f device = 
0x5 intx = 0x0

------------------------------------------------------------------------

/proc/irq/48/smp_affinity=2:

dom0 lspci: Address: 00000000fee10000  Data: 40bb (dest ID changed from 0 (APIC 
ID of CPU0) to 16 (APIC ID of CPU1), vector unchanged)

domU lspci: Address: 00000000fee02000  Data: 40b1 (dest ID changed from 0 (APIC 
ID of CPU0) to 2 (APIC ID of CPU1), new vector=177)

Guest interrupt information: (XEN) IRQ: 74, IRQ affinity:0x00000002, Vec:219 
type=PCI-MSI status=00000010 in-flight=0 domain-list=1: 79(----)

qemu-dm-dpm.log: pt_msi_update: Update msi with pirq 4f gvec 71 gflags 2
                 pt_msi_update: Update msi with pirq 4f gvec b1 gflags 2

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.