[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pv-ops domU not working with MSI interrupts on Nehalem



On Mon, Sep 27, 2010 at 12:16:50PM -0700, Bruce Edge wrote:
> On Mon, Sep 27, 2010 at 10:24 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@xxxxxxxxxx> wrote:
> >
> > On Mon, Sep 27, 2010 at 08:52:39AM -0700, Bruce Edge wrote:
> > > One of our developers who is working on a tachyon driver is
> > > complaining that the pvops domU kernel is not working for these MSI
> > > interrupts.
> > > This is using the current head of xen/2.6.32.x on both a single
> > > Nahelam 920 and a dual E5540. This behavior is consistent with Xen
> > > 4.0.1, 4.0.2.rc1-pre and 4.1.
> > >
> > > Here are his comments:
> > >
> > > - the driver has no problem to enable msi interrupt and request the
> > > interrupt through kernel functions pci_enable_msi & request_irq
> >
> > What shows up in the Xen console when you send the 'q' key? Does it
> > show that the vector is assigned to the appropiate guest?
> 
> The Xen console q key shows that the domU is assigned:
> 
> (XEN)     Interrupts { 32, 41-42, 47 }

Aha!

> 
> but the domU thinks it has:
> 
> 124/125/126/127
> 
> Is there some mapping that's taking place, or is this plain wrong?

That looks wrong. The IRQ numbers (even though they are MSI vectors) are
setup as IRQ numbers in the DomU guest. You should have seen

32:
41:
42:
47:
in you /proc/interrupts on your DomU guest.

I wonder what broke  - can you use 
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
devel/xen-pcifront-0.5 (or pv/pcifront-2.6.32)?

It has the latest pcifront driver but without the PVonHVM enhancments
so we can try to eliminate the PvONHVM logic out of the picture.

> 
> >
> > > - the interrupt does happen. But the interrupt service routine of
> > > tachyon driver doesn't detect any interrupt status related to this
> > > interrupt, which inhibits the tachyon chip from coming on-line. And
> > > there are high count of tachyon interrupt in /proc/interrupts
> >
> > Is it checking the PCI_STATUS_INTERRUPT or the appropiate register
> > in the MMIO BAR?
> >
> 
> The driver would check the appropriate register (tachyon registers) in
> the MMIO to determine the source of interrupts.

OK, so that isn't it. Is there anything at these vectors:
7c, 7d, 7e, and 7f? When you use xen debug-keys 'i' or 'q' it should give you
an inkling what device this is set for.

> 
> > >
> > > kaan-18-dpm:~# cat /proc/interrupts | grep TACH
> > > 124:     760415          0          0          0          0          0
> > >          0          0          0          0          0          0
> > >     0          0  xen-pirq-pcifront-msi  HW_TACHYON
> > > 125:     762234          0          0          0          0          0
> > >          0          0          0          0          0          0
> > >     0          0  xen-pirq-pcifront-msi  HW_TACHYON
> > > 126:     764180          0          0          0          0          0
> > >          0          0          0          0          0          0
> > >     0          0  xen-pirq-pcifront-msi  HW_TACHYON
> > > 127:     764164          0          0          0          0          0
> > >          0          0          0          0          0          0
> > >     0          0  xen-pirq-pcifront-msi  HW_TACHYON
> >
> > Can you provide the full dmesg output?
> 
> Attached.
> 
> Some possibly related messages on dom0 console:
> 
> [ 1882.269778] pciback 0000:07:00.0: enabling device (0000 -> 0003)
> [ 1882.269800] xen: registering gsi 32 triggering 0 polarity 1
> [ 1882.269827] xen_allocate_pirq: returning irq 32 for gsi 32
> [ 1882.269834] xen: --> irq=32
> [ 1882.269841] Already setup the GSI :32
> [ 1882.269847] pciback 0000:07:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 
> 32
> [ 1882.269866] pciback 0000:07:00.0: setting latency timer to 64
> [ 1882.270463] pciback 0000:07:00.0: Driver tried to write to a
> read-only configuration space field at offset 0x62, size 2. This may
> be harmless, but if you have problems with your device:

Uhhh, for that I think you need to do 'lspci -vvv -xxx -s 07:00.00'
to find out what is at the configuration space. You could enable
it using the permissive attribute.

> [ 1882.270465] 1) see permissive attribute in sysfs
> [ 1882.270467] 2) report problems to the xen-devel mailing list along
> with details of your device obtained from lspci.
> [ 1882.270615]   alloc irq_desc for 478 on node 0
> [ 1882.270625]   alloc kstat_irqs on node 0

So for 478: what do you see? xen-pciback I presume?
> [ 1882.348411] pciback 0000:07:00.1: enabling device (0000 -> 0003)
> [ 1882.348433] xen: registering gsi 42 triggering 0 polarity 1
> [ 1882.348440] xen_allocate_pirq: returning irq 42 for gsi 42
> [ 1882.348445] xen: --> irq=42
> [ 1882.348472] Already setup the GSI :42
> [ 1882.348479] pciback 0000:07:00.1: PCI INT B -> GSI 42 (level, low) -> IRQ 
> 42
> [ 1882.348497] pciback 0000:07:00.1: setting latency timer to 64
> [ 1882.349063] pciback 0000:07:00.1: Driver tried to write to a
> read-only configuration space field at offset 0x62, size 2. This may
> be harmless, but if you have problems with your device:
> [ 1882.349066] 1) see permissive attribute in sysfs
> [ 1882.349067] 2) report problems to the xen-devel mailing list along
> with details of your device obtained from lspci.
> [ 1882.349205]   alloc irq_desc for 477 on node 0
> [ 1882.349215]   alloc kstat_irqs on node 0
> [ 1882.402893] pciback 0000:07:00.2: enabling device (0000 -> 0003)
> [ 1882.402908] xen: registering gsi 47 triggering 0 polarity 1
> [ 1882.402913] xen_allocate_pirq: returning irq 47 for gsi 47
> [ 1882.402916] xen: --> irq=47
> [ 1882.402921] Already setup the GSI :47
> [ 1882.402925] pciback 0000:07:00.2: PCI INT C -> GSI 47 (level, low) -> IRQ 
> 47
> [ 1882.402938] pciback 0000:07:00.2: setting latency timer to 64
> [ 1882.403280] pciback 0000:07:00.2: Driver tried to write to a
> read-only configuration space field at offset 0x62, size 2. This may
> be harmless, but if you have problems with your device:
> [ 1882.403282] 1) see permissive attribute in sysfs
> [ 1882.403282] 2) report problems to the xen-devel mailing list along
> with details of your device obtained from lspci.
> [ 1882.403380]   alloc irq_desc for 476 on node 0
> [ 1882.403386]   alloc kstat_irqs on node 0
> (XEN) [VT-D]iommu.c:824: iommu_fault_status: Primary Pending Fault
> (XEN) [VT-D]iommu.c:799: DMAR:[DMA Write] Request device [07:00.0]
> fault addr e6f80000, iommu reg = ffff82c3fff57000
> (XEN) DMAR:[fault reason 05h] PTE Write access is not set
> (XEN) print_vtd_entries: iommu = ffff83019fffa370 bdf = 7:0.0 gmfn = e6f80
> (XEN)     root_entry = ffff83019ff70000
> (XEN)     root_entry[7] = 19cf52001
> (XEN)     context = ffff83019cf52000
> (XEN)     context[0] = 102_706dc005
> (XEN)     l4 = ffff8300706dc000
> (XEN)     l4_index = 0
> (XEN)     l4[0] = 706db003
> (XEN)     l3 = ffff8300706db000
> (XEN)     l3_index = 3
> (XEN)     l3[3] = 702b6003
> (XEN)     l2 = ffff8300702b6000
> (XEN)     l2_index = 137
> (XEN)     l2[137] = 0
> (XEN)     l2[137] not present
> (XEN) traps.c:466:d0 Unhandled nmi fault/trap [#2] on VCPU 0 [ec=0000]

That is not good. What changed from your earlier emails that this was 
triggered? Or
was it triggered all along? What happens if you run the system without the 
iommu enabled?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.