[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: "x86 don't change affinity with interrupt unmasked", APCI errors and assorted pci trouble



Monday, March 30, 2015, 1:04:26 PM, you wrote:

> On 28/03/15 20:10, Sander Eikelenboom wrote:
>> Saturday, March 28, 2015, 6:30:39 PM, you wrote:
>>
>>> On 28/03/15 15:34, Sander Eikelenboom wrote:
>>>> Hi Jan,
>>>>
>>>> Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622:
>>>> "x86 don't change affinity with interrupt unmasked",
>>>> gives trouble on my AMD box, symptoms:
>>>> - APIC errors in xl dmesg that weren't previously there:
>>>>   (XEN) [2015-03-26 20:35:37.085] IOAPIC[0]: Set PCI routing entry (6-13 
>>>> -> 0x88 -> IRQ 13 Mode:0 Active:0)
>>>>   (XEN) [2015-03-26 20:35:37.101] PCI: Using MCFG for segment 0000 bus 
>>>> 00-ff
>>>>   (XEN) [2015-03-26 20:35:37.097] IOAPIC[0]: Set PCI routing entry (6-8 -> 
>>>> 0x58 -> IRQ 8 Mode:0 Active:0)
>>>>   (XEN) [2015-03-26 20:35:37.112] IOAPIC[0]: Set PCI routing entry (6-18 
>>>> -> 0xb8 -> IRQ 18 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.189] IOAPIC[0]: Set PCI routing entry (6-17 
>>>> -> 0xc0 -> IRQ 17 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-29 
>>>> -> 0xc8 -> IRQ 53 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-24 
>>>> -> 0xd0 -> IRQ 48 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-30 
>>>> -> 0xd8 -> IRQ 54 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-12 
>>>> -> 0x21 -> IRQ 36 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.420] IOAPIC[1]: Set PCI routing entry (7-13 
>>>> -> 0x29 -> IRQ 37 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.421] IOAPIC[1]: Set PCI routing entry (7-16 
>>>> -> 0x31 -> IRQ 40 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.495] IOAPIC[1]: Set PCI routing entry (7-28 
>>>> -> 0x39 -> IRQ 52 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.498] IOAPIC[0]: Set PCI routing entry (6-16 
>>>> -> 0x89 -> IRQ 16 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.498] IOAPIC[1]: Set PCI routing entry (7-14 
>>>> -> 0xa9 -> IRQ 38 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:37.548] IOAPIC[0]: Set PCI routing entry (6-22 
>>>> -> 0xb9 -> IRQ 22 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:39.620] IOAPIC[1]: Set PCI routing entry (7-9 -> 
>>>> 0xc1 -> IRQ 33 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:39.646] IOAPIC[1]: Set PCI routing entry (7-8 -> 
>>>> 0xc9 -> IRQ 32 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:39.647] IOAPIC[1]: Set PCI routing entry (7-23 
>>>> -> 0xd1 -> IRQ 47 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:41.732] IOAPIC[1]: Set PCI routing entry (7-5 -> 
>>>> 0xd9 -> IRQ 29 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:41.779] IOAPIC[1]: Set PCI routing entry (7-4 -> 
>>>> 0x22 -> IRQ 28 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:41.803] mm.c:803: d0: Forcing read-only access 
>>>> to MFN fed00
>>>>   (XEN) [2015-03-26 20:35:41.894] IOAPIC[0]: Set PCI routing entry (6-19 
>>>> -> 0x2a -> IRQ 19 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:42.057] IOAPIC[1]: Set PCI routing entry (7-22 
>>>> -> 0x72 -> IRQ 46 Mode:1 Active:1)
>>>>   (XEN) [2015-03-26 20:35:42.093] IOAPIC[1]: Set PCI routing entry (7-27 
>>>> -> 0x8a -> IRQ 51 Mode:1 Active:1)
>>>>
>>>>   these:
>>>>   (XEN) [2015-03-26 20:35:42.205] APIC error on CPU0: 00(40)
>>>>   (XEN) [2015-03-26 20:35:42.372] APIC error on CPU0: 40(40)
>>>>
>>>>   (XEN) [2015-03-26 20:35:42.691] d0 attempted to change d0v1's CR4 flags 
>>>> 00000660 -> 00000760
>>>>   (XEN) [2015-03-26 20:35:42.691] IOAPIC[1]: Set PCI routing entry (7-1 -> 
>>>> 0x9a -> IRQ 25 Mode:1 Active:1)
>>>>
>>>>   and this one:
>>>>   (XEN) [2015-03-26 20:35:42.707] APIC error on CPU0: 40(40)
>>>>   (XEN) [2015-03-26 20:35:43.958] d0 attempted to change d0v0's CR4 flags 
>>>> 00000660 -> 00000760
>>>>   (XEN) [2015-03-26 20:35:43.970] d0 attempted to change d0v2's CR4 flags 
>>>> 00000660 -> 00000760
>>>>   (XEN) [2015-03-26 20:35:43.988] d0 attempted to change d0v3's CR4 flags 
>>>> 00000660 -> 00000760
>>>>   (XEN) [2015-03-26 20:35:43.992] d0 attempted to change d0v4's CR4 flags 
>>>> 00000660 -> 00000760
>>>>   (XEN) [2015-03-26 20:35:43.996] d0 attempted to change d0v5's CR4 flags 
>>>> 00000660 -> 00000760
>>>>   (d1) [2015-03-26 20:40:42.220] mapping kernel into physical memory
>>>>   (d1) [2015-03-26 20:40:42.220] about to get started...
>>>>
>>>>
>>>> - random failures on dom0 SATA devices, the SATA controller is using 
>>>> multiple MSI 
>>>>   interrupts.
>>>>
>>>> - failues on XHCI controllers passed through to a HVM guest which uses 
>>>> MSI-X
>>>>   interrupts. Leading to these in the guest dmesg:
>>>>   [  350.246548] xhci_hcd 0000:00:05.0: Looking for event-dma 
>>>> 000000003cdf7140 trb-start 000000003cdf7240 trb-end 000000003cdf7240 
>>>> seg-start 000000003cdf7000 seg-end 000000003cdf73f0
>>>>   [  350.246548] xhci_hcd 0000:00:05.0: ERROR Transfer event TRB DMA ptr 
>>>> not part of current TD ep_index 1 comp_code 1
>>>>   [  350.246548] xhci_hcd 0000:00:05.0: Looking for event-dma 
>>>> 000000003cdf7150 trb-start 000000003cdf7240 trb-end 000000003cdf7240 
>>>> seg-start 000000003cdf7000 seg-end 000000003cdf73f0
>>>>   [  350.246548] xhci_hcd 0000:00:05.0: ERROR Transfer event TRB DMA ptr 
>>>> not part of current TD ep_index 1 comp_code 1
>>>>
>>>>
>>>> Reverting this specific commit makes all the troubles go away ..
>>> That is unfortunate, as conceptually the identified patch definitely
>>> fixes a bug.
>>> The "APIC error" messages have bit 6 set, which is "Receive Illegal
>>> Vector".  i.e. a device has attempted to deliver an interrupt with a
>>> vector field less than 16.  I presume that this means that the device is
>>> ending up with a malformed data field programmed into it.
>>> Can you identify the PCI sbdf's of the problematic devices, and collect
>>> debug-keys Q, M and i on a working system so I can identify precisely
>>> which of the MSI interrupt drivers is in use (Xen has several, depending
>>> on exact hardware circumstance).  If you can, the same debug-keys with
>>> the problematic changeset present might also be interesting.
>>> ~Andrew
>>
>> Hi Andrew,
>>
>> The passed through xhci is 08:00.0
>> The SATA controller is 00:11.0
>>
>> Most clear failure is on the xhci controller.
>>
>> The working and not working config only differ in the revert of the 
>> mentioned 
>> commit.
>>
>> Attached are:
>>
>> - lspci in dom0 of the working config 
>> - serial-log of the working config (with debug-keys Q, M and i after full 
>> boot 
>>   and guest start)
>> - serial-log of the not working config (with debug-keys Q, M and i after 
>> full 
>> boot and guest start)

> Thanks.

> As an utter longshot, can you give this patch a try?  Could you also see
> about capturing an lspci in dom0 while the bad situation is manifesting
> itself?

> ~Andrew

Hi Andrew,

lspci of the not working case attached, there are some differences
compared to the working case, but on other device than i expected.
(btw i'm running with the ivrs_ioapic[6]=00:14.0 override due to 
the bios tables not properly specifying the SB ioapic.)

I tried the patch, but couldn't notice any difference,
lspci output was exactly the same as of the not working case
that is attached.

--
Sander

> diff --git a/xen/drivers/passthrough/amd/iommu_intr.c
> b/xen/drivers/passthrough/amd/iommu_intr.c
> index c1b76fb..439ba05 100644
> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> @@ -529,10 +529,12 @@ int amd_iommu_msi_msg_update_ire(
>      } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) );
>  
>      if ( !rc )
> +    {
>          for ( i = 1; i < nr; ++i )
>              msi_desc[i].remap_index = msi_desc->remap_index + i;
+        msg->>data = data;
> +    }
>  
-    msg->>data = data;
>      return rc;
>  }

Attachment: lspci-dom0-not-working.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.