[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622: "x86 don't change affinity with interrupt unmasked", APCI errors and assorted pci trouble
Friday, April 17, 2015, 1:43:32 PM, you wrote: >>>> On 14.04.15 at 14:46, <linux@xxxxxxxxxxxxxx> wrote: >> I just had a hunch .. could it be related to the kernel apci/irq refactoring >> series of Jiang Liu, that already caused a lot of trouble in 3.17, 3.18 and >> 3.19 >> with Xen. And yes that seems to be the case: >> >> On Xen without "x86 don't change affinity with interrupt unmasked" >> - 3.16 && 3.19 && 4.0 all work fine >> >> On Xen with "x86 don't change affinity with interrupt unmasked" >> - 3.16 (which is before that kernel refactoring series) works fine. >> - 3.19, 4.0 both give the dom0 kernel hangs and the : >> (XEN) [2015-03-26 20:35:42.205] APIC error on CPU0: 00(40) >> (XEN) [2015-03-26 20:35:42.372] APIC error on CPU0: 40(40) >> >> (haven't tested 3.17 and 3.18 because these have asorted problems due that >> series that weren't fixed in time before stable updates ended.) >> >> So it seems Jan's patch seems to interfere with that patch series. > That's rather odd a finding - the patch in question in fact uncovered > a bug introduced in 2ca9fbd739 ("AMD IOMMU: allocate IRTE entries > instead of using a static mapping") in that IO-APIC RTE reads would > unconditionally translate the data (i.e. regardless of whether the > entry was already in translated format). The patch below fixes this > for me - can you please give this a try too? > Thanks, Jan > --- unstable.orig/xen/drivers/passthrough/amd/iommu_intr.c > +++ unstable/xen/drivers/passthrough/amd/iommu_intr.c > @@ -365,15 +365,17 @@ unsigned int amd_iommu_read_ioapic_from_ > unsigned int apic, unsigned int reg) > { > unsigned int val = __io_apic_read(apic, reg); > + unsigned int pin = (reg - 0x10) / 2; > + unsigned int offset = ioapic_sbdf[IO_APIC_ID(apic)].pin_2_idx[pin]; > > - if ( !(reg & 1) ) > + if ( !(reg & 1) && offset < INTREMAP_ENTRIES ) > { > - unsigned int offset = val & (INTREMAP_ENTRIES - 1); > u16 bdf = ioapic_sbdf[IO_APIC_ID(apic)].bdf; > u16 seg = ioapic_sbdf[IO_APIC_ID(apic)].seg; > u16 req_id = get_intremap_requestor_id(seg, bdf); > const u32 *entry = get_intremap_entry(seg, req_id, offset); > > + ASSERT(offset == (val & (INTREMAP_ENTRIES - 1))); > val &= ~(INTREMAP_ENTRIES - 1); > val |= get_field_from_reg_u32(*entry, > INT_REMAP_ENTRY_INTTYPE_MASK, Hmmm can this patch or tim's patch make andrew's patch ineffective ? I now have applied: Jan's: diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c index c1b76fb..879698e 100644 --- a/xen/drivers/passthrough/amd/iommu_intr.c +++ b/xen/drivers/passthrough/amd/iommu_intr.c @@ -365,15 +365,17 @@ unsigned int amd_iommu_read_ioapic_from_ire( unsigned int apic, unsigned int reg) { unsigned int val = __io_apic_read(apic, reg); + unsigned int pin = (reg - 0x10) / 2; + unsigned int offset = ioapic_sbdf[IO_APIC_ID(apic)].pin_2_idx[pin]; - if ( !(reg & 1) ) + if ( !(reg & 1) && offset < INTREMAP_ENTRIES ) { - unsigned int offset = val & (INTREMAP_ENTRIES - 1); u16 bdf = ioapic_sbdf[IO_APIC_ID(apic)].bdf; u16 seg = ioapic_sbdf[IO_APIC_ID(apic)].seg; u16 req_id = get_intremap_requestor_id(seg, bdf); const u32 *entry = get_intremap_entry(seg, req_id, offset); + ASSERT(offset == (val & (INTREMAP_ENTRIES - 1))); val &= ~(INTREMAP_ENTRIES - 1); val |= get_field_from_reg_u32(*entry, INT_REMAP_ENTRY_INTTYPE_MASK, Tim's: @@ -529,10 +531,11 @@ int amd_iommu_msi_msg_update_ire( } while ( PCI_SLOT(bdf) == PCI_SLOT(pdev->devfn) ); if ( !rc ) + { for ( i = 1; i < nr; ++i ) msi_desc[i].remap_index = msi_desc->remap_index + i; - - msg->data = data; + msg->data = data; + } return rc; } Andrew's: diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 9eb8d33..3aee00c 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -56,9 +56,9 @@ int arch_iommu_populate_page_table(struct domain *d) while ( !rc && (page = page_list_remove_head(&d->page_list)) ) { - if ( has_hvm_container_domain(d) || - (page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page ) - { + if ( (mfn_to_gmfn(d, page_to_mfn(page)) != INVALID_MFN) && + (has_hvm_container_domain(d) || + ((page->u.inuse.type_info & PGT_type_mask) == PGT_writable_page)) ) BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page)))); rc = hd->platform_ops->map_page( d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page), And i now have this one again (which Andrew's patch should prevent): (XEN) [2015-04-17 15:00:55.954] Xen call trace: (XEN) [2015-04-17 15:00:55.954] [<ffff82d080155f51>] iommu_pde_from_gfn+0x38/0x430 (XEN) [2015-04-17 15:00:55.954] [<ffff82d080156456>] amd_iommu_map_page+0x10d/0x4e6 (XEN) [2015-04-17 15:00:55.954] [<ffff82d08015a93d>] arch_iommu_populate_page_table+0x179/0x4d8 (XEN) [2015-04-17 15:00:55.954] [<ffff82d08014ca61>] iommu_do_pci_domctl+0x395/0x604 (XEN) [2015-04-17 15:00:55.954] [<ffff82d08014942b>] iommu_do_domctl+0x17/0x1a (XEN) [2015-04-17 15:00:55.954] [<ffff82d080161f70>] arch_do_domctl+0x24ac/0x2724 (XEN) [2015-04-17 15:00:55.954] [<ffff82d080104ae8>] do_domctl+0x1a98/0x1df0 (XEN) [2015-04-17 15:00:55.954] [<ffff82d0802349ab>] syscall_enter+0xeb/0x145 (XEN) [2015-04-17 15:00:55.954] (XEN) [2015-04-17 15:00:57.023] (XEN) [2015-04-17 15:00:57.032] **************************************** (XEN) [2015-04-17 15:00:57.051] Panic on CPU 5: (XEN) [2015-04-17 15:00:57.064] Xen BUG at iommu_map.c:455 (XEN) [2015-04-17 15:00:57.079] **************************************** _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |