[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables
On 17.09.2019 15:10, Andrew Cooper wrote: > On 06/08/2019 14:09, Jan Beulich wrote: >> ACPI tables are free to list far more device coordinates than there are >> actual devices. By delaying the table allocations for most cases, and >> doing them only when an actual device is known to be present at a given >> position, overall memory used for the tables goes down from over 500k >> pages to just over 1k ones. > > This is slightly awkward grammar. While I don't think it is strictly > speaking incorrect, it would be more normal to phrase as "just over 1k > pages". Changed, albeit to me the double "pages" sounds odd as well. Would "of them" be any better than "ones"? >> --- >> TBD: This retains prior (but suspicious) behavior of not calling >> amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries. >> Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs >> changing. > > How would an invalid entry get DTE.V set in the first place? DTE.V gets set by amd_iommu_set_root_page_table(), which in turn gets called from amd_iommu_setup_domain_device() alone. It's only the latter function's callers which obtain (and possibly check) the corresponding IVRS mappings entry. Hence to me there's a sufficient disconnect between setting of DTE.V and DTE.IV. Plus, looking at e.g. amd_iommu_add_device(), there's ample room for not even making it to amd_iommu_set_intremap_table(), due to earlier errors. > Whatever the old behaviour may have been, we shouldn't have code which > comes with a chance of having non-remapped interrupts by accident. We can't make amd_iommu_set_root_page_table() set dte->iv to 1, as it gets called only after amd_iommu_set_intremap_table() in amd_iommu_add_device(). But we could of course make it do so when it finds dte->it_root be zero. Yet I wonder if it wasn't more safe to have DTEs start out with the field set this way. Along these lines I'm also not convinced it is a good idea for amd_iommu_set_intremap_table() to have a "valid" parameter in the first place: It's okay as long as all callers pass iommu_intremap, but it would seem to me that - as already said - we'd want DTEs be set this way right when a DT gets allocated. If you agree, I'll happily add a patch doing so to the end of this series (there's meanwhile a patch re-arranging DT allocation anyway). >> @@ -1266,13 +1270,46 @@ static int __init amd_iommu_setup_device >> if ( ivrs_mappings[bdf].valid ) >> { >> void *dte; >> + const struct pci_dev *pdev = NULL; >> >> /* add device table entry */ >> dte = device_table.buffer + (bdf * >> IOMMU_DEV_TABLE_ENTRY_SIZE); >> iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]); >> >> + if ( iommu_intremap && >> + ivrs_mappings[bdf].dte_requestor_id == bdf && >> + !ivrs_mappings[bdf].intremap_table ) >> + { >> + if ( !pci_init ) >> + continue; >> + pcidevs_lock(); >> + pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf)); >> + pcidevs_unlock(); >> + } >> + >> + if ( pdev ) >> + { >> + unsigned int req_id = bdf; >> + >> + do { >> + ivrs_mappings[req_id].intremap_table = >> + amd_iommu_alloc_intremap_table( >> + ivrs_mappings[bdf].iommu, >> + &ivrs_mappings[req_id].intremap_inuse); >> + if ( !ivrs_mappings[req_id].intremap_table ) >> + return -ENOMEM; >> + >> + if ( !pdev->phantom_stride ) >> + break; >> + req_id += pdev->phantom_stride; >> + } while ( PCI_SLOT(req_id) == pdev->sbdf.dev ); >> + } >> + >> amd_iommu_set_intremap_table( >> - dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table), >> + dte, >> + ivrs_mappings[bdf].intremap_table >> + ? virt_to_maddr(ivrs_mappings[bdf].intremap_table) >> + : 0, > > Under what circumstances can ivrs_mappings[bdf].intremap_table be NULL, > given the ENOMEM above? The "ivrs_mappings[bdf].dte_requestor_id == bdf" part of the conditional around the setting of pdev can result in allocation (and hence the ENOMEM error path) to be bypassed. > This case isn't very clear cut given the !pdev possibility, but... > >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> @@ -458,6 +459,32 @@ static int amd_iommu_add_device(u8 devfn >> return -ENODEV; >> } >> >> + ivrs_mappings = get_ivrs_mappings(pdev->seg); >> + bdf = PCI_BDF2(pdev->bus, devfn); >> + if ( !ivrs_mappings || >> + !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid ) >> + return -EPERM; >> + >> + if ( iommu_intremap && >> + ivrs_mappings[bdf].dte_requestor_id == bdf && >> + !ivrs_mappings[bdf].intremap_table ) >> + { >> + ivrs_mappings[bdf].intremap_table = >> + amd_iommu_alloc_intremap_table( >> + iommu, &ivrs_mappings[bdf].intremap_inuse); >> + if ( !ivrs_mappings[bdf].intremap_table ) >> + return -ENOMEM; >> + >> + amd_iommu_set_intremap_table( >> + iommu->dev_table.buffer + (bdf * >> IOMMU_DEV_TABLE_ENTRY_SIZE), >> + ivrs_mappings[bdf].intremap_table >> + ? virt_to_maddr(ivrs_mappings[bdf].intremap_table) >> + : 0, > > ... this case definitely cannot occur. Oh, missing cleanup after copy-and-paste. Thanks for noticing. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |