|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 06/10] AMD/IOMMU: don't blindly allocate interrupt remapping tables
On 17.09.2019 15:10, Andrew Cooper wrote:
> On 06/08/2019 14:09, Jan Beulich wrote:
>> ACPI tables are free to list far more device coordinates than there are
>> actual devices. By delaying the table allocations for most cases, and
>> doing them only when an actual device is known to be present at a given
>> position, overall memory used for the tables goes down from over 500k
>> pages to just over 1k ones.
>
> This is slightly awkward grammar. While I don't think it is strictly
> speaking incorrect, it would be more normal to phrase as "just over 1k
> pages".
Changed, albeit to me the double "pages" sounds odd as well. Would
"of them" be any better than "ones"?
>> ---
>> TBD: This retains prior (but suspicious) behavior of not calling
>> amd_iommu_set_intremap_table() for "invalid" IVRS mapping entries.
>> Since DTE.IV=0 means un-remapped interrupts, I wonder if this needs
>> changing.
>
> How would an invalid entry get DTE.V set in the first place?
DTE.V gets set by amd_iommu_set_root_page_table(), which in turn gets
called from amd_iommu_setup_domain_device() alone. It's only the
latter function's callers which obtain (and possibly check) the
corresponding IVRS mappings entry. Hence to me there's a sufficient
disconnect between setting of DTE.V and DTE.IV.
Plus, looking at e.g. amd_iommu_add_device(), there's ample room for
not even making it to amd_iommu_set_intremap_table(), due to earlier
errors.
> Whatever the old behaviour may have been, we shouldn't have code which
> comes with a chance of having non-remapped interrupts by accident.
We can't make amd_iommu_set_root_page_table() set dte->iv to 1, as
it gets called only after amd_iommu_set_intremap_table() in
amd_iommu_add_device(). But we could of course make it do so when
it finds dte->it_root be zero. Yet I wonder if it wasn't more safe
to have DTEs start out with the field set this way.
Along these lines I'm also not convinced it is a good idea for
amd_iommu_set_intremap_table() to have a "valid" parameter in the
first place: It's okay as long as all callers pass iommu_intremap,
but it would seem to me that - as already said - we'd want DTEs be
set this way right when a DT gets allocated. If you agree, I'll
happily add a patch doing so to the end of this series (there's
meanwhile a patch re-arranging DT allocation anyway).
>> @@ -1266,13 +1270,46 @@ static int __init amd_iommu_setup_device
>> if ( ivrs_mappings[bdf].valid )
>> {
>> void *dte;
>> + const struct pci_dev *pdev = NULL;
>>
>> /* add device table entry */
>> dte = device_table.buffer + (bdf *
>> IOMMU_DEV_TABLE_ENTRY_SIZE);
>> iommu_dte_add_device_entry(dte, &ivrs_mappings[bdf]);
>>
>> + if ( iommu_intremap &&
>> + ivrs_mappings[bdf].dte_requestor_id == bdf &&
>> + !ivrs_mappings[bdf].intremap_table )
>> + {
>> + if ( !pci_init )
>> + continue;
>> + pcidevs_lock();
>> + pdev = pci_get_pdev(seg, PCI_BUS(bdf), PCI_DEVFN2(bdf));
>> + pcidevs_unlock();
>> + }
>> +
>> + if ( pdev )
>> + {
>> + unsigned int req_id = bdf;
>> +
>> + do {
>> + ivrs_mappings[req_id].intremap_table =
>> + amd_iommu_alloc_intremap_table(
>> + ivrs_mappings[bdf].iommu,
>> + &ivrs_mappings[req_id].intremap_inuse);
>> + if ( !ivrs_mappings[req_id].intremap_table )
>> + return -ENOMEM;
>> +
>> + if ( !pdev->phantom_stride )
>> + break;
>> + req_id += pdev->phantom_stride;
>> + } while ( PCI_SLOT(req_id) == pdev->sbdf.dev );
>> + }
>> +
>> amd_iommu_set_intremap_table(
>> - dte, virt_to_maddr(ivrs_mappings[bdf].intremap_table),
>> + dte,
>> + ivrs_mappings[bdf].intremap_table
>> + ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
>> + : 0,
>
> Under what circumstances can ivrs_mappings[bdf].intremap_table be NULL,
> given the ENOMEM above?
The "ivrs_mappings[bdf].dte_requestor_id == bdf" part of the conditional
around the setting of pdev can result in allocation (and hence the
ENOMEM error path) to be bypassed.
> This case isn't very clear cut given the !pdev possibility, but...
>
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -458,6 +459,32 @@ static int amd_iommu_add_device(u8 devfn
>> return -ENODEV;
>> }
>>
>> + ivrs_mappings = get_ivrs_mappings(pdev->seg);
>> + bdf = PCI_BDF2(pdev->bus, devfn);
>> + if ( !ivrs_mappings ||
>> + !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid )
>> + return -EPERM;
>> +
>> + if ( iommu_intremap &&
>> + ivrs_mappings[bdf].dte_requestor_id == bdf &&
>> + !ivrs_mappings[bdf].intremap_table )
>> + {
>> + ivrs_mappings[bdf].intremap_table =
>> + amd_iommu_alloc_intremap_table(
>> + iommu, &ivrs_mappings[bdf].intremap_inuse);
>> + if ( !ivrs_mappings[bdf].intremap_table )
>> + return -ENOMEM;
>> +
>> + amd_iommu_set_intremap_table(
>> + iommu->dev_table.buffer + (bdf *
>> IOMMU_DEV_TABLE_ENTRY_SIZE),
>> + ivrs_mappings[bdf].intremap_table
>> + ? virt_to_maddr(ivrs_mappings[bdf].intremap_table)
>> + : 0,
>
> ... this case definitely cannot occur.
Oh, missing cleanup after copy-and-paste. Thanks for noticing.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |