Xen project Mailing List

Re: [PATCH v2 1/2] x86/iommu: account for IOMEM caps when populating dom0 IOMMU page-tables

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 18 Feb 2025 17:38:47 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 18 Feb 2025 16:39:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Feb 18, 2025 at 04:32:33PM +0100, Jan Beulich wrote: > On 17.02.2025 15:16, Roger Pau Monne wrote: > > The current code in arch_iommu_hwdom_init() kind of open-codes the same > > MMIO permission ranges that are added to the hardware domain ->iomem_caps. > > Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to > > filter which memory regions should be added to the dom0 IOMMU page-tables. > > > > This should result in no change in the memory regions that end up identity > > mapped in the domain IOMMU page tables. > > > > Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH > > dom0, otherwise the internal Xen emulation for those ranges won't work. > > This requires an adjustment in dom0_setup_permissions(). > > > > Also the special casing of E820_UNUSABLE regions no longer needs to be done > > in arch_iommu_hwdom_init(), as those regions are already blocked in > > ->iomem_caps and thus would be removed from the rangeset as part of > > ->iomem_caps processing in arch_iommu_hwdom_init(). > > Only almost: ->iomem_caps has them removed only for addresses 1Mb and up. Right, but we would like to have parity between IOMMU and CPU page-tables, and hence should also allow IOMMU to map unusable regions below 1Mb if the CPU is also allowed to map them. I can tweak the commit message to note this is done on purpose. > > --- a/xen/arch/x86/dom0_build.c > > +++ b/xen/arch/x86/dom0_build.c > > @@ -552,7 +552,8 @@ int __init dom0_setup_permissions(struct domain *d) > > for ( i = 0; i < nr_ioapics; i++ ) > > { > > mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr); > > - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > > + if ( is_hvm_domain(d) || > > + !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > > rc |= iomem_deny_access(d, mfn, mfn); > > } > > The original code used has_vioapic() and had a comment. At least one of# > the two would better be transplanted here, I think. I can indeed switch to has_vioapic() and keep the comment. > > @@ -593,6 +594,22 @@ int __init dom0_setup_permissions(struct domain *d) > > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > > } > > > > + /* For PVH dom0 prevent access to MCFG, it's emulated by Xen. */ > > + if ( is_hvm_domain(d) ) > > + { > > + for ( i = 0; i < pci_mmcfg_config_num; i++ ) > > + { > > + const unsigned long s = > > + PFN_DOWN(pci_mmcfg_config[i].address) + > > + PCI_BDF(pci_mmcfg_config[i].start_bus_number, 0, 0); > > + const unsigned long e = > > + PFN_DOWN(pci_mmcfg_config[i].address) + > > + PCI_BDF(pci_mmcfg_config[i].end_bus_number, ~0, ~0); > > + > > + rc |= iomem_deny_access(d, s, e); > > + } > > + } > > Similarly here, the original code used has_vpci() and also had a comment. > Is there actually a strong reason to replace the original MCFG code? Hm, I see, I could tweak vpci_subtract_mmcfg() to remove the regions from ->iomem_cap instead of open-coding here. > > --- a/xen/drivers/passthrough/x86/iommu.c > > +++ b/xen/drivers/passthrough/x86/iommu.c > > @@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_subtract(unsigned > > long s, unsigned long e, > > return rangeset_remove_range(map, s, e); > > } > > > > +struct handle_iomemcap { > > + struct rangeset *r; > > + unsigned long last; > > +}; > > +static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s, > > + unsigned long e, > > + void *data) > > +{ > > + struct handle_iomemcap *h = data; > > + int rc = 0; > > + > > + if ( h->last != s ) > > + rc = rangeset_remove_range(h->r, h->last, s - 1); > > + > > + /* Be careful with overflows. */ > > + h->last = max(e + 1, e); > > What overflow could occur here? Addresses are limited to 52 bits; frame > numbers are limited to 40 bits. And ->iomem_caps starts out with a range > ending at the last address permitted by paddr_bits. I was misremembering ->iomem_caps being initialized as [0, ~0UL] for dom0, but that's not the case. I can indeed drop the max() here. > > @@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain > > *d) > > if ( rc ) > > panic("IOMMU failed to remove Xen ranges: %d\n", rc); > > > > - /* Remove any overlap with the Interrupt Address Range. */ > > - rc = rangeset_remove_range(map, 0xfee00, 0xfeeff); > > + iomem.r = map; > > + rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, > > map_subtract_iomemcap, > > + &iomem); > > + if ( !rc && iomem.last < ~0UL ) > > + rc = rangeset_remove_range(map, iomem.last, ~0UL); > > Similarly here I'm wondering about the upper bound of ~0UL. Is this just > to be on the safe side? And/or just because it's simpler than calculating > the actual upper bound? I could use the actual upper bound, as we already use it a bit below in: /* Remove any regions past the last address addressable by the domain. */ rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)), ~0UL); However using ~0UL is also correct for the purposes here. > No ranges above the system physical address width > should ever be entered into the rangeset. Kind of as an implication > iomem.last also is guaranteed to be below ~0UL when making it here. I could add an assert: ASSERT(iomem.last < PFN_DOWN(1UL << domain_max_paddr_bits(d))); But then I would need to adjust dom0_setup_permissions() to also use domain_max_paddr_bits() instead of using paddr_bits for the ->iomem_caps initial max address. That should likely be a separate patch on it's own. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.