[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/2] x86/iommu: account for IOMEM caps when populating dom0 IOMMU page-tables



On Tue, Feb 18, 2025 at 04:32:33PM +0100, Jan Beulich wrote:
> On 17.02.2025 15:16, Roger Pau Monne wrote:
> > The current code in arch_iommu_hwdom_init() kind of open-codes the same
> > MMIO permission ranges that are added to the hardware domain ->iomem_caps.
> > Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to
> > filter which memory regions should be added to the dom0 IOMMU page-tables.
> > 
> > This should result in no change in the memory regions that end up identity
> > mapped in the domain IOMMU page tables.
> > 
> > Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH
> > dom0, otherwise the internal Xen emulation for those ranges won't work.
> > This requires an adjustment in dom0_setup_permissions().
> > 
> > Also the special casing of E820_UNUSABLE regions no longer needs to be done
> > in arch_iommu_hwdom_init(), as those regions are already blocked in
> > ->iomem_caps and thus would be removed from the rangeset as part of
> > ->iomem_caps processing in arch_iommu_hwdom_init().
> 
> Only almost: ->iomem_caps has them removed only for addresses 1Mb and up.

Right, but we would like to have parity between IOMMU and CPU
page-tables, and hence should also allow IOMMU to map unusable regions
below 1Mb if the CPU is also allowed to map them.  I can tweak the
commit message to note this is done on purpose.

> > --- a/xen/arch/x86/dom0_build.c
> > +++ b/xen/arch/x86/dom0_build.c
> > @@ -552,7 +552,8 @@ int __init dom0_setup_permissions(struct domain *d)
> >      for ( i = 0; i < nr_ioapics; i++ )
> >      {
> >          mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
> > -        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> > +        if ( is_hvm_domain(d) ||
> > +             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
> >              rc |= iomem_deny_access(d, mfn, mfn);
> >      }
> 
> The original code used has_vioapic() and had a comment. At least one of#
> the two would better be transplanted here, I think.

I can indeed switch to has_vioapic() and keep the comment.

> > @@ -593,6 +594,22 @@ int __init dom0_setup_permissions(struct domain *d)
> >              rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
> >      }
> >  
> > +    /* For PVH dom0 prevent access to MCFG, it's emulated by Xen. */
> > +    if ( is_hvm_domain(d) )
> > +    {
> > +        for ( i = 0; i < pci_mmcfg_config_num; i++ )
> > +        {
> > +            const unsigned long s =
> > +                PFN_DOWN(pci_mmcfg_config[i].address) +
> > +                PCI_BDF(pci_mmcfg_config[i].start_bus_number, 0, 0);
> > +            const unsigned long e =
> > +                PFN_DOWN(pci_mmcfg_config[i].address) +
> > +                PCI_BDF(pci_mmcfg_config[i].end_bus_number, ~0, ~0);
> > +
> > +            rc |= iomem_deny_access(d, s, e);
> > +        }
> > +    }
> 
> Similarly here, the original code used has_vpci() and also had a comment.
> Is there actually a strong reason to replace the original MCFG code?

Hm, I see, I could tweak vpci_subtract_mmcfg() to remove the regions
from ->iomem_cap instead of open-coding here.

> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_subtract(unsigned 
> > long s, unsigned long e,
> >      return rangeset_remove_range(map, s, e);
> >  }
> >  
> > +struct handle_iomemcap {
> > +    struct rangeset *r;
> > +    unsigned long last;
> > +};
> > +static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s,
> > +                                                       unsigned long e,
> > +                                                       void *data)
> > +{
> > +    struct handle_iomemcap *h = data;
> > +    int rc = 0;
> > +
> > +    if ( h->last != s )
> > +        rc = rangeset_remove_range(h->r, h->last, s - 1);
> > +
> > +    /* Be careful with overflows. */
> > +    h->last = max(e + 1, e);
> 
> What overflow could occur here? Addresses are limited to 52 bits; frame
> numbers are limited to 40 bits. And ->iomem_caps starts out with a range
> ending at the last address permitted by paddr_bits.

I was misremembering ->iomem_caps being initialized as [0, ~0UL] for
dom0, but that's not the case.  I can indeed drop the max() here.

> > @@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain 
> > *d)
> >      if ( rc )
> >          panic("IOMMU failed to remove Xen ranges: %d\n", rc);
> >  
> > -    /* Remove any overlap with the Interrupt Address Range. */
> > -    rc = rangeset_remove_range(map, 0xfee00, 0xfeeff);
> > +    iomem.r = map;
> > +    rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, 
> > map_subtract_iomemcap,
> > +                                &iomem);
> > +    if ( !rc && iomem.last < ~0UL )
> > +        rc = rangeset_remove_range(map, iomem.last, ~0UL);
> 
> Similarly here I'm wondering about the upper bound of ~0UL. Is this just
> to be on the safe side? And/or just because it's simpler than calculating
> the actual upper bound?

I could use the actual upper bound, as we already use it a bit below
in:

    /* Remove any regions past the last address addressable by the domain. */
    rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)),
                               ~0UL);

However using ~0UL is also correct for the purposes here.

> No ranges above the system physical address width
> should ever be entered into the rangeset. Kind of as an implication
> iomem.last also is guaranteed to be below ~0UL when making it here.

I could add an assert:

ASSERT(iomem.last < PFN_DOWN(1UL << domain_max_paddr_bits(d)));

But then I would need to adjust dom0_setup_permissions() to also use
domain_max_paddr_bits() instead of using paddr_bits for the
->iomem_caps initial max address.  That should likely be a separate
patch on it's own.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.