[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.20 2/3] x86/PCI: init segments earlier



On Thu, Jan 30, 2025 at 12:12:31PM +0100, Jan Beulich wrote:
> In order for amd_iommu_detect_one_acpi()'s call to pci_ro_device() to
> have permanent effect, pci_segments_init() needs to be called ahead of
> making it there. Without this we're losing segment 0's r/o map, and thus
> we're losing write-protection of the PCI devices representing IOMMUs.
> Which in turn means that half-way recent Linux Dom0 will, as it boots,
> turn off MSI on these devices, thus preventing any IOMMU events (faults
> in particular) from being reported on pre-x2APIC hardware.
> 
> As the acpi_iommu_init() invocation was moved ahead of
> acpi_mmcfg_init()'s by the offending commit, move the call to
> pci_segments_init() accordingly.
> 
> Fixes: 3950f2485bbc ("x86/x2APIC: defer probe until after IOMMU ACPI table 
> parsing")
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> Of course it would have been quite a bit easier to notice this issue if
> radix_tree_insert() wouldn't work fine ahead of radix_tree_init() being
> invoked for a given radix tree, when the index inserted at is 0.
> 
> While hunting down various other dead paths to actually find the root
> cause, it occurred to me that it's probably not a good idea to fully
> disallow config space writes for r/o devices: Dom0 won't be able to size
> their BARs (luckily the IOMMU "devices" don't have any, but e.g. serial
> ones generally will have at least one), for example. Without being able
> to size BARs it also will likely be unable to correctly account for the
> address space taken by these BARs. However, outside of vPCI it's not
> really clear to me how we could reasonably emulate such BAR sizing
> writes - we can't, after all, allow Dom0 to actually write to the
> underlying physical registers, yet we don't intercept reads (i.e. we
> can't mimic expected behavior then).

For properly sizing the domain will also attempt to toggle the memory
decoding bit ahead of sizing the BARs, and letting that trough will
break the usage of the device from Xen.  IOW: we would likely need to
emulate a fair amount of device state to make the view coherent from a
guest PoV, but is it worth it for a device that the hardware domain
cannot interact with?

Would it make more sense to just hide those devices instead of
allowing read-only access to their PCI config space?

> --- a/xen/arch/x86/x86_64/mmconfig-shared.c
> +++ b/xen/arch/x86/x86_64/mmconfig-shared.c
> @@ -402,8 +402,6 @@ void __init acpi_mmcfg_init(void)
>  {
>      bool valid = true;
>  
> -    pci_segments_init();
> -
>      /* MMCONFIG disabled */
>      if ((pci_probe & PCI_PROBE_MMCONF) == 0)
>          return;
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -55,6 +55,8 @@ void __init acpi_iommu_init(void)
>  {
>      int ret = -ENODEV;
>  
> +    pci_segments_init();

My preference might be to just place the pci_segments_init() call in
__start_xen(), instead of hiding it again in what might look like an
unrelated function (there's no mention of PCI in acpi_iommu_init()
function name for example).

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.