[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.20 2/3] x86/PCI: init segments earlier
On 03/02/2025 9:09 am, Jan Beulich wrote: > On 02.02.2025 15:46, Andrew Cooper wrote: >> On 30/01/2025 11:12 am, Jan Beulich wrote: >>> In order for amd_iommu_detect_one_acpi()'s call to pci_ro_device() to >>> have permanent effect, pci_segments_init() needs to be called ahead of >>> making it there. Without this we're losing segment 0's r/o map, and thus >>> we're losing write-protection of the PCI devices representing IOMMUs. >>> Which in turn means that half-way recent Linux Dom0 will, as it boots, >>> turn off MSI on these devices, thus preventing any IOMMU events (faults >>> in particular) from being reported on pre-x2APIC hardware. >>> >>> As the acpi_iommu_init() invocation was moved ahead of >>> acpi_mmcfg_init()'s by the offending commit, move the call to >>> pci_segments_init() accordingly. >>> >>> Fixes: 3950f2485bbc ("x86/x2APIC: defer probe until after IOMMU ACPI table >>> parsing") >>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>> --- >>> Of course it would have been quite a bit easier to notice this issue if >>> radix_tree_insert() wouldn't work fine ahead of radix_tree_init() being >>> invoked for a given radix tree, when the index inserted at is 0. >>> >>> While hunting down various other dead paths to actually find the root >>> cause, it occurred to me that it's probably not a good idea to fully >>> disallow config space writes for r/o devices: Dom0 won't be able to size >>> their BARs (luckily the IOMMU "devices" don't have any, but e.g. serial >>> ones generally will have at least one), for example. Without being able >>> to size BARs it also will likely be unable to correctly account for the >>> address space taken by these BARs. However, outside of vPCI it's not >>> really clear to me how we could reasonably emulate such BAR sizing >>> writes - we can't, after all, allow Dom0 to actually write to the >>> underlying physical registers, yet we don't intercept reads (i.e. we >>> can't mimic expected behavior then). >>> >>> --- a/xen/arch/x86/x86_64/mmconfig-shared.c >>> +++ b/xen/arch/x86/x86_64/mmconfig-shared.c >>> @@ -402,8 +402,6 @@ void __init acpi_mmcfg_init(void) >>> { >>> bool valid = true; >>> >>> - pci_segments_init(); >>> - >>> /* MMCONFIG disabled */ >>> if ((pci_probe & PCI_PROBE_MMCONF) == 0) >>> return; >>> --- a/xen/drivers/passthrough/x86/iommu.c >>> +++ b/xen/drivers/passthrough/x86/iommu.c >>> @@ -55,6 +55,8 @@ void __init acpi_iommu_init(void) >>> { >>> int ret = -ENODEV; >>> >>> + pci_segments_init(); >>> + >>> if ( !iommu_enable && !iommu_intremap ) >>> return; >>> >>> >> I can't help but feel this is taking a bad problem and not making it any >> better. >> >> pci_segments_init() is even less (obviously) relevant in >> apci_iommu_init() than it is in acpi_mmcfg_init(), and given the >> fine-grain Kconfig-ing going on, is only one small step from >> accidentally being compiled out. > The alternative I did consider was to move the call into __start_xen() > itself. Anything going beyond that looks more intrusive than we'd like > it at this point of the release cycle. Moving into __start_xen() would be ok if we think we're getting too close to the release. It makes it clearer that there is explicit ordering necessary. > >> ARM is in a bad state too, with this initialisation even being behind >> the PCI Passthrough cmdline option. >> >> IMO there are two problems here; one as you pointed out >> (radix_tree_insert() doesn't fail), and that PCI handling requires >> explicit initialisation to begin with. >> >> Looking through radix tree, it wouldn't be hard to create a >> RADIX_TREE_INIT macro to allow initialisation at compile time for >> suitable objects (pci_segments and acpi_ivrs currently). >> >> That involves exporting rcu_node_{alloc,free}(), although the last >> caller of radix_tree_set_alloc_callbacks() was dropped when TMEM went, >> so we could reasonably remove that infrastructure too, at which point >> radix_tree_init() is a simple zero of the structure. > Yes, seeing that this was even an extension of ours (i.e. Linux doesn't > have such), it's certainly worth getting rid of. If nothing else, then > for the two cf_check annotations that's we'd then be able to drop. I'll > make a patch. Oh, even better. > >> Dealing with alloc_pseg(0) is harder. As we never free the PCI >> segments, we could just opencode the radix_tree_root of height=1 with a >> static pseg0 structure, and that would drop the need for >> pci_segemnts_init() completely. > I'm afraid this would end up being too much open-coding for my taste. I didn't much like the suggestion either. > I'd put this differently: Unlike the radix tree initialization, the > setting up of segment 0 isn't a prereq to acpi_iommu_init(). We could > keep acpi_mmcfg_init() doing that, by way of calling pci_add_segment(0) > (and that would simply be a no-op if acpi_iommu_init() ended up > introducing segment 0 already). That might be ok. ~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |