[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.20 2/3] x86/PCI: init segments earlier


  • To: Jan Beulich <jbeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Sun, 2 Feb 2025 14:46:22 +0000
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>
  • Delivery-date: Sun, 02 Feb 2025 14:46:31 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 30/01/2025 11:12 am, Jan Beulich wrote:
> In order for amd_iommu_detect_one_acpi()'s call to pci_ro_device() to
> have permanent effect, pci_segments_init() needs to be called ahead of
> making it there. Without this we're losing segment 0's r/o map, and thus
> we're losing write-protection of the PCI devices representing IOMMUs.
> Which in turn means that half-way recent Linux Dom0 will, as it boots,
> turn off MSI on these devices, thus preventing any IOMMU events (faults
> in particular) from being reported on pre-x2APIC hardware.
>
> As the acpi_iommu_init() invocation was moved ahead of
> acpi_mmcfg_init()'s by the offending commit, move the call to
> pci_segments_init() accordingly.
>
> Fixes: 3950f2485bbc ("x86/x2APIC: defer probe until after IOMMU ACPI table 
> parsing")
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> Of course it would have been quite a bit easier to notice this issue if
> radix_tree_insert() wouldn't work fine ahead of radix_tree_init() being
> invoked for a given radix tree, when the index inserted at is 0.
>
> While hunting down various other dead paths to actually find the root
> cause, it occurred to me that it's probably not a good idea to fully
> disallow config space writes for r/o devices: Dom0 won't be able to size
> their BARs (luckily the IOMMU "devices" don't have any, but e.g. serial
> ones generally will have at least one), for example. Without being able
> to size BARs it also will likely be unable to correctly account for the
> address space taken by these BARs. However, outside of vPCI it's not
> really clear to me how we could reasonably emulate such BAR sizing
> writes - we can't, after all, allow Dom0 to actually write to the
> underlying physical registers, yet we don't intercept reads (i.e. we
> can't mimic expected behavior then).
>
> --- a/xen/arch/x86/x86_64/mmconfig-shared.c
> +++ b/xen/arch/x86/x86_64/mmconfig-shared.c
> @@ -402,8 +402,6 @@ void __init acpi_mmcfg_init(void)
>  {
>      bool valid = true;
>  
> -    pci_segments_init();
> -
>      /* MMCONFIG disabled */
>      if ((pci_probe & PCI_PROBE_MMCONF) == 0)
>          return;
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -55,6 +55,8 @@ void __init acpi_iommu_init(void)
>  {
>      int ret = -ENODEV;
>  
> +    pci_segments_init();
> +
>      if ( !iommu_enable && !iommu_intremap )
>          return;
>  
>

I can't help but feel this is taking a bad problem and not making it any
better.

pci_segments_init() is even less (obviously) relevant in
apci_iommu_init() than it is in acpi_mmcfg_init(), and given the
fine-grain Kconfig-ing going on, is only one small step from
accidentally being compiled out.

ARM is in a bad state too, with this initialisation even being behind
the PCI Passthrough cmdline option.

IMO there are two problems here; one as you pointed out
(radix_tree_insert() doesn't fail), and that PCI handling requires
explicit initialisation to begin with.

Looking through radix tree, it wouldn't be hard to create a
RADIX_TREE_INIT macro to allow initialisation at compile time for
suitable objects (pci_segments and acpi_ivrs currently).

That involves exporting rcu_node_{alloc,free}(), although the last
caller of radix_tree_set_alloc_callbacks() was dropped when TMEM went,
so we could reasonably remove that infrastructure too, at which point
radix_tree_init() is a simple zero of the structure.

Dealing with alloc_pseg(0) is harder.  As we never free the PCI
segments, we could just opencode the radix_tree_root of height=1 with a
static pseg0 structure, and that would drop the need for
pci_segemnts_init() completely.

This gets us into a far less fragile position, and one liable to survive
future refactoring too.

~Andrew

P.S. Yes AMD IOMMUs really do have BARs.  The BIOS programs them, then
sets a register in config space to hide the BAR registers.  You can
reprogram them if you know how.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.