[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-4.20 2/3] x86/PCI: init segments earlier


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 3 Feb 2025 15:31:00 +0000
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Oleksii Kurochko <oleksii.kurochko@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 03 Feb 2025 15:31:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 03/02/2025 9:09 am, Jan Beulich wrote:
> On 02.02.2025 15:46, Andrew Cooper wrote:
>> On 30/01/2025 11:12 am, Jan Beulich wrote:
>>> In order for amd_iommu_detect_one_acpi()'s call to pci_ro_device() to
>>> have permanent effect, pci_segments_init() needs to be called ahead of
>>> making it there. Without this we're losing segment 0's r/o map, and thus
>>> we're losing write-protection of the PCI devices representing IOMMUs.
>>> Which in turn means that half-way recent Linux Dom0 will, as it boots,
>>> turn off MSI on these devices, thus preventing any IOMMU events (faults
>>> in particular) from being reported on pre-x2APIC hardware.
>>>
>>> As the acpi_iommu_init() invocation was moved ahead of
>>> acpi_mmcfg_init()'s by the offending commit, move the call to
>>> pci_segments_init() accordingly.
>>>
>>> Fixes: 3950f2485bbc ("x86/x2APIC: defer probe until after IOMMU ACPI table 
>>> parsing")
>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>> ---
>>> Of course it would have been quite a bit easier to notice this issue if
>>> radix_tree_insert() wouldn't work fine ahead of radix_tree_init() being
>>> invoked for a given radix tree, when the index inserted at is 0.
>>>
>>> While hunting down various other dead paths to actually find the root
>>> cause, it occurred to me that it's probably not a good idea to fully
>>> disallow config space writes for r/o devices: Dom0 won't be able to size
>>> their BARs (luckily the IOMMU "devices" don't have any, but e.g. serial
>>> ones generally will have at least one), for example. Without being able
>>> to size BARs it also will likely be unable to correctly account for the
>>> address space taken by these BARs. However, outside of vPCI it's not
>>> really clear to me how we could reasonably emulate such BAR sizing
>>> writes - we can't, after all, allow Dom0 to actually write to the
>>> underlying physical registers, yet we don't intercept reads (i.e. we
>>> can't mimic expected behavior then).
>>>
>>> --- a/xen/arch/x86/x86_64/mmconfig-shared.c
>>> +++ b/xen/arch/x86/x86_64/mmconfig-shared.c
>>> @@ -402,8 +402,6 @@ void __init acpi_mmcfg_init(void)
>>>  {
>>>      bool valid = true;
>>>  
>>> -    pci_segments_init();
>>> -
>>>      /* MMCONFIG disabled */
>>>      if ((pci_probe & PCI_PROBE_MMCONF) == 0)
>>>          return;
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -55,6 +55,8 @@ void __init acpi_iommu_init(void)
>>>  {
>>>      int ret = -ENODEV;
>>>  
>>> +    pci_segments_init();
>>> +
>>>      if ( !iommu_enable && !iommu_intremap )
>>>          return;
>>>  
>>>
>> I can't help but feel this is taking a bad problem and not making it any
>> better.
>>
>> pci_segments_init() is even less (obviously) relevant in
>> apci_iommu_init() than it is in acpi_mmcfg_init(), and given the
>> fine-grain Kconfig-ing going on, is only one small step from
>> accidentally being compiled out.
> The alternative I did consider was to move the call into __start_xen()
> itself. Anything going beyond that looks more intrusive than we'd like
> it at this point of the release cycle.

Moving into __start_xen() would be ok if we think we're getting too
close to the release.  It makes it clearer that there is explicit
ordering necessary.

>
>> ARM is in a bad state too, with this initialisation even being behind
>> the PCI Passthrough cmdline option.
>>
>> IMO there are two problems here; one as you pointed out
>> (radix_tree_insert() doesn't fail), and that PCI handling requires
>> explicit initialisation to begin with.
>>
>> Looking through radix tree, it wouldn't be hard to create a
>> RADIX_TREE_INIT macro to allow initialisation at compile time for
>> suitable objects (pci_segments and acpi_ivrs currently).
>>
>> That involves exporting rcu_node_{alloc,free}(), although the last
>> caller of radix_tree_set_alloc_callbacks() was dropped when TMEM went,
>> so we could reasonably remove that infrastructure too, at which point
>> radix_tree_init() is a simple zero of the structure.
> Yes, seeing that this was even an extension of ours (i.e. Linux doesn't
> have such), it's certainly worth getting rid of. If nothing else, then
> for the two cf_check annotations that's we'd then be able to drop. I'll
> make a patch.

Oh, even better. 

>
>> Dealing with alloc_pseg(0) is harder.  As we never free the PCI
>> segments, we could just opencode the radix_tree_root of height=1 with a
>> static pseg0 structure, and that would drop the need for
>> pci_segemnts_init() completely.
> I'm afraid this would end up being too much open-coding for my taste.

I didn't much like the suggestion either.

> I'd put this differently: Unlike the radix tree initialization, the
> setting up of segment 0 isn't a prereq to acpi_iommu_init(). We could
> keep acpi_mmcfg_init() doing that, by way of calling pci_add_segment(0)
> (and that would simply be a no-op if acpi_iommu_init() ended up
> introducing segment 0 already).

That might be ok.

~Andrew



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.