[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: Add resizable bar support


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Date: Wed, 20 Nov 2024 10:25:37 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5YpXkv2pSRg1kChKvZWa/dY/10JeD4lKybzdNeiY2PI=; b=wleEN6uiV8vBmWDRJRIAJqYaXQKQsHbje13aNcg+ZHRsz9zOeGT+bG3Hqql+bB35mRCHxAdeu4/huVsF1Hm+ru9kTp4neMrVc0zGFDtqmYXI4zE/liSDSHsL54K8bZhY558jdWoWBCu2GS06p77u2/PIk96xx6hTOoymQPvQvZ9GQnQxdv5pZFgsdOIcGJEDexKc4lUnOo0jUxniO7TaT7tIdgDSodWBOzrZsxpxUaL/4GCBJK6LnO/a+92HDqWTaV9uY2ATUh0i4jI1qKafghOjn/9xMCdDrpkBfEMthJjCmH6cSrtxTqZHYLKGTH59SQcEkcL/3Pi1uBdeZHPQAQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TCUOhEGUTZ8XogUsGTzskkofBvlOTCN5u6Dt2SNk7jfH2JqXuBFP5eluqtt+4ZrDthQtwaSDJWvED/i0Y7F7h/Za3h6siw3abYt79ghYouGFR32y5LoLl11b2bw0Mv1tzGsMxG66W0qgRWEec7u3xi5dNVA0Ee+hQ/eVKK44iLPpC6t7JK4hawcdKq7AzG/jjsHwvoQd9IiA/n4ZUCDOoJOvlVSdxA76nwmXUUix1/C4o9CnG+2H9NvNRy3Bo9LvQDvJNg9ZEuf3XCnyw/GHU1S2BC5rkxLAxFZoktUJU2Xeui5ggWQI6y964J5cZuILY0JJ+2+5RuOIFBiEBPttNQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Wed, 20 Nov 2024 10:25:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHbNaIlMxf8mJcywkeNRFmHEai1qbK08gcAgACKfwD//4ZQAIABzRcAgAAfLACAAB0/gIABIxcAgAAMOoCABNwTAIABfwWAgAFxTQD//+JeAIAAnFuA
  • Thread-topic: [PATCH] vpci: Add resizable bar support

On 2024/11/20 17:01, Roger Pau Monné wrote:
> On Wed, Nov 20, 2024 at 03:01:57AM +0000, Chen, Jiqian wrote:
>> On 2024/11/19 20:46, Roger Pau Monné wrote:
>>> On Mon, Nov 18, 2024 at 06:06:03AM +0000, Chen, Jiqian wrote:
>>>> On 2024/11/15 19:42, Roger Pau Monné wrote:
>>>>> On Fri, Nov 15, 2024 at 03:04:22AM +0000, Chen, Jiqian wrote:
>>>>>> On 2024/11/15 01:36, Roger Pau Monné wrote:
>>>>>>> On Thu, Nov 14, 2024 at 04:52:18PM +0100, Roger Pau Monné wrote:
>>>>>>>> On Thu, Nov 14, 2024 at 06:11:46AM +0000, Chen, Jiqian wrote:
>>>>>>>>> On 2024/11/13 18:30, Roger Pau Monné wrote:
>>>>>>>>>> On Wed, Nov 13, 2024 at 10:00:33AM +0000, Chen, Jiqian wrote:
>>>>>>>>>>> On 2024/11/13 17:30, Roger Pau Monné wrote:
>>>>>>>>>>>> On Wed, Nov 13, 2024 at 04:00:27PM +0800, Jiqian Chen wrote:
>>>>>>>>>>>>> Some devices, like discrete GPU of amd, support resizable bar 
>>>>>>>>>>>>> capability,
>>>>>>>>>>>>> but vpci of Xen doesn't support this feature, so they fail to 
>>>>>>>>>>>>> resize bars
>>>>>>>>>>>>> and then cause probing failure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> According to PCIe spec, each bar that support resizing has two 
>>>>>>>>>>>>> registers,
>>>>>>>>>>>>> PCI_REBAR_CAP and PCI_REBAR_CTRL, so add these two registers and 
>>>>>>>>>>>>> their
>>>>>>>>>>>>> corresponding handler into vpci.
>>>>>>>>>>>>>
>>>>>>>>>>>>> PCI_REBAR_CAP is RO, only provide reading.
>>>>>>>>>>>>>
>>>>>>>>>>>>> PCI_REBAR_CTRL only has bar size is RW, so add write function to 
>>>>>>>>>>>>> support
>>>>>>>>>>>>> setting the new size.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the logic to handle resizable BAR could be much simpler.  
>>>>>>>>>>>> Some
>>>>>>>>>>>> time ago I've made a patch to add support for it, but due to lack 
>>>>>>>>>>>> of
>>>>>>>>>>>> hardware on my side to test it I've never submitted it.
>>>>>>>>>>>>
>>>>>>>>>>>> My approach would be to detect the presence of the
>>>>>>>>>>>> PCI_EXT_CAP_ID_REBAR capability in init_header(), and if the
>>>>>>>>>>>> capability is present force the sizing of BARs each time they are
>>>>>>>>>>>> mapped in modify_bars().  I don't think we need to trap accesses to
>>>>>>>>>>>> the capability itself, as resizing can only happen when memory
>>>>>>>>>>>> decoding is not enabled for the device.  It's enough to fetch the 
>>>>>>>>>>>> size
>>>>>>>>>>>> of the BARs ahead of each enabling of memory decoding.
>>>>>>>>>>>>
>>>>>>>>>>>> Note that memory decoding implies mapping the BARs into the p2m, 
>>>>>>>>>>>> which
>>>>>>>>>>>> is already an expensive operation, the extra sizing is unlikely to
>>>>>>>>>>>> make much of a difference performance wise.
>>>>>>>>>>>>
>>>>>>>>>>>> I've found the following on my git tree and rebased on top of 
>>>>>>>>>>>> staging:
>>>>>>>>>>> OK.
>>>>>>>>>>> Do you need me to validate your patch in my environment?
>>>>>>>>>>
>>>>>>>>>> Yes please, I have no way to test it.  Let's see what others think
>>>>>>>>>> about the different approaches.
>>>>>>>>> There are some errors with your method.
>>>>>>>>> I attached the dmesg and xl dmesg logs.
>>>>>>>>> From the dmesg logs, it seems that 0000:03:00.0 has addresse overlap 
>>>>>>>>> with 0000:03:00.1
>>>>>>>>
>>>>>>>> Do you have the output of lspci with the BAR sizes/positions before
>>>>>>>> and after the resizing?
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think there is a place that needs to be modified regarding your 
>>>>>>>>> method,
>>>>>>>>> although this modification does not help with the above-mentioned 
>>>>>>>>> errors,
>>>>>>>>> it is that whether to support resizing is specific to which bar, 
>>>>>>>>> rather than just determining whether there is a Rebar capability.
>>>>>>>>
>>>>>>>> Do we really need such fine-grained information?  It should be
>>>>>>>> harmless (even if not strictly necessary) to size all BARs on the
>>>>>>>> device before enabling memory decoding, even if some of them do not
>>>>>>>> support resizing.
>>>>>>>>
>>>>>>>> I might have to provide a patch with additional messages to see what's
>>>>>>>> going on.
>>>>>>>
>>>>>>> One nit that I've noticed with the patch I gave you previously is that
>>>>>>> the check for a size change in modify_bars() should be done ahead of
>>>>>>> pci_check_bar(), otherwise the check is possibly using an outdated
>>>>>>> size.
>>>>>>>
>>>>>>> I've also added a debug message to notify when a BAR register is
>>>>>>> written and report the new address.  This is done unconditionally, but
>>>>>>> if you think it's too chatty you can limit to only printing for the
>>>>>>> device that has the ReBAR capability.
>>>>>> Errors are the same.
>>>>>> Attached the dmesg, xl dmesg, patch and lspci output.
>>>>>> I will also continue to debug your method on my side to try to get some 
>>>>>> findings.
>>>>>
>>>>> Hello,
>>>>>
>>>>> I've been looking at the output, and it all seems fine, except the
>>>>> 03:00.0 device that becomes broken at some point, note the lspci
>>>>> output lists [virtual] next to the resource sizes.  This means reading
>>>>> for the registers returned 0, so the position and sizes are provided
>>>>> from the internal OS information.
>>>>>
>>>>> I'm assuming the patch you sent to the list doesn't lead to such errors,
>>>> Yes, the method of my patch doesn't lead to any errors.
>>>> I attached the dmesg, xl dmesg and lspci logs of my method.
>>>>
>>>>> in which case I can only guess that fetching the size of the
>>>>> BARs in modify_bars() causes issues with the device.
>>>>>
>>>>> To confirm this, can you try the following patch on top of your original 
>>>>> change?  
>>>> I tried below patch with my original patch, it didn't cause any errors.
>>>> And the lspci showed without the "[virtual]".
>>>> So, unfortunately, it is not related to the fetching size of Bars in 
>>>> modify_bars().
>>>
>>> I see, I'm at a loss as to what's wrong with my patch.  Do you have
>>> any additional patches on Xen when testing your or my approach?
>> No, just my patch or your patch, base on upstream codes(kernel branch is 
>> linux-next and xen branch is staging).
>>
>>>
>>> I sadly don't have any box with a PCI device that exposes the
>>> resizable BAR capability, so I'm not able to test or debug this.
>>>
>>> I would like to understand why my approach doesn't work, as otherwise
>>> I feel like I'm missing how ReBAR is supposed to work.  Anyway, if you
>>> can give a try to the diff below, it's the same patch, but with yet
>>> even more debug messages added.
>> Attached the xl dmesg.
>>
>>>
>>> Thanks, and sorry for keeping you testing it.
>> That's fine, feel free to send me if you need more test information.
>> Actually I am still continuing to debug your patch and also curious about 
>> why your patch does not work.
>>
>> The only difference between our methods is the timing of updating the size.
>> Yours is later than mine because you updated the size when the driver 
>> re-enabled memory decoding, while I updated the size in time when driver 
>> resize it.
> 
> Indeed, my last guess is the stale cached size is somehow used in my
> approach, and that leads to the failures.  One last (possibly dummy?)
> thing to try might be to use your patch to detect writes to the resize
> control register, but update the BAR sizes in modify_bars(), while
> keeping the traces of when the operations happen.
This seems to be the same as your method.
But I will take a try.

> 
>> I suspect that the driver or hypervisor may have done something in between 
>> and read outdated information.
>> I am debugging the driver codes.
> 
> Thanks for doing this.
> 
> I have a couple of side unrelated notes, something I've noticed from
> the trace you provided, taking device 0000:03:00.1 as an example:
> 
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: write BAR0 val: 0xfffffff0 BAR0 address: 0xfffffff0
> (XEN) 0000:03:00.1: write BAR0 val: 0xd0700000 BAR0 address: 0xd0700000
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
> (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
> (XEN) PCI add device 0000:03:00.1
> 
> Linux toggles memory decoding individually for sizing each BAR.  This
> is very costly when running virtualized (as least on Xen), as toggling
> the memory decoding bit implies tearing down or creating p2m mappings
> for all the BARs of the device.  It would be much better if Linux
> toggled memory decoding, sized all the BARs of the device, and then
> enabled memory decoding again.  If done that way you could probably
> reduce boot time of Linux PVH dom0 noticeably.
Thanks, it is helpful since I will need to reduce the boot time in future.

> 
> Secondly, the call to "PCI add device" is too late.  It works now
> because Xen scans for PCI devices, so those are already known and
> setup by Xen.  But if it's a hotplug device (or a device that somehow
> is not discovered by Xen at boot), doing the PHYSDEVOP_pci_device_add
> hypercall after sizing the device is wrong, the hypercall should be
> done ahead of Linux interacting with the device in any meaningful way.
> The checks done for device discovery are fine are obviously fine to be
> done ahead of the PHYSDEVOP_pci_device_add call.
> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.