[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] vpci: Add resizable bar support
On 2024/11/20 17:01, Roger Pau Monné wrote: > On Wed, Nov 20, 2024 at 03:01:57AM +0000, Chen, Jiqian wrote: >> On 2024/11/19 20:46, Roger Pau Monné wrote: >>> On Mon, Nov 18, 2024 at 06:06:03AM +0000, Chen, Jiqian wrote: >>>> On 2024/11/15 19:42, Roger Pau Monné wrote: >>>>> On Fri, Nov 15, 2024 at 03:04:22AM +0000, Chen, Jiqian wrote: >>>>>> On 2024/11/15 01:36, Roger Pau Monné wrote: >>>>>>> On Thu, Nov 14, 2024 at 04:52:18PM +0100, Roger Pau Monné wrote: >>>>>>>> On Thu, Nov 14, 2024 at 06:11:46AM +0000, Chen, Jiqian wrote: >>>>>>>>> On 2024/11/13 18:30, Roger Pau Monné wrote: >>>>>>>>>> On Wed, Nov 13, 2024 at 10:00:33AM +0000, Chen, Jiqian wrote: >>>>>>>>>>> On 2024/11/13 17:30, Roger Pau Monné wrote: >>>>>>>>>>>> On Wed, Nov 13, 2024 at 04:00:27PM +0800, Jiqian Chen wrote: >>>>>>>>>>>>> Some devices, like discrete GPU of amd, support resizable bar >>>>>>>>>>>>> capability, >>>>>>>>>>>>> but vpci of Xen doesn't support this feature, so they fail to >>>>>>>>>>>>> resize bars >>>>>>>>>>>>> and then cause probing failure. >>>>>>>>>>>>> >>>>>>>>>>>>> According to PCIe spec, each bar that support resizing has two >>>>>>>>>>>>> registers, >>>>>>>>>>>>> PCI_REBAR_CAP and PCI_REBAR_CTRL, so add these two registers and >>>>>>>>>>>>> their >>>>>>>>>>>>> corresponding handler into vpci. >>>>>>>>>>>>> >>>>>>>>>>>>> PCI_REBAR_CAP is RO, only provide reading. >>>>>>>>>>>>> >>>>>>>>>>>>> PCI_REBAR_CTRL only has bar size is RW, so add write function to >>>>>>>>>>>>> support >>>>>>>>>>>>> setting the new size. >>>>>>>>>>>> >>>>>>>>>>>> I think the logic to handle resizable BAR could be much simpler. >>>>>>>>>>>> Some >>>>>>>>>>>> time ago I've made a patch to add support for it, but due to lack >>>>>>>>>>>> of >>>>>>>>>>>> hardware on my side to test it I've never submitted it. >>>>>>>>>>>> >>>>>>>>>>>> My approach would be to detect the presence of the >>>>>>>>>>>> PCI_EXT_CAP_ID_REBAR capability in init_header(), and if the >>>>>>>>>>>> capability is present force the sizing of BARs each time they are >>>>>>>>>>>> mapped in modify_bars(). I don't think we need to trap accesses to >>>>>>>>>>>> the capability itself, as resizing can only happen when memory >>>>>>>>>>>> decoding is not enabled for the device. It's enough to fetch the >>>>>>>>>>>> size >>>>>>>>>>>> of the BARs ahead of each enabling of memory decoding. >>>>>>>>>>>> >>>>>>>>>>>> Note that memory decoding implies mapping the BARs into the p2m, >>>>>>>>>>>> which >>>>>>>>>>>> is already an expensive operation, the extra sizing is unlikely to >>>>>>>>>>>> make much of a difference performance wise. >>>>>>>>>>>> >>>>>>>>>>>> I've found the following on my git tree and rebased on top of >>>>>>>>>>>> staging: >>>>>>>>>>> OK. >>>>>>>>>>> Do you need me to validate your patch in my environment? >>>>>>>>>> >>>>>>>>>> Yes please, I have no way to test it. Let's see what others think >>>>>>>>>> about the different approaches. >>>>>>>>> There are some errors with your method. >>>>>>>>> I attached the dmesg and xl dmesg logs. >>>>>>>>> From the dmesg logs, it seems that 0000:03:00.0 has addresse overlap >>>>>>>>> with 0000:03:00.1 >>>>>>>> >>>>>>>> Do you have the output of lspci with the BAR sizes/positions before >>>>>>>> and after the resizing? >>>>>>>> >>>>>>>>> >>>>>>>>> I think there is a place that needs to be modified regarding your >>>>>>>>> method, >>>>>>>>> although this modification does not help with the above-mentioned >>>>>>>>> errors, >>>>>>>>> it is that whether to support resizing is specific to which bar, >>>>>>>>> rather than just determining whether there is a Rebar capability. >>>>>>>> >>>>>>>> Do we really need such fine-grained information? It should be >>>>>>>> harmless (even if not strictly necessary) to size all BARs on the >>>>>>>> device before enabling memory decoding, even if some of them do not >>>>>>>> support resizing. >>>>>>>> >>>>>>>> I might have to provide a patch with additional messages to see what's >>>>>>>> going on. >>>>>>> >>>>>>> One nit that I've noticed with the patch I gave you previously is that >>>>>>> the check for a size change in modify_bars() should be done ahead of >>>>>>> pci_check_bar(), otherwise the check is possibly using an outdated >>>>>>> size. >>>>>>> >>>>>>> I've also added a debug message to notify when a BAR register is >>>>>>> written and report the new address. This is done unconditionally, but >>>>>>> if you think it's too chatty you can limit to only printing for the >>>>>>> device that has the ReBAR capability. >>>>>> Errors are the same. >>>>>> Attached the dmesg, xl dmesg, patch and lspci output. >>>>>> I will also continue to debug your method on my side to try to get some >>>>>> findings. >>>>> >>>>> Hello, >>>>> >>>>> I've been looking at the output, and it all seems fine, except the >>>>> 03:00.0 device that becomes broken at some point, note the lspci >>>>> output lists [virtual] next to the resource sizes. This means reading >>>>> for the registers returned 0, so the position and sizes are provided >>>>> from the internal OS information. >>>>> >>>>> I'm assuming the patch you sent to the list doesn't lead to such errors, >>>> Yes, the method of my patch doesn't lead to any errors. >>>> I attached the dmesg, xl dmesg and lspci logs of my method. >>>> >>>>> in which case I can only guess that fetching the size of the >>>>> BARs in modify_bars() causes issues with the device. >>>>> >>>>> To confirm this, can you try the following patch on top of your original >>>>> change? >>>> I tried below patch with my original patch, it didn't cause any errors. >>>> And the lspci showed without the "[virtual]". >>>> So, unfortunately, it is not related to the fetching size of Bars in >>>> modify_bars(). >>> >>> I see, I'm at a loss as to what's wrong with my patch. Do you have >>> any additional patches on Xen when testing your or my approach? >> No, just my patch or your patch, base on upstream codes(kernel branch is >> linux-next and xen branch is staging). >> >>> >>> I sadly don't have any box with a PCI device that exposes the >>> resizable BAR capability, so I'm not able to test or debug this. >>> >>> I would like to understand why my approach doesn't work, as otherwise >>> I feel like I'm missing how ReBAR is supposed to work. Anyway, if you >>> can give a try to the diff below, it's the same patch, but with yet >>> even more debug messages added. >> Attached the xl dmesg. >> >>> >>> Thanks, and sorry for keeping you testing it. >> That's fine, feel free to send me if you need more test information. >> Actually I am still continuing to debug your patch and also curious about >> why your patch does not work. >> >> The only difference between our methods is the timing of updating the size. >> Yours is later than mine because you updated the size when the driver >> re-enabled memory decoding, while I updated the size in time when driver >> resize it. > > Indeed, my last guess is the stale cached size is somehow used in my > approach, and that leads to the failures. One last (possibly dummy?) > thing to try might be to use your patch to detect writes to the resize > control register, but update the BAR sizes in modify_bars(), while > keeping the traces of when the operations happen. This seems to be the same as your method. But I will take a try. > >> I suspect that the driver or hypervisor may have done something in between >> and read outdated information. >> I am debugging the driver codes. > > Thanks for doing this. > > I have a couple of side unrelated notes, something I've noticed from > the trace you provided, taking device 0000:03:00.1 as an example: > > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: write BAR0 val: 0xfffffff0 BAR0 address: 0xfffffff0 > (XEN) 0000:03:00.1: write BAR0 val: 0xd0700000 BAR0 address: 0xd0700000 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0 > (XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0 > (XEN) PCI add device 0000:03:00.1 > > Linux toggles memory decoding individually for sizing each BAR. This > is very costly when running virtualized (as least on Xen), as toggling > the memory decoding bit implies tearing down or creating p2m mappings > for all the BARs of the device. It would be much better if Linux > toggled memory decoding, sized all the BARs of the device, and then > enabled memory decoding again. If done that way you could probably > reduce boot time of Linux PVH dom0 noticeably. Thanks, it is helpful since I will need to reduce the boot time in future. > > Secondly, the call to "PCI add device" is too late. It works now > because Xen scans for PCI devices, so those are already known and > setup by Xen. But if it's a hotplug device (or a device that somehow > is not discovered by Xen at boot), doing the PHYSDEVOP_pci_device_add > hypercall after sizing the device is wrong, the hypercall should be > done ahead of Linux interacting with the device in any meaningful way. > The checks done for device discovery are fine are obviously fine to be > done ahead of the PHYSDEVOP_pci_device_add call. > > Thanks, Roger. -- Best regards, Jiqian Chen.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |