[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vpci: Add resizable bar support



On Wed, Nov 20, 2024 at 03:01:57AM +0000, Chen, Jiqian wrote:
> On 2024/11/19 20:46, Roger Pau Monné wrote:
> > On Mon, Nov 18, 2024 at 06:06:03AM +0000, Chen, Jiqian wrote:
> >> On 2024/11/15 19:42, Roger Pau Monné wrote:
> >>> On Fri, Nov 15, 2024 at 03:04:22AM +0000, Chen, Jiqian wrote:
> >>>> On 2024/11/15 01:36, Roger Pau Monné wrote:
> >>>>> On Thu, Nov 14, 2024 at 04:52:18PM +0100, Roger Pau Monné wrote:
> >>>>>> On Thu, Nov 14, 2024 at 06:11:46AM +0000, Chen, Jiqian wrote:
> >>>>>>> On 2024/11/13 18:30, Roger Pau Monné wrote:
> >>>>>>>> On Wed, Nov 13, 2024 at 10:00:33AM +0000, Chen, Jiqian wrote:
> >>>>>>>>> On 2024/11/13 17:30, Roger Pau Monné wrote:
> >>>>>>>>>> On Wed, Nov 13, 2024 at 04:00:27PM +0800, Jiqian Chen wrote:
> >>>>>>>>>>> Some devices, like discrete GPU of amd, support resizable bar 
> >>>>>>>>>>> capability,
> >>>>>>>>>>> but vpci of Xen doesn't support this feature, so they fail to 
> >>>>>>>>>>> resize bars
> >>>>>>>>>>> and then cause probing failure.
> >>>>>>>>>>>
> >>>>>>>>>>> According to PCIe spec, each bar that support resizing has two 
> >>>>>>>>>>> registers,
> >>>>>>>>>>> PCI_REBAR_CAP and PCI_REBAR_CTRL, so add these two registers and 
> >>>>>>>>>>> their
> >>>>>>>>>>> corresponding handler into vpci.
> >>>>>>>>>>>
> >>>>>>>>>>> PCI_REBAR_CAP is RO, only provide reading.
> >>>>>>>>>>>
> >>>>>>>>>>> PCI_REBAR_CTRL only has bar size is RW, so add write function to 
> >>>>>>>>>>> support
> >>>>>>>>>>> setting the new size.
> >>>>>>>>>>
> >>>>>>>>>> I think the logic to handle resizable BAR could be much simpler.  
> >>>>>>>>>> Some
> >>>>>>>>>> time ago I've made a patch to add support for it, but due to lack 
> >>>>>>>>>> of
> >>>>>>>>>> hardware on my side to test it I've never submitted it.
> >>>>>>>>>>
> >>>>>>>>>> My approach would be to detect the presence of the
> >>>>>>>>>> PCI_EXT_CAP_ID_REBAR capability in init_header(), and if the
> >>>>>>>>>> capability is present force the sizing of BARs each time they are
> >>>>>>>>>> mapped in modify_bars().  I don't think we need to trap accesses to
> >>>>>>>>>> the capability itself, as resizing can only happen when memory
> >>>>>>>>>> decoding is not enabled for the device.  It's enough to fetch the 
> >>>>>>>>>> size
> >>>>>>>>>> of the BARs ahead of each enabling of memory decoding.
> >>>>>>>>>>
> >>>>>>>>>> Note that memory decoding implies mapping the BARs into the p2m, 
> >>>>>>>>>> which
> >>>>>>>>>> is already an expensive operation, the extra sizing is unlikely to
> >>>>>>>>>> make much of a difference performance wise.
> >>>>>>>>>>
> >>>>>>>>>> I've found the following on my git tree and rebased on top of 
> >>>>>>>>>> staging:
> >>>>>>>>> OK.
> >>>>>>>>> Do you need me to validate your patch in my environment?
> >>>>>>>>
> >>>>>>>> Yes please, I have no way to test it.  Let's see what others think
> >>>>>>>> about the different approaches.
> >>>>>>> There are some errors with your method.
> >>>>>>> I attached the dmesg and xl dmesg logs.
> >>>>>>> From the dmesg logs, it seems that 0000:03:00.0 has addresse overlap 
> >>>>>>> with 0000:03:00.1
> >>>>>>
> >>>>>> Do you have the output of lspci with the BAR sizes/positions before
> >>>>>> and after the resizing?
> >>>>>>
> >>>>>>>
> >>>>>>> I think there is a place that needs to be modified regarding your 
> >>>>>>> method,
> >>>>>>> although this modification does not help with the above-mentioned 
> >>>>>>> errors,
> >>>>>>> it is that whether to support resizing is specific to which bar, 
> >>>>>>> rather than just determining whether there is a Rebar capability.
> >>>>>>
> >>>>>> Do we really need such fine-grained information?  It should be
> >>>>>> harmless (even if not strictly necessary) to size all BARs on the
> >>>>>> device before enabling memory decoding, even if some of them do not
> >>>>>> support resizing.
> >>>>>>
> >>>>>> I might have to provide a patch with additional messages to see what's
> >>>>>> going on.
> >>>>>
> >>>>> One nit that I've noticed with the patch I gave you previously is that
> >>>>> the check for a size change in modify_bars() should be done ahead of
> >>>>> pci_check_bar(), otherwise the check is possibly using an outdated
> >>>>> size.
> >>>>>
> >>>>> I've also added a debug message to notify when a BAR register is
> >>>>> written and report the new address.  This is done unconditionally, but
> >>>>> if you think it's too chatty you can limit to only printing for the
> >>>>> device that has the ReBAR capability.
> >>>> Errors are the same.
> >>>> Attached the dmesg, xl dmesg, patch and lspci output.
> >>>> I will also continue to debug your method on my side to try to get some 
> >>>> findings.
> >>>
> >>> Hello,
> >>>
> >>> I've been looking at the output, and it all seems fine, except the
> >>> 03:00.0 device that becomes broken at some point, note the lspci
> >>> output lists [virtual] next to the resource sizes.  This means reading
> >>> for the registers returned 0, so the position and sizes are provided
> >>> from the internal OS information.
> >>>
> >>> I'm assuming the patch you sent to the list doesn't lead to such errors,
> >> Yes, the method of my patch doesn't lead to any errors.
> >> I attached the dmesg, xl dmesg and lspci logs of my method.
> >>
> >>> in which case I can only guess that fetching the size of the
> >>> BARs in modify_bars() causes issues with the device.
> >>>
> >>> To confirm this, can you try the following patch on top of your original 
> >>> change?  
> >> I tried below patch with my original patch, it didn't cause any errors.
> >> And the lspci showed without the "[virtual]".
> >> So, unfortunately, it is not related to the fetching size of Bars in 
> >> modify_bars().
> > 
> > I see, I'm at a loss as to what's wrong with my patch.  Do you have
> > any additional patches on Xen when testing your or my approach?
> No, just my patch or your patch, base on upstream codes(kernel branch is 
> linux-next and xen branch is staging).
> 
> > 
> > I sadly don't have any box with a PCI device that exposes the
> > resizable BAR capability, so I'm not able to test or debug this.
> > 
> > I would like to understand why my approach doesn't work, as otherwise
> > I feel like I'm missing how ReBAR is supposed to work.  Anyway, if you
> > can give a try to the diff below, it's the same patch, but with yet
> > even more debug messages added.
> Attached the xl dmesg.
> 
> > 
> > Thanks, and sorry for keeping you testing it.
> That's fine, feel free to send me if you need more test information.
> Actually I am still continuing to debug your patch and also curious about why 
> your patch does not work.
> 
> The only difference between our methods is the timing of updating the size.
> Yours is later than mine because you updated the size when the driver 
> re-enabled memory decoding, while I updated the size in time when driver 
> resize it.

Indeed, my last guess is the stale cached size is somehow used in my
approach, and that leads to the failures.  One last (possibly dummy?)
thing to try might be to use your patch to detect writes to the resize
control register, but update the BAR sizes in modify_bars(), while
keeping the traces of when the operations happen.

> I suspect that the driver or hypervisor may have done something in between 
> and read outdated information.
> I am debugging the driver codes.

Thanks for doing this.

I have a couple of side unrelated notes, something I've noticed from
the trace you provided, taking device 0000:03:00.1 as an example:

(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: write BAR0 val: 0xfffffff0 BAR0 address: 0xfffffff0
(XEN) 0000:03:00.1: write BAR0 val: 0xd0700000 BAR0 address: 0xd0700000
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 0 rom_only: 0
(XEN) 0000:03:00.1: modify bars cmd: 3 rom_only: 0
(XEN) PCI add device 0000:03:00.1

Linux toggles memory decoding individually for sizing each BAR.  This
is very costly when running virtualized (as least on Xen), as toggling
the memory decoding bit implies tearing down or creating p2m mappings
for all the BARs of the device.  It would be much better if Linux
toggled memory decoding, sized all the BARs of the device, and then
enabled memory decoding again.  If done that way you could probably
reduce boot time of Linux PVH dom0 noticeably.

Secondly, the call to "PCI add device" is too late.  It works now
because Xen scans for PCI devices, so those are already known and
setup by Xen.  But if it's a hotplug device (or a device that somehow
is not discovered by Xen at boot), doing the PHYSDEVOP_pci_device_add
hypercall after sizing the device is wrong, the hypercall should be
done ahead of Linux interacting with the device in any meaningful way.
The checks done for device discovery are fine are obviously fine to be
done ahead of the PHYSDEVOP_pci_device_add call.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.