[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology



On Fri, 17 Nov 2023, Julien Grall wrote:
> Hi Stefano,
> 
> On 16/11/2023 23:28, Stefano Stabellini wrote:
> > On Thu, 16 Nov 2023, Julien Grall wrote:
> > > IIUC, this means that Xen will allocate the BDF. I think this will become
> > > a
> > > problem quite quickly as some of the PCI may need to be assigned at a
> > > specific
> > > vBDF (I have the intel graphic card in mind).
> > > 
> > > Also, xl allows you to specificy the slot (e.g. <bdf>@<vslot>) which would
> > > not
> > > work with this approach.
> > > 
> > > For dom0less passthrough, I feel the virtual BDF should always be
> > > specified in
> > > device-tree. When a domain is created after boot, then I think you want to
> > > support <bdf>@<vslot> where <vslot> is optional.
> > 
> > Hi Julien,
> > 
> > I also think there should be a way to specify the virtual BDF, but if
> > possible (meaning: it is not super difficult to implement) I think it
> > would be very convenient if we could let Xen pick whatever virtual BDF
> > Xen wants when the user doesn't specify the virtual BDF. That's
> > because it would make it easier to specify the configuration for the
> > user. Typically the user doesn't care about the virtual BDF, only to
> > expose a specific host device to the VM. There are exceptions of course
> > and that's why I think we should also have a way for the user to
> > request a specific virtual BDF. One of these exceptions are integrated
> > GPUs: the OS drivers used to have hardcoded BDFs. So it wouldn't work if
> > the device shows up at a different virtual BDF compared to the host.
> 
> If you let Xen allocating the vBDF, then wouldn't you need a way to tell the
> toolstack/Device Models which vBDF was allocated?
> 
> > 
> > Thinking more about this, one way to simplify the problem would be if we
> > always reuse the physical BDF as virtual BDF for passthrough devices. I
> > think that would solve the problem and makes it much more unlikely to
> > run into drivers bugs.
> 
> This works so long you have only one physical segment (i.e. hostbridge). If
> you have multiple one, then you either have to expose multiple hostbridge to
> the guest (which is not great) or need someone to allocate the vBDF.
> 
> > 
> > And we allocate a "special" virtual BDF space for emulated devices, with
> > the Root Complex still emulated in Xen. For instance, we could reserve
> > ff:xx:xx.
> Hmmm... Wouldn't this means reserving ECAM space for 256 buses? Obviously, we
> could use 5 (just as random number). Yet, it still requires to reserve more
> memory than necessary.
> 
> > and in case of clashes we could refuse to continue.
> 
> Urgh. And what would be the solution users triggering this clash?
> 
> > Or we could
> > allocate the first free virtual BDF, after all the pasthrough devices.
> 
> This is only works if you don't want to support PCI hotplug. It may not be a
> thing for embedded, but it is used by cloud. So you need a mechanism that
> works with hotplug as well.
> 
> > 
> > Example:
> > - the user wants to assign physical 00:11.5 and b3:00.1 to the guest
> > - Xen create virtual BDFs 00:11.5 and b3:00.1 for the passthrough devices
> > - Xen allocates the next virtual BDF for emulated devices: b4:xx.x
> > - If more virtual BDFs are needed for emulated devices, Xen allocates
> >    b5:xx.x >
> > I still think, no matter the BDF allocation scheme, that we should try
> > to avoid as much as possible to have two different PCI Root Complex
> > emulators. Ideally we would have only one PCI Root Complex emulated by
> > Xen. Having 2 PCI Root Complexes both of them emulated by Xen would be
> > tolerable but not ideal. The worst case I would like to avoid is to have
> > two PCI Root Complexes, one emulated by Xen and one emulated by QEMU.
> 
> So while I agree that one emulated hostbridge is the best solution, I don't
> think your proposal would work. As I wrote above, you may have a system with
> multiple physical hostbridge. It would not be possible to assign two PCI
> devices with the same BDF but from different segment.
> 
> I agree unlikely, but if we can avoid it then it would be best. There are one
> scheme which fits that:
>   1. If the vBDF is not specified, then pick a free one.
>   2. Otherwise check if the specified vBDF is free. If not return an error.
> 
> This scheme should be used for both virtual and physical. This is pretty much
> the algorithm used by QEMU today. It works, so what's would be the benefits to
> do something different?

I am OK with that. I was trying to find a way that could work without
user intervention in almost 100% of the cases. I think both 1. and 2.
you proposed are fine.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.