[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology



On Wed, 22 Nov 2023, Roger Pau Monné wrote:
> On Tue, Nov 21, 2023 at 05:12:15PM -0800, Stefano Stabellini wrote:
> > On Tue, 20 Nov 2023, Volodymyr Babchuk wrote:
> > > Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
> > > > On Fri, 17 Nov 2023, Volodymyr Babchuk wrote:
> > > >> > On Fri, 17 Nov 2023, Volodymyr Babchuk wrote:
> > > >> >> Hi Julien,
> > > >> >> 
> > > >> >> Julien Grall <julien@xxxxxxx> writes:
> > > >> >> 
> > > >> >> > Hi Volodymyr,
> > > >> >> >
> > > >> >> > On 17/11/2023 14:09, Volodymyr Babchuk wrote:
> > > >> >> >> Hi Stefano,
> > > >> >> >> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
> > > >> >> >> 
> > > >> >> >>> On Fri, 17 Nov 2023, Volodymyr Babchuk wrote:
> > > >> >> >>>>> I still think, no matter the BDF allocation scheme, that we 
> > > >> >> >>>>> should try
> > > >> >> >>>>> to avoid as much as possible to have two different PCI Root 
> > > >> >> >>>>> Complex
> > > >> >> >>>>> emulators. Ideally we would have only one PCI Root Complex 
> > > >> >> >>>>> emulated by
> > > >> >> >>>>> Xen. Having 2 PCI Root Complexes both of them emulated by Xen 
> > > >> >> >>>>> would be
> > > >> >> >>>>> tolerable but not ideal.
> > > >> >> >>>>
> > > >> >> >>>> But what is exactly wrong with this setup?
> > > >> >> >>>
> > > >> >> >>> [...]
> > > >> >> >>>
> > > >> >> >>>>> The worst case I would like to avoid is to have
> > > >> >> >>>>> two PCI Root Complexes, one emulated by Xen and one emulated 
> > > >> >> >>>>> by QEMU.
> > > >> >> >>>>
> > > >> >> >>>> This is how our setup works right now.
> > > >> >> >>>
> > > >> >> >>> If we have:
> > > >> >> >>> - a single PCI Root Complex emulated in Xen
> > > >> >> >>> - Xen is safety certified
> > > >> >> >>> - individual Virtio devices emulated by QEMU with grants for 
> > > >> >> >>> memory
> > > >> >> >>>
> > > >> >> >>> We can go very far in terms of being able to use Virtio in 
> > > >> >> >>> safety
> > > >> >> >>> use-cases. We might even be able to use Virtio (frontends) in a 
> > > >> >> >>> SafeOS.
> > > >> >> >>>
> > > >> >> >>> On the other hand if we put an additional Root Complex in QEMU:
> > > >> >> >>> - we pay a price in terms of complexity of the codebase
> > > >> >> >>> - we pay a price in terms of resource utilization
> > > >> >> >>> - we have one additional problem in terms of using this setup 
> > > >> >> >>> with a
> > > >> >> >>>    SafeOS (one more device emulated by a non-safe component)
> > > >> >> >>>
> > > >> >> >>> Having 2 PCI Root Complexes both emulated in Xen is a middle 
> > > >> >> >>> ground
> > > >> >> >>> solution because:
> > > >> >> >>> - we still pay a price in terms of resource utilization
> > > >> >> >>> - the code complexity goes up a bit but hopefully not by much
> > > >> >> >>> - there is no impact on safety compared to the ideal scenario
> > > >> >> >>>
> > > >> >> >>> This is why I wrote that it is tolerable.
> > > >> >> >> Ah, I see now. Yes, I am agree with this. Also I want to add some
> > > >> >> >> more
> > > >> >> >> points:
> > > >> >> >> - There is ongoing work on implementing virtio backends as a
> > > >> >> >> separate
> > > >> >> >>    applications, written in Rust. Linaro are doing this part. 
> > > >> >> >> Right now
> > > >> >> >>    they are implementing only virtio-mmio, but if they want to 
> > > >> >> >> provide
> > > >> >> >>    virtio-pci as well, they will need a mechanism to plug only
> > > >> >> >>    virtio-pci, without Root Complex. This is argument for using 
> > > >> >> >> single Root
> > > >> >> >>    Complex emulated in Xen.
> > > >> >> >> - As far as I know (actually, Oleksandr told this to me), QEMU 
> > > >> >> >> has
> > > >> >> >> no
> > > >> >> >>    mechanism for exposing virtio-pci backends without exposing 
> > > >> >> >> PCI root
> > > >> >> >>    complex as well. Architecturally, there should be a PCI bus 
> > > >> >> >> to which
> > > >> >> >>    virtio-pci devices are connected. Or we need to make some 
> > > >> >> >> changes to
> > > >> >> >>    QEMU internals to be able to create virtio-pci backends that 
> > > >> >> >> are not
> > > >> >> >>    connected to any bus. Also, added benefit that PCI Root 
> > > >> >> >> Complex
> > > >> >> >>    emulator in QEMU handles legacy PCI interrupts for us. This is
> > > >> >> >>    argument for separate Root Complex for QEMU.
> > > >> >> >> As right now we have only virtio-pci backends provided by QEMU 
> > > >> >> >> and
> > > >> >> >> this
> > > >> >> >> setup is already working, I propose to stick to this
> > > >> >> >> solution. Especially, taking into account that it does not 
> > > >> >> >> require any
> > > >> >> >> changes to hypervisor code.
> > > >> >> >
> > > >> >> > I am not against two hostbridge as a temporary solution as long as
> > > >> >> > this is not a one way door decision. I am not concerned about the
> > > >> >> > hypervisor itself, I am more concerned about the interface 
> > > >> >> > exposed by
> > > >> >> > the toolstack and QEMU.
> > > >> >
> > > >> > I agree with this...
> > > >> >
> > > >> >
> > > >> >> > To clarify, I don't particular want to have to maintain the two
> > > >> >> > hostbridges solution once we can use a single hostbridge. So we 
> > > >> >> > need
> > > >> >> > to be able to get rid of it without impacting the interface too 
> > > >> >> > much.
> > > >> >
> > > >> > ...and this
> > > >> >
> > > >> >
> > > >> >> This depends on virtio-pci backends availability. AFAIK, now only 
> > > >> >> one
> > > >> >> option is to use QEMU and QEMU provides own host bridge. So if we 
> > > >> >> want
> > > >> >> get rid of the second host bridge we need either another virtio-pci
> > > >> >> backend or we need to alter QEMU code so it can live without host
> > > >> >> bridge.
> > > >> >> 
> > > >> >> As for interfaces, it appears that QEMU case does not require any 
> > > >> >> changes
> > > >> >> into hypervisor itself, it just boils down to writing couple of 
> > > >> >> xenstore
> > > >> >> entries and spawning QEMU with correct command line arguments.
> > > >> >
> > > >> > One thing that Stewart wrote in his reply that is important: it 
> > > >> > doesn't
> > > >> > matter if QEMU thinks it is emulating a PCI Root Complex because 
> > > >> > that's
> > > >> > required from QEMU's point of view to emulate an individual PCI 
> > > >> > device.
> > > >> >
> > > >> > If we can arrange it so the QEMU PCI Root Complex is not registered
> > > >> > against Xen as part of the ioreq interface, then QEMU's emulated PCI
> > > >> > Root Complex is going to be left unused. I think that would be great
> > > >> > because we still have a clean QEMU-Xen-tools interface and the only
> > > >> > downside is some extra unused emulation in QEMU. It would be a
> > > >> > fantastic starting point.
> > > >> 
> > > >> I believe, that in this case we need to set manual ioreq handlers, like
> > > >> what was done in patch "xen/arm: Intercept vPCI config accesses and
> > > >> forward them to emulator", because we need to route ECAM accesses
> > > >> either to a virtio-pci backend or to a real PCI device. Also we need
> > > >> to tell QEMU to not install own ioreq handles for ECAM space.
> > > >
> > > > I was imagining that the interface would look like this: QEMU registers
> > > > a PCI BDF and Xen automatically starts forwarding to QEMU ECAM
> > > > reads/writes requests for the PCI config space of that BDF only. It
> > > > would not be the entire ECAM space but only individual PCI conf
> > > > reads/writes that the BDF only.
> > > >
> > > 
> > > Okay, I see that there is the
> > > xendevicemodel_map_pcidev_to_ioreq_server() function and corresponding
> > > IOREQ_TYPE_PCI_CONFIG call. Is this what you propose to use to register
> > > PCI BDF?
> > 
> > Yes, I think that's best.
> > 
> > Let me expand on this. Like I wrote above, I think it is important that
> > Xen vPCI is the only in-use PCI Root Complex emulator. If it makes the
> > QEMU implementation easier, it is OK if QEMU emulates an unneeded and
> > unused PCI Root Complex. From Xen point of view, it doesn't exist.
> > 
> > In terms if ioreq registration, QEMU calls
> > xendevicemodel_map_pcidev_to_ioreq_server for each PCI BDF it wants to
> > emulate. That way, Xen vPCI knows exactly what PCI config space
> > reads/writes to forward to QEMU.
> > 
> > Let's say that:
> > - 00:02.0 is PCI passthrough device
> > - 00:03.0 is a PCI emulated device
> > 
> > QEMU would register 00:03.0 and vPCI would know to forward anything
> > related to 00:03.0 to QEMU, but not 00:02.0.
> 
> I think there's some work here so that we have a proper hierarchy
> inside of Xen.  Right now both ioreq and vpci expect to decode the
> accesses to the PCI config space, and setup (MM)IO handlers to trap
> ECAM, see vpci_ecam_{read,write}().
> 
> I think we want to move to a model where vPCI doesn't setup MMIO traps
> itself, and instead relies on ioreq to do the decoding and forwarding
> of accesses.  We need some work in order to represent an internal
> ioreq handler, but that shouldn't be too complicated.  IOW: vpci
> should register devices it's handling with ioreq, much like QEMU does.

I think this could be a good idea.

This would be the very first IOREQ handler implemented in Xen itself,
rather than outside of Xen. Some code refactoring might be required,
which worries me given that vPCI is at v10 and has been pending for
years. I think it could make sense as a follow-up series, not v11.

I think this idea would be beneficial if, in the example above, vPCI
doesn't really need to know about device 00:03.0. vPCI registers via
IOREQ the PCI Root Complex and device 00:02.0 only, QEMU registers
00:03.0, and everything works. vPCI is not involved at all in PCI config
space reads and writes for 00:03.0. If this is the case, then moving
vPCI to IOREQ could be good.

On the other hand if vPCI actually needs to know that 00:03.0 exists,
perhaps because it changes something in the PCI Root Complex emulation
or vPCI needs to take some action when PCI config space registers of
00:03.0 are written to, then I think this model doesn't work well. If
this is the case, then I think it would be best to keep vPCI as MMIO
handler and let vPCI forward to IOREQ when appropriate.

I haven't run any experiements, but my gut feeling tells me that we'll
have to follow the second approach because the first is too limiting.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.