[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 13/17] vpci: add initial support for virtual PCI bus topology



On Tue, Nov 21, 2023 at 05:12:15PM -0800, Stefano Stabellini wrote:
> On Tue, 20 Nov 2023, Volodymyr Babchuk wrote:
> > Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
> > > On Fri, 17 Nov 2023, Volodymyr Babchuk wrote:
> > >> > On Fri, 17 Nov 2023, Volodymyr Babchuk wrote:
> > >> >> Hi Julien,
> > >> >> 
> > >> >> Julien Grall <julien@xxxxxxx> writes:
> > >> >> 
> > >> >> > Hi Volodymyr,
> > >> >> >
> > >> >> > On 17/11/2023 14:09, Volodymyr Babchuk wrote:
> > >> >> >> Hi Stefano,
> > >> >> >> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
> > >> >> >> 
> > >> >> >>> On Fri, 17 Nov 2023, Volodymyr Babchuk wrote:
> > >> >> >>>>> I still think, no matter the BDF allocation scheme, that we 
> > >> >> >>>>> should try
> > >> >> >>>>> to avoid as much as possible to have two different PCI Root 
> > >> >> >>>>> Complex
> > >> >> >>>>> emulators. Ideally we would have only one PCI Root Complex 
> > >> >> >>>>> emulated by
> > >> >> >>>>> Xen. Having 2 PCI Root Complexes both of them emulated by Xen 
> > >> >> >>>>> would be
> > >> >> >>>>> tolerable but not ideal.
> > >> >> >>>>
> > >> >> >>>> But what is exactly wrong with this setup?
> > >> >> >>>
> > >> >> >>> [...]
> > >> >> >>>
> > >> >> >>>>> The worst case I would like to avoid is to have
> > >> >> >>>>> two PCI Root Complexes, one emulated by Xen and one emulated by 
> > >> >> >>>>> QEMU.
> > >> >> >>>>
> > >> >> >>>> This is how our setup works right now.
> > >> >> >>>
> > >> >> >>> If we have:
> > >> >> >>> - a single PCI Root Complex emulated in Xen
> > >> >> >>> - Xen is safety certified
> > >> >> >>> - individual Virtio devices emulated by QEMU with grants for 
> > >> >> >>> memory
> > >> >> >>>
> > >> >> >>> We can go very far in terms of being able to use Virtio in safety
> > >> >> >>> use-cases. We might even be able to use Virtio (frontends) in a 
> > >> >> >>> SafeOS.
> > >> >> >>>
> > >> >> >>> On the other hand if we put an additional Root Complex in QEMU:
> > >> >> >>> - we pay a price in terms of complexity of the codebase
> > >> >> >>> - we pay a price in terms of resource utilization
> > >> >> >>> - we have one additional problem in terms of using this setup 
> > >> >> >>> with a
> > >> >> >>>    SafeOS (one more device emulated by a non-safe component)
> > >> >> >>>
> > >> >> >>> Having 2 PCI Root Complexes both emulated in Xen is a middle 
> > >> >> >>> ground
> > >> >> >>> solution because:
> > >> >> >>> - we still pay a price in terms of resource utilization
> > >> >> >>> - the code complexity goes up a bit but hopefully not by much
> > >> >> >>> - there is no impact on safety compared to the ideal scenario
> > >> >> >>>
> > >> >> >>> This is why I wrote that it is tolerable.
> > >> >> >> Ah, I see now. Yes, I am agree with this. Also I want to add some
> > >> >> >> more
> > >> >> >> points:
> > >> >> >> - There is ongoing work on implementing virtio backends as a
> > >> >> >> separate
> > >> >> >>    applications, written in Rust. Linaro are doing this part. 
> > >> >> >> Right now
> > >> >> >>    they are implementing only virtio-mmio, but if they want to 
> > >> >> >> provide
> > >> >> >>    virtio-pci as well, they will need a mechanism to plug only
> > >> >> >>    virtio-pci, without Root Complex. This is argument for using 
> > >> >> >> single Root
> > >> >> >>    Complex emulated in Xen.
> > >> >> >> - As far as I know (actually, Oleksandr told this to me), QEMU has
> > >> >> >> no
> > >> >> >>    mechanism for exposing virtio-pci backends without exposing PCI 
> > >> >> >> root
> > >> >> >>    complex as well. Architecturally, there should be a PCI bus to 
> > >> >> >> which
> > >> >> >>    virtio-pci devices are connected. Or we need to make some 
> > >> >> >> changes to
> > >> >> >>    QEMU internals to be able to create virtio-pci backends that 
> > >> >> >> are not
> > >> >> >>    connected to any bus. Also, added benefit that PCI Root Complex
> > >> >> >>    emulator in QEMU handles legacy PCI interrupts for us. This is
> > >> >> >>    argument for separate Root Complex for QEMU.
> > >> >> >> As right now we have only virtio-pci backends provided by QEMU and
> > >> >> >> this
> > >> >> >> setup is already working, I propose to stick to this
> > >> >> >> solution. Especially, taking into account that it does not require 
> > >> >> >> any
> > >> >> >> changes to hypervisor code.
> > >> >> >
> > >> >> > I am not against two hostbridge as a temporary solution as long as
> > >> >> > this is not a one way door decision. I am not concerned about the
> > >> >> > hypervisor itself, I am more concerned about the interface exposed 
> > >> >> > by
> > >> >> > the toolstack and QEMU.
> > >> >
> > >> > I agree with this...
> > >> >
> > >> >
> > >> >> > To clarify, I don't particular want to have to maintain the two
> > >> >> > hostbridges solution once we can use a single hostbridge. So we need
> > >> >> > to be able to get rid of it without impacting the interface too 
> > >> >> > much.
> > >> >
> > >> > ...and this
> > >> >
> > >> >
> > >> >> This depends on virtio-pci backends availability. AFAIK, now only one
> > >> >> option is to use QEMU and QEMU provides own host bridge. So if we want
> > >> >> get rid of the second host bridge we need either another virtio-pci
> > >> >> backend or we need to alter QEMU code so it can live without host
> > >> >> bridge.
> > >> >> 
> > >> >> As for interfaces, it appears that QEMU case does not require any 
> > >> >> changes
> > >> >> into hypervisor itself, it just boils down to writing couple of 
> > >> >> xenstore
> > >> >> entries and spawning QEMU with correct command line arguments.
> > >> >
> > >> > One thing that Stewart wrote in his reply that is important: it doesn't
> > >> > matter if QEMU thinks it is emulating a PCI Root Complex because that's
> > >> > required from QEMU's point of view to emulate an individual PCI device.
> > >> >
> > >> > If we can arrange it so the QEMU PCI Root Complex is not registered
> > >> > against Xen as part of the ioreq interface, then QEMU's emulated PCI
> > >> > Root Complex is going to be left unused. I think that would be great
> > >> > because we still have a clean QEMU-Xen-tools interface and the only
> > >> > downside is some extra unused emulation in QEMU. It would be a
> > >> > fantastic starting point.
> > >> 
> > >> I believe, that in this case we need to set manual ioreq handlers, like
> > >> what was done in patch "xen/arm: Intercept vPCI config accesses and
> > >> forward them to emulator", because we need to route ECAM accesses
> > >> either to a virtio-pci backend or to a real PCI device. Also we need
> > >> to tell QEMU to not install own ioreq handles for ECAM space.
> > >
> > > I was imagining that the interface would look like this: QEMU registers
> > > a PCI BDF and Xen automatically starts forwarding to QEMU ECAM
> > > reads/writes requests for the PCI config space of that BDF only. It
> > > would not be the entire ECAM space but only individual PCI conf
> > > reads/writes that the BDF only.
> > >
> > 
> > Okay, I see that there is the
> > xendevicemodel_map_pcidev_to_ioreq_server() function and corresponding
> > IOREQ_TYPE_PCI_CONFIG call. Is this what you propose to use to register
> > PCI BDF?
> 
> Yes, I think that's best.
> 
> Let me expand on this. Like I wrote above, I think it is important that
> Xen vPCI is the only in-use PCI Root Complex emulator. If it makes the
> QEMU implementation easier, it is OK if QEMU emulates an unneeded and
> unused PCI Root Complex. From Xen point of view, it doesn't exist.
> 
> In terms if ioreq registration, QEMU calls
> xendevicemodel_map_pcidev_to_ioreq_server for each PCI BDF it wants to
> emulate. That way, Xen vPCI knows exactly what PCI config space
> reads/writes to forward to QEMU.
> 
> Let's say that:
> - 00:02.0 is PCI passthrough device
> - 00:03.0 is a PCI emulated device
> 
> QEMU would register 00:03.0 and vPCI would know to forward anything
> related to 00:03.0 to QEMU, but not 00:02.0.

I think there's some work here so that we have a proper hierarchy
inside of Xen.  Right now both ioreq and vpci expect to decode the
accesses to the PCI config space, and setup (MM)IO handlers to trap
ECAM, see vpci_ecam_{read,write}().

I think we want to move to a model where vPCI doesn't setup MMIO traps
itself, and instead relies on ioreq to do the decoding and forwarding
of accesses.  We need some work in order to represent an internal
ioreq handler, but that shouldn't be too complicated.  IOW: vpci
should register devices it's handling with ioreq, much like QEMU does.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.