|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci
On 11/17/23 03:11, Oleksandr Tyshchenko wrote:
>
>
> On 17.11.23 05:31, Stewart Hildebrand wrote:
>
> Hello Stewart
>
> [answering only for virtio-pci bits as for vPCI I am only familiar with
> code responsible for trapping config space accesses]
>
> [snip]
>
>>>
>>>
>>> Let me start by saying that if we can get away with it, I think that a
>>> single PCI Root Complex in Xen would be best because it requires less
>>> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only
>>> one?
>>>
>>> Stewart, you are deep into vPCI, what's your thinking?
>>
>> First allow me explain the moving pieces in a bit more detail (skip ahead to
>> "Back to the question: " if you don't want to be bored with the details). I
>> played around with this series, and I passed through a PCI device (with
>> vPCI) and enabled virtio-pci:
>>
>> virtio = [
>> "type=virtio,device,transport=pci,bdf=0000:00:00.0,backend_type=qemu" ]
>> device_model_args = [ "-device", "virtio-serial-pci" ]
>> pci = [ "01:00.0" ]
>>
>> Indeed we get two root complexes (2 ECAM ranges, 2 sets of interrupts, etc.)
>> from the domU point of view:
>>
>> pcie@10000000 {
>> compatible = "pci-host-ecam-generic";
>> device_type = "pci";
>> reg = <0x00 0x10000000 0x00 0x10000000>;
>> bus-range = <0x00 0xff>;
>> #address-cells = <0x03>;
>> #size-cells = <0x02>;
>> status = "okay";
>> ranges = <0x2000000 0x00 0x23000000 0x00 0x23000000 0x00 0x10000000
>> 0x42000000 0x01 0x00 0x01 0x00 0x01 0x00>;
>> #interrupt-cells = <0x01>;
>> interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x74 0x04>;
>> interrupt-map-mask = <0x00 0x00 0x00 0x07>;
>
>
> I am wondering how you got interrupt-map here? AFAIR upstream toolstack
> doesn't add that property for vpci dt node.
You are correct. I'm airing my dirty laundry here: this is a hack to get a
legacy interrupt passed through on a ZCU102 board (Zynq UltraScale+), which
doesn't have GICv3/ITS. In a system with a GICv3/ITS, we would have an msi-map
property (this is also not yet upstream, but is in a couple of downstreams).
>
>> };
>>
>> pcie@33000000 {
>> compatible = "pci-host-ecam-generic";
>> device_type = "pci";
>> reg = <0x00 0x33000000 0x00 0x200000>;
>> bus-range = <0x00 0x01>;
>> #address-cells = <0x03>;
>> #size-cells = <0x02>;
>> status = "okay";
>> ranges = <0x2000000 0x00 0x34000000 0x00 0x34000000 0x00 0x800000
>> 0x42000000 0x00 0x3a000000 0x00 0x3a000000 0x00 0x800000>;
>> dma-coherent;
>> #interrupt-cells = <0x01>;
>> interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x0c 0x04 0x00
>> 0x00 0x00 0x02 0xfde8 0x00 0x0d 0x04 0x00 0x00 0x00 0x03 0xfde8 0x00 0x0e
>> 0x04 0x00 0x00 0x00 0x04 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x01 0xfde8
>> 0x00 0x0d 0x04 0x800 0x00 0x00 0x02 0xfde8 0x00 0x0e 0x04 0x800 0x00 0x00
>> 0x03 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x04 0xfde8 0x00 0x0c 0x04 0x1000
>> 0x00 0x00 0x01 0xfde8 0x00 0x0e 0x04 0x1000 0x00 0x00 0x02 0xfde8 0x00 0x0f
>> 0x04 0x1000 0x00 0x00 0x03 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x04
>> 0xfde8 0x00 0x0d 0x04 0x1800 0x00 0x00 0x01 0xfde8 0x00 0x0f 0x04 0x1800
>> 0x00 0x00 0x02 0xfde8 0x00 0x0c 0x04 0x1800 0x00 0x00 0x03 0xfde8 0x00 0x0d
>> 0x04 0x1800 0x00 0x00 0x04 0xfde8 0x00 0x0e 0x04>;
>> interrupt-map-mask = <0x1800 0x00 0x00 0x07>;
>
>
> that is correct dump.
>
> BTW, if you added "grant_usage=1" (it is disabled by default for dom0)
> to virtio configuration you would get iommu-map property here as well
> [1]. This is another point to think about when considering combined
> approach (single PCI Host bridge node -> single virtual root complex), I
> guess usual PCI device doesn't want grant based DMA addresses, correct?
> If so, it shouldn't be specified in the property.
Right, a usual PCI device doesn't want/need grant based DMA addresses. The
iommu-map property [2] is flexible enough to make it apply only to certain
vBDFs or ranges of vBDFs. Actually, it looks like ("libxl/arm: Reuse generic
PCI-IOMMU bindings for virtio-pci devices") already has the logic for doing
exactly this. So it should be fine to have the iommu-map property in the single
root complex and still do regular PCI passthrough.
Presumably, we would also want msi-map [3] to apply to some vBDFs, not others.
msi-map has the same flexible BDF matching as iommu-map (these two bindings
actually share the same parsing function), so we should be able to use a
similar strategy here if needed.
We would also want interrupt-map [4] to apply to some vBDFs, not others. The
BDF matching with interrupt-map behaves slightly differently, but I believe it
is still flexible enough to achieve this. In this case, the function
create_virtio_pci_irq_map(), added in ("libxl/arm: Add basic virtio-pci
support"), would need some re-thinking.
In summary, with a single virtual root complex, the toolstack needs to know
which vBDFs are virtio-pci, and which vBDFs are passthrough, so it can create
the [iommu,msi,interrupt]-map properties accordingly.
[2]
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
[3] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-msi.txt
[4]
https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/host-generic-pci.yaml
>
>
>> };
>>
>> Xen vPCI doesn't currently expose a host bridge (i.e. a device with base
>> class 0x06). As an aside, we may eventually want to expose a
>> virtual/emulated host bridge in vPCI, because Linux's x86 PCI probe expects
>> one [0].
>>
>> Qemu exposes an emulated host bridge, along with any requested emulated
>> devices.
>>
>> Running lspci -v in the domU yields the following:
>>
>> 0000:00:00.0 Network controller: Ralink corp. RT2790 Wireless 802.11n 1T/2R
>> PCIe
>> Subsystem: ASUSTeK Computer Inc. RT2790 Wireless 802.11n 1T/2R PCIe
>> Flags: bus master, fast devsel, latency 0, IRQ 13
>> Memory at 23000000 (32-bit, non-prefetchable) [size=64K]
>> Capabilities: [50] MSI: Enable- Count=1/128 Maskable- 64bit+
>> Kernel driver in use: rt2800pci
>>
>> 0001:00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
>> Subsystem: Red Hat, Inc. QEMU PCIe Host bridge
>> Flags: fast devsel
>>
>> 0001:00:01.0 Communication controller: Red Hat, Inc. Virtio console
>> Subsystem: Red Hat, Inc. Virtio console
>> Flags: bus master, fast devsel, latency 0, IRQ 14
>> Memory at 3a000000 (64-bit, prefetchable) [size=16K]
>> Capabilities: [84] Vendor Specific Information: VirtIO: <unknown>
>> Capabilities: [70] Vendor Specific Information: VirtIO: Notify
>> Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg
>> Capabilities: [50] Vendor Specific Information: VirtIO: ISR
>> Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg
>> Kernel driver in use: virtio-pci
>>
>> 0000:00:00.0 is a real passed through device (corresponding to 0000:01:00.0
>> in dom0).
>> 0001:00:00.0 is the qemu host bridge (base class 0x06).
>> They are on different segments because they are associated with different
>> root complexes.
>
>
> Glad to hear this patch series doesn't seem to break PCI passthrough in
> your environment.
>
>
>>
>>
>> Back to the question: Sure, avoiding reserving more memory from the
>> preciously small lowmem virtual memory layout is probably a good idea. With
>> everything in a single virtual root complex (segment), it's probably
>> possible to come up with some vBDF-picking algorithm (+ user ability to
>> specify) that works for most use cases as discussed elsewhere. It will
>> always be in a single fixed segment as far as I can tell.
>>
>> Some more observations assuming a single virtual root complex:
>>
>> We should probably hide the qemu host bridge(s) from the guest. In other
>> words, hide all devices with base class 0x06, except eventually vPCI's own
>> virtual host bridge. If we don't hide them, we would likely end up with
>> multiple emulated host bridges on a single root complex (segment). That
>> sounds messy and hard to manage.
>>
>> We have a need to control the vBDF exposed to the guest - can we force qemu
>> to use particular BDFs for its emulated devices?
>
>
> Yes, it is possible. Maybe there is a better way, but at
> least *bus* and *addr* can be specified and Qemu indeed follows that.
>
> device_model_args=[ '-device',
> 'virtio-blk-pci,scsi=off,disable-legacy=on,iommu_platform=on,bus=pcie.0,addr=2,drive=image',
>
> '-drive', 'if=none,id=image,format=raw,file=/dev/mmcblk1p3' ]
>
> virtio=[ "backend=Domain-0, type=virtio,device, transport=pci,
> bdf=0000:00:02.0, grant_usage=1, backend_type=qemu" ]
>
> root@h3ulcb-domd:~# dmesg | grep virtio
> [ 0.660789] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002)
> [ 0.715876] virtio_blk virtio0: [vda] 4096 512-byte logical blocks
> (2.10 MB/2.00 MiB)
>
> root@h3ulcb-domd:~# lspci
> 00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge
> 00:02.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
>
> Also there is one moment for current series: bdf specified for
> virtio-pci device only makes sense for iommu-map property. So
> bdf=0000:00:02.0 in virtio property and bus=pcie.0,addr=2 in
> device_model_args property should be in sync.
>
> [1]
> https://patchwork.kernel.org/project/xen-devel/patch/20231115112611.3865905-5-Sergiy_Kibrik@xxxxxxxx/
>
>
> [snip]
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |