Xen project Mailing List

Re: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci

To: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>

From: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>

Date: Fri, 17 Nov 2023 08:11:59 +0000

Accept-language: en-US, ru-RU

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QJ3EpuPRaUQNjMvfMjZxEepvNY1Ql+1IAP8Du5F6Xs0=; b=LbGI5DO7PqWDOXN/EIgtQWm0qNw3BaAxbTftADKbbAgl2ssr9ZXeeB8jOGVEDVMMnq48j1JT05q7402ZswLhLl1voE2GucY8Tr8qJQ5indcrFIc9pzSkIU9HTAX7w0JW4PWwDI7qi0HB+6blgCnZBVDD8kFwmyImZJuuA2iQAUeIHaY4Bo6cWha2lKp38XNZoB+i/TC8ZoMFH+VivSt49KSWdwIJa7NjM5UoAOodJlFouSq5dRjM2GhdXZ33NsU9a+4UiVKN51EvrGhk/FSDscOLM01jVR3RAEm2d3wqaMTK4FmjVTB9z5oph/yduOzyHvkjHqK0urCVpYNZSyjhKw==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AFj4LNKkulkYZRhxVwhnEePhIMIej/UvlZ5neSHJOtWiNFg83wJi1LsWfy5c34B2TICCcv5Q/XJM/B7Q0i4X7Ozct4OIT6yDklNDBT22W32MzmKngoe6G4OVZxp128nJ0ggw0GzDy8kV+gKRCDrkjHFVbNWlVmrZsIGti/l8kh9lRYgxrvcAa+rSAf+DghX860fAe6b73btV8N5iVJd5JFxtWCCcLC4XJPpKvPhBLjoelryUMElJFu4aedOXDOftIgrNEt6ug8YhqQLUQk66QnIydrTkzTuIZbi90Z2+PpZC1R5r8c9R+Pdn4WZmhqXmDAv1B1sWwGCxE1lxX2pnbw==

Cc: Julien Grall <julien@xxxxxxx>, Sergiy Kibrik <Sergiy_Kibrik@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, "vikram.garhwal@xxxxxxx" <vikram.garhwal@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

Delivery-date: Fri, 17 Nov 2023 08:12:43 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHaF7/0l4ZQp7ixjkGnkGM6dDamtLB7mIqAgABuuwCAAdZJgIAATmUA

Thread-topic: [RFC PATCH 2/6] xen/public: arch-arm: reserve resources for virtio-pci

On 17.11.23 05:31, Stewart Hildebrand wrote: Hello Stewart [answering only for virtio-pci bits as for vPCI I am only familiar with code responsible for trapping config space accesses] [snip] >> >> >> Let me start by saying that if we can get away with it, I think that a >> single PCI Root Complex in Xen would be best because it requires less >> complexity. Why emulate 2/3 PCI Root Complexes if we can emulate only >> one? >> >> Stewart, you are deep into vPCI, what's your thinking? > > First allow me explain the moving pieces in a bit more detail (skip ahead to > "Back to the question: " if you don't want to be bored with the details). I > played around with this series, and I passed through a PCI device (with vPCI) > and enabled virtio-pci: > > virtio = [ > "type=virtio,device,transport=pci,bdf=0000:00:00.0,backend_type=qemu" ] > device_model_args = [ "-device", "virtio-serial-pci" ] > pci = [ "01:00.0" ] > > Indeed we get two root complexes (2 ECAM ranges, 2 sets of interrupts, etc.) > from the domU point of view: > > pcie@10000000 { > compatible = "pci-host-ecam-generic"; > device_type = "pci"; > reg = <0x00 0x10000000 0x00 0x10000000>; > bus-range = <0x00 0xff>; > #address-cells = <0x03>; > #size-cells = <0x02>; > status = "okay"; > ranges = <0x2000000 0x00 0x23000000 0x00 0x23000000 0x00 0x10000000 > 0x42000000 0x01 0x00 0x01 0x00 0x01 0x00>; > #interrupt-cells = <0x01>; > interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x74 0x04>; > interrupt-map-mask = <0x00 0x00 0x00 0x07>; I am wondering how you got interrupt-map here? AFAIR upstream toolstack doesn't add that property for vpci dt node. > }; > > pcie@33000000 { > compatible = "pci-host-ecam-generic"; > device_type = "pci"; > reg = <0x00 0x33000000 0x00 0x200000>; > bus-range = <0x00 0x01>; > #address-cells = <0x03>; > #size-cells = <0x02>; > status = "okay"; > ranges = <0x2000000 0x00 0x34000000 0x00 0x34000000 0x00 0x800000 > 0x42000000 0x00 0x3a000000 0x00 0x3a000000 0x00 0x800000>; > dma-coherent; > #interrupt-cells = <0x01>; > interrupt-map = <0x00 0x00 0x00 0x01 0xfde8 0x00 0x0c 0x04 0x00 0x00 > 0x00 0x02 0xfde8 0x00 0x0d 0x04 0x00 0x00 0x00 0x03 0xfde8 0x00 0x0e 0x04 > 0x00 0x00 0x00 0x04 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x01 0xfde8 0x00 > 0x0d 0x04 0x800 0x00 0x00 0x02 0xfde8 0x00 0x0e 0x04 0x800 0x00 0x00 0x03 > 0xfde8 0x00 0x0f 0x04 0x800 0x00 0x00 0x04 0xfde8 0x00 0x0c 0x04 0x1000 0x00 > 0x00 0x01 0xfde8 0x00 0x0e 0x04 0x1000 0x00 0x00 0x02 0xfde8 0x00 0x0f 0x04 > 0x1000 0x00 0x00 0x03 0xfde8 0x00 0x0c 0x04 0x1000 0x00 0x00 0x04 0xfde8 0x00 > 0x0d 0x04 0x1800 0x00 0x00 0x01 0xfde8 0x00 0x0f 0x04 0x1800 0x00 0x00 0x02 > 0xfde8 0x00 0x0c 0x04 0x1800 0x00 0x00 0x03 0xfde8 0x00 0x0d 0x04 0x1800 0x00 > 0x00 0x04 0xfde8 0x00 0x0e 0x04>; > interrupt-map-mask = <0x1800 0x00 0x00 0x07>; that is correct dump. BTW, if you added "grant_usage=1" (it is disabled by default for dom0) to virtio configuration you would get iommu-map property here as well [1]. This is another point to think about when considering combined approach (single PCI Host bridge node -> single virtual root complex), I guess usual PCI device doesn't want grant based DMA addresses, correct? If so, it shouldn't be specified in the property. > }; > > Xen vPCI doesn't currently expose a host bridge (i.e. a device with base > class 0x06). As an aside, we may eventually want to expose a virtual/emulated > host bridge in vPCI, because Linux's x86 PCI probe expects one [0]. > > Qemu exposes an emulated host bridge, along with any requested emulated > devices. > > Running lspci -v in the domU yields the following: > > 0000:00:00.0 Network controller: Ralink corp. RT2790 Wireless 802.11n 1T/2R > PCIe > Subsystem: ASUSTeK Computer Inc. RT2790 Wireless 802.11n 1T/2R PCIe > Flags: bus master, fast devsel, latency 0, IRQ 13 > Memory at 23000000 (32-bit, non-prefetchable) [size=64K] > Capabilities: [50] MSI: Enable- Count=1/128 Maskable- 64bit+ > Kernel driver in use: rt2800pci > > 0001:00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge > Subsystem: Red Hat, Inc. QEMU PCIe Host bridge > Flags: fast devsel > > 0001:00:01.0 Communication controller: Red Hat, Inc. Virtio console > Subsystem: Red Hat, Inc. Virtio console > Flags: bus master, fast devsel, latency 0, IRQ 14 > Memory at 3a000000 (64-bit, prefetchable) [size=16K] > Capabilities: [84] Vendor Specific Information: VirtIO: <unknown> > Capabilities: [70] Vendor Specific Information: VirtIO: Notify > Capabilities: [60] Vendor Specific Information: VirtIO: DeviceCfg > Capabilities: [50] Vendor Specific Information: VirtIO: ISR > Capabilities: [40] Vendor Specific Information: VirtIO: CommonCfg > Kernel driver in use: virtio-pci > > 0000:00:00.0 is a real passed through device (corresponding to 0000:01:00.0 > in dom0). > 0001:00:00.0 is the qemu host bridge (base class 0x06). > They are on different segments because they are associated with different > root complexes. Glad to hear this patch series doesn't seem to break PCI passthrough in your environment. > > > Back to the question: Sure, avoiding reserving more memory from the > preciously small lowmem virtual memory layout is probably a good idea. With > everything in a single virtual root complex (segment), it's probably possible > to come up with some vBDF-picking algorithm (+ user ability to specify) that > works for most use cases as discussed elsewhere. It will always be in a > single fixed segment as far as I can tell. > > Some more observations assuming a single virtual root complex: > > We should probably hide the qemu host bridge(s) from the guest. In other > words, hide all devices with base class 0x06, except eventually vPCI's own > virtual host bridge. If we don't hide them, we would likely end up with > multiple emulated host bridges on a single root complex (segment). That > sounds messy and hard to manage. > > We have a need to control the vBDF exposed to the guest - can we force qemu > to use particular BDFs for its emulated devices? Yes, it is possible. Maybe there is a better way, but at least *bus* and *addr* can be specified and Qemu indeed follows that. device_model_args=[ '-device', 'virtio-blk-pci,scsi=off,disable-legacy=on,iommu_platform=on,bus=pcie.0,addr=2,drive=image', '-drive', 'if=none,id=image,format=raw,file=/dev/mmcblk1p3' ] virtio=[ "backend=Domain-0, type=virtio,device, transport=pci, bdf=0000:00:02.0, grant_usage=1, backend_type=qemu" ] root@h3ulcb-domd:~# dmesg | grep virtio [ 0.660789] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002) [ 0.715876] virtio_blk virtio0: [vda] 4096 512-byte logical blocks (2.10 MB/2.00 MiB) root@h3ulcb-domd:~# lspci 00:00.0 Host bridge: Red Hat, Inc. QEMU PCIe Host bridge 00:02.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01) Also there is one moment for current series: bdf specified for virtio-pci device only makes sense for iommu-map property. So bdf=0000:00:02.0 in virtio property and bus=pcie.0,addr=2 in device_model_args property should be in sync. [1] https://patchwork.kernel.org/project/xen-devel/patch/20231115112611.3865905-5-Sergiy_Kibrik@xxxxxxxx/ [snip]

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.