[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/virtio: Handle PCI devices which Host controller is described in DT


  • To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • From: Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>
  • Date: Thu, 20 Oct 2022 18:01:49 +0000
  • Accept-language: en-US, ru-RU
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Jg+dZokOvxuvqM1+/aceV1T/4Qmbs95uoeaHwHUHZ8A=; b=HvOZZU5Bi94z4n4/KqNxmrjOXm94bgLsXwC3kcO6oyXbA7+2DmrtQ9JS87jINE5m847QN7AQE8mL+YrE2NkuXzzhWMnJDWLInvniBzXEZijQVZFEq3UfIXdSLAdswlUe80+sJKnU6VDSm79wNgBp7NWfINznMGzjbQGqvxZGXwV0NsVbzJ+rHIcsmYP6i7bHrqaaeP9qR153fZ04qP+Jl8KL1XigGMTzOB1/Y5UTBO4CP6ofZ/tDOPW3ZHiOXRwesJqng/3h/tItQKiHAu7blUAnGcj/hX82ltNiVwXZ8fxrPGC/iSaGEdxaEN8bIolItyEoSH7ckyLJYJSxpCacWw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W8EPx+mpItCEeMMOAgHToN9XtSh0mmIFHQV7EZTvRyvfM99XSO8EiBGGRP4Jc6Q9jszhLo60meb+qXcaxDTnL4wcz6msF8Ze8Zec7AA2Lzw80fY2PEA420FiO9ZolLIAx0BWi83dqtEmuk7aEhffxSa+TJQGLigrLEYEFsxhGcO1QyGtSBRIqBbDloX5rsfc79hDTJ+g8l9X3ReBtmJCMPdr2CgSSZ+plrryBmWtBbPX19ypydBR9zZqlGCedAy3Yc2Ehx1YhnBd6XzuDPyJm33LBa6215YtTB8d8YmwsBg+zkf+/0rGwg+szISGxwvqPnJQjnAHsru5IzTOwCNxjw==
  • Cc: Oleksandr Tyshchenko <olekstysh@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "vikram.garhwal@xxxxxxx" <vikram.garhwal@xxxxxxx>
  • Delivery-date: Thu, 20 Oct 2022 18:02:15 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHY2avbTqg8YKTBpUCScd7xSfM5Yq4CIOiAgAkdnYCAAET3gIAEI/OAgAN2W4CAAq38gIAAdmSAgAFmi4A=
  • Thread-topic: [PATCH] xen/virtio: Handle PCI devices which Host controller is described in DT

On 19.10.22 23:38, Stefano Stabellini wrote:

Hello Stefano

> + Vikram
>
> On Wed, 19 Oct 2022, Oleksandr Tyshchenko wrote:
>>>> Regarding the virtio-mmio (platform) devices, yes, we could expose them
>>>> with status "disabled", and they won't get probed by default.
>>>> To be honest, I have experimented with that, when I was thinking of
>>>> possible hotplug for virtio-mmio devices (I know, this sounds uncommon
>>>> and strange).
>>>> I used Linux feature (CONFIG_OF_DYNAMIC, overlays) to update the
>>>> device-tree on running guest, so the toolstack initially inserts
>>>> virtio-mmio device nodes for non-boot devices
>>>> with status "disabled", and at the runtime, once we receive an event for
>>>> example, we change the status to "ok" and the corresponding virtio-mmio
>>>> device gets probed.
>>>> But again, it is not a 100% hotplug, as we need to pre-allocate memory
>>>> range and interrupt in advance (when generating guest device tree).
>>> Actually this is really cool! Does it work? It doesn't matter to me if
>>> the virtio devices are pci or mmio as long as we can solve the "wait"
>>> problem. So this could be a good solution.
>>
>> ... yes, it does. Initially I experimented with virtio-mmio devices, but
>> today I tried with PCI host bridge as well.
>> I won't describe the commands which I used to apply/remove device-tree
>> overlays from the userspace as well as the context of
>> dtso files I created, I will describe how that could be done from the
>> kernel by using existing functionality (CONFIG_OF_DYNAMIC).
>>
>> As I said if we exposed the devices with status "disabled", they
>> wouldn't get probed by default. Once we receive an signal
>> that otherend is ready, we change the status to "ok" and the
>> corresponding device gets probed.
>>
>> So below the test patch, which just change the status of the required
>> device-tree node (as you can see the code to update the property is
>> simple enough),
>> I hacked "xl sysrq" for the convenience of testing.
>>
>>
>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>> index 045c1805b2d5..9683ce075bc9 100644
>> --- a/drivers/xen/grant-dma-ops.c
>> +++ b/drivers/xen/grant-dma-ops.c
>> @@ -10,6 +10,7 @@
>>    #include <linux/module.h>
>>    #include <linux/dma-map-ops.h>
>>    #include <linux/of.h>
>> +#include <linux/of_platform.h>
>>    #include <linux/pci.h>
>>    #include <linux/pfn.h>
>>    #include <linux/xarray.h>
>> @@ -379,6 +380,108 @@ bool xen_is_grant_dma_device(struct device *dev)
>>           return false;
>>    }
>>
>> +/* TODO: Consider using statically allocated (struct property status) */
>> +static int xen_grant_dma_enable_device(struct device_node *np)
>> +{
>> +       struct property *status;
>> +
>> +       status = kzalloc(sizeof(*status), GFP_KERNEL);
>> +       if (!status)
>> +               return -ENOMEM;
>> +
>> +       status->name = kstrdup("status", GFP_KERNEL);
>> +       if (!status->name)
>> +               return -ENOMEM;
>> +
>> +       status->value = kstrdup("okay", GFP_KERNEL);
>> +       if (!status->value)
>> +               return -ENOMEM;
>> +
>> +       status->length = sizeof("okay");
>> +
>> +       return of_update_property(np, status);
>> +}
>> +
>> +static int xen_grant_dma_disable_device(struct device_node *np)
>> +{
>> +       struct property *status;
>> +
>> +       status = kzalloc(sizeof(*status), GFP_KERNEL);
>> +       if (!status)
>> +               return -ENOMEM;
>> +
>> +       status->name = kstrdup("status", GFP_KERNEL);
>> +       if (!status->name)
>> +               return -ENOMEM;
>> +
>> +       status->value = kstrdup("disabled", GFP_KERNEL);
>> +       if (!status->value)
>> +               return -ENOMEM;
>> +
>> +       status->length = sizeof("disabled");
>> +
>> +       return of_update_property(np, status);
>> +}
>> +
>> +void xen_grant_dma_handle_sysrq(int key)
>> +{
>> +       struct device_node *np;
>> +       const char *path;
>> +       bool enable;
>> +
>> +       printk("%s: got key %d\n", __func__, key);
>> +
>> +       switch (key) {
>> +       case '0':
>> +               path = "/virtio@2000000";
>> +               enable = true;
>> +               break;
>> +
>> +       case '1':
>> +               path = "/virtio@2000200";
>> +               enable = true;
>> +               break;
>> +
>> +       case '2':
>> +               path = "/virtio@2000000";
>> +               enable = false;
>> +               break;
>> +
>> +       case '3':
>> +               path = "/virtio@2000200";
>> +               enable = false;
>> +               break;
>> +
>> +       case '4':
>> +               path = "/pcie@10000000";
>> +               enable = true;
>> +               break;
>> +
>> +       case '5':
>> +               path = "/pcie@10000000";
>> +               enable = false;
>> +               break;
>> +
>> +       default:
>> +               printk("%s: wrong key %d\n", __func__, key);
>> +               return;
>> +       }
>> +
>> +       np = of_find_node_by_path(path);
>> +       if (!np) {
>> +               printk("%s: failed to find node by path %s\n", __func__,
>> path);
>> +               return;
>> +       }
>> +
>> +       if (enable) {
>> +               xen_grant_dma_enable_device(np);
>> +               printk("%s: enable %s\n", __func__, path);
>> +       } else {
>> +               xen_grant_dma_disable_device(np);
>> +               printk("%s: disable %s\n", __func__, path);
>> +       }
>> +}
>> +
>>    bool xen_virtio_mem_acc(struct virtio_device *dev)
>>    {
>>           if (IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) || xen_pv_domain())
>> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
>> index c16df629907e..6df96be1ea40 100644
>> --- a/drivers/xen/manage.c
>> +++ b/drivers/xen/manage.c
>> @@ -308,7 +308,8 @@ static void sysrq_handler(struct xenbus_watch
>> *watch, const char *path,
>>                   goto again;
>>
>>           if (sysrq_key != '\0')
>> -               handle_sysrq(sysrq_key);
>> +               /*handle_sysrq(sysrq_key);*/
>> +               xen_grant_dma_handle_sysrq(sysrq_key);
>>    }
>>
>>    static struct xenbus_watch sysrq_watch = {
>> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
>> index a34f4271a2e9..c2da1bc24091 100644
>> --- a/include/xen/xen-ops.h
>> +++ b/include/xen/xen-ops.h
>> @@ -215,6 +215,8 @@ static inline void xen_preemptible_hcall_end(void) { }
>>
>>    #endif /* CONFIG_XEN_PV && !CONFIG_PREEMPTION */
>>
>> +void xen_grant_dma_handle_sysrq(int key);
>> +
>>    #ifdef CONFIG_XEN_GRANT_DMA_OPS
>>    void xen_grant_setup_dma_ops(struct device *dev);
>>    bool xen_is_grant_dma_device(struct device *dev);
>> (END)
>>
>> So how it looks like:
>>
>> 1. DomU boots without PCI Host bridge probed. So nothing PCI related is
>> observed in DomU.
>>
>> cat /proc/device-tree/pcie@10000000/status
>> disabled
>>
>> 2. I run backends in DomD and after that issue a signal to "enable"
>>
>> root@generic-armv8-xt-dom0:~# xl sysrq DomU 4
>>
>> 3. The PCI Host bridge is probed, and all required PCI devices are
>> discovered
>>
>> root@generic-armv8-xt-dom0:~# xl console DomU
>> [  237.407620] xen_grant_dma_handle_sysrq: got key 52
>> [  237.408133] pci-host-generic 10000000.pcie: host bridge
>> /pcie@10000000 ranges:
>> [  237.408186] pci-host-generic 10000000.pcie:      MEM
>> 0x0023000000..0x0032ffffff -> 0x0023000000
>> [  237.408231] pci-host-generic 10000000.pcie:      MEM
>> 0x0100000000..0x01ffffffff -> 0x0100000000
>> [  237.408313] pci-host-generic 10000000.pcie: ECAM at [mem
>> 0x10000000-0x1fffffff] for [bus 00-ff]
>> [  237.408451] pci-host-generic 10000000.pcie: PCI host bridge to bus
>> 0000:00
>> [  237.408490] pci_bus 0000:00: root bus resource [bus 00-ff]
>> [  237.408517] pci_bus 0000:00: root bus resource [mem
>> 0x23000000-0x32ffffff]
>> [  237.408545] pci_bus 0000:00: root bus resource [mem
>> 0x100000000-0x1ffffffff pref]
>> [  237.409043] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
>> [  237.458045] pci 0000:00:01.0: [1af4:1041] type 00 class 0x020000
>> [  237.502588] pci 0000:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff
>> 64bit pref]
>> [  237.507475] pci 0000:00:02.0: [1af4:1042] type 00 class 0x010000
>> [  237.552706] pci 0000:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff
>> 64bit pref]
>> [  237.559847] pci 0000:00:01.0: BAR 4: assigned [mem
>> 0x100000000-0x100003fff 64bit pref]
>> [  237.560411] pci 0000:00:02.0: BAR 4: assigned [mem
>> 0x100004000-0x100007fff 64bit pref]
>> [  237.563324] virtio-pci 0000:00:01.0: Set up Xen grant DMA ops (rid
>> 0x8 sid 0x1)
>> [  237.564833] virtio-pci 0000:00:01.0: enabling device (0000 -> 0002)
>> [  237.582734] virtio-pci 0000:00:02.0: Set up Xen grant DMA ops (rid
>> 0x10 sid 0x1)
>> [  237.583413] virtio-pci 0000:00:02.0: enabling device (0000 -> 0002)
>> [  237.595712] virtio_blk virtio1: 4/0/0 default/read/poll queues
>> [  237.596227] virtio_net virtio0 enp0s1: renamed from eth1
>> [  237.602499] virtio_blk virtio1: [vda] 4096000 512-byte logical blocks
>> (2.10 GB/1.95 GiB)
>> [  237.606317] xen_grant_dma_handle_sysrq: enable /pcie@10000000
>>
>> 4. The same way the pseudo-hotremove would work (if we change the status
>> to "disabled" the corresponding device gets removed)
>>
>>
>> If this pseudo-hotplug sounds appropriate for the dom0less,
> This is great! Yes I think it is totally acceptable.


Perfect!


>
>
>> the one of the next questions would be what mechanism to use for
>> signalling (event, xenstore, whatever).
> For your information, we had to solve a similar issue a few months ago
> to let a domU discover a newly added and directly assigned programmable
> logic block. That was also done by applying DT overlays, first to Xen,
> then to the domU.
>
> Have a look at Vikram's Xen Summit presentation:
> https://urldefense.com/v3/__https://static.sched.com/hosted_files/xen2022/e8/Introduce*20Dynamic*20Device*20Node*20Programming*20for*20Xen.pdf__;JSUlJSUl!!GF_29dbcQIUBPA!yTJGjmmfrdJfoc-magnbk6EcuNspXxuQyDMzItgaV9mehJMj37w7Go_O9tDCQV3Qpij0PORZdiZsjBYROsaRNWVjPNUucg$
>   [static[.]sched[.]com]
>
> We wrote a small xenstore-based protocol to notify the domU and also to
> tranfer the overlay to it:
>
> https://urldefense.com/v3/__https://github.com/Xilinx/xen/blob/xlnx_rebase_4.16/docs/misc/arm/overlay.txt__;!!GF_29dbcQIUBPA!yTJGjmmfrdJfoc-magnbk6EcuNspXxuQyDMzItgaV9mehJMj37w7Go_O9tDCQV3Qpij0PORZdiZsjBYROsaRNWXs7xlouA$
>   [github[.]com]
> https://urldefense.com/v3/__https://github.com/Xilinx/xen/blob/xlnx_rebase_4.16/tools/helpers/get_overlay.c__;!!GF_29dbcQIUBPA!yTJGjmmfrdJfoc-magnbk6EcuNspXxuQyDMzItgaV9mehJMj37w7Go_O9tDCQV3Qpij0PORZdiZsjBYROsaRNWXoR3gEjw$
>   [github[.]com]
>
> There is a good description starting at slide 16 in the PDF.
>
>
> I am only sharing this as FYI.


Very interesting materials and impressive work! I can't even imagine 
what it took to get it working.



> This Virtio problem is simpler because we
> already know the devices that are going to become available. We don't
> need an actual DT overlay to be passed to the domU. So we could get away
> with just a single interrupt or a single xenstore property.

I completely agree that for virtio it is going to be simpler than for FPGA.


>
>
>> Note that signal should only be sent if all backends which serve
>> virtio-pci devices within that PCI Host bridge are ready.
> Yes. That should be fine as long as all the backends are in the same
> domain. I can imagine there could be difficulties if the backends are
> in different domains: backend-domain-1 would have to tell dom0 that it
> is ready, then backend-domain-2 would have to do the same, then dom0
> finally notifies the domU, or something like that.
>
> Anyway, I think this is good enough to start as a solution. Excellent!
>
>
>>>>> Other ideas?
>>>> Another (crazy?) idea is to reuse CONFIG_XEN_VIRTIO_FORCE_GRANT for
>>>> dom0less system (I mean without "xen,grant-dma" bindings at all).
>>>> If virtio backends are always going to run in Dom0 when we have it up
>>>> and running, then it should work as domid == 0 is reserved for Dom0.
>>>> If there is a need to run virtio backends in other *backend* domain (for
>>>> the domain ID to be always known we could reserve an ID for it, so it
>>>> would be a const value),
>>>> we could probably introduce something configurable like
>>>> CONFIG_XEN_VIRTIO_FORCE_GRANT_BE_DOMID with 0 by default (or cmd line
>>>> option).
>>> The problem in a dom0less system is not much how to tell which is the
>>> backend domid, because that is known in advance and could be added to
>>> device tree at boot somehow. The issue is how to ask the frontend to
>>> "wait" and then how to tell the frontend to "proceed" after the backend
>>> comes online.
>> please see above.
>>
>>
>> To summarize:
>>
>> 1. For normal case there is no problem with communicating the backend
>> domid on Arm with device-tree (neither for virtio-mmio nor for virtio-pci),
>> for the virtio-pci the V2 (PCI-IOMMU bindings) should be used. For the
>> dom0less there won't be problem also as I understood from the discussion
>> (as the configuration is known in advance).
>> So I propose to concentrate on V2.
> Yes I agree
>
>
>> 2. The problem is in supporting virtio for the dom0less in general
>> despite whether it is a foreign or grant mappings.
>> Here we would need a (pseudo-)hotplug or some other method to start
>> operating only when backend is available.
> Yes I think you are right

-- 
Regards,

Oleksandr Tyshchenko

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.