[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V4 24/24] [RFC] libxl: Add support for virtio-disk configuration

On 16.01.21 00:01, Julien Grall wrote:
Hi Oleksandr,

Hi Julien

On 12/01/2021 21:52, Oleksandr Tyshchenko wrote:
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>

This patch adds basic support for configuring and assisting virtio-disk
backend (emualator) which is intended to run out of Qemu and could be run
in any domain.

Xenstore was chosen as a communication interface for the emulator running in non-toolstack domain to be able to get configuration either by reading Xenstore directly or by receiving command line parameters (an updated 'xl devd' running in the same domain would read Xenstore beforehand and call backend
executable with the required arguments).

An example of domain configuration (two disks are assigned to the guest,
the latter is in readonly mode):

vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ]

Where per-disk Xenstore entries are:
- filename and readonly flag (configured via "vdisk" property)
- base and irq (allocated dynamically)

Besides handling 'visible' params described in configuration file,
patch also allocates virtio-mmio specific ones for each device and
writes them into Xenstore. virtio-mmio params (irq and base) are
unique per guest domain, they allocated at the domain creation time
and passed through to the emulator. Each VirtIO device has at least
one pair of these params.

1. An extra "virtio" property could be removed.
2. Update documentation.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
[On Arm only]
Tested-by: Wei Chen <Wei.Chen@xxxxxxx>

Changes RFC -> V1:
    - no changes

Changes V1 -> V2:
    - rebase according to the new location of libxl_virtio_disk.c

Changes V2 -> V3:
    - no changes

Changes V3 -> V4:
    - rebase according to the new argument for DEFINE_DEVICE_TYPE_STRUCT

Please note, there is a real concern about VirtIO interrupts allocation.
[Just copy here what Stefano said in RFC thread]

So, if we end up allocating let's say 6 virtio interrupts for a domain,
the chance of a clash with a physical interrupt of a passthrough device is real.

For the first version, I think a static approach is fine because it doesn't bind us to anything yet (there is no interface change). We can refine it on follow-ups as we figure out how virtio is going to be used in the field.

I am not entirely sure how to solve it, but these are a few ideas:
- choosing virtio interrupts that are less likely to conflict (maybe > 1000)

Well, we only support 988 interrupts :). However, we will waste some memory in the vGIC structure (we would need to allocate memory for the 988 interrupts) if you chose an interrupt towards then end.

- make the virtio irq (optionally) configurable so that a user could
   override the default irq and specify one that doesn't conflict

This is not very ideal because it makes the use of virtio quite unfriendly with passthrough. Note that platform device passthrough is already unfriendly, but I am thinking PCI :).

- implementing support for virq != pirq (even the xl interface doesn't
   allow to specify the virq number for passthrough devices, see "irqs")
I can't remember whether I had a reason to not support virq != pirq when this was initially implemented. This is one possibility, but it is as unfriendly as the previous option.

I will add a 4th one:
   - Automatically allocate the virtio IRQ. This should be possible to do it without too much trouble as we know in advance which IRQs will be passthrough.
As I understand the IRQs for passthrough are described in "irq" property and stored in d_config->b_info.irqs[i], so yes we know in advance which IRQs will be used for passthrough and we will be able to choose non-clashed ones (iterating over all IRQs in a reserved range) for the virtio devices.  The question is how many IRQs should be reserved.

My preference is the 4th one, that said we may also want to pick either 2 or 3 to give some flexibility to an admin if they wish to get their hand dirty.

Personally I would be ok with 4th, but without admin involvement.

Also there is one suggestion from Wei Chen regarding a parameter for domain
config file which I haven't addressed yet.
[Just copy here what Wei said in V2 thread]
Can we keep use the same 'disk' parameter for virtio-disk, but add an option like
For example:
disk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3,model=virtio-disk' ]
Just like what Xen has done for x86 virtio-net.
  tools/libs/light/Makefile                 |   1 +
  tools/libs/light/libxl_arm.c              |  56 ++++++++++++---
  tools/libs/light/libxl_create.c           |   1 +
  tools/libs/light/libxl_internal.h         |   1 +
  tools/libs/light/libxl_types.idl          |  15 ++++
  tools/libs/light/libxl_types_internal.idl |   1 +
  tools/libs/light/libxl_virtio_disk.c      | 109 ++++++++++++++++++++++++++++
  tools/xl/Makefile                         |   2 +-
  tools/xl/xl.h                             |   3 +
  tools/xl/xl_cmdtable.c                    |  15 ++++
  tools/xl/xl_parse.c                       | 115 ++++++++++++++++++++++++++++++
  tools/xl/xl_virtio_disk.c                 |  46 ++++++++++++
  12 files changed, 354 insertions(+), 11 deletions(-)
  create mode 100644 tools/libs/light/libxl_virtio_disk.c
  create mode 100644 tools/xl/xl_virtio_disk.c

diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile
index 68f6fa3..ccc91b9 100644
--- a/tools/libs/light/Makefile
+++ b/tools/libs/light/Makefile
@@ -115,6 +115,7 @@ SRCS-y += libxl_genid.c
  SRCS-y += _libxl_types.c
  SRCS-y += libxl_flask.c
  SRCS-y += _libxl_types_internal.c
+SRCS-y += libxl_virtio_disk.c
    ifeq ($(CONFIG_LIBNL),y)
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 588ee5a..9eb3022 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -8,6 +8,12 @@
  #include <assert.h>
  #include <xen/device_tree_defs.h>
  +#ifndef container_of
+#define container_of(ptr, type, member) ({            \
+        typeof( ((type *)0)->member ) *__mptr = (ptr);    \
+        (type *)( (char *)__mptr - offsetof(type,member) );})
  static const char *gicv_to_string(libxl_gic_version gic_version)
      switch (gic_version) {
@@ -39,14 +45,32 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
          vuart_enabled = true;
  -    /*
-     * XXX: Handle properly virtio
-     * A proper solution would be the toolstack to allocate the interrupts -     * used by each virtio backend and let the backend now which one is used
-     */

Ok, so you added some code in patch #23 that is going to be mostly dropped here. I think you want to rethink how you do the split here.

One possible approach would be to have a patch which adds the infrastructe but no call. It would contain:
  1) Allocate a space in the virtio region and an interrupt
  2) Create the bindings.

Those helpers can then be called in this patch.

Sounds reasonable.

      if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) {

It feels to me that this parameter is not necessary. You can easily infer it based whether you have a virtio disks attached or not.


-        nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1;
+        uint64_t virtio_base;
+        libxl_device_virtio_disk *virtio_disk;
+        virtio_base = GUEST_VIRTIO_MMIO_BASE;
          virtio_irq = GUEST_VIRTIO_MMIO_SPI;

Looking at patch #23, you defined a single SPI and a region that can only fit virtio device. However, here, you are going to define multiple virtio devices.

I think you want to define the following:

 - GUEST_VIRTIO_MMIO_BASE: Base address of the virtio window
 - GUEST_VIRTIO_MMIO_SIZE: Full length of the virtio window (may contain multiple devices)
 - GUEST_VIRTIO_SPI_FIRST: First SPI reserved for virtio
 - GUEST_VIRTIO_SPI_LAST: Last SPI reserved for virtio

The per-device size doesn't need to be defined in arch-arm.h. Instead, I would only define internally (unless we can use a virtio.h header from Linux?).

I think I got the idea. What are the preferences for these values?

+        if (!d_config->num_virtio_disks) {
+            LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n");
+            return ERROR_FAIL;
+        }
+        virtio_disk = &d_config->virtio_disks[0];
+        for (i = 0; i < virtio_disk->num_disks; i++) {
+            virtio_disk->disks[i].base = virtio_base;
+            virtio_disk->disks[i].irq = virtio_irq;
+            LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64,
+                virtio_irq, virtio_base);
+            virtio_irq ++;

NIT: We usually don't have space before ++ or ...


+            virtio_base += GUEST_VIRTIO_MMIO_SIZE;
+        }
+        virtio_irq --;

... --;


+        nr_spis += (virtio_irq - 32) + 1;
          virtio_enabled = true;


diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 2a3364b..054a0c9 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1204,6 +1204,120 @@ out:
      if (rc) exit(EXIT_FAILURE);
  +#define MAX_VIRTIO_DISKS 4

May I ask why this is hardcoded to 4?

I found 4 as a reasonable value for the initial implementation.
This means how many disks the single device instance can handle.
Do you think we need to change it?



Oleksandr Tyshchenko



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.