[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Minios-devel] [UNIKRAFT PATCH 2/2] plat/kvm: Introducing virtio base driver



Hi Sharan,

Thanks a lot for comments and in-depth analysis! Please see my replies
inline.

On 06/29/2018 05:14 PM, Sharan Santhanam wrote:
> Hello,
> 
> Please find some of the comments inline:
> 
> 
> On 06/27/2018 12:59 PM, Costin Lupu wrote:
>> From: Costin Lupu <costin.lup@xxxxxxxxx>
>>
>> Currently, the virtio base driver contains the implementation only for
>> virtio rings. The implementation was ported from Solo5 and adapted to
>> Unikraft APIs. Subsequent virtio drivers should depend on this base
>> driver.
>>
>> This patch also introduces the plat/drivers/ directory which should
>> contain the drivers implementations which may be used by more than a
>> single platform.
>>
>> Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx>
>> ---
>>   plat/Makefile.uk                             |   1 +
>>   plat/common/include/pci/virtio/virtio_pci.h  |  66 +++++++
>>   plat/common/include/pci/virtio/virtio_ring.h | 257
>> +++++++++++++++++++++++++++
>>   plat/drivers/virtio/virtio_ring.c            | 153 ++++++++++++++++
>>   plat/kvm/Config.uk                           |   8 +
>>   plat/kvm/Makefile.uk                         |   8 +
>>   6 files changed, 493 insertions(+)
>>   create mode 100644 plat/common/include/pci/virtio/virtio_pci.h
>>   create mode 100644 plat/common/include/pci/virtio/virtio_ring.h
>>   create mode 100644 plat/drivers/virtio/virtio_ring.c
>>
>> diff --git a/plat/Makefile.uk b/plat/Makefile.uk
>> index 6ff632c..72155bd 100644
>> --- a/plat/Makefile.uk
>> +++ b/plat/Makefile.uk
>> @@ -1,5 +1,6 @@
>>   UK_PLAT_BASE        := $(CONFIG_UK_BASE)/plat
>>   UK_PLAT_COMMON_BASE := $(UK_PLAT_BASE)/common
>> +UK_PLAT_DRIVERS_BASE:= $(UK_PLAT_BASE)/drivers
>>     $(eval $(call _import_lib,$(UK_PLAT_BASE)/xen))
>>   $(eval $(call _import_lib,$(UK_PLAT_BASE)/kvm))
>> diff --git a/plat/common/include/pci/virtio/virtio_pci.h
>> b/plat/common/include/pci/virtio/virtio_pci.h
>> new file mode 100644
>> index 0000000..02bad4c
>> --- /dev/null
>> +++ b/plat/common/include/pci/virtio/virtio_pci.h
>> @@ -0,0 +1,66 @@
>> +/* SPDX-License-Identifier: ISC */
>> +/*
>> + * Authors: Dan Williams
>> + *          Costin Lupu <costin.lupu@xxxxxxxxx>
>> + *
>> + * Copyright (c) 2015, IBM
>> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation
>> + *
>> + * Permission to use, copy, modify, and/or distribute this software
>> + * for any purpose with or without fee is hereby granted, provided
>> + * that the above copyright notice and this permission notice appear
>> + * in all copies.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
>> + * WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
>> + * WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
>> + * AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
>> + * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
>> + * OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
>> + * NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
>> + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>> + */
>> +/* Taken and adapted from solo5 virtio_pci.h */
>> +
>> +#ifndef __PLAT_CMN_PCI_VIRTIO_PCI_H__
>> +#define __PLAT_CMN_PCI_VIRTIO_PCI_H__
>> +
>> +#include <uk/arch/limits.h>
>> +
> Do  we protect (MARCO definition) for the legacy mode of virtio vs
> nonlegacy virtio mode? As the current layout support for legacy mode.

Ack, I will add a TODO comment to emphasize that.

>> +/* virtio config space layout */
>> +#define VIRTIO_PCI_HOST_FEATURES        0    /* 32-bit r/o */
>> +#define VIRTIO_PCI_GUEST_FEATURES       4    /* 32-bit r/w */
>> +#define VIRTIO_PCI_QUEUE_PFN            8    /* 32-bit r/w */
>> +#define VIRTIO_PCI_QUEUE_SIZE           12   /* 16-bit r/o */
>> +#define VIRTIO_PCI_QUEUE_SEL            14   /* 16-bit r/w */
>> +#define VIRTIO_PCI_QUEUE_NOTIFY         16   /* 16-bit r/w */
>> +
>> +/*
>> + * Shift size used for writing physical queue address to QUEUE_PFN
>> + */
> 
> The spec explicitly mentions the size of the page size to be 4096 in
> section 4.1.5.1.5.4.1.
> 
> Should we not hard code it to 12 instead of combining it with
> __PAGE_SHIFT as there is also virtio-PCI which is driving this
> configuration?

Ack, I will replace it with 12.

>> +#define VIRTIO_PCI_QUEUE_ADDR_SHIFT     __PAGE_SHIFT
>> +
>> +
>> +/*
>> + * The status register lets us tell the device where we are in
>> + * initialization
>> + */
>> +#define VIRTIO_PCI_STATUS               18   /* 8-bit r/w */
>> +#define VIRTIO_PCI_STATUS_ACK           0x1  /* we recognize device
>> as virtio */
>> +#define VIRTIO_PCI_STATUS_DRIVER        0x2  /* we want to drive it */
>> +#define VIRTIO_PCI_STATUS_DRIVER_OK     0x4  /* initialization is
>> complete */
> Missing definition for the device reset or device error flag (0x40). Do
> we need to care for the device status?

Ack for reset flag. Yep, we'll need these in the virtio drivers.

>> +#define VIRTIO_PCI_STATUS_FAIL          0x80 /* tell device
>> something's wrong */
>> +
>> +/*
>> + * Reading the value will return the current contents of the interrupt
>> + * status register and will also clear it.  This is effectively a
>> + * read-and-acknowledge.
>> + */
>> +#define VIRTIO_PCI_ISR                  19   /* 8-bit r/o */
>> +#define VIRTIO_PCI_ISR_HAS_INTR         0x1  /* interrupt is for this
>> device */
>> +#define VIRTIO_PCI_ISR_CONFIG           0x2  /* config change bit */
>> +
> 
> Do we also make assumption on MSI more explicitly?

Ack, I'll add a TODO comment for revisiting this when MSI will be
implemented.

>> +/* xxx assuming msi is not configured */
>> +#define VIRTIO_PCI_CONFIG_OFF           20
>> +
>> +#endif /* __PLAT_CMN_PCI_VIRTIO_PCI_H__ */
>> diff --git a/plat/common/include/pci/virtio/virtio_ring.h
>> b/plat/common/include/pci/virtio/virtio_ring.h
>> new file mode 100644
>> index 0000000..549ae54
>> --- /dev/null
>> +++ b/plat/common/include/pci/virtio/virtio_ring.h
>> @@ -0,0 +1,257 @@
> 
> As a general comment, we should use the virtio_ring.h from the official
> specification. It is available in the linux kernel repo
> include/uapi/linux/virtio_ring.h under BSD License.

I totally agree that the Linux variant is better. However, given the
short time, we'll stick to the Solo5 variant. I added a reminding TODO
comment in the header.

>> +/* SPDX-License-Identifier: BSD-3-Clause */
>> +/*
>> + * Authors: Costin Lupu <costin.lupu@xxxxxxxxx>
>> + *
>> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation. All rights
>> reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + *
>> + * 1. Redistributions of source code must retain the above copyright
>> + *    notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *    notice, this list of conditions and the following disclaimer in
>> the
>> + *    documentation and/or other materials provided with the
>> distribution.
>> + * 3. Neither the name of the copyright holder nor the names of its
>> + *    contributors may be used to endorse or promote products derived
>> from
>> + *    this software without specific prior written permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS "AS IS"
>> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>> TO, THE
>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>> PURPOSE
>> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
>> CONTRIBUTORS BE
>> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
>> BUSINESS
>> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
>> WHETHER IN
>> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
>> OTHERWISE)
>> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
>> ADVISED OF THE
>> + * POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
>> + */
>> +/* An interface for efficient virtio implementation.
>> + *
>> + * This header is BSD licensed so anyone can use the definitions
>> + * to implement compatible drivers/servers.
>> + *
>> + * Copyright 2007, 2009, IBM Corporation
>> + * Copyright 2011, Red Hat, Inc
>> + * All rights reserved.
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> + * are met:
>> + * 1. Redistributions of source code must retain the above copyright
>> + *    notice, this list of conditions and the following disclaimer.
>> + * 2. Redistributions in binary form must reproduce the above copyright
>> + *    notice, this list of conditions and the following disclaimer in
>> the
>> + *    documentation and/or other materials provided with the
>> distribution.
>> + * 3. Neither the name of IBM nor the names of its contributors
>> + *    may be used to endorse or promote products derived from this
>> software
>> + *    without specific prior written permission.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS "AS IS"
>> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>> TO, THE
>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
>> PURPOSE
>> + * ARE DISCLAIMED. IN NO EVENT SHALL IBM OR CONTRIBUTORS BE
>> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
>> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
>> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
>> BUSINESS
>> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
>> WHETHER IN
>> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
>> OTHERWISE)
>> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
>> ADVISED OF THE
>> + * POSSIBILITY OF SUCH DAMAGE.
>> + */
>> +#ifndef __PLAT_CMN_PCI_VIRTIO_RING_H__
>> +#define __PLAT_CMN_PCI_VIRTIO_RING_H__
>> +
>> +#include <uk/arch/limits.h>
>> +#include <uk/alloc.h>
>> +
>> +/* This marks a buffer as continuing via the next field. */
>> +#define VIRTQ_DESC_F_NEXT       1
>> +/* This marks a buffer as write-only (otherwise read-only). */
>> +#define VIRTQ_DESC_F_WRITE      2
>> +/* This means the buffer contains a list of buffer descriptors. */
>> +#define VIRTQ_DESC_F_INDIRECT   4
>> +
>> +/*
>> + * The device uses this in used->flags to advise the driver:
>> + * don't kick me when you add a buffer. It's unreliable, so
>> + * it's simply an optimization.
>> + */
>> +#define VIRTQ_USED_F_NO_NOTIFY  1
>> +
>> +/*
>> + * The driver uses this in avail->flags to advise the device:
>> + * don't interrupt me when you consume a buffer. It's unreliable, so
>> + * it's simply an optimization.
>> + */
>> +#define VIRTQ_AVAIL_F_NO_INTERRUPT      1
>> +
>> +/* Support for indirect descriptors */
>> +#define VIRTIO_F_INDIRECT_DESC    28
>> +
>> +/* Support for avail_event and used_event fields */
>> +#define VIRTIO_F_EVENT_IDX        29
>> +
>> +/* Arbitrary descriptor layouts. */
>> +#define VIRTIO_F_ANY_LAYOUT       27
>> +
>> +
>> +/*
>> + * Virtqueue descriptors: 16 bytes.
>> + * These can chain together via "next".
>> + */
> It is better to explicitly use virtio based datatype instead of generic
> u64, u32, u16 because in the specification it is mentioned that the data
> type to be little endian/guest endianess for different layout. We should
> be careful on the use of these data types.
> 
> It may also be useful to provide conversion functions for guest to host
> endianess.

Ack. I added new types for little endianess (__virtio_le{16,32,64}) to
clarify which is which. For big endianess, the conversion functions
should be implemented in the future. I added a preprocessor validation
for that.

>> +struct virtq_desc {
>> +    /* Address (guest-physical). */
>> +    __u64 addr;
>> +    /* Length. */
>> +    __u32 len;
>> +    /* The flags as indicated above. */
>> +    __u16 flags;
>> +    /* We chain unused descriptors via this, too */
>> +    __u16 next;
>> +};
>> +
>> +struct virtq_avail {
>> +    __u16 flags;
>> +    __u16 idx;
>> +    __u16 ring[];
>> +    /* Only if VIRTIO_F_EVENT_IDX: __u16 used_event; */
>> +};
>> +
>> +/* __u32 is used here for ids for padding reasons. */
>> +struct virtq_used_elem {
>> +    /* Index of start of used descriptor chain. */
>> +    __u32 id;
>> +    /* Total length of the descriptor chain which was written to. */
>> +    __u32 len;
>> +};
>> +
>> +struct virtq_used {
>> +    __u16 flags;
>> +    volatile __u16 idx;
>> +    struct virtq_used_elem ring[];
>> +    /* Only if VIRTIO_F_EVENT_IDX: __u16 avail_event; */
>> +};
>> +
> 
> Why did we choose 1526 as the length? Is it based on the size of the
> Ethernet frame size of 1514 and 12 for the meta data? If yes, should we
> make this choice based on this use case?

Yes, it's from the frame size + meta. I added a TODO comment in the code
to clarify that and to outline it should be changed to support buffer
sizes defined by the drivers.

>> +/* This is the max buffer length per descriptor. */
>> +#define MAX_BUFFER_LEN 1526
>> +
>> +/*
>> + * Each one of these io_buffer's map to a descriptor.
>> + * An array of io_buffer's of size virtq->num (same
>> + * as virtq->desc) is allocated during init.
>> + */
>> +struct io_buffer {
>> +    __u8 data[MAX_BUFFER_LEN];
>> +
>> +    /*
>> +     * Data length in bytes. It is written by the driver on
>> +     * a tx/write, or by the device on a rx/read on interrupt
>> +     * handling (do not remove the volatile).
>> +     */
>> +    volatile __u32 len;
>> +
>> +    /* Extra flags to be added to the corresponding descriptor. */
>> +    __u16 extra_flags;
>> +};
>> +
> 
> We should also consider packing the structure as we want a specific
> layout of memory allocation. As mentioned in the spec [1].

The layout structure follows the specification if one uses the helper
macros below to initialize the structure fields, therefore there is no
need for structure packing.

>> +struct virtq {
>> +    unsigned int num;
>> +
>> +    struct virtq_desc *desc;
>> +    struct virtq_avail *avail;
>> +    struct virtq_used *used;
>> +    struct io_buffer *bufs;
>> +
>> +    /* Keep track of available (free) descriptors */
>> +    __u16 num_avail;
>> +
>> +    /* Indexes in the descriptors array */
>> +    __u16 last_used;
>> +    __u16 next_avail;
>> +};
>> +
>> +/*
>> + * Helper macros for accessing virtqueue fields
>> + */
>> +
>> +#define VIRTQ_OFF_DESC(q)         0
>> +
>> +#define VIRTQ_OFF_AVAIL(q)        ((q)->num * sizeof(struct virtq_desc))
>> +#define VIRTQ_OFF_AVAIL_RING(q) \
>> +    (VIRTQ_OFF_AVAIL(q) + sizeof(struct virtq_avail))
>> +
>> +#define VIRTQ_OFF_PADDING(q) \
>> +    (VIRTQ_OFF_AVAIL_RING(q) + (sizeof(__u16) * (q)->num))
>> +
>> +#define VIRTQ_OFF_USED(q) \
>> +    ((VIRTQ_OFF_PADDING(q) + __PAGE_SIZE - 1) & __PAGE_MASK)
>> +#define VIRTQ_OFF_USED_RING(q) \
>> +    (VIRTQ_OFF_USED(q) + sizeof(struct virtq_used))
>> +
>> +#define VIRTQ_SIZE(q) \
>> +    (VIRTQ_OFF_USED_RING(q) + (sizeof(struct virtq_used_elem) *
>> (q)->num))
>> +
>> +static inline
>> +int virtq_need_event(__u16 event_idx, __u16 new_idx, __u16 old_idx)
>> +{
>> +    return (new_idx - event_idx - 1) < (new_idx - old_idx);
>> +}
>> +
>> +/* Get location of event indices (only with VIRTIO_F_EVENT_IDX) */
>> +static inline __u16 *virtq_used_event(struct virtq *vq)
>> +{
>> +    /*
>> +     * For backwards compatibility, used event index
>> +     * is at *end* of avail ring.
>> +     */
>> +    return &vq->avail->ring[vq->num];
>> +}
>> +
>> +static inline __u16 *virtq_avail_event(struct virtq *vq)
>> +{
>> +    /*
>> +     * For backwards compatibility, avail event index
>> +     * is at *end* of used ring.
>> +     */
>> +    return (__u16 *) &vq->used->ring[vq->num];
>> +}
>> +
>> +/*
>> + * Create a descriptor chain starting at index head,
>> + * using vq->bufs also starting at index head.
>> + * @param vq Virtual queue
>> + * @param head Starting descriptor index
>> + * @param num Number of descriptors (and number of bufs).
>> + * @return 0 on success, < 0 otherwise
>> + */
>> +int virtq_add_descriptor_chain(struct virtq *vq,
>> +        __u16 head, __u16 num);
>> +
>> +/*
>> + * Initializes a virtual queue
>> + * @param vq Virtual queue
>> + * @param pci_base Base in PCI configuration space
>> + * @param selector Virtual queue selector
>> + * @param a Memory allocator
>> + * @return 0 on success, < 0 otherwise
>> + */
>> +int virtq_rings_init(struct virtq *vq, __u16 pci_base, int selector,
>> +        struct uk_alloc *a);
>> +
>> +/*
>> + * Deinitializes a virtual queue
>> + * @param vq Virtual queue
>> + * @param pci_base Base in PCI configuration space
>> + * @param selector Virtual queue selector
>> + * @param a Memory allocator
>> + */
>> +void virtq_rings_fini(struct virtq *vq, __u16 pci_base, int selector,
>> +        struct uk_alloc *a);
>> +
>> +#endif /* __PLAT_CMN_PCI_VIRTIO_RING_H__ */
>> diff --git a/plat/drivers/virtio/virtio_ring.c
>> b/plat/drivers/virtio/virtio_ring.c
>> new file mode 100644
>> index 0000000..c82cc86
>> --- /dev/null
>> +++ b/plat/drivers/virtio/virtio_ring.c
>> @@ -0,0 +1,153 @@
>> +/* SPDX-License-Identifier: ISC */
>> +/*
>> + * Authors: Dan Williams
>> + *          Martin Lucina
>> + *          Ricardo Koller
>> + *          Costin Lupu <costin.lupu@xxxxxxxxx>
>> + *
>> + * Copyright (c) 2015-2017 IBM
>> + * Copyright (c) 2016-2017 Docker, Inc.
>> + * Copyright (c) 2018, NEC Europe Ltd., NEC Corporation
>> + *
>> + * Permission to use, copy, modify, and/or distribute this software
>> + * for any purpose with or without fee is hereby granted, provided
>> + * that the above copyright notice and this permission notice appear
>> + * in all copies.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
>> + * WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
>> + * WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
>> + * AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
>> + * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
>> + * OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
>> + * NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
>> + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>> + */
>> +/* Taken and adapted from solo5 virtio_ring.c */
>> +
>> +#include <string.h>
>> +#include <uk/print.h>
>> +#include <cpu.h>
>> +#include <pci/virtio/virtio_pci.h>
>> +#include <pci/virtio/virtio_ring.h>
>> +
> 
>  In section 3.2.1, the specification mention a max size of 32768.

Ack.

>> +/*
>> + * There is no official max queue size. But we've seen 4096,
>> + * so let's use the double of that.
>> + */
>> +#define VIRTQ_MAX_QUEUE_SIZE  (2 * __PAGE_SIZE)
>> +
>> +
>> +/*
>> + * Create a descriptor chain starting at index head, using vq->bufs
>> + * also starting at index head.
>> + * Make sure the vq-bufs are cleaned before using them again.
>> + */
>> +int virtq_add_descriptor_chain(struct virtq *vq, __u16 head, __u16 num)
>> +{
>> +    struct virtq_desc *desc;
>> +    __u16 used_descs, mask, i;
>> +
>> +    UK_ASSERT(vq != NULL);
>> +    UK_ASSERT(num > 0);
>> +
>> +    if (vq->num_avail < num) {
>> +        uk_printd(DLVL_WARN, "virtq full! next_avail:%d last_used:%d\n",
>> +                vq->next_avail, vq->last_used);
> 
> Should it not be -ENOMEM or -ENOSPC?

Actually, num might have invalid values when `num > vq->num`. I refined
the validations and replaced the error code here with -ENOMEM to report
there is no space only if `num <= vq->num && q->num_avail < num`.

>> +        return -EINVAL;
>> +    }
>> +
>> +    used_descs = num;
>> +    mask = vq->num - 1;
>> +
>> +    for (i = head; used_descs > 0; used_descs--) {
>> +        struct io_buffer *buf = &vq->bufs[i];
>> +
>> +        /*
>> +         * The first field of a "struct io_buffer" is the "data" field,
>> +         * so in the interrupt handler we can just cast this pointer
>> +         * back into a 'struct io_buffer'.
>> +         */
>> +        UK_ASSERT(buf->data == (__u8 *) buf);
>> +        UK_ASSERT(buf->len <= MAX_BUFFER_LEN);
> 
> Do we assert here or report an error to an upper layer? ASSERT might be
> skipped in the optimized code?

With the current implementation, if the assertions are not satisfied
then the buffer is corrupted. This comment raises a new concern here:
how should we do the error handling in Unikraft? How tolerant should we
be in situations like this. We should define some directions for error
handling and document them.

>> +
>> +        desc = &vq->desc[i];
>> +        desc->addr = (__u64) buf->data;
>> +        desc->len = buf->len;
>> +        desc->flags = VIRTQ_DESC_F_NEXT | buf->extra_flags;
>> +
>> +        i = (i + 1) & mask;
>> +        desc->next = i;
>> +    }
>> +
>> +    /* The last descriptor in the chain does not have a next */
>> +    desc->next = 0;
>> +    desc->flags &= ~VIRTQ_DESC_F_NEXT;
>> +
>> +    vq->num_avail -= num;
> 
> I think we need a write memory barrier before the next statement as we
> transferring the buffer to the virtio device which could run on a SMP
> host. What do you think?

Ack.

>> +    /* Memory barriers should be unnecessary with one processor */
>> +    vq->avail->ring[vq->avail->idx & mask] = head;
>> +    /* avail->idx always increments and wraps naturally at 65536 */
>> +    vq->avail->idx++;
>> +    vq->next_avail += num;
>> +
>> +    return 0;
>> +}
>> +
>> +int virtq_rings_init(struct virtq *vq, __u16 pci_base, int selector,
>> +        struct uk_alloc *a)
>> +{
>> +    __u8 *data;
>> +    __sz vq_size;
>> +
>> +    UK_ASSERT(vq != NULL);
>> +    UK_ASSERT(a != NULL);
>> +
>> +    vq->last_used = vq->next_avail = 0;
>> +
>> +    /* read queue size */
>> +    outw(pci_base + VIRTIO_PCI_QUEUE_SEL, selector);
>> +    vq->num = vq->num_avail = inw(pci_base + VIRTIO_PCI_QUEUE_SIZE);
> 
> Check for zero on the num, as it is a check if the queue exist or not.
> 
> For better alignment, it may be wise to check for a power of two.

Ack. I'll leave the power-of-two check for the future updates.

>> +
>> +    UK_ASSERT(vq->num <= VIRTQ_MAX_QUEUE_SIZE);
> 
> The specification provides a cleaner implementation for calculating the
> size of the queue. Personally, I would use that implementation as it
> improve on readability.

We shall change this when we'll adopt the Linux virtio_ring.h.

>> +
>> +    vq_size = VIRTQ_SIZE(vq);
> 
> Initialize data variable as the memalign return error without modifying
> the data variable.

Ack.

>> +
>> +    /* allocate queue memory */
>> +    uk_posix_memalign_ifpages(a, (void **) &data, __PAGE_SIZE, vq_size);
>> +    if (!data)
>> +        return -ENOMEM;
>> +
>> +    memset(data, 0, vq_size);
>> +
>> +    vq->desc = (struct virtq_desc *) (data + VIRTQ_OFF_DESC(vq));
>> +    vq->avail = (struct virtq_avail *) (data + VIRTQ_OFF_AVAIL(vq));
>> +    vq->used = (struct virtq_used *) (data + VIRTQ_OFF_USED(vq));
>> +
>> +    /* set queue memory */
>> +    outw(pci_base + VIRTIO_PCI_QUEUE_SEL, selector);
> 
> The specification mentions about providing the physical address of the
> guest OS. We are currently assigning the address return from malloc,
> which will not work always.

Yep, good catch. I added a TODO comment since we don't have a function
converting virtual addresses to physical addresses.

>> +    outl(pci_base + VIRTIO_PCI_QUEUE_PFN,
>> +            (__u64) data >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
>> +
>> +    return 0;
>> +}
>> +
>> +void virtq_rings_fini(struct virtq *vq, __u16 pci_base, int selector,
>> +        struct uk_alloc *a)
>> +{
>> +    __u8 *data;
>> +
>> +    UK_ASSERT(vq != NULL);
>> +    UK_ASSERT(a != NULL);
>> +
>> +    /* reset queue memory */
>> +    outw(pci_base + VIRTIO_PCI_QUEUE_SEL, selector);
>> +    outl(pci_base + VIRTIO_PCI_QUEUE_PFN, 0);
>> +
>> +    /* free queue memory */
>> +    data = (__u8 *) vq->desc - VIRTQ_OFF_DESC(vq->num);
>> +    uk_free(a, data);
>> +
>> +    /* cleanup the queue */
>> +    memset(vq, 0, sizeof(*vq));
>> +}
>> diff --git a/plat/kvm/Config.uk b/plat/kvm/Config.uk
>> index 967a07b..893de74 100644
>> --- a/plat/kvm/Config.uk
>> +++ b/plat/kvm/Config.uk
>> @@ -17,4 +17,12 @@ config KVM_PCI
>>          select LIBUKBUS
>>          help
>>                   PCI bus driver for probing and operating PCI devices
>> +
>> +if (KVM_PCI)
>> +config KVM_PCI_VIRTIO
>> +       bool "Virtio Drivers"
>> +       default n
>> +       help
>> +                Virtio drivers for KVM
>> +endif
>>   endif
>> diff --git a/plat/kvm/Makefile.uk b/plat/kvm/Makefile.uk
>> index c1e79a2..1ae1b08 100644
>> --- a/plat/kvm/Makefile.uk
>> +++ b/plat/kvm/Makefile.uk
>> @@ -8,6 +8,7 @@ $(eval $(call addplat_s,kvm,$(CONFIG_PLAT_KVM)))
>>   ##
>>   $(eval $(call addplatlib,kvm,libkvmplat))
>>   $(eval $(call addplatlib_s,kvm,libkvmpci,$(CONFIG_KVM_PCI)))
>> +$(eval $(call
>> addplatlib_s,kvm,libkvmpcivirtio,$(CONFIG_KVM_PCI_VIRTIO)))
>>     ##
>>   ## Platform library definitions
>> @@ -46,3 +47,10 @@ LIBKVMPLAT_SRCS-y              +=
>> $(UK_PLAT_COMMON_BASE)/memory.c|common
>>   LIBKVMPCI_ASINCLUDES-$(CONFIG_ARCH_X86_64)  +=
>> -I$(UK_PLAT_COMMON_BASE)/include
>>   LIBKVMPCI_CINCLUDES-$(CONFIG_ARCH_X86_64)   +=
>> -I$(UK_PLAT_COMMON_BASE)/include
>>   LIBKVMPCI_SRCS-y                     +=
>> $(UK_PLAT_COMMON_BASE)/pci_bus.c|common
>> +
>> +##
>> +## Virtio library definitions
>> +##
>> +LIBKVMPCIVIRTIO_ASINCLUDES-y   += -I$(UK_PLAT_COMMON_BASE)/include
>> +LIBKVMPCIVIRTIO_CINCLUDES-y    += -I$(UK_PLAT_COMMON_BASE)/include
>> +LIBKVMPCIVIRTIO_SRCS-y         +=
>> $(UK_PLAT_DRIVERS_BASE)/virtio/virtio_ring.c
>>
> 
> We may also need to append to export symbols for the api functions as
> virtio_ring expose function to be used by other libraries.

Let's wait until we'll have some clearer indications on that matter.

> Reference
> [1] http://docs.oasis-open.org/virtio/virtio/v1.0/cs01/virtio-v1.0-cs01.pdf
> 
> Thanks & Regards
> Sharan Santhanam

_______________________________________________
Minios-devel mailing list
Minios-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/minios-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.