[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v6 11/11] vpci/msix: add MSI-X handlers



Add handlers for accesses to the MSI-X message control field on the
PCI configuration space, and traps for accesses to the memory region
that contains the MSI-X table and PBA. This traps detect attempts from
the guest to configure MSI-X interrupts and properly sets them up.

Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
BIR are not trapped by Xen at the moment.

Finally, turn the panic in the Dom0 PVH builder into a warning.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
Cc: Jan Beulich <jbeulich@xxxxxxxx>
Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
---
Changes since v5:
 - Update lock usage.
 - Unbind/unmap PIRQs when MSIX is disabled.
 - Share the arch-specific MSIX code with the MSI functions.
 - Do not reference the MSIX memory areas from the PCI BARs fields,
   instead fetch the BIR and offset each time needed.
 - Add the '_entry' suffix to the MSIX arch functions.
 - Prefix the vMSIX macros with 'V'.
 - s/gdprintk/gprintk/ in msix.c
 - Make vpci_msix_access_check return bool, and change it's name to
   vpci_msix_access_allowed.
 - Join the first two ifs in vpci_msix_{read/write} into a single one.
 - Allow Dom0 to write to the PBA area.
 - Add a note that reads from the PBA area will need to be translated
   if the PBA it's not identity mapped.

Changes since v4:
 - Remove parentheses around offsetof.
 - Add "being" to MSI-X enabling comment.
 - Use INVALID_PIRQ.
 - Add a simple sanity check to vpci_msix_arch_enable in order to
   detect wrong MSI-X entries more quickly.
 - Constify vpci_msix_arch_print entry argument.
 - s/cpu/fixed/ in vpci_msix_arch_print.
 - Dump the MSI-X info together with the MSI info.
 - Fix vpci_msix_control_write to take into account changes to the
   address and data fields when switching the function mask bit.
 - Only disable/enable the entries if the address or data fields have
   been updated.
 - Usew the BAR enable field to check if a BAR is mapped or not
   (instead of reading the command register for each device).
 - Fix error path in vpci_msix_read to set the return data to ~0.
 - Simplify mask usage in vpci_msix_write.
 - Cast data to uint64_t when shifting it 32 bits.
 - Fix writes to the table entry control register to take into account
   if the mask-all bit is set.
 - Add some comments to clarify the intended behavior of the code.
 - Align the PBA size to 64-bits.
 - Remove the error label in vpci_init_msix.
 - Try to compact the layout of the vpci_msix structure.
 - Remove the local table_bar and pba_bar variables from
   vpci_init_msix, they are used only once.

Changes since v3:
 - Propagate changes from previous versions: remove xen_ prefix, use
   the new fields in vpci_val and remove the return value from
   handlers.
 - Remove the usage of GENMASK.
 - Mave the arch-specific parts of the dump routine to the
   x86/hvm/vmsi.c dump handler.
 - Chain the MSI-X dump handler to the 'M' debug key.
 - Fix the header BAR mappings so that the MSI-X regions inside of
   BARs are unmapped from the domain p2m in order for the handlers to
   work properly.
 - Unconditionally trap and forward accesses to the PBA MSI-X area.
 - Simplify the conditionals in vpci_msix_control_write.
 - Fix vpci_msix_accept to use a bool type.
 - Allow all supported accesses as described in the spec to the MSI-X
   table.
 - Truncate the returned address when the access is a 32b read.
 - Always return X86EMUL_OKAY from the handlers, returning ~0 in the
   read case if the access is not supported, or ignoring writes.
 - Do not check that max_entries is != 0 in the init handler.
 - Use trylock in the dump handler.

Changes since v2:
 - Split out arch-specific code.

This patch has been tested with devices using both a single MSI-X
entry and multiple ones.
---
 xen/arch/x86/hvm/dom0_build.c    |   2 +-
 xen/arch/x86/hvm/hvm.c           |   1 +
 xen/arch/x86/hvm/vmsi.c          | 133 ++++++++--
 xen/drivers/vpci/Makefile        |   2 +-
 xen/drivers/vpci/header.c        |  16 ++
 xen/drivers/vpci/msi.c           |  22 +-
 xen/drivers/vpci/msix.c          | 506 +++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/domain.h |   3 +
 xen/include/asm-x86/hvm/io.h     |   5 +
 xen/include/xen/vpci.h           |  45 ++++
 10 files changed, 705 insertions(+), 30 deletions(-)
 create mode 100644 xen/drivers/vpci/msix.c

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 17d77137d6..8fa92bc5b6 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1111,7 +1111,7 @@ int __init dom0_construct_pvh(struct domain *d, const 
module_t *image,
 
     pvh_setup_mmcfg(d);
 
-    panic("Building a PVHv2 Dom0 is not yet supported.");
+    printk("WARNING: PVH is an experimental mode with limited 
functionality\n");
     return 0;
 }
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b1064413fc..042b7c6a31 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -585,6 +585,7 @@ int hvm_domain_initialise(struct domain *d, unsigned long 
domcr_flags,
     INIT_LIST_HEAD(&d->arch.hvm_domain.write_map.list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.g2m_ioport_list);
     INIT_LIST_HEAD(&d->arch.hvm_domain.mmcfg_regions);
+    INIT_LIST_HEAD(&d->arch.hvm_domain.msix_tables);
 
     rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
     if ( rc )
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 3dcde3d882..a335e75f8b 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -642,16 +642,15 @@ static unsigned int msi_gflags(uint16_t data, uint64_t 
addr)
                      XEN_DOMCTL_VMSI_X86_TRIG_MASK);
 }
 
-void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
-                        unsigned int entry, bool mask)
+static void vpci_mask_pirq(struct domain *d, int pirq, bool mask)
 {
     const struct pirq *pinfo;
     struct irq_desc *desc;
     unsigned long flags;
     int irq;
 
-    ASSERT(msi->arch.pirq >= 0 && entry < msi->vectors);
-    pinfo = pirq_info(pdev->domain, msi->arch.pirq + entry);
+    ASSERT(pirq >= 0);
+    pinfo = pirq_info(d, pirq);
     if ( !pinfo )
         return;
 
@@ -668,23 +667,31 @@ void vpci_msi_arch_mask(struct vpci_msi *msi, const 
struct pci_dev *pdev,
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
-int vpci_msi_arch_enable(struct vpci_msi *msi, const struct pci_dev *pdev,
-                         unsigned int vectors)
+void vpci_msi_arch_mask(struct vpci_msi *msi, const struct pci_dev *pdev,
+                        unsigned int entry, bool mask)
+{
+    vpci_mask_pirq(pdev->domain, msi->arch.pirq + entry, mask);
+}
+
+static int vpci_msi_enable(const struct pci_dev *pdev, uint32_t data,
+                           uint64_t address, unsigned int nr,
+                           paddr_t table_base)
 {
     struct msi_info msi_info = {
         .seg = pdev->seg,
         .bus = pdev->bus,
         .devfn = pdev->devfn,
-        .entry_nr = vectors,
+        .table_base = table_base,
+        .entry_nr = nr,
     };
-    unsigned int i;
-    int rc;
-
-    ASSERT(msi->arch.pirq == INVALID_PIRQ);
+    unsigned int i, vectors = table_base ? 1 : nr;
+    int rc, pirq = INVALID_PIRQ;
 
     /* Get a PIRQ. */
-    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &msi->arch.pirq,
-                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &pirq,
+                                   table_base ? MAP_PIRQ_TYPE_MSI
+                                              : MAP_PIRQ_TYPE_MULTI_MSI,
+                                   &msi_info);
     if ( rc )
     {
         gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
@@ -695,14 +702,14 @@ int vpci_msi_arch_enable(struct vpci_msi *msi, const 
struct pci_dev *pdev,
 
     for ( i = 0; i < vectors; i++ )
     {
-        uint8_t vector = MASK_EXTR(msi->data, MSI_DATA_VECTOR_MASK);
-        uint8_t vector_mask = 0xff >> (8 - fls(msi->vectors) + 1);
+        uint8_t vector = MASK_EXTR(data, MSI_DATA_VECTOR_MASK);
+        uint8_t vector_mask = 0xff >> (8 - fls(vectors) + 1);
         xen_domctl_bind_pt_irq_t bind = {
-            .machine_irq = msi->arch.pirq + i,
+            .machine_irq = pirq + i,
             .irq_type = PT_IRQ_TYPE_MSI,
             .u.msi.gvec = (vector & ~vector_mask) |
                           ((vector + i) & vector_mask),
-            .u.msi.gflags = msi_gflags(msi->data, msi->address),
+            .u.msi.gflags = msi_gflags(data, address),
         };
 
         pcidevs_lock();
@@ -712,33 +719,48 @@ int vpci_msi_arch_enable(struct vpci_msi *msi, const 
struct pci_dev *pdev,
             gdprintk(XENLOG_ERR,
                      "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
                      pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
-                     PCI_FUNC(pdev->devfn), msi->arch.pirq + i, rc);
+                     PCI_FUNC(pdev->devfn), pirq + i, rc);
             while ( bind.machine_irq-- )
                 pt_irq_destroy_bind(pdev->domain, &bind);
             spin_lock(&pdev->domain->event_lock);
-            unmap_domain_pirq(pdev->domain, msi->arch.pirq);
+            unmap_domain_pirq(pdev->domain, pirq);
             spin_unlock(&pdev->domain->event_lock);
             pcidevs_unlock();
-            msi->arch.pirq = INVALID_PIRQ;
             return rc;
         }
         pcidevs_unlock();
     }
 
-    return 0;
+    return pirq;
 }
 
-int vpci_msi_arch_disable(struct vpci_msi *msi, const struct pci_dev *pdev)
+int vpci_msi_arch_enable(struct vpci_msi *msi, const struct pci_dev *pdev,
+                         unsigned int vectors)
+{
+    int rc;
+
+    ASSERT(msi->arch.pirq == INVALID_PIRQ);
+    rc = vpci_msi_enable(pdev, msi->data, msi->address, vectors, 0);
+    if ( rc >= 0 )
+    {
+        msi->arch.pirq = rc;
+        rc = 0;
+    }
+
+    return rc;
+}
+
+void vpci_msi_disable(const struct pci_dev *pdev, int pirq, unsigned int nr)
 {
     unsigned int i;
 
-    ASSERT(msi->arch.pirq != INVALID_PIRQ);
+    ASSERT(pirq != INVALID_PIRQ);
 
     pcidevs_lock();
-    for ( i = 0; i < msi->vectors; i++ )
+    for ( i = 0; i < nr; i++ )
     {
         xen_domctl_bind_pt_irq_t bind = {
-            .machine_irq = msi->arch.pirq + i,
+            .machine_irq = pirq + i,
             .irq_type = PT_IRQ_TYPE_MSI,
         };
         int rc;
@@ -748,10 +770,14 @@ int vpci_msi_arch_disable(struct vpci_msi *msi, const 
struct pci_dev *pdev)
     }
 
     spin_lock(&pdev->domain->event_lock);
-    unmap_domain_pirq(pdev->domain, msi->arch.pirq);
+    unmap_domain_pirq(pdev->domain, pirq);
     spin_unlock(&pdev->domain->event_lock);
     pcidevs_unlock();
+}
 
+int vpci_msi_arch_disable(struct vpci_msi *msi, const struct pci_dev *pdev)
+{
+    vpci_msi_disable(pdev, msi->arch.pirq, msi->vectors);
     msi->arch.pirq = INVALID_PIRQ;
 
     return 0;
@@ -774,3 +800,58 @@ void vpci_msi_arch_print(const struct vpci_msi *msi)
            MASK_EXTR(msi->address, MSI_ADDR_DEST_ID_MASK),
            msi->arch.pirq);
 }
+
+void vpci_msix_arch_mask_entry(struct vpci_msix_entry *entry,
+                               const struct pci_dev *pdev, bool mask)
+{
+    ASSERT(entry->arch.pirq != INVALID_PIRQ);
+    vpci_mask_pirq(pdev->domain, entry->arch.pirq, mask);
+}
+
+int vpci_msix_arch_enable_entry(struct vpci_msix_entry *entry,
+                                const struct pci_dev *pdev, paddr_t table_base)
+{
+    int rc;
+
+    ASSERT(entry->arch.pirq == INVALID_PIRQ);
+    rc = vpci_msi_enable(pdev, entry->data, entry->addr, entry->nr,
+                         table_base);
+    if ( rc >= 0 )
+    {
+        entry->arch.pirq = rc;
+        rc = 0;
+    }
+
+    return rc;
+}
+
+int vpci_msix_arch_disable_entry(struct vpci_msix_entry *entry,
+                                 const struct pci_dev *pdev)
+{
+    if ( entry->arch.pirq == INVALID_PIRQ )
+        return -ENOENT;
+
+    vpci_msi_disable(pdev, entry->arch.pirq, 1);
+    entry->arch.pirq = INVALID_PIRQ;
+
+    return 0;
+}
+
+int vpci_msix_arch_init_entry(struct vpci_msix_entry *entry)
+{
+    entry->arch.pirq = INVALID_PIRQ;
+    return 0;
+}
+
+void vpci_msix_arch_print_entry(const struct vpci_msix_entry *entry)
+{
+    printk("%4u vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu mask=%u pirq: %d\n",
+           entry->nr, MASK_EXTR(entry->data, MSI_DATA_VECTOR_MASK),
+           entry->data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+           entry->data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+           entry->data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+           entry->addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+           entry->addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "fixed",
+           MASK_EXTR(entry->addr, MSI_ADDR_DEST_ID_MASK),
+           entry->masked, entry->arch.pirq);
+}
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 62cec9e82b..55d1bdfda0 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@
-obj-y += vpci.o header.o msi.o
+obj-y += vpci.o header.o msi.o msix.o
diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 07a6bbf0be..02b9776ea9 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -152,6 +152,7 @@ static int vpci_check_bar_overlap(const struct pci_dev 
*pdev,
 static void vpci_modify_bars(const struct pci_dev *pdev, bool map)
 {
     struct vpci_header *header = &pdev->vpci->header;
+    struct vpci_msix *msix = pdev->vpci->msix;
     struct rangeset *mem = rangeset_new(NULL, NULL, 0);
     unsigned int i;
     int rc;
@@ -186,6 +187,21 @@ static void vpci_modify_bars(const struct pci_dev *pdev, 
bool map)
         }
     }
 
+    /* Remove any MSIX regions if present. */
+    for ( i = 0; msix && i < ARRAY_SIZE(msix->mem); i++ )
+    {
+        paddr_t start =
+            header->bars[msix->mem[i].bir].addr + msix->mem[i].offset;
+
+        rc = rangeset_remove_range(mem, PFN_DOWN(start),
+                                   PFN_DOWN(start + msix->mem[i].size - 1));
+        if ( rc )
+        {
+            rangeset_destroy(mem);
+            return;
+        }
+    }
+
     /* Check for overlaps with other device's BARs. */
     rc = vpci_check_bar_overlap(pdev, NULL, mem);
     if ( rc )
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index 7a0b0521c5..5c10a0d9c9 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -320,13 +320,17 @@ void vpci_dump_msi(void)
         if ( !has_vpci(d) )
             continue;
 
-        printk("vPCI MSI information for d%d\n", d->domain_id);
+        printk("vPCI MSI/MSI-X information for d%d\n", d->domain_id);
 
         list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list )
         {
             uint8_t seg = pdev->seg, bus = pdev->bus;
             uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
             const struct vpci_msi *msi = pdev->vpci->msi;
+            const struct vpci_msix *msix = pdev->vpci->msix;
+
+            if ( msi || msix )
+                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
 
             if ( !spin_trylock(&pdev->vpci->lock) )
             {
@@ -336,7 +340,7 @@ void vpci_dump_msi(void)
 
             if ( msi )
             {
-                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+                printk(" MSI\n");
 
                 printk("  Enabled: %u Supports masking: %u 64-bit addresses: 
%u\n",
                        msi->enabled, msi->masking, msi->address64);
@@ -349,6 +353,20 @@ void vpci_dump_msi(void)
                     printk("  mask=%08x\n", msi->mask);
             }
 
+            if ( msix )
+            {
+                unsigned int i;
+
+                printk(" MSI-X\n");
+
+                printk("  Max entries: %u maskall: %u enabled: %u\n",
+                       msix->max_entries, msix->masked, msix->enabled);
+
+                printk("  Table entries:\n");
+                for ( i = 0; i < msix->max_entries; i++ )
+                    vpci_msix_arch_print_entry(&msix->entries[i]);
+            }
+
             spin_unlock(&pdev->vpci->lock);
             process_pending_softirqs();
         }
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
new file mode 100644
index 0000000000..ad4684c357
--- /dev/null
+++ b/xen/drivers/vpci/msix.c
@@ -0,0 +1,506 @@
+/*
+ * Handlers for accesses to the MSI-X capability structure and the memory
+ * region.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+#include <asm/msi.h>
+#include <xen/p2m-common.h>
+#include <xen/keyhandler.h>
+
+#define VMSIX_SIZE(num) offsetof(struct vpci_msix, entries[num])
+#define VMSIX_ADDR_IN_RANGE(addr, table, bar)                              \
+    ((addr) >= (bar)->addr + (table)->offset &&                            \
+     (addr) < (bar)->addr + (table)->offset + (table)->size)
+
+static uint32_t vpci_msix_control_read(const struct pci_dev *pdev,
+                                       unsigned int reg, void *data)
+{
+    const struct vpci_msix *msix = data;
+    uint16_t val;
+
+    val = msix->max_entries - 1;
+    val |= msix->enabled ? PCI_MSIX_FLAGS_ENABLE : 0;
+    val |= msix->masked ? PCI_MSIX_FLAGS_MASKALL : 0;
+
+    return val;
+}
+
+static void vpci_msix_control_write(const struct pci_dev *pdev,
+                                    unsigned int reg, uint32_t val, void *data)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msix *msix = data;
+    bool new_masked, new_enabled;
+    unsigned int i;
+    int rc;
+
+    new_masked = val & PCI_MSIX_FLAGS_MASKALL;
+    new_enabled = val & PCI_MSIX_FLAGS_ENABLE;
+
+    /*
+     * According to the PCI 3.0 specification, switching the enable bit
+     * to 1 or the function mask bit to 0 should cause all the cached
+     * addresses and data fields to be recalculated. Xen implements this
+     * as disabling and enabling the entries.
+     *
+     * Note that the disable/enable sequence is only performed when the
+     * guest has written to the entry (ie: updated field set) or MSIX is
+     * enabled.
+     */
+    if ( new_enabled && !new_masked && (!msix->enabled || msix->masked) )
+    {
+        paddr_t table_base =
+            pdev->vpci->header.bars[msix->mem[VPCI_MSIX_TABLE].bir].addr;
+
+        for ( i = 0; i < msix->max_entries; i++ )
+        {
+            if ( msix->entries[i].masked ||
+                 (new_enabled && msix->enabled && !msix->entries[i].updated) )
+                continue;
+
+            rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
+            if ( rc )
+            {
+                gprintk(XENLOG_WARNING,
+                        "%04x:%02x:%02x.%u: unable to disable entry %u: %d\n",
+                        seg, bus, slot, func, msix->entries[i].nr, rc);
+                return;
+            }
+
+            rc = vpci_msix_arch_enable_entry(&msix->entries[i], pdev,
+                                             table_base);
+            if ( rc )
+            {
+                gprintk(XENLOG_WARNING,
+                        "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
+                        seg, bus, slot, func, msix->entries[i].nr, rc);
+                /* Entry is likely not properly configured, skip it. */
+                continue;
+            }
+
+            /*
+             * At this point the PIRQ is still masked. Unmask it, or else the
+             * guest won't receive interrupts. This is due to the
+             * disable/enable sequence performed above.
+             */
+            vpci_msix_arch_mask_entry(&msix->entries[i], pdev, false);
+
+            msix->entries[i].updated = false;
+        }
+    }
+    else if ( !new_enabled && msix->enabled )
+    {
+        /* Guest has disabled MSIX, disable all entries. */
+        for ( i = 0; i < msix->max_entries; i++ )
+        {
+            /*
+             * NB: vpci_msix_arch_disable can be called for entries that are
+             * not setup, it will return -ENOENT in that case.
+             */
+            rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
+            switch ( rc )
+            {
+            case 0:
+                /*
+                 * Mark the entry successfully disabled as updated, so that on
+                 * the next enable the entry is properly setup. This is done
+                 * so that the following flow works correctly:
+                 *
+                 * mask entry -> disable MSIX -> enable MSIX -> unmask entry
+                 *
+                 * Without setting 'updated', the 'unmask entry' step will fail
+                 * because the entry has not been updated, so it would not be
+                 * mapped/bound at all.
+                 */
+                msix->entries[i].updated = true;
+                break;
+            case -ENOENT:
+                /* Ignore non-present entry. */
+                break;
+            default:
+                gprintk(XENLOG_WARNING,
+                         "%04x:%02x:%02x.%u: unable to disable entry %u: %d\n",
+                         seg, bus, slot, func, msix->entries[i].nr, rc);
+                return;
+            }
+        }
+    }
+
+    if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
+         pci_msi_conf_write_intercept(msix->pdev, reg, 2, &val) >= 0 )
+        pci_conf_write16(seg, bus, slot, func, reg, val);
+
+    msix->masked = new_masked;
+    msix->enabled = new_enabled;
+}
+
+static struct vpci_msix *vpci_msix_find(const struct domain *d,
+                                        unsigned long addr)
+{
+    struct vpci_msix *msix;
+
+    list_for_each_entry ( msix, &d->arch.hvm_domain.msix_tables, next )
+    {
+        const struct vpci_bar *bars = msix->pdev->vpci->header.bars;
+        unsigned int i;
+
+        for ( i = 0; i < ARRAY_SIZE(msix->mem); i++ )
+            if ( bars[msix->mem[i].bir].enabled &&
+                 VMSIX_ADDR_IN_RANGE(addr, &msix->mem[i],
+                                     &bars[msix->mem[i].bir]) )
+                return msix;
+    }
+
+    return NULL;
+}
+
+static int vpci_msix_accept(struct vcpu *v, unsigned long addr)
+{
+    return !!vpci_msix_find(v->domain, addr);
+}
+
+static bool vpci_msix_access_allowed(const struct pci_dev *pdev,
+                                     unsigned long addr, unsigned int len)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+
+    /* Only allow 32/64b accesses. */
+    if ( len != 4 && len != 8 )
+    {
+        gprintk(XENLOG_WARNING,
+                "%04x:%02x:%02x.%u: invalid MSI-X table access size: %u\n",
+                seg, bus, slot, func, len);
+        return false;
+    }
+
+    /* Only allow aligned accesses. */
+    if ( (addr & (len - 1)) != 0 )
+    {
+        gprintk(XENLOG_WARNING,
+                "%04x:%02x:%02x.%u: MSI-X only allows aligned accesses\n",
+                seg, bus, slot, func);
+        return false;
+    }
+
+    return true;
+}
+
+static struct vpci_msix_entry *vpci_msix_get_entry(struct vpci_msix *msix,
+                                                   const struct vpci_bar *bars,
+                                                   unsigned long addr)
+{
+    paddr_t start = bars[msix->mem[VPCI_MSIX_TABLE].bir].addr +
+                    msix->mem[VPCI_MSIX_TABLE].offset;
+
+    return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
+}
+
+static int vpci_msix_read(struct vcpu *v, unsigned long addr,
+                          unsigned int len, unsigned long *data)
+{
+    struct domain *d = v->domain;
+    const struct vpci_bar *bars;
+    struct vpci_msix *msix;
+    const struct vpci_msix_entry *entry;
+    unsigned int offset;
+
+    *data = ~0ul;
+
+    msix = vpci_msix_find(d, addr);
+    if ( !msix || !vpci_msix_access_allowed(msix->pdev, addr, len) )
+        return X86EMUL_OKAY;
+
+    bars = msix->pdev->vpci->header.bars;
+    if ( VMSIX_ADDR_IN_RANGE(addr, &msix->mem[VPCI_MSIX_PBA],
+                             &bars[msix->mem[VPCI_MSIX_PBA].bir]) )
+    {
+        /*
+         * Access to PBA.
+         *
+         * TODO: note that this relies on having the PBA identity mapped to the
+         * guest address space. If this changes the address will need to be
+         * translated.
+         */
+        switch ( len )
+        {
+        case 4:
+            *data = readl(addr);
+            break;
+        case 8:
+            *data = readq(addr);
+            break;
+        default:
+            ASSERT_UNREACHABLE();
+            break;
+        }
+
+        return X86EMUL_OKAY;
+    }
+
+    spin_lock(&msix->pdev->vpci->lock);
+    entry = vpci_msix_get_entry(msix, bars, addr);
+    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+
+    switch ( offset )
+    {
+    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
+        *data = entry->addr;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
+        *data = entry->addr >> 32;
+        break;
+    case PCI_MSIX_ENTRY_DATA_OFFSET:
+        *data = entry->data;
+        if ( len == 8 )
+            *data |=
+                (uint64_t)(entry->masked ? PCI_MSIX_VECTOR_BITMASK : 0) << 32;
+        break;
+    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
+        *data = entry->masked ? PCI_MSIX_VECTOR_BITMASK : 0;
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+    spin_unlock(&msix->pdev->vpci->lock);
+
+    return X86EMUL_OKAY;
+}
+
+static int vpci_msix_write(struct vcpu *v, unsigned long addr,
+                           unsigned int len, unsigned long data)
+{
+    struct domain *d = v->domain;
+    const struct vpci_bar *bars;
+    struct vpci_msix *msix;
+    struct vpci_msix_entry *entry;
+    unsigned int offset;
+
+    msix = vpci_msix_find(d, addr);
+    if ( !msix || !vpci_msix_access_allowed(msix->pdev, addr, len) )
+        return X86EMUL_OKAY;
+
+    bars = msix->pdev->vpci->header.bars;
+    if ( VMSIX_ADDR_IN_RANGE(addr, &msix->mem[VPCI_MSIX_PBA],
+                             &bars[msix->mem[VPCI_MSIX_PBA].bir]) )
+    {
+        /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
+        if ( is_hardware_domain(d) )
+        {
+            switch ( len )
+            {
+            case 4:
+                writel(data, addr);
+                break;
+            case 8:
+                writeq(data, addr);
+                break;
+            default:
+                ASSERT_UNREACHABLE();
+                break;
+            }
+        }
+
+        return X86EMUL_OKAY;
+    }
+
+    spin_lock(&msix->pdev->vpci->lock);
+    entry = vpci_msix_get_entry(msix, bars, addr);
+    offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+
+    /*
+     * NB: Xen allows writes to the data/address registers with the entry
+     * unmasked. The specification says this is undefined behavior, and Xen
+     * implements it as storing the written value, which will be made effective
+     * in the next mask/unmask cycle. This also mimics the implementation in
+     * QEMU.
+     */
+    switch ( offset )
+    {
+    case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
+        entry->updated = true;
+        if ( len == 8 )
+        {
+            entry->addr = data;
+            break;
+        }
+        entry->addr &= ~0xffffffff;
+        entry->addr |= data;
+        break;
+    case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
+        entry->updated = true;
+        entry->addr &= 0xffffffff;
+        entry->addr |= (uint64_t)data << 32;
+        break;
+    case PCI_MSIX_ENTRY_DATA_OFFSET:
+        entry->updated = true;
+        entry->data = data;
+
+        if ( len == 4 )
+            break;
+
+        data >>= 32;
+        /* fallthrough */
+    case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
+    {
+        bool new_masked = data & PCI_MSIX_VECTOR_BITMASK;
+        const struct pci_dev *pdev = msix->pdev;
+        paddr_t table_base = bars[msix->mem[VPCI_MSIX_TABLE].bir].addr;
+        int rc;
+
+        if ( entry->masked == new_masked )
+            /* No change in the mask bit, nothing to do. */
+            break;
+
+        if ( !new_masked && msix->enabled && !msix->masked && entry->updated )
+        {
+            /*
+             * If MSI-X is enabled, the function mask is not active, the entry
+             * is being unmasked and there have been changes to the address or
+             * data fields Xen needs to disable and enable the entry in order
+             * to pick up the changes.
+             */
+            rc = vpci_msix_arch_disable_entry(entry, pdev);
+            if ( rc && rc != -ENOENT )
+            {
+                gprintk(XENLOG_WARNING,
+                        "%04x:%02x:%02x.%u: unable to disable entry %u: %d\n",
+                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                        PCI_FUNC(pdev->devfn), entry->nr, rc);
+                break;
+            }
+
+            rc = vpci_msix_arch_enable_entry(entry, pdev, table_base);
+            if ( rc )
+            {
+                gprintk(XENLOG_WARNING,
+                        "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
+                        pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                        PCI_FUNC(pdev->devfn), entry->nr, rc);
+                break;
+            }
+            entry->updated = false;
+        }
+
+        vpci_msix_arch_mask_entry(entry, pdev, new_masked);
+        entry->masked = new_masked;
+
+        break;
+    }
+    default:
+        ASSERT_UNREACHABLE();
+        break;
+    }
+    spin_unlock(&msix->pdev->vpci->lock);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct hvm_mmio_ops vpci_msix_table_ops = {
+    .check = vpci_msix_accept,
+    .read = vpci_msix_read,
+    .write = vpci_msix_write,
+};
+
+static int vpci_init_msix(struct pci_dev *pdev)
+{
+    struct domain *d = pdev->domain;
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msix *msix;
+    struct vpci_msix_mem *table, *pba;
+    unsigned int msix_offset, i, max_entries;
+    uint16_t control;
+    int rc;
+
+    msix_offset = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX);
+    if ( !msix_offset )
+        return 0;
+
+    control = pci_conf_read16(seg, bus, slot, func,
+                              msix_control_reg(msix_offset));
+
+    max_entries = msix_table_size(control);
+
+    msix = xzalloc_bytes(VMSIX_SIZE(max_entries));
+    if ( !msix )
+        return -ENOMEM;
+
+    msix->max_entries = max_entries;
+    msix->pdev = pdev;
+
+    /* Find the MSI-X table address. */
+    table = &msix->mem[VPCI_MSIX_TABLE];
+    table->offset = pci_conf_read32(seg, bus, slot, func,
+                                    msix_table_offset_reg(msix_offset));
+    table->bir = table->offset & PCI_MSIX_BIRMASK;
+    table->offset &= ~PCI_MSIX_BIRMASK;
+    table->size = msix->max_entries * PCI_MSIX_ENTRY_SIZE;
+
+    /* Find the MSI-X pba address. */
+    pba = &msix->mem[VPCI_MSIX_PBA];
+    pba->offset = pci_conf_read32(seg, bus, slot, func,
+                                  msix_pba_offset_reg(msix_offset));
+    pba->bir = pba->offset & PCI_MSIX_BIRMASK;
+    pba->offset &= ~PCI_MSIX_BIRMASK;
+    /*
+     * The spec mentions regarding to the PBA that "The last QWORD will not
+     * necessarily be fully populated", so it implies that the PBA size is
+     * 64-bit aligned.
+     */
+    pba->size = ROUNDUP(DIV_ROUND_UP(msix->max_entries, 8), 8);
+
+    for ( i = 0; i < msix->max_entries; i++)
+    {
+        msix->entries[i].masked = true;
+        msix->entries[i].nr = i;
+        vpci_msix_arch_init_entry(&msix->entries[i]);
+    }
+
+    if ( list_empty(&d->arch.hvm_domain.msix_tables) )
+        register_mmio_handler(d, &vpci_msix_table_ops);
+
+    list_add(&msix->next, &d->arch.hvm_domain.msix_tables);
+
+    rc = vpci_add_register(pdev, vpci_msix_control_read,
+                           vpci_msix_control_write,
+                           msix_control_reg(msix_offset), 2, msix);
+    if ( rc )
+    {
+        xfree(msix);
+        return rc;
+    }
+
+    pdev->vpci->msix = msix;
+
+    return 0;
+}
+REGISTER_VPCI_INIT(vpci_init_msix, VPCI_PRIORITY_HIGH);
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index cd19ee11e9..5e3139e61c 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -188,6 +188,9 @@ struct hvm_domain {
     struct list_head mmcfg_regions;
     rwlock_t mmcfg_lock;
 
+    /* List of MSI-X tables. */
+    struct list_head msix_tables;
+
     /* List of permanently write-mapped pages. */
     struct {
         spinlock_t lock;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index c47cc971d3..e7367071ce 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -132,6 +132,11 @@ struct vpci_arch_msi {
     int pirq;
 };
 
+/* Arch-specific MSI-X entry data for vPCI. */
+struct vpci_arch_msix_entry {
+    int pirq;
+};
+
 enum stdvga_cache_state {
     STDVGA_CACHE_UNINITIALIZED,
     STDVGA_CACHE_ENABLED,
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index c6913631c0..9656b1855b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -100,6 +100,40 @@ struct vpci {
         /* 64-bit address capable? */
         bool address64;
     } *msi;
+
+    /* MSI-X data. */
+    struct vpci_msix {
+        struct pci_dev *pdev;
+        /* List link. */
+        struct list_head next;
+        /* Table information. */
+        struct vpci_msix_mem {
+            /* MSI-X table offset. */
+            unsigned int offset;
+            /* MSI-X table BIR. */
+            unsigned int bir;
+            /* Table size. */
+            unsigned int size;
+#define VPCI_MSIX_TABLE     0
+#define VPCI_MSIX_PBA       1
+#define VPCI_MSIX_MEM_NUM   2
+        } mem[VPCI_MSIX_MEM_NUM];
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_entries;
+        /* MSI-X enabled? */
+        bool enabled;
+        /* Masked? */
+        bool masked;
+        /* Entries. */
+        struct vpci_msix_entry {
+            uint64_t addr;
+            uint32_t data;
+            unsigned int nr;
+            struct vpci_arch_msix_entry arch;
+            bool masked;
+            bool updated;
+        } entries[];
+    } *msix;
 #endif
 };
 
@@ -119,6 +153,17 @@ int vpci_msi_arch_enable(struct vpci_msi *msi, const 
struct pci_dev *pdev,
 int vpci_msi_arch_disable(struct vpci_msi *msi, const struct pci_dev *pdev);
 void vpci_msi_arch_init(struct vpci_msi *msi);
 void vpci_msi_arch_print(const struct vpci_msi *msi);
+
+/* Arch-specific vPCI MSI-X helpers. */
+void vpci_msix_arch_mask_entry(struct vpci_msix_entry *entry,
+                               const struct pci_dev *pdev, bool mask);
+int vpci_msix_arch_enable_entry(struct vpci_msix_entry *entry,
+                                const struct pci_dev *pdev,
+                                paddr_t table_base);
+int vpci_msix_arch_disable_entry(struct vpci_msix_entry *entry,
+                                 const struct pci_dev *pdev);
+int vpci_msix_arch_init_entry(struct vpci_msix_entry *entry);
+void vpci_msix_arch_print_entry(const struct vpci_msix_entry *entry);
 #endif
 
 #endif
-- 
2.11.0 (Apple Git-81)


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.