|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v7 for-next 12/12] vpci/msix: add MSI-X handlers
>>> On 18.10.17 at 13:40, <roger.pau@xxxxxxxxxx> wrote:
> Changes since v6:
> - Reduce the output of the debug keys.
> - Fix comments and code to match in vpci_msix_control_write.
> - Optimize size of the MSIX structure.
> - Convert vpci_msix_mem to a uint32_t in order to reduce the size of
> vpci_msix. Introduce some macros to make it easier to get the MSIX
> tables related data.
This is misleading - there's no vpci_msix_mem in the patch. I think
you mean the tables[] member of struct vpci_msix?
> +void vpci_msix_arch_print_entry(const struct vpci_msix_entry *entry)
> +{
> + printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu mask=%u pirq: %d\n",
Aiui %#02x is pointless as the 0x already occupies both characters.
Please drop the #; imo vectors make no sense to be logged in decimal
anyway.
> @@ -254,6 +255,20 @@ static void modify_bars(const struct pci_dev *pdev, bool
> map, bool rom)
> }
> }
>
> + /* Remove any MSIX regions if present. */
> + for ( i = 0; msix && i < ARRAY_SIZE(msix->tables); i++ )
> + {
> + paddr_t start = VMSIX_TABLE_ADDR(pdev->vpci, i);
> +
> + rc = rangeset_remove_range(mem, PFN_DOWN(start), PFN_DOWN(start +
> + VMSIX_TABLE_SIZE(pdev->vpci, i) - 1));
Bad indentation. Also if the second PFN_DOWN() is really intended
this way (rather than PFN_UP()), please add a comment explaining
why that is.
> + if ( rc )
> + {
> + rangeset_destroy(mem);
> + return;
Silent discarding of an error also needs an explanation in a comment.
> @@ -325,6 +329,22 @@ void vpci_dump_msi(void)
> vpci_msi_arch_print(msi);
> }
>
> + if ( msix )
> + {
> + unsigned int i;
> +
> + printk(" MSI-X\n");
> +
> + printk(" entries: %u maskall: %d enabled: %d\n",
> + msix->max_entries, msix->masked, msix->enabled);
> +
> + for ( i = 0; i < msix->max_entries; i++ )
> + {
> + printk(" %4u ", i);
> + vpci_msix_arch_print_entry(&msix->entries[i]);
> + }
Printing a partial line (i.e. w/o newline) and then calling an arch-
dependent function is not ideal. Perhaps bets to also hand the
index to the function, so it can decide how best to print
everything. Also, btw, " %4" can be had with less .rodata
consumption by using "%6u".
> +static void control_write(const struct pci_dev *pdev, unsigned int reg,
> + uint32_t val, void *data)
> +{
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> + struct vpci_msix *msix = data;
> + bool new_masked = val & PCI_MSIX_FLAGS_MASKALL;
> + bool new_enabled = val & PCI_MSIX_FLAGS_ENABLE;
> + unsigned int i;
> + int rc;
> +
> + if ( new_masked == msix->masked && new_enabled == msix->enabled )
> + return;
> +
> + /*
> + * According to the PCI 3.0 specification, switching the enable bit to
> 1
> + * or the function mask bit to 0 should cause all the cached addresses
> + * and data fields to be recalculated.
> + *
> + * In order to avoid the overhead of disabling and enabling all the
> + * entries every time the guest sets the maskall bit, Xen will only
> + * perform the disable and enable sequence when the guest has written to
> + * the entry.
> + */
> + if ( new_enabled && !new_masked && (!msix->enabled || msix->masked) )
> + {
> + for ( i = 0; i < msix->max_entries; i++ )
> + {
> + if ( msix->entries[i].masked || !msix->entries[i].updated )
> + continue;
> +
> + rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
> + /* Ignore ENOENT, it means the entry wasn't setup. */
> + if ( rc && rc != -ENOENT )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: unable to disable entry %u:
> %d\n",
> + pdev->seg, pdev->bus, slot, func, i, rc);
In a log from e.g. a customer, how am I to tell this message from ...
> + return;
> + }
> +
> + rc = vpci_msix_arch_enable_entry(&msix->entries[i], pdev,
> + VMSIX_TABLE_BASE(pdev->vpci,
> +
> VPCI_MSIX_TABLE));
> + if ( rc )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
> + pdev->seg, pdev->bus, slot, func, i, rc);
> + /* Entry is likely not properly configured, skip it. */
> + continue;
> + }
> +
> + /*
> + * At this point the PIRQ is still masked. Unmask it, or else the
> + * guest won't receive interrupts. This is due to the
> + * disable/enable sequence performed above.
> + */
> + vpci_msix_arch_mask_entry(&msix->entries[i], pdev, false);
> +
> + msix->entries[i].updated = false;
> + }
> + }
> + else if ( !new_enabled && msix->enabled )
> + {
> + /* Guest has disabled MSIX, disable all entries. */
> + for ( i = 0; i < msix->max_entries; i++ )
> + {
> + /*
> + * NB: vpci_msix_arch_disable can be called for entries that are
> + * not setup, it will return -ENOENT in that case.
> + */
> + rc = vpci_msix_arch_disable_entry(&msix->entries[i], pdev);
> + switch ( rc )
> + {
> + case 0:
> + /*
> + * Mark the entry successfully disabled as updated, so that
> on
> + * the next enable the entry is properly setup. This is done
> + * so that the following flow works correctly:
> + *
> + * mask entry -> disable MSIX -> enable MSIX -> unmask entry
> + *
> + * Without setting 'updated', the 'unmask entry' step will
> fail
> + * because the entry has not been updated, so it would not be
> + * mapped/bound at all.
> + */
> + msix->entries[i].updated = true;
> + break;
> + case -ENOENT:
> + /* Ignore non-present entry. */
> + break;
> + default:
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: unable to disable entry %u:
> %d\n",
> + pdev->seg, pdev->bus, slot, func, i, rc);
... this one?
> +static bool access_allowed(const struct pci_dev *pdev, unsigned long addr,
> + unsigned int len)
> +{
> + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +
> + /* Only allow 32/64b accesses. */
> + if ( len != 4 && len != 8 )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: invalid MSI-X table access size: %u\n",
> + pdev->seg, pdev->bus, slot, func, len);
> + return false;
> + }
> +
> + /* Only allow aligned accesses. */
> + if ( (addr & (len - 1)) != 0 )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: MSI-X only allows aligned accesses\n",
> + pdev->seg, pdev->bus, slot, func);
> + return false;
> + }
Perhaps fold both error paths and messages (I'm not convinced we
need all of them [also considering others you add] anyway)?
> +static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
> + unsigned long data)
> +{
> + const struct domain *d = v->domain;
> + struct vpci_msix *msix = msix_find(d, addr);
> + struct vpci_msix_entry *entry;
> + unsigned int offset;
> +
> + if ( !msix )
> + return X86EMUL_RETRY;
> +
> + if ( !access_allowed(msix->pdev, addr, len) )
> + return X86EMUL_OKAY;
> +
> + if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
> + {
> + /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
> + if ( is_hardware_domain(d) )
> + {
> + switch ( len )
> + {
> + case 4:
> + writel(data, addr);
> + break;
> + case 8:
> + writeq(data, addr);
> + break;
> + default:
> + ASSERT_UNREACHABLE();
> + break;
> + }
> + }
> +
> + return X86EMUL_OKAY;
> + }
> +
> + spin_lock(&msix->pdev->vpci->lock);
> + entry = get_entry(msix, addr);
> + offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
> +
> + /*
> + * NB: Xen allows writes to the data/address registers with the entry
> + * unmasked. The specification says this is undefined behavior, and Xen
> + * implements it as storing the written value, which will be made
> effective
> + * in the next mask/unmask cycle. This also mimics the implementation in
> + * QEMU.
> + */
> + switch ( offset )
> + {
> + case PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET:
> + entry->updated = true;
> + if ( len == 8 )
> + {
> + entry->addr = data;
> + break;
> + }
> + entry->addr &= ~0xffffffff;
> + entry->addr |= data;
> + break;
> + case PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET:
This switch() is too large to not have blank lines between non-fall-
through case blocks.
> + entry->updated = true;
> + entry->addr &= 0xffffffff;
> + entry->addr |= (uint64_t)data << 32;
> + break;
> + case PCI_MSIX_ENTRY_DATA_OFFSET:
> + entry->updated = true;
> + entry->data = data;
> +
> + if ( len == 4 )
> + break;
> +
> + data >>= 32;
> + /* fallthrough */
> + case PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET:
> + {
> + bool new_masked = data & PCI_MSIX_VECTOR_BITMASK;
> + const struct pci_dev *pdev = msix->pdev;
> + int rc;
> +
> + if ( entry->masked == new_masked )
> + /* No change in the mask bit, nothing to do. */
> + break;
> +
> + if ( !new_masked && msix->enabled && !msix->masked && entry->updated
> )
> + {
> + /*
> + * If MSI-X is enabled, the function mask is not active, the
> entry
> + * is being unmasked and there have been changes to the address
> or
> + * data fields Xen needs to disable and enable the entry in order
> + * to pick up the changes.
> + */
> + rc = vpci_msix_arch_disable_entry(entry, pdev);
> + if ( rc && rc != -ENOENT )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: unable to disable entry %u:
> %d\n",
> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn), VMSIX_ENTRY_NR(msix, entry),
> rc);
> + break;
> + }
> +
> + rc = vpci_msix_arch_enable_entry(entry, pdev,
> + VMSIX_TABLE_BASE(pdev->vpci,
> +
> VPCI_MSIX_TABLE));
> + if ( rc )
> + {
> + gprintk(XENLOG_WARNING,
> + "%04x:%02x:%02x.%u: unable to enable entry %u: %d\n",
> + pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> + PCI_FUNC(pdev->devfn), VMSIX_ENTRY_NR(msix, entry),
> rc);
> + break;
> + }
This very much looks lie it is identical to the earlier disable/enable
code sequence - use a helper function?
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -105,6 +105,32 @@ struct vpci {
> /* Arch-specific data. */
> struct vpci_arch_msi arch;
> } *msi;
> +
> + /* MSI-X data. */
> + struct vpci_msix {
> + struct pci_dev *pdev;
> + /* List link. */
> + struct list_head next;
> + /* Table information. */
> +#define VPCI_MSIX_TABLE 0
> +#define VPCI_MSIX_PBA 1
> +#define VPCI_MSIX_MEM_NUM 2
> + uint32_t tables[VPCI_MSIX_MEM_NUM];
> + /* Maximum number of vectors supported by the device. */
> + uint16_t max_entries : 11;
Isn't this one bit too few (if there are exactly 2k entries)?
> @@ -127,6 +153,41 @@ int __must_check vpci_msi_arch_enable(struct vpci_msi
> *msi,
> void vpci_msi_arch_disable(struct vpci_msi *msi, const struct pci_dev *pdev);
> void vpci_msi_arch_init(struct vpci_msi *msi);
> void vpci_msi_arch_print(const struct vpci_msi *msi);
> +
> +/* Arch-specific vPCI MSI-X helpers. */
> +void vpci_msix_arch_mask_entry(struct vpci_msix_entry *entry,
> + const struct pci_dev *pdev, bool mask);
> +int __must_check vpci_msix_arch_enable_entry(struct vpci_msix_entry *entry,
> + const struct pci_dev *pdev,
> + paddr_t table_base);
> +int __must_check vpci_msix_arch_disable_entry(struct vpci_msix_entry *entry,
> + const struct pci_dev *pdev);
> +void vpci_msix_arch_init_entry(struct vpci_msix_entry *entry);
> +void vpci_msix_arch_print_entry(const struct vpci_msix_entry *entry);
> +
> +/*
> + * Helper functions to fetch MSIX related data. They are used by both the
> + * emulated MSIX code and the BAR handlers.
> + */
> +#define VMSIX_TABLE_BASE(vpci, nr) \
> + ((vpci)->header.bars[(vpci)->msix->tables[nr] & PCI_MSIX_BIRMASK].addr)
> +#define VMSIX_TABLE_ADDR(vpci, nr) \
> + (VMSIX_TABLE_BASE(vpci, nr) + \
> + ((vpci)->msix->tables[nr] & ~PCI_MSIX_BIRMASK))
> +
> +/*
> + * Note regarding the size calculation of the PBA: the spec mentions "The
> last
> + * QWORD will not necessarily be fully populated", so it implies that the PBA
> + * size is 64-bit aligned.
> + */
> +#define VMSIX_TABLE_SIZE(vpci, nr)
> \
> + ((nr == VPCI_MSIX_TABLE) ? (vpci)->msix->max_entries *
> PCI_MSIX_ENTRY_SIZE \
(nr)
Anyway I wonder whether some or all of the macros couldn't, like the
earlier comment say, be (inline) functions.
> + :
> ROUNDUP(DIV_ROUND_UP((vpci)->msix->max_entries, \
> + 8), 8))
> +
> +#define VMSIX_ENTRY_NR(msix, entry) \
> + (unsigned int)((entry) - (msix)->entries)
Missing outer parentheses.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |