[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Mon, 27 Nov 2023 18:00:57 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mMM5gqqY6GhO3pJrgHG+fgXdLNUvYC+Vadfm0e4i5sA=; b=GHSMiLnLOZdty382tfOu/4RTUI6Jdf0mqqN0vsDXm1jQDAJpQIi8OpVRTgh1Bdii+rSl6NwH1pR3L7xl5tQADdxhTZYREEcV3r9XUMgVjDp3QJqfzjt8Mw4yVSH7ph/0OQ1uF+oZ1hN2l7dX8wqNAVL7H/LkS5C93tr+Gc+PN6dBp2ZVeJYPIbqnrDk+MdT0SKHfx8q6+43bHm1iz4lDpmLkY35D4YYDfOKJ3XOza/r4l6HdsVPwUd8iSYX8i55dnOYk6WkreHJuUAg40iZc5u9h8BaeBs9vlkuodI3GgSChnNLDOCfeqUSTIHXEjcIP3oiU+EXjs6+2wsr0Dnz/4g==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SkFQmxhcg42pccdZnihu+/iyEmvoTmNX/3oftOIJcY/0pLndZ59/wJQaOJAeYvCV7xXX5e7Z1Yn00zrkWAB7RRwM2Ql7z3GlGzVwfIK+dXHQrDUE0vGN5SgI3vpt8tFvW9OyUQvFUGvYDRltrw/D3neh1XVm1KMcnFuiBvcB79UMt5y4Do9NEjCxsU3CSJn9eh2JBm6fk23O1gRiP76ww5ah3ZcSqDx1L//k4fTOgKGUeWY0dYaIW9pVWbyh/Y9DZvnzevmrfmOpBpH0Iyte7gsHLWX+VeNvYUCVp0M0WWJNUSajXGJIPz2ope+/eHfu46pwhUwvwup61MO/JO67rQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Mon, 27 Nov 2023 17:01:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 24.11.2023 02:47, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, extend
> msixtbl_mmio_ops to handle such accesses too.
> 
> Doing this, requires correlating read/write location with guest
> of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
> 
> This will be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, discard writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved, and also no known device abuses
> the spec in this way (at least yet).
> 
> To access those registers, msixtbl_mmio_ops need the relevant page
> mapped. MSI handling already has infrastructure for that, using fixmap,
> so try to map first/last page of the MSI-X table (if necessary) and save
> their fixmap indexes. Note that msix_get_fixmap() does reference
> counting and reuses existing mapping, so just call it directly, even if
> the page was mapped before. Also, it uses a specific range of fixmap
> indexes which doesn't include 0, so use 0 as default ("not mapped")
> value - which simplifies code a bit.
> 
> GCC gets confused about 'desc' variable:
> 
>     arch/x86/hvm/vmsi.c: In function ‘msixtbl_range’:
>     arch/x86/hvm/vmsi.c:553:8: error: ‘desc’ may be used uninitialized 
> [-Werror=maybe-uninitialized]
>       553 |     if ( desc )
>           |        ^
>     arch/x86/hvm/vmsi.c:537:28: note: ‘desc’ was declared here
>       537 |     const struct msi_desc *desc;
>           |                            ^~~~

This could do with also indicating the gcc version. Issues like this
tend to get fixed over time.

> It's conditional initialization is actually correct (in the case where
> it isn't initialized, function returns early), but to avoid
> build failure initialize it explicitly to NULL anyway.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Logic looks okay to me now (albeit I'm still not overly happy that we need
to gain such code), but there are a couple of cosmetic issues.

> @@ -213,6 +217,131 @@ static struct msi_desc *msixtbl_addr_to_desc(
>      return NULL;
>  }
>  
> +/*
> + * Returns:
> + *  - UINT_MAX if no handling should be done
> + *  - UINT_MAX-1 if write should be discarded
> + *  - a fixmap idx to use for handling
> + */
> +#define ADJACENT_DONT_HANDLE UINT_MAX
> +#define ADJACENT_DISCARD_WRITE (UINT_MAX - 1)

The comment would imo better talk in terms of the two constants you
define here.

> +static unsigned int adjacent_handle(
> +    const struct msixtbl_entry *entry, unsigned long addr, bool write)
> +{
> +    unsigned int adj_type;
> +    const struct arch_msix *msix;
> +
> +    if ( !entry || !entry->pdev )
> +        return ADJACENT_DONT_HANDLE;
> +
> +    if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable )
> +        adj_type = ADJ_IDX_FIRST;
> +    else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len - 
> 1) &&
> +              addr >= entry->gtable + entry->table_len )
> +        adj_type = ADJ_IDX_LAST;
> +    else
> +        return ADJACENT_DONT_HANDLE;
> +
> +    msix = entry->pdev->msix;
> +    ASSERT(msix);
> +
> +    if ( !msix->adj_access_idx[adj_type] )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "Page for adjacent(%d) MSI-X table access not initialized 
> for %pp (addr %#lx, gtable %#lx\n",
> +                adj_type, &entry->pdev->sbdf, addr, entry->gtable);
> +
> +        return ADJACENT_DONT_HANDLE;
> +    }
> +
> +    /* If PBA lives on the same page too, discard writes. */
> +    if ( write &&
> +         ((adj_type == ADJ_IDX_LAST &&
> +           msix->table.last == msix->pba.first) ||
> +          (adj_type == ADJ_IDX_FIRST &&
> +           msix->table.first == msix->pba.last)) )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "MSI-X table and PBA of %pp live on the same page, "
> +                "writing to other registers there is not implemented\n",
> +                &entry->pdev->sbdf);

Here and above I think verbosity needs limiting to the first instance per
device per domain.

> +        return ADJACENT_DISCARD_WRITE;
> +    }
> +
> +    return msix->adj_access_idx[adj_type];
> +}
> +
> +static int adjacent_read(
> +    unsigned int fixmap_idx,
> +    paddr_t address, unsigned int len, uint64_t *pval)
> +{
> +    const void __iomem *hwaddr;
> +
> +    *pval = ~0UL;
> +
> +    ASSERT(fixmap_idx != ADJACENT_DISCARD_WRITE);
> +
> +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
> +    return X86EMUL_OKAY;
> +}
> +
> +static int adjacent_write(
> +        unsigned int fixmap_idx,
> +        uint64_t address, uint32_t len, uint64_t val)

This uses indentation different from the two cases further up. Types
used also don't match adjacent_read()'s.

> @@ -220,16 +349,31 @@ static int cf_check msixtbl_read(
>      unsigned long offset;
>      struct msixtbl_entry *entry;
>      unsigned int nr_entry, index;
> +    unsigned int adjacent_fixmap;
>      int r = X86EMUL_UNHANDLEABLE;
>  
> -    if ( (len != 4 && len != 8) || (address & (len - 1)) )
> +    if ( !IS_ALIGNED(address, len) )
>          return r;
>  
>      rcu_read_lock(&msixtbl_rcu_lock);
> -
>      entry = msixtbl_find_entry(current, address);
>      if ( !entry )
>          goto out;
> +
> +    adjacent_fixmap = adjacent_handle(entry, address, false);
> +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> +    {
> +        r = adjacent_read(adjacent_fixmap, address, len, pval);
> +        goto out;
> +    }
> +
> +    if ( address < entry->gtable ||
> +         address >= entry->gtable + entry->table_len )
> +        goto out;
> +
> +    if ( len != 4 && len != 8 )
> +        goto out;
> +
>      offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
>  
>      if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET )
> @@ -282,6 +426,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long 
> address,
>      int r = X86EMUL_UNHANDLEABLE;
>      unsigned long flags;
>      struct irq_desc *desc;
> +    unsigned int adjacent_fixmap;
>  
>      if ( !IS_ALIGNED(address, len) )
>          return X86EMUL_OKAY;
> @@ -291,6 +436,19 @@ static int msixtbl_write(struct vcpu *v, unsigned long 
> address,
>      entry = msixtbl_find_entry(v, address);
>      if ( !entry )
>          goto out;
> +
> +    adjacent_fixmap = adjacent_handle(entry, address, true);
> +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> +    {
> +        r = adjacent_write(adjacent_fixmap, address, len, val);
> +        goto out;
> +    }
> +    if ( address < entry->gtable ||
> +         address >= entry->gtable + entry->table_len )
> +        goto out;
> +    if ( len != 4 && len != 8 )
> +        goto out;
> +

Can this please follow the read side as far as use of blank lines goes?

> @@ -622,12 +788,15 @@ void msix_write_completion(struct vcpu *v)
>           v->arch.hvm.hvm_io.msix_snoop_gpa )
>      {
>          unsigned int token = hvmemul_cache_disable(v);
> -        const struct msi_desc *desc;
> +        const struct msi_desc *desc = NULL;
> +        const struct msixtbl_entry *entry;
>          uint32_t data;
>  
>          rcu_read_lock(&msixtbl_rcu_lock);
> -        desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, snoop_addr),
> -                                    snoop_addr);
> +        entry = msixtbl_find_entry(v, snoop_addr);
> +        if ( entry && snoop_addr >= entry->gtable &&
> +                      snoop_addr < entry->gtable + entry->table_len )

Nit: Too deep indentation.

> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
>                                         PCI_MSIX_ENTRY_SIZE + \
>                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>  
> +/* indexes in adj_access_idx[] below */

Nit: Comment style again.

> @@ -1078,6 +1108,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>              WARN();
>          msix->table.first = 0;
>          msix->table.last = 0;
> +        if ( msix->adj_access_idx[ADJ_IDX_FIRST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_idx[ADJ_IDX_FIRST]);
> +            msix->adj_access_idx[ADJ_IDX_FIRST] = 0;
> +        }
> +        if ( msix->adj_access_idx[ADJ_IDX_LAST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_idx[ADJ_IDX_LAST]);
> +            msix->adj_access_idx[ADJ_IDX_LAST] = 0;
> +        }
>  
>          if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
>                                     msix->pba.last) )

This could probably do with another blank line at the head of the
addition.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.