 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
 On 24.11.2023 02:47, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, extend
> msixtbl_mmio_ops to handle such accesses too.
> 
> Doing this, requires correlating read/write location with guest
> of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
> 
> This will be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, discard writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved, and also no known device abuses
> the spec in this way (at least yet).
> 
> To access those registers, msixtbl_mmio_ops need the relevant page
> mapped. MSI handling already has infrastructure for that, using fixmap,
> so try to map first/last page of the MSI-X table (if necessary) and save
> their fixmap indexes. Note that msix_get_fixmap() does reference
> counting and reuses existing mapping, so just call it directly, even if
> the page was mapped before. Also, it uses a specific range of fixmap
> indexes which doesn't include 0, so use 0 as default ("not mapped")
> value - which simplifies code a bit.
> 
> GCC gets confused about 'desc' variable:
> 
>     arch/x86/hvm/vmsi.c: In function ‘msixtbl_range’:
>     arch/x86/hvm/vmsi.c:553:8: error: ‘desc’ may be used uninitialized 
> [-Werror=maybe-uninitialized]
>       553 |     if ( desc )
>           |        ^
>     arch/x86/hvm/vmsi.c:537:28: note: ‘desc’ was declared here
>       537 |     const struct msi_desc *desc;
>           |                            ^~~~
This could do with also indicating the gcc version. Issues like this
tend to get fixed over time.
> It's conditional initialization is actually correct (in the case where
> it isn't initialized, function returns early), but to avoid
> build failure initialize it explicitly to NULL anyway.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
Logic looks okay to me now (albeit I'm still not overly happy that we need
to gain such code), but there are a couple of cosmetic issues.
> @@ -213,6 +217,131 @@ static struct msi_desc *msixtbl_addr_to_desc(
>      return NULL;
>  }
>  
> +/*
> + * Returns:
> + *  - UINT_MAX if no handling should be done
> + *  - UINT_MAX-1 if write should be discarded
> + *  - a fixmap idx to use for handling
> + */
> +#define ADJACENT_DONT_HANDLE UINT_MAX
> +#define ADJACENT_DISCARD_WRITE (UINT_MAX - 1)
The comment would imo better talk in terms of the two constants you
define here.
> +static unsigned int adjacent_handle(
> +    const struct msixtbl_entry *entry, unsigned long addr, bool write)
> +{
> +    unsigned int adj_type;
> +    const struct arch_msix *msix;
> +
> +    if ( !entry || !entry->pdev )
> +        return ADJACENT_DONT_HANDLE;
> +
> +    if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable )
> +        adj_type = ADJ_IDX_FIRST;
> +    else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len - 
> 1) &&
> +              addr >= entry->gtable + entry->table_len )
> +        adj_type = ADJ_IDX_LAST;
> +    else
> +        return ADJACENT_DONT_HANDLE;
> +
> +    msix = entry->pdev->msix;
> +    ASSERT(msix);
> +
> +    if ( !msix->adj_access_idx[adj_type] )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "Page for adjacent(%d) MSI-X table access not initialized 
> for %pp (addr %#lx, gtable %#lx\n",
> +                adj_type, &entry->pdev->sbdf, addr, entry->gtable);
> +
> +        return ADJACENT_DONT_HANDLE;
> +    }
> +
> +    /* If PBA lives on the same page too, discard writes. */
> +    if ( write &&
> +         ((adj_type == ADJ_IDX_LAST &&
> +           msix->table.last == msix->pba.first) ||
> +          (adj_type == ADJ_IDX_FIRST &&
> +           msix->table.first == msix->pba.last)) )
> +    {
> +        gprintk(XENLOG_WARNING,
> +                "MSI-X table and PBA of %pp live on the same page, "
> +                "writing to other registers there is not implemented\n",
> +                &entry->pdev->sbdf);
Here and above I think verbosity needs limiting to the first instance per
device per domain.
> +        return ADJACENT_DISCARD_WRITE;
> +    }
> +
> +    return msix->adj_access_idx[adj_type];
> +}
> +
> +static int adjacent_read(
> +    unsigned int fixmap_idx,
> +    paddr_t address, unsigned int len, uint64_t *pval)
> +{
> +    const void __iomem *hwaddr;
> +
> +    *pval = ~0UL;
> +
> +    ASSERT(fixmap_idx != ADJACENT_DISCARD_WRITE);
> +
> +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +    }
> +    return X86EMUL_OKAY;
> +}
> +
> +static int adjacent_write(
> +        unsigned int fixmap_idx,
> +        uint64_t address, uint32_t len, uint64_t val)
This uses indentation different from the two cases further up. Types
used also don't match adjacent_read()'s.
> @@ -220,16 +349,31 @@ static int cf_check msixtbl_read(
>      unsigned long offset;
>      struct msixtbl_entry *entry;
>      unsigned int nr_entry, index;
> +    unsigned int adjacent_fixmap;
>      int r = X86EMUL_UNHANDLEABLE;
>  
> -    if ( (len != 4 && len != 8) || (address & (len - 1)) )
> +    if ( !IS_ALIGNED(address, len) )
>          return r;
>  
>      rcu_read_lock(&msixtbl_rcu_lock);
> -
>      entry = msixtbl_find_entry(current, address);
>      if ( !entry )
>          goto out;
> +
> +    adjacent_fixmap = adjacent_handle(entry, address, false);
> +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> +    {
> +        r = adjacent_read(adjacent_fixmap, address, len, pval);
> +        goto out;
> +    }
> +
> +    if ( address < entry->gtable ||
> +         address >= entry->gtable + entry->table_len )
> +        goto out;
> +
> +    if ( len != 4 && len != 8 )
> +        goto out;
> +
>      offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
>  
>      if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET )
> @@ -282,6 +426,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long 
> address,
>      int r = X86EMUL_UNHANDLEABLE;
>      unsigned long flags;
>      struct irq_desc *desc;
> +    unsigned int adjacent_fixmap;
>  
>      if ( !IS_ALIGNED(address, len) )
>          return X86EMUL_OKAY;
> @@ -291,6 +436,19 @@ static int msixtbl_write(struct vcpu *v, unsigned long 
> address,
>      entry = msixtbl_find_entry(v, address);
>      if ( !entry )
>          goto out;
> +
> +    adjacent_fixmap = adjacent_handle(entry, address, true);
> +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> +    {
> +        r = adjacent_write(adjacent_fixmap, address, len, val);
> +        goto out;
> +    }
> +    if ( address < entry->gtable ||
> +         address >= entry->gtable + entry->table_len )
> +        goto out;
> +    if ( len != 4 && len != 8 )
> +        goto out;
> +
Can this please follow the read side as far as use of blank lines goes?
> @@ -622,12 +788,15 @@ void msix_write_completion(struct vcpu *v)
>           v->arch.hvm.hvm_io.msix_snoop_gpa )
>      {
>          unsigned int token = hvmemul_cache_disable(v);
> -        const struct msi_desc *desc;
> +        const struct msi_desc *desc = NULL;
> +        const struct msixtbl_entry *entry;
>          uint32_t data;
>  
>          rcu_read_lock(&msixtbl_rcu_lock);
> -        desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, snoop_addr),
> -                                    snoop_addr);
> +        entry = msixtbl_find_entry(v, snoop_addr);
> +        if ( entry && snoop_addr >= entry->gtable &&
> +                      snoop_addr < entry->gtable + entry->table_len )
Nit: Too deep indentation.
> --- a/xen/arch/x86/include/asm/msi.h
> +++ b/xen/arch/x86/include/asm/msi.h
> @@ -207,6 +207,10 @@ struct msg_address {
>                                         PCI_MSIX_ENTRY_SIZE + \
>                                         (~PCI_MSIX_BIRMASK & (PAGE_SIZE - 1)))
>  
> +/* indexes in adj_access_idx[] below */
Nit: Comment style again.
> @@ -1078,6 +1108,16 @@ static void _pci_cleanup_msix(struct arch_msix *msix)
>              WARN();
>          msix->table.first = 0;
>          msix->table.last = 0;
> +        if ( msix->adj_access_idx[ADJ_IDX_FIRST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_idx[ADJ_IDX_FIRST]);
> +            msix->adj_access_idx[ADJ_IDX_FIRST] = 0;
> +        }
> +        if ( msix->adj_access_idx[ADJ_IDX_LAST] )
> +        {
> +            msix_put_fixmap(msix, msix->adj_access_idx[ADJ_IDX_LAST]);
> +            msix->adj_access_idx[ADJ_IDX_LAST] = 0;
> +        }
>  
>          if ( rangeset_remove_range(mmio_ro_ranges, msix->pba.first,
>                                     msix->pba.last) )
This could probably do with another blank line at the head of the
addition.
Jan
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |