[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 3/7] x86/hvm: Allow access to registers on the same page as MSI-X table



On Fri, May 03, 2024 at 10:33:38AM +0200, Roger Pau Monné wrote:
> On Fri, Apr 26, 2024 at 07:54:00PM +0200, Marek Marczykowski-Górecki wrote:
> > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> > on the same page as MSI-X table. Device model (especially one in
> > stubdomain) cannot really handle those, as direct writes to that page is
> > refused (page is on the mmio_ro_ranges list). Instead, extend
> > msixtbl_mmio_ops to handle such accesses too.
> > 
> > Doing this, requires correlating read/write location with guest
> > of MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
>   ^ extra 'of'?
> > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> > for PV would need to be done separately.
> > 
> > This will be also used to read Pending Bit Array, if it lives on the same
> > page, making QEMU not needing /dev/mem access at all (especially helpful
> > with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> > map it to the guest directly.
> > If PBA lives on the same page, discard writes and log a message.
> > Technically, writes outside of PBA could be allowed, but at this moment
> > the precise location of PBA isn't saved, and also no known device abuses
> > the spec in this way (at least yet).
> > 
> > To access those registers, msixtbl_mmio_ops need the relevant page
> > mapped. MSI handling already has infrastructure for that, using fixmap,
> > so try to map first/last page of the MSI-X table (if necessary) and save
> > their fixmap indexes. Note that msix_get_fixmap() does reference
> > counting and reuses existing mapping, so just call it directly, even if
> > the page was mapped before. Also, it uses a specific range of fixmap
> > indexes which doesn't include 0, so use 0 as default ("not mapped")
> > value - which simplifies code a bit.
> > 
> > GCC 12.2.1 gets confused about 'desc' variable:
> > 
> >     arch/x86/hvm/vmsi.c: In function ‘msixtbl_range’:
> >     arch/x86/hvm/vmsi.c:553:8: error: ‘desc’ may be used uninitialized 
> > [-Werror=maybe-uninitialized]
> >       553 |     if ( desc )
> >           |        ^
> >     arch/x86/hvm/vmsi.c:537:28: note: ‘desc’ was declared here
> >       537 |     const struct msi_desc *desc;
> >           |                            ^~~~
> > 
> > It's conditional initialization is actually correct (in the case where
> > it isn't initialized, function returns early), but to avoid
> > build failure initialize it explicitly to NULL anyway.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > ---
> > Changes in v6:
> > - use MSIX_CHECK_WARN macro
> > - extend assert on fixmap_idx
> > - add break in default label, after ASSERT_UNREACHABLE(), and move
> >   setting default there
> > - style fixes
> > Changes in v5:
> > - style fixes
> > - include GCC version in the commit message
> > - warn only once (per domain, per device) about failed adjacent access
> > Changes in v4:
> > - drop same_page parameter of msixtbl_find_entry(), distinguish two
> >   cases in relevant callers
> > - rename adj_access_table_idx to adj_access_idx
> > - code style fixes
> > - drop alignment check in adjacent_{read,write}() - all callers already
> >   have it earlier
> > - delay mapping first/last MSI-X pages until preparing device for a
> >   passthrough
> > v3:
> >  - merge handling into msixtbl_mmio_ops
> >  - extend commit message
> > v2:
> >  - adjust commit message
> >  - pass struct domain to msixtbl_page_handler_get_hwaddr()
> >  - reduce local variables used only once
> >  - log a warning if write is forbidden if MSI-X and PBA lives on the same
> >    page
> >  - do not passthrough unaligned accesses
> >  - handle accesses both before and after MSI-X table
> > ---
> >  xen/arch/x86/hvm/vmsi.c        | 200 ++++++++++++++++++++++++++++++++--
> >  xen/arch/x86/include/asm/msi.h |   5 +-
> >  xen/arch/x86/msi.c             |  41 +++++++-
> >  3 files changed, 236 insertions(+), 10 deletions(-)
> > 
> > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> > index 999917983789..230e3a5dee3f 100644
> > --- a/xen/arch/x86/hvm/vmsi.c
> > +++ b/xen/arch/x86/hvm/vmsi.c
> > @@ -180,6 +180,10 @@ static bool msixtbl_initialised(const struct domain *d)
> >      return d->arch.hvm.msixtbl_list.next;
> >  }
> >  
> > +/*
> > + * Lookup an msixtbl_entry on the same page as given addr. It's up to the
> > + * caller to check if address is strictly part of the table - if relevant.
> > + */
> >  static struct msixtbl_entry *msixtbl_find_entry(
> >      struct vcpu *v, unsigned long addr)
> >  {
> > @@ -187,8 +191,8 @@ static struct msixtbl_entry *msixtbl_find_entry(
> >      struct domain *d = v->domain;
> >  
> >      list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> > -        if ( addr >= entry->gtable &&
> > -             addr < entry->gtable + entry->table_len )
> > +        if ( PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
> > +             PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len - 
> > 1) )
> >              return entry;
> >  
> >      return NULL;
> > @@ -213,6 +217,138 @@ static struct msi_desc *msixtbl_addr_to_desc(
> >      return NULL;
> >  }
> >  
> > +/*
> > + * Returns:
> > + *  - ADJACENT_DONT_HANDLE if no handling should be done
> > + *  - ADJACENT_DISCARD_WRITE if write should be discarded
> > + *  - a fixmap idx to use for handling
> > + */
> > +#define ADJACENT_DONT_HANDLE UINT_MAX
> > +#define ADJACENT_DISCARD_WRITE (UINT_MAX - 1)
> 
> I think this could be simpler, there's no need to signal with so fine
> grained detail about the action to be performed.
> 
> Any adjacent access to the MSI-X table should be handled by the logic
> you are adding, so anything that falls in those ranges should
> terminate here.
> 
> adjacent_handle() should IMO just return whether the access is
> replayed against the hardware, or if it's just dropped.

The distinction here is to return X86EMUL_OKAY in case of adjacent write
that is ignored because PBA is somewhere near, but X86EMUL_UNHANDLABLE
for other/error cases (like fixmap indices not initialized).
But maybe this distinction doesn't make sense and X86EMUL_UNHANDLABLE is
okay in either case? 

> > +static unsigned int adjacent_handle(
> > +    const struct msixtbl_entry *entry, unsigned long addr, bool write)
> > +{
> > +    unsigned int adj_type;
> > +    struct arch_msix *msix;
> > +
> > +    if ( !entry || !entry->pdev )
> > +        return ADJACENT_DONT_HANDLE;
> > +
> > +    if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable 
> > )
> > +        adj_type = ADJ_IDX_FIRST;
> > +    else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len 
> > - 1) &&
> > +              addr >= entry->gtable + entry->table_len )
> > +        adj_type = ADJ_IDX_LAST;
> > +    else
> > +        return ADJACENT_DONT_HANDLE;
> > +
> > +    msix = entry->pdev->msix;
> > +    ASSERT(msix);
> > +
> > +    if ( !msix->adj_access_idx[adj_type] )
> > +    {
> > +        if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> > +                             adjacent_not_initialized) )
> > +            gprintk(XENLOG_WARNING,
> > +                    "Page for adjacent(%d) MSI-X table access not 
> > initialized for %pp (addr %#lx, gtable %#lx\n",
> > +                    adj_type, &entry->pdev->sbdf, addr, entry->gtable);
> > +        return ADJACENT_DONT_HANDLE;
> > +    }
> > +
> > +    /* If PBA lives on the same page too, discard writes. */
> > +    if ( write &&
> > +         ((adj_type == ADJ_IDX_LAST &&
> > +           msix->table.last == msix->pba.first) ||
> > +          (adj_type == ADJ_IDX_FIRST &&
> > +           msix->table.first == msix->pba.last)) )
> > +    {
> > +        if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> > +                             adjacent_pba) )
> > +            gprintk(XENLOG_WARNING,
> > +                    "MSI-X table and PBA of %pp live on the same page, "
> > +                    "writing to other registers there is not 
> > implemented\n",
> > +                    &entry->pdev->sbdf);
> > +        return ADJACENT_DISCARD_WRITE;
> > +    }
> > +
> > +    return msix->adj_access_idx[adj_type];
> > +}
> > +
> > +static int adjacent_read(
> > +    unsigned int fixmap_idx,
> > +    paddr_t address, unsigned int len, uint64_t *pval)
> > +{
> > +    const void __iomem *hwaddr;
> > +
> > +    ASSERT(fixmap_idx <= FIX_MSIX_IO_RESERV_END);
> > +
> > +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> 
> IMO adjacent_handle() should be called here (and in adjacent_write()),
> and adjacent_{read,write}() called unconditionally from
> msixtbl_{read,write}() when an access that doesn't fall in the MSI-X
> table is handled.  See comment below in msixtbl_read().

Makes sense.

> > +
> > +    switch ( len )
> > +    {
> > +    case 1:
> > +        *pval = readb(hwaddr);
> > +        break;
> > +
> > +    case 2:
> > +        *pval = readw(hwaddr);
> > +        break;
> > +
> > +    case 4:
> > +        *pval = readl(hwaddr);
> > +        break;
> > +
> > +    case 8:
> > +        *pval = readq(hwaddr);
> > +        break;
> > +
> > +    default:
> > +        ASSERT_UNREACHABLE();
> > +        *pval = ~0UL;
> > +        break;
> > +    }
> > +
> > +    return X86EMUL_OKAY;
> > +}
> > +
> > +static int adjacent_write(
> > +    unsigned int fixmap_idx,
> > +    paddr_t address, unsigned int len, uint64_t val)
> > +{
> > +    void __iomem *hwaddr;
> > +
> > +    if ( fixmap_idx == ADJACENT_DISCARD_WRITE )
> > +        return X86EMUL_OKAY;
> > +
> > +    ASSERT(fixmap_idx <= FIX_MSIX_IO_RESERV_END);
> 
> Since you check the idx is sane, shouldn't you also assert idx >=
> FIX_MSIX_IO_RESERV_BASE?

If moving adjacent_handle() here, I'd simply drop this assert.

> > +
> > +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> > +
> > +    switch ( len )
> > +    {
> > +    case 1:
> > +        writeb(val, hwaddr);
> > +        break;
> > +
> > +    case 2:
> > +        writew(val, hwaddr);
> > +        break;
> > +
> > +    case 4:
> > +        writel(val, hwaddr);
> > +        break;
> > +
> > +    case 8:
> > +        writeq(val, hwaddr);
> > +        break;
> > +
> > +    default:
> > +        ASSERT_UNREACHABLE();
> > +    }
> > +
> > +    return X86EMUL_OKAY;
> > +}
> > +
> >  static int cf_check msixtbl_read(
> >      const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
> >      uint64_t *pval)
> > @@ -220,9 +356,10 @@ static int cf_check msixtbl_read(
> >      unsigned long offset;
> >      struct msixtbl_entry *entry;
> >      unsigned int nr_entry, index;
> > +    unsigned int adjacent_fixmap;
> >      int r = X86EMUL_UNHANDLEABLE;
> >  
> > -    if ( (len != 4 && len != 8) || (address & (len - 1)) )
> > +    if ( !IS_ALIGNED(address, len) )
> >          return r;
> >  
> >      rcu_read_lock(&msixtbl_rcu_lock);
> > @@ -230,6 +367,21 @@ static int cf_check msixtbl_read(
> >      entry = msixtbl_find_entry(current, address);
> >      if ( !entry )
> >          goto out;
> > +
> > +    adjacent_fixmap = adjacent_handle(entry, address, false);
> 
> This seems overly complicated, but is possible I'm missing some logic.
> 
> IMO it would seem way less convoluted to simply do:
> 
> entry = msixtbl_find_entry(current, address);
> if ( !entry )
>     goto out;
> 
> if ( address < entry->gtable ||
>      address >= entry->gtable + entry->table_len )
> {
>     adjacent_read(...);
>     goto out;
> }
> 
> And put all the logic in adjacent_{read,write}() directly rather than
> having both adjacent_{read,write}() plus adjacent_handle() calls here?
> 
> If the access doesn't fall between the boundaries of the MSI-X table
> it's either going to be a handled adjacent access, or it's going to be
> discarded.

Discarded - should it return X86EMUL_OKAY in that case? Currently it
returns X86EMUL_UNHANDLABLE in case adjacent access isn't handled (for
any reason) either.

> > +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> > +    {
> > +        r = adjacent_read(adjacent_fixmap, address, len, pval);
> > +        goto out;
> > +    }
> > +
> > +    if ( address < entry->gtable ||
> > +         address >= entry->gtable + entry->table_len )
> > +        goto out;
> > +
> > +    if ( len != 4 && len != 8 )
> > +        goto out;
> > +
> >      offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
> >  
> >      if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET )
> > @@ -282,6 +434,7 @@ static int msixtbl_write(struct vcpu *v, unsigned long 
> > address,
> >      int r = X86EMUL_UNHANDLEABLE;
> >      unsigned long flags;
> >      struct irq_desc *desc;
> > +    unsigned int adjacent_fixmap;
> >  
> >      if ( !IS_ALIGNED(address, len) )
> >          return X86EMUL_OKAY;
> > @@ -291,6 +444,21 @@ static int msixtbl_write(struct vcpu *v, unsigned long 
> > address,
> >      entry = msixtbl_find_entry(v, address);
> >      if ( !entry )
> >          goto out;
> > +
> > +    adjacent_fixmap = adjacent_handle(entry, address, true);
> > +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> > +    {
> > +        r = adjacent_write(adjacent_fixmap, address, len, val);
> > +        goto out;
> > +    }
> > +
> > +    if ( address < entry->gtable ||
> > +         address >= entry->gtable + entry->table_len )
> > +        goto out;
> > +
> > +    if ( len != 4 && len != 8 )
> > +        goto out;
> > +
> >      nr_entry = array_index_nospec(((address - entry->gtable) /
> >                                     PCI_MSIX_ENTRY_SIZE),
> >                                    MAX_MSIX_TABLE_ENTRIES);
> > @@ -356,8 +524,8 @@ static int cf_check _msixtbl_write(
> >      const struct hvm_io_handler *handler, uint64_t address, uint32_t len,
> >      uint64_t val)
> >  {
> > -    /* Ignore invalid length or unaligned writes. */
> > -    if ( (len != 4 && len != 8) || !IS_ALIGNED(address, len) )
> > +    /* Ignore unaligned writes. */
> > +    if ( !IS_ALIGNED(address, len) )
> >          return X86EMUL_OKAY;
> >  
> >      /*
> > @@ -374,14 +542,22 @@ static bool cf_check msixtbl_range(
> >  {
> >      struct vcpu *curr = current;
> >      unsigned long addr = r->addr;
> > -    const struct msi_desc *desc;
> > +    const struct msixtbl_entry *entry;
> > +    const struct msi_desc *desc = NULL;
> > +    unsigned int adjacent_fixmap;
> >  
> >      ASSERT(r->type == IOREQ_TYPE_COPY);
> >  
> >      rcu_read_lock(&msixtbl_rcu_lock);
> > -    desc = msixtbl_addr_to_desc(msixtbl_find_entry(curr, addr), addr);
> > +    entry = msixtbl_find_entry(curr, addr);
> > +    adjacent_fixmap = adjacent_handle(entry, addr, false);
> > +    if ( adjacent_fixmap == ADJACENT_DONT_HANDLE )
> > +        desc = msixtbl_addr_to_desc(entry, addr);
> 
> I'm not sure you need adjacent_handle() here, I would think that any
> address adjacent to the MSI-X table Xen would want to handle
> unconditionally, and hence msixtbl_range() should return true:

I put it here to not duplicate a set of checks for adjacent access. It
isn't only about the range, but also few other case (like if fixmap
indices are set).

> entry = msixtbl_find_entry(curr, addr);
> if ( !entry )
>     return 0;
> 
> if ( addr < entry->gtable || addr >= entry->gtable + entry->table_len )
>     /* Possibly handle adjacent access. */
>     return 1;
> 
> desc = msixtbl_find_entry(curr, addr);
> ...
> 
> (Missing the _unlock calls in the chunk above)
> 
> There's no other processing that can happen for an adjacent access
> unless it's are handled here.  Otherwise such accesses will be
> discarded anyway?  Hence better short-circuit the MMIO handler search
> earlier.

Can't there be some ioreq server that could theoretically handle them?

> >      rcu_read_unlock(&msixtbl_rcu_lock);
> >  
> > +    if ( adjacent_fixmap != ADJACENT_DONT_HANDLE )
> > +        return 1;
> > +
> >      if ( desc )
> >          return 1;
> >  
> > @@ -622,12 +798,16 @@ void msix_write_completion(struct vcpu *v)
> >           v->arch.hvm.hvm_io.msix_snoop_gpa )
> >      {
> >          unsigned int token = hvmemul_cache_disable(v);
> > -        const struct msi_desc *desc;
> > +        const struct msi_desc *desc = NULL;
> > +        const struct msixtbl_entry *entry;
> >          uint32_t data;
> >  
> >          rcu_read_lock(&msixtbl_rcu_lock);
> > -        desc = msixtbl_addr_to_desc(msixtbl_find_entry(v, snoop_addr),
> > -                                    snoop_addr);
> > +        entry = msixtbl_find_entry(v, snoop_addr);
> > +        if ( entry &&
> > +             snoop_addr >= entry->gtable &&
> > +             snoop_addr < entry->gtable + entry->table_len )
> > +            desc = msixtbl_addr_to_desc(entry, snoop_addr);
> 
> This would be easier if you added the MSI-X table boundary checks in
> msixtbl_addr_to_desc() itself, rather than open-coding here.  That way
> you don't need to modify msix_write_completion() at all.  It also
> makes msixtbl_addr_to_desc() more robust in case it gets called with
> bogus addresses.

Makes sense I think.

> >          rcu_read_unlock(&msixtbl_rcu_lock);
> >  
> >          if ( desc &&
> > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h
> > index bcfdfd35345d..923b730d48b8 100644
> > --- a/xen/arch/x86/include/asm/msi.h
> > +++ b/xen/arch/x86/include/asm/msi.h
> > @@ -224,6 +224,9 @@ struct arch_msix {
> >      } table, pba;
> >      int table_refcnt[MAX_MSIX_TABLE_PAGES];
> >      int table_idx[MAX_MSIX_TABLE_PAGES];
> > +#define ADJ_IDX_FIRST 0
> > +#define ADJ_IDX_LAST  1
> > +    unsigned int adj_access_idx[2];
> >      spinlock_t table_lock;
> >      bool host_maskall, guest_maskall;
> >      domid_t warned_domid;
> > @@ -231,6 +234,8 @@ struct arch_msix {
> >          uint8_t all;
> >          struct {
> >              bool maskall                   : 1;
> > +            bool adjacent_not_initialized  : 1;
> > +            bool adjacent_pba              : 1;
> >          };
> >      } warned_kind;
> >  };
> > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
> > index 42c793426da3..c77b81896269 100644
> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -913,6 +913,36 @@ static int msix_capability_init(struct pci_dev *dev,
> >          list_add_tail(&entry->list, &dev->msi_list);
> >          *desc = entry;
> >      }
> > +    else
> > +    {
> > +        /*
> > +         * If the MSI-X table doesn't start at the page boundary, map the 
> > first page for
> > +         * passthrough accesses.
> > +         */
> > +        if ( PAGE_OFFSET(table_paddr) )
> > +        {
> > +            int idx = msix_get_fixmap(msix, table_paddr, table_paddr);
> > +
> > +            if ( idx > 0 )
> > +                msix->adj_access_idx[ADJ_IDX_FIRST] = idx;
> > +            else
> > +                gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: 
> > %d\n", idx);
> > +        }
> > +        /*
> > +         * If the MSI-X table doesn't end on the page boundary, map the 
> > last page
> > +         * for passthrough accesses.
> > +         */
> > +        if ( PAGE_OFFSET(table_paddr + msix->nr_entries * 
> > PCI_MSIX_ENTRY_SIZE) )
> 
> This check is correct ....
> 
> > +        {
> > +            uint64_t entry_paddr = table_paddr + msix->nr_entries * 
> > PCI_MSIX_ENTRY_SIZE;
> 
> ... however for correctness you want to use msix->nr_entries - 1 here,
> otherwise you are requesting an address that's past the last MSI-X
> table entry, and hence msix_get_fixmap() could refuse to provide an
> idx if it ever does boundary checking.

Ok.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.