[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v7 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
On Wed, May 08, 2024 at 06:09:48PM +0200, Roger Pau Monné wrote: > On Tue, May 07, 2024 at 02:44:02PM +0200, Marek Marczykowski-Górecki wrote: > > Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers > > on the same page as MSI-X table. Device model (especially one in > > stubdomain) cannot really handle those, as direct writes to that page is > > refused (page is on the mmio_ro_ranges list). Instead, extend > > msixtbl_mmio_ops to handle such accesses too. > > > > Doing this, requires correlating read/write location with guest > > MSI-X table address. Since QEMU doesn't map MSI-X table to the guest, > > it requires msixtbl_entry->gtable, which is HVM-only. Similar feature > > for PV would need to be done separately. > > > > This will be also used to read Pending Bit Array, if it lives on the same > > page, making QEMU not needing /dev/mem access at all (especially helpful > > with lockdown enabled in dom0). If PBA lives on another page, QEMU will > > map it to the guest directly. > > If PBA lives on the same page, discard writes and log a message. > > Technically, writes outside of PBA could be allowed, but at this moment > > the precise location of PBA isn't saved, and also no known device abuses > > the spec in this way (at least yet). > > > > To access those registers, msixtbl_mmio_ops need the relevant page > > mapped. MSI handling already has infrastructure for that, using fixmap, > > so try to map first/last page of the MSI-X table (if necessary) and save > > their fixmap indexes. Note that msix_get_fixmap() does reference > > counting and reuses existing mapping, so just call it directly, even if > > the page was mapped before. Also, it uses a specific range of fixmap > > indexes which doesn't include 0, so use 0 as default ("not mapped") > > value - which simplifies code a bit. > > > > Based on assumption that all MSI-X page accesses are handled by Xen, do > > not forward adjacent accesses to other hypothetical ioreq servers, even > > if the access wasn't handled for some reason (failure to map pages etc). > > Relevant places log a message about that already. > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > Thanks, just a couple of minor comments, I think the only relevant one > is that you can drop ADJACENT_DONT_HANDLE unless there's something > I'm missing. The rest are mostly cosmetic, but if you have to respin > and agree with them might be worth addressing. > > Sorry for giving this feedback so late in the process, I should have > attempted to review earlier versions. > > > --- > > Changes in v7: > > - simplify logic based on assumption that all access to MSI-X pages are > > handled by Xen (Roger) > > - move calling adjacent_handle() into adjacent_{read,write}() (Roger) > > - move range check into msixtbl_addr_to_desc() (Roger) > > - fix off-by-one when initializing adj_access_idx[ADJ_IDX_LAST] (Roger) > > - no longer distinguish between unhandled write due to PBA nearby and > > other reasons > > - add missing break after ASSERT_UNREACHABLE (Jan) > > Changes in v6: > > - use MSIX_CHECK_WARN macro > > - extend assert on fixmap_idx > > - add break in default label, after ASSERT_UNREACHABLE(), and move > > setting default there > > - style fixes > > Changes in v5: > > - style fixes > > - include GCC version in the commit message > > - warn only once (per domain, per device) about failed adjacent access > > Changes in v4: > > - drop same_page parameter of msixtbl_find_entry(), distinguish two > > cases in relevant callers > > - rename adj_access_table_idx to adj_access_idx > > - code style fixes > > - drop alignment check in adjacent_{read,write}() - all callers already > > have it earlier > > - delay mapping first/last MSI-X pages until preparing device for a > > passthrough > > v3: > > - merge handling into msixtbl_mmio_ops > > - extend commit message > > v2: > > - adjust commit message > > - pass struct domain to msixtbl_page_handler_get_hwaddr() > > - reduce local variables used only once > > - log a warning if write is forbidden if MSI-X and PBA lives on the same > > page > > - do not passthrough unaligned accesses > > - handle accesses both before and after MSI-X table > > --- > > xen/arch/x86/hvm/vmsi.c | 205 ++++++++++++++++++++++++++++++++-- > > xen/arch/x86/include/asm/msi.h | 5 +- > > xen/arch/x86/msi.c | 42 +++++++- > > 3 files changed, 242 insertions(+), 10 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c > > index 999917983789..f7b7b4998b5e 100644 > > --- a/xen/arch/x86/hvm/vmsi.c > > +++ b/xen/arch/x86/hvm/vmsi.c > > @@ -180,6 +180,10 @@ static bool msixtbl_initialised(const struct domain *d) > > return d->arch.hvm.msixtbl_list.next; > > } > > > > +/* > > + * Lookup an msixtbl_entry on the same page as given addr. It's up to the > > + * caller to check if address is strictly part of the table - if relevant. > > + */ > > static struct msixtbl_entry *msixtbl_find_entry( > > struct vcpu *v, unsigned long addr) > > { > > @@ -187,8 +191,8 @@ static struct msixtbl_entry *msixtbl_find_entry( > > struct domain *d = v->domain; > > > > list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list ) > > - if ( addr >= entry->gtable && > > - addr < entry->gtable + entry->table_len ) > > + if ( PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) && > > + PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len - > > 1) ) > > return entry; > > > > return NULL; > > @@ -203,6 +207,10 @@ static struct msi_desc *msixtbl_addr_to_desc( > > if ( !entry || !entry->pdev ) > > return NULL; > > > > + if ( addr < entry->gtable || > > + addr >= entry->gtable + entry->table_len ) > > + return NULL; > > + > > nr_entry = (addr - entry->gtable) / PCI_MSIX_ENTRY_SIZE; > > > > list_for_each_entry( desc, &entry->pdev->msi_list, list ) > > @@ -213,6 +221,152 @@ static struct msi_desc *msixtbl_addr_to_desc( > > return NULL; > > } > > > > +/* > > + * Returns: > > + * - ADJACENT_DONT_HANDLE if no handling should be done > > + * - a fixmap idx to use for handling > > + */ > > +#define ADJACENT_DONT_HANDLE UINT_MAX > > Isn't it fine to just return 0 to signal that the access is not > handled? > > fixmap index 0 is reserved anyway (see FIX_RESERVED), so could be used > for this purpose and then you don't need to introduce > ADJACENT_DONT_HANDLE? It was this way before in v2 and you asked me to not use 0 for this purpose... > > +static unsigned int adjacent_handle( > > + const struct msixtbl_entry *entry, unsigned long addr, bool write) > > Now that it has been quite pruned, get_adjacent_idx() or some such > might be more inline with the function logic. makes sense > > > +{ > > + unsigned int adj_type; > > + struct arch_msix *msix; > > + > > + if ( !entry || !entry->pdev ) > > + { > > + ASSERT_UNREACHABLE(); > > + return ADJACENT_DONT_HANDLE; > > + } > > + > > + if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable > > ) > > + adj_type = ADJ_IDX_FIRST; > > + else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len > > - 1) && > > + addr >= entry->gtable + entry->table_len ) > > + adj_type = ADJ_IDX_LAST; > > + else > > + { > > + /* All callers should already do equivalent range checking. */ > > + ASSERT_UNREACHABLE(); > > + return ADJACENT_DONT_HANDLE; > > + } > > + > > + msix = entry->pdev->msix; > > + ASSERT(msix); > > Since you already do it above, would be best to do: > > if ( !msix ) > { > ASSERT_UNREACHABLE(); > return 0; > } Ok. > > + > > + if ( !msix->adj_access_idx[adj_type] ) > > + { > > + if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id, > > + adjacent_not_initialized) ) > > + gprintk(XENLOG_WARNING, > > + "Page for adjacent(%d) MSI-X table access not > > initialized for %pp (addr %#lx, gtable %#lx\n", > > Do you really need to log an error here? There's an error already > printed in msix_capability_init() if the adjacent pages can't be > mapped. IMO it's better to keep this message, otherwise it might be pretty hard to debug not working device - a message buried somewhere on startup might be hard to correlate with an issue much later. > > + adj_type, &entry->pdev->sbdf, addr, entry->gtable); > > + return ADJACENT_DONT_HANDLE; > > + } > > + > > + /* If PBA lives on the same page too, discard writes. */ > > + if ( write && > > + ((adj_type == ADJ_IDX_LAST && > > + msix->table.last == msix->pba.first) || > > + (adj_type == ADJ_IDX_FIRST && > > + msix->table.first == msix->pba.last)) ) > > + { > > + if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id, > > + adjacent_pba) ) > > + gprintk(XENLOG_WARNING, > > + "MSI-X table and PBA of %pp live on the same page, " > > + "writing to other registers there is not > > implemented\n", > > + &entry->pdev->sbdf); > > I would usually start those errors with the SBDF, as that makes it > easier to identify error originating from the same device: > > "%pp: MSI-X table and PBA share a page, discard write to adjacent memory > (%#lx)\n", > &entry->pdev->sbdf, addr ok > > + return ADJACENT_DONT_HANDLE; > > + } > > + > > + return msix->adj_access_idx[adj_type]; > > +} > > + > > +static int adjacent_read( > > + const struct msixtbl_entry *entry, > > + paddr_t address, unsigned int len, uint64_t *pval) > > +{ > > + const void __iomem *hwaddr; > > + unsigned int fixmap_idx; > > + > > I would add an ASSERT(IS_ALIGNED(address, len)) (and in > adjacent_write()) just in case, as otherwise the access could cross a > page boundary. ok > > + fixmap_idx = adjacent_handle(entry, address, false); > > + > > + if ( fixmap_idx == ADJACENT_DONT_HANDLE ) > > + { > > + *pval = ~0UL; > > + return X86EMUL_OKAY; > > + } > > FWIW, I find it safer to unconditionally init *pval = ~0UL at the > start of the function, and then the return here and in the default > switch statement case can avoid setting it. It's less easy to return > without the variable being set. It was this way in v5, but Jan asked me to move it to only relevant branch. > > + hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address); > > + > > + switch ( len ) > > + { > > + case 1: > > + *pval = readb(hwaddr); > > + break; > > + > > + case 2: > > + *pval = readw(hwaddr); > > + break; > > + > > + case 4: > > + *pval = readl(hwaddr); > > + break; > > + > > + case 8: > > + *pval = readq(hwaddr); > > + break; > > + > > + default: > > + ASSERT_UNREACHABLE(); > > + *pval = ~0UL; > > + break; > > We should likely move this to some kind of helpers, as it's now > exactly the same that's done in adjacent_{read,write}() in > vpci/msix.c (not that you should do it here, just a note). > > > + } > > + > > + return X86EMUL_OKAY; > > +} > > + > > +static int adjacent_write( > > + const struct msixtbl_entry *entry, > > + paddr_t address, unsigned int len, uint64_t val) > > +{ > > + void __iomem *hwaddr; > > + unsigned int fixmap_idx; > > + > > + fixmap_idx = adjacent_handle(entry, address, true); > > + > > + if ( fixmap_idx == ADJACENT_DONT_HANDLE ) > > + return X86EMUL_OKAY; > > + > > + hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address); > > + > > + switch ( len ) > > + { > > + case 1: > > + writeb(val, hwaddr); > > + break; > > + > > + case 2: > > + writew(val, hwaddr); > > + break; > > + > > + case 4: > > + writel(val, hwaddr); > > + break; > > + > > + case 8: > > + writeq(val, hwaddr); > > + break; > > + > > + default: > > + ASSERT_UNREACHABLE(); > > + break; > > + } > > + > > + return X86EMUL_OKAY; > > +} > > + > > static int cf_check msixtbl_read( > > const struct hvm_io_handler *handler, uint64_t address, uint32_t len, > > uint64_t *pval) > > @@ -222,7 +376,7 @@ static int cf_check msixtbl_read( > > unsigned int nr_entry, index; > > int r = X86EMUL_UNHANDLEABLE; > > > > - if ( (len != 4 && len != 8) || (address & (len - 1)) ) > > + if ( !IS_ALIGNED(address, len) ) > > return r; > > > > rcu_read_lock(&msixtbl_rcu_lock); > > @@ -230,6 +384,17 @@ static int cf_check msixtbl_read( > > entry = msixtbl_find_entry(current, address); > > if ( !entry ) > > goto out; > > + > > + if ( address < entry->gtable || > > + address >= entry->gtable + entry->table_len ) > > + { > > + r = adjacent_read(entry, address, len, pval); > > + goto out; > > + } > > + > > + if ( len != 4 && len != 8 ) > > + goto out; > > + > > offset = address & (PCI_MSIX_ENTRY_SIZE - 1); > > > > if ( offset != PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET ) > > @@ -291,6 +456,17 @@ static int msixtbl_write(struct vcpu *v, unsigned long > > address, > > entry = msixtbl_find_entry(v, address); > > if ( !entry ) > > goto out; > > + > > + if ( address < entry->gtable || > > + address >= entry->gtable + entry->table_len ) > > + { > > + r = adjacent_write(entry, address, len, val); > > + goto out; > > + } > > + > > + if ( len != 4 && len != 8 ) > > + goto out; > > + > > nr_entry = array_index_nospec(((address - entry->gtable) / > > PCI_MSIX_ENTRY_SIZE), > > MAX_MSIX_TABLE_ENTRIES); > > @@ -356,8 +532,8 @@ static int cf_check _msixtbl_write( > > const struct hvm_io_handler *handler, uint64_t address, uint32_t len, > > uint64_t val) > > { > > - /* Ignore invalid length or unaligned writes. */ > > - if ( (len != 4 && len != 8) || !IS_ALIGNED(address, len) ) > > + /* Ignore unaligned writes. */ > > + if ( !IS_ALIGNED(address, len) ) > > return X86EMUL_OKAY; > > > > /* > > @@ -374,16 +550,25 @@ static bool cf_check msixtbl_range( > > { > > struct vcpu *curr = current; > > unsigned long addr = r->addr; > > - const struct msi_desc *desc; > > + const struct msixtbl_entry *entry; > > + bool ret = false; > > > > ASSERT(r->type == IOREQ_TYPE_COPY); > > > > rcu_read_lock(&msixtbl_rcu_lock); > > - desc = msixtbl_addr_to_desc(msixtbl_find_entry(curr, addr), addr); > > + entry = msixtbl_find_entry(curr, addr); > > + if ( entry ) > > + { > > + if ( addr < entry->gtable || addr >= entry->gtable + > > entry->table_len ) > > + /* Possibly handle adjacent access. */ > > + ret = true; > > + else > > + ret = msixtbl_addr_to_desc(entry, addr) != NULL; > > + } > > You could probably put all this into a single condition: > > if ( entry && > /* Adjacent access. */ > (addr < entry->gtable || addr >= entry->gtable + entry->table_len || > /* Otherwise check if there's a matching msi_desc. */ > msixtbl_addr_to_desc(entry, addr)) ) > ret = true; > > That's IMO easier to read by setting ret once only. Is multi-line "if" mixed with comments really easier to follow? > > > rcu_read_unlock(&msixtbl_rcu_lock); > > > > - if ( desc ) > > - return 1; > > + if ( ret ) > > + return ret; > > > > if ( r->state == STATE_IOREQ_READY && r->dir == IOREQ_WRITE ) > > { > > @@ -429,7 +614,7 @@ static bool cf_check msixtbl_range( > > } > > } > > > > - return 0; > > + return false; > > } > > > > static const struct hvm_io_ops msixtbl_mmio_ops = { > > diff --git a/xen/arch/x86/include/asm/msi.h b/xen/arch/x86/include/asm/msi.h > > index bcfdfd35345d..923b730d48b8 100644 > > --- a/xen/arch/x86/include/asm/msi.h > > +++ b/xen/arch/x86/include/asm/msi.h > > @@ -224,6 +224,9 @@ struct arch_msix { > > } table, pba; > > int table_refcnt[MAX_MSIX_TABLE_PAGES]; > > int table_idx[MAX_MSIX_TABLE_PAGES]; > > +#define ADJ_IDX_FIRST 0 > > +#define ADJ_IDX_LAST 1 > > + unsigned int adj_access_idx[2]; > > spinlock_t table_lock; > > bool host_maskall, guest_maskall; > > domid_t warned_domid; > > @@ -231,6 +234,8 @@ struct arch_msix { > > uint8_t all; > > struct { > > bool maskall : 1; > > + bool adjacent_not_initialized : 1; > > Not sure we need that, since we already warn of failure to map at > initialization time, not sure it's worth also printing another error > at access time. See response earlier above. > > + bool adjacent_pba : 1; > > }; > > } warned_kind; > > }; > > diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c > > index 42c793426da3..87190a88ed5d 100644 > > --- a/xen/arch/x86/msi.c > > +++ b/xen/arch/x86/msi.c > > @@ -913,6 +913,37 @@ static int msix_capability_init(struct pci_dev *dev, > > list_add_tail(&entry->list, &dev->msi_list); > > *desc = entry; > > } > > + else > > + { > > + /* > > + * If the MSI-X table doesn't start at the page boundary, map the > > first page for > > + * passthrough accesses. > > + */ > > + if ( PAGE_OFFSET(table_paddr) ) > > + { > > + int idx = msix_get_fixmap(msix, table_paddr, table_paddr); > > + > > + if ( idx > 0 ) > > + msix->adj_access_idx[ADJ_IDX_FIRST] = idx; > > + else > > + gprintk(XENLOG_ERR, "Failed to map first MSI-X table page: > > %d\n", idx); > > + } > > + /* > > + * If the MSI-X table doesn't end on the page boundary, map the > > last page > > + * for passthrough accesses. > > + */ > > + if ( PAGE_OFFSET(table_paddr + msix->nr_entries * > > PCI_MSIX_ENTRY_SIZE) ) > > + { > > + uint64_t entry_paddr = table_paddr + > > + (msix->nr_entries - 1) * PCI_MSIX_ENTRY_SIZE; > > + int idx = msix_get_fixmap(msix, table_paddr, entry_paddr); > > + > > + if ( idx > 0 ) > > + msix->adj_access_idx[ADJ_IDX_LAST] = idx; > > + else > > + gprintk(XENLOG_ERR, "Failed to map last MSI-X table page: > > %d\n", idx); > > Could you prefix the messages with the SBDF of the device please? > It's in the seg, bus, slot, func local variables AFAICT. Ok (%pp + dev->sbdf). -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab Attachment:
signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |