[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v13 01/14] vpci: use per-domain PCI lock to protect vpci structure
On 2/13/24 03:35, Roger Pau Monné wrote: > On Fri, Feb 02, 2024 at 04:33:05PM -0500, Stewart Hildebrand wrote: >> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> >> Use the per-domain PCI read/write lock to protect the presence of the >> pci device vpci field. This lock can be used (and in a few cases is used >> right away) so that vpci removal can be performed while holding the lock >> in write mode. Previously such removal could race with vpci_read for >> example. >> >> When taking both d->pci_lock and pdev->vpci->lock, they should be >> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid >> possible deadlock situations. >> >> 1. Per-domain's pci_lock is used to protect pdev->vpci structure >> from being removed. >> >> 2. Writing the command register and ROM BAR register may trigger >> modify_bars to run, which in turn may access multiple pdevs while >> checking for the existing BAR's overlap. The overlapping check, if >> done under the read lock, requires vpci->lock to be acquired on both >> devices being compared, which may produce a deadlock. It is not >> possible to upgrade read lock to write lock in such a case. So, in >> order to prevent the deadlock, use d->pci_lock in write mode instead. >> >> All other code, which doesn't lead to pdev->vpci destruction and does >> not access multiple pdevs at the same time, can still use a >> combination of the read lock and pdev->vpci->lock. >> >> 3. Drop const qualifier where the new rwlock is used and this is >> appropriate. >> >> 4. Do not call process_pending_softirqs with any locks held. For that >> unlock prior the call and re-acquire the locks after. After >> re-acquiring the lock there is no need to check if pdev->vpci exists: >> - in apply_map because of the context it is called (no race condition >> possible) >> - for MSI/MSI-X debug code because it is called at the end of >> pdev->vpci access and no further access to pdev->vpci is made >> >> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev() >> while accessing pdevs in vpci code. >> >> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs >> do not go away. The vPCI functions call several MSI-related functions >> which already have existing non-vPCI callers. Change those MSI-related >> functions to allow using either pcidevs_lock() or d->pci_lock for >> ensuring pdevs do not go away. Holding d->pci_lock in read mode is >> sufficient. Note that this pdev protection mechanism does not protect >> other state or critical sections. These MSI-related functions already >> have other race condition and state protection mechanims (e.g. >> d->event_lock and msixtbl RCU), so we deduce that the use of the global >> pcidevs_lock() is to ensure that pdevs do not go away. Existing non-vPCI >> callers of these MSI-related functions will remain (ab)using the global >> pcidevs_lock() to ensure pdevs do not go away so as to minimize changes >> to existing non-vPCI call paths. >> >> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking >> that pdevs do not go away. The purpose of this wrapper is to aid >> readability and document the intent of the pdev protection mechanism. > > I would add that when possible, the existing callers haven't been > switched to use the newly introduced per-domain pci_lock, and will > continue to use the global pcidevs lock. This is done to reduce the > risk of the new locking scheme introducing regressions. Those users > will be adjusted in due time. I'll use this wording, thanks > > IIRC Jan had concerns about why some existing use-cases are not > switched straight to use the new per-domain pci_lock in this patch. I hope the clarified commit description addresses this > >> >> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx> >> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx> >> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> >> --- >> Changes in v13: >> - hold off adding Roger's R-b tag even though it was provided on v12.2 >> - use a wrapper construct to ease readability of odd-looking ASSERTs >> - new placement of ASSERT in __pci_enable_msix(), __pci_enable_msi(), >> and pci_enable_msi(). Rearrange/add pdev NULL check. >> - expand commit description with details about using either >> pcidevs_lock() or d->pci_lock >> >> Changes in v12.2: >> - drop Roger's R-b >> - drop both locks on error paths in vpci_msix_arch_print() >> - add another ASSERT in vpci_msix_arch_print(), to enforce the >> expectation both locks are held before calling vpci_msix_arch_print() >> - move pdev_done label in vpci_dump_msi() >> - update comments in vpci_dump_msi() to say locks (plural) >> >> Changes in v12.1: >> - use read_trylock() in vpci_msix_arch_print() >> - fixup in-code comments (revert double space, use DomXEN) in >> vpci_{read,write}() >> - minor updates in commit message >> - add Roger's R-b >> >> Changes in v12: >> - s/pci_rwlock/pci_lock/ in commit message >> - expand comment about scope of pci_lock in sched.h >> - in vpci_{read,write}, if hwdom is trying to access a device assigned >> to dom_xen, holding hwdom->pci_lock is sufficient (no need to hold >> dom_xen->pci_lock) >> - reintroduce ASSERT in vmx_pi_update_irte() >> - reintroduce ASSERT in __pci_enable_msi{x}() >> - delete note 6. in commit message about removing ASSERTs since we have >> reintroduced them >> >> Changes in v11: >> - Fixed commit message regarding possible spinlocks >> - Removed parameter from allocate_and_map_msi_pirq(), which was added >> in the prev version. Now we are taking pcidevs_lock in >> physdev_map_pirq() >> - Returned ASSERT to pci_enable_msi >> - Fixed case when we took read lock instead of write one >> - Fixed label indentation >> >> Changes in v10: >> - Moved printk pas locked area >> - Returned back ASSERTs >> - Added new parameter to allocate_and_map_msi_pirq() so it knows if >> it should take the global pci lock >> - Added comment about possible improvement in vpci_write >> - Changed ASSERT(rw_is_locked()) to rw_is_write_locked() in >> appropriate places >> - Renamed release_domain_locks() to release_domain_write_locks() >> - moved domain_done label in vpci_dump_msi() to correct place >> Changes in v9: >> - extended locked region to protect vpci_remove_device and >> vpci_add_handlers() calls >> - vpci_write() takes lock in the write mode to protect >> potential call to modify_bars() >> - renamed lock releasing function >> - removed ASSERT()s from msi code >> - added trylock in vpci_dump_msi >> >> Changes in v8: >> - changed d->vpci_lock to d->pci_lock >> - introducing d->pci_lock in a separate patch >> - extended locked region in vpci_process_pending >> - removed pcidevs_lockis vpci_dump_msi() >> - removed some changes as they are not needed with >> the new locking scheme >> - added handling for hwdom && dom_xen case >> --- >> xen/arch/x86/hvm/vmsi.c | 31 +++++++++++++-------- >> xen/arch/x86/hvm/vmx/vmx.c | 2 +- >> xen/arch/x86/irq.c | 8 +++--- >> xen/arch/x86/msi.c | 20 +++++++++----- >> xen/arch/x86/physdev.c | 2 ++ >> xen/drivers/passthrough/pci.c | 9 +++--- >> xen/drivers/vpci/header.c | 18 ++++++++++++ >> xen/drivers/vpci/msi.c | 30 +++++++++++++++++--- >> xen/drivers/vpci/msix.c | 52 ++++++++++++++++++++++++++++++----- >> xen/drivers/vpci/vpci.c | 24 ++++++++++++++-- >> xen/include/xen/sched.h | 15 +++++++++- >> 11 files changed, 170 insertions(+), 41 deletions(-) >> >> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c >> index 128f23636279..f29089178a59 100644 >> --- a/xen/arch/x86/hvm/vmsi.c >> +++ b/xen/arch/x86/hvm/vmsi.c >> @@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq >> *pirq, uint64_t gtable) >> struct msixtbl_entry *entry, *new_entry; >> int r = -EINVAL; >> >> - ASSERT(pcidevs_locked()); >> + ASSERT(pdev_list_is_read_locked(d)); >> ASSERT(rw_is_write_locked(&d->event_lock)); >> >> if ( !msixtbl_initialised(d) ) >> @@ -538,7 +538,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq >> *pirq) >> struct pci_dev *pdev; >> struct msixtbl_entry *entry; >> >> - ASSERT(pcidevs_locked()); >> + ASSERT(pdev_list_is_read_locked(d)); >> ASSERT(rw_is_write_locked(&d->event_lock)); >> >> if ( !msixtbl_initialised(d) ) >> @@ -684,7 +684,7 @@ static int vpci_msi_update(const struct pci_dev *pdev, >> uint32_t data, >> { >> unsigned int i; >> >> - ASSERT(pcidevs_locked()); >> + ASSERT(rw_is_locked(&pdev->domain->pci_lock)); > > Any reason to not use the newly introduced helper here? I know the > pcidevs will never be locked here given the new lock usage, but still > it would be less confusing if the new helper was used consistently. Nope. Agreed, it would help readability, I'll switch to the helper. In addition to vpci_msi_update(), I assume this comment also applies to the remaining occurrences: xen/arch/x86/hvm/vmsi.c:vpci_msi_arch_update() xen/arch/x86/hvm/vmsi.c:vpci_msi_arch_enable() xen/arch/x86/hvm/vmsi.c:vpci_msi_disable() xen/arch/x86/hvm/vmsi.c:vpci_msix_arch_enable_entry() xen/drivers/vpci/msix.c:msix_find() But not: xen/arch/x86/hvm/vmsi.c:vpci_msix_arch_print() I'll add a suitable comment to this one exception. > > Otherwise we need a comment here as to why the helper can't be used, > in order to avoid confusion in the future. > >>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >> index 9da91e0e6244..c3adec1aca3c 100644 >> --- a/xen/include/xen/sched.h >> +++ b/xen/include/xen/sched.h >> @@ -462,7 +462,8 @@ struct domain >> #ifdef CONFIG_HAS_PCI >> struct list_head pdev_list; >> /* >> - * pci_lock protects access to pdev_list. >> + * pci_lock protects access to pdev_list. pci_lock also protects >> pdev->vpci >> + * structure from being removed. >> * >> * Any user *reading* from pdev_list, or from devices stored in >> pdev_list, >> * should hold either pcidevs_lock() or pci_lock in read mode. >> Optionally, >> @@ -628,6 +629,18 @@ struct domain >> unsigned int cdf; >> }; >> >> +/* >> + * Check for use in ASSERTs to ensure that: >> + * 1. we can *read* d->pdev_list >> + * 2. pdevs (belonging to this domain) do not go away >> + * 3. pdevs (belonging to this domain) do not get assigned to other >> domains > > I think you can just state that this check ensures there will be no > changes to the entries in d->pdev_list, but not the contents of each > entry. No changes to d->pdev_list already ensures not devices can be > deassigned or removed from the system, and obviously makes the list > safe to iterate against. OK, I'll simplify the comment > > I would also drop the explicitly mention this is intended for ASSERT > usage: there's nothing specific in the code that prevents it from > being used in other places (albeit I think that's unlikely). > >> + * This check is not suitable for protecting other state or critical >> regions. >> + */ >> +#define pdev_list_is_read_locked(d) ({ \ > > I would be tempted to drop at least the '_read_' part from the name, > the name is getting a bit too long for my taste. > >> + struct domain *d_ = (d); \ > > Why do you need this local domain variable? Can't you use the d > parameter directly? > > Such assign will prevent using a const 'd' parameter, and 'd_' itself > should be const IMO (iff we really need this). > > Also sched.h is not the best place, can't you just place it in > pci.h? Yes, I'll move it to xen/include/xen/pci.h
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |