[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v13 01/14] vpci: use per-domain PCI lock to protect vpci structure



On Fri, Feb 02, 2024 at 04:33:05PM -0500, Stewart Hildebrand wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> 
> Use the per-domain PCI read/write lock to protect the presence of the
> pci device vpci field. This lock can be used (and in a few cases is used
> right away) so that vpci removal can be performed while holding the lock
> in write mode. Previously such removal could race with vpci_read for
> example.
> 
> When taking both d->pci_lock and pdev->vpci->lock, they should be
> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
> possible deadlock situations.
> 
> 1. Per-domain's pci_lock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if
> done under the read lock, requires vpci->lock to be acquired on both
> devices being compared, which may produce a deadlock. It is not
> possible to upgrade read lock to write lock in such a case. So, in
> order to prevent the deadlock, use d->pci_lock in write mode instead.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does
> not access multiple pdevs at the same time, can still use a
> combination of the read lock and pdev->vpci->lock.
> 
> 3. Drop const qualifier where the new rwlock is used and this is
> appropriate.
> 
> 4. Do not call process_pending_softirqs with any locks held. For that
> unlock prior the call and re-acquire the locks after. After
> re-acquiring the lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>    possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>    pdev->vpci access and no further access to pdev->vpci is made
> 
> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev()
> while accessing pdevs in vpci code.
> 
> 6. Switch vPCI functions to use per-domain pci_lock for ensuring pdevs
> do not go away. The vPCI functions call several MSI-related functions
> which already have existing non-vPCI callers. Change those MSI-related
> functions to allow using either pcidevs_lock() or d->pci_lock for
> ensuring pdevs do not go away. Holding d->pci_lock in read mode is
> sufficient. Note that this pdev protection mechanism does not protect
> other state or critical sections. These MSI-related functions already
> have other race condition and state protection mechanims (e.g.
> d->event_lock and msixtbl RCU), so we deduce that the use of the global
> pcidevs_lock() is to ensure that pdevs do not go away. Existing non-vPCI
> callers of these MSI-related functions will remain (ab)using the global
> pcidevs_lock() to ensure pdevs do not go away so as to minimize changes
> to existing non-vPCI call paths.
> 
> 7. Introduce wrapper construct, pdev_list_is_read_locked(), for checking
> that pdevs do not go away. The purpose of this wrapper is to aid
> readability and document the intent of the pdev protection mechanism.

I would add that when possible, the existing callers haven't been
switched to use the newly introduced per-domain pci_lock, and will
continue to use the global pcidevs lock.  This is done to reduce the
risk of the new locking scheme introducing regressions.  Those users
will be adjusted in due time.

IIRC Jan had concerns about why some existing use-cases are not
switched straight to use the new per-domain pci_lock in this patch.

> 
> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx>
> ---
> Changes in v13:
>  - hold off adding Roger's R-b tag even though it was provided on v12.2
>  - use a wrapper construct to ease readability of odd-looking ASSERTs
>  - new placement of ASSERT in __pci_enable_msix(), __pci_enable_msi(),
>    and pci_enable_msi(). Rearrange/add pdev NULL check.
>  - expand commit description with details about using either
>    pcidevs_lock() or d->pci_lock
> 
> Changes in v12.2:
>  - drop Roger's R-b
>  - drop both locks on error paths in vpci_msix_arch_print()
>  - add another ASSERT in vpci_msix_arch_print(), to enforce the
>    expectation both locks are held before calling vpci_msix_arch_print()
>  - move pdev_done label in vpci_dump_msi()
>  - update comments in vpci_dump_msi() to say locks (plural)
> 
> Changes in v12.1:
>  - use read_trylock() in vpci_msix_arch_print()
>  - fixup in-code comments (revert double space, use DomXEN) in
>    vpci_{read,write}()
>  - minor updates in commit message
>  - add Roger's R-b
> 
> Changes in v12:
>  - s/pci_rwlock/pci_lock/ in commit message
>  - expand comment about scope of pci_lock in sched.h
>  - in vpci_{read,write}, if hwdom is trying to access a device assigned
>    to dom_xen, holding hwdom->pci_lock is sufficient (no need to hold
>    dom_xen->pci_lock)
>  - reintroduce ASSERT in vmx_pi_update_irte()
>  - reintroduce ASSERT in __pci_enable_msi{x}()
>  - delete note 6. in commit message about removing ASSERTs since we have
>    reintroduced them
> 
> Changes in v11:
>  - Fixed commit message regarding possible spinlocks
>  - Removed parameter from allocate_and_map_msi_pirq(), which was added
>  in the prev version. Now we are taking pcidevs_lock in
>  physdev_map_pirq()
>  - Returned ASSERT to pci_enable_msi
>  - Fixed case when we took read lock instead of write one
>  - Fixed label indentation
> 
> Changes in v10:
>  - Moved printk pas locked area
>  - Returned back ASSERTs
>  - Added new parameter to allocate_and_map_msi_pirq() so it knows if
>  it should take the global pci lock
>  - Added comment about possible improvement in vpci_write
>  - Changed ASSERT(rw_is_locked()) to rw_is_write_locked() in
>    appropriate places
>  - Renamed release_domain_locks() to release_domain_write_locks()
>  - moved domain_done label in vpci_dump_msi() to correct place
> Changes in v9:
>  - extended locked region to protect vpci_remove_device and
>    vpci_add_handlers() calls
>  - vpci_write() takes lock in the write mode to protect
>    potential call to modify_bars()
>  - renamed lock releasing function
>  - removed ASSERT()s from msi code
>  - added trylock in vpci_dump_msi
> 
> Changes in v8:
>  - changed d->vpci_lock to d->pci_lock
>  - introducing d->pci_lock in a separate patch
>  - extended locked region in vpci_process_pending
>  - removed pcidevs_lockis vpci_dump_msi()
>  - removed some changes as they are not needed with
>    the new locking scheme
>  - added handling for hwdom && dom_xen case
> ---
>  xen/arch/x86/hvm/vmsi.c       | 31 +++++++++++++--------
>  xen/arch/x86/hvm/vmx/vmx.c    |  2 +-
>  xen/arch/x86/irq.c            |  8 +++---
>  xen/arch/x86/msi.c            | 20 +++++++++-----
>  xen/arch/x86/physdev.c        |  2 ++
>  xen/drivers/passthrough/pci.c |  9 +++---
>  xen/drivers/vpci/header.c     | 18 ++++++++++++
>  xen/drivers/vpci/msi.c        | 30 +++++++++++++++++---
>  xen/drivers/vpci/msix.c       | 52 ++++++++++++++++++++++++++++++-----
>  xen/drivers/vpci/vpci.c       | 24 ++++++++++++++--
>  xen/include/xen/sched.h       | 15 +++++++++-
>  11 files changed, 170 insertions(+), 41 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 128f23636279..f29089178a59 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -468,7 +468,7 @@ int msixtbl_pt_register(struct domain *d, struct pirq 
> *pirq, uint64_t gtable)
>      struct msixtbl_entry *entry, *new_entry;
>      int r = -EINVAL;
>  
> -    ASSERT(pcidevs_locked());
> +    ASSERT(pdev_list_is_read_locked(d));
>      ASSERT(rw_is_write_locked(&d->event_lock));
>  
>      if ( !msixtbl_initialised(d) )
> @@ -538,7 +538,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq 
> *pirq)
>      struct pci_dev *pdev;
>      struct msixtbl_entry *entry;
>  
> -    ASSERT(pcidevs_locked());
> +    ASSERT(pdev_list_is_read_locked(d));
>      ASSERT(rw_is_write_locked(&d->event_lock));
>  
>      if ( !msixtbl_initialised(d) )
> @@ -684,7 +684,7 @@ static int vpci_msi_update(const struct pci_dev *pdev, 
> uint32_t data,
>  {
>      unsigned int i;
>  
> -    ASSERT(pcidevs_locked());
> +    ASSERT(rw_is_locked(&pdev->domain->pci_lock));

Any reason to not use the newly introduced helper here?  I know the
pcidevs will never be locked here given the new lock usage, but still
it would be less confusing if the new helper was used consistently.

Otherwise we need a comment here as to why the helper can't be used,
in order to avoid confusion in the future.

>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 9da91e0e6244..c3adec1aca3c 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -462,7 +462,8 @@ struct domain
>  #ifdef CONFIG_HAS_PCI
>      struct list_head pdev_list;
>      /*
> -     * pci_lock protects access to pdev_list.
> +     * pci_lock protects access to pdev_list. pci_lock also protects 
> pdev->vpci
> +     * structure from being removed.
>       *
>       * Any user *reading* from pdev_list, or from devices stored in 
> pdev_list,
>       * should hold either pcidevs_lock() or pci_lock in read mode. 
> Optionally,
> @@ -628,6 +629,18 @@ struct domain
>      unsigned int cdf;
>  };
>  
> +/*
> + * Check for use in ASSERTs to ensure that:
> + *   1. we can *read* d->pdev_list
> + *   2. pdevs (belonging to this domain) do not go away
> + *   3. pdevs (belonging to this domain) do not get assigned to other domains

I think you can just state that this check ensures there will be no
changes to the entries in d->pdev_list, but not the contents of each
entry.  No changes to d->pdev_list already ensures not devices can be
deassigned or removed from the system, and obviously makes the list
safe to iterate against.

I would also drop the explicitly mention this is intended for ASSERT
usage: there's nothing specific in the code that prevents it from
being used in other places (albeit I think that's unlikely).

> + * This check is not suitable for protecting other state or critical regions.
> + */
> +#define pdev_list_is_read_locked(d) ({                           \

I would be tempted to drop at least the '_read_' part from the name,
the name is getting a bit too long for my taste.

> +        struct domain *d_ = (d);                                 \

Why do you need this local domain variable?  Can't you use the d
parameter directly?

Such assign will prevent using a const 'd' parameter, and 'd_' itself
should be const IMO (iff we really need this).

Also sched.h is not the best place, can't you just place it in
pci.h?

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.