[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure


  • To: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • Date: Tue, 28 Nov 2023 22:24:12 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KxxTC7MzxsgS0nkgkUExKzZbzLVl7qMR8fJ2O9b8uzc=; b=icgxL9Su1OjcvuT/Va8FsGTYZBpUFK33dtd5yFnp+IyzbIAKh3K9tSpql/S+lQ+N/I5VjZk+ZAx3bAHBk6wmOmz+s4p6q7mQpj+c6BHCDwDJKoJUv+e96b37GWACNn3hc1bIlafK3vpKlE7Ps+zapq2ETO2GwCpJuZ8QlTrCuNJ2JPGGnzMI6b1ZstqrH3zL2UhRnrzJOuOcqNfSMHMFm7B9Sugip6eAk3Vp59RS/3nUPdFSqHbRqD6thQzRFF1skjPadgInC5D1QTU0EqoectIsJreTSqmefzEjMg5ubsxzlUYQh0HjV40LYp/4TkkzhRi9Vrjv9zzjZ8C9QvWD3A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fVuyCIi4KftF7ZnDqDizONje6W2i+iC3Hzb4soOlTY0qbcMb0QBm+IeVwVJpvKL96bfIIwpN0LO6kSfcow2JQ2UiCz5hNqK0KLzlocT2kqFvs6TuZgL9SNA/OoHIRH0vI2Htypz5xxJtW3lbYgkxvdd++RcnJ4G/dpAm5z0Dpj/FDWb6UqebcaVNj8eD+JCg6aXZ3YCcAv9YKI9kuGZViCnrM8xKfxbFs5+nIpfnzhnwBe/RKqQilcMvzauiG052QjtYxaWrAjeoeNbciZJIyrG9ejZSk6ziGm4LWOUM9kvX/P84gbskqD9zOmVj0vUKILfEoxG4IrKPBNrYKDtFpg==
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stewart Hildebrand <stewart.hildebrand@xxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Paul Durrant <paul@xxxxxxx>
  • Delivery-date: Tue, 28 Nov 2023 22:24:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZ/Vi9bRc3EUPSDEy0THfJgPlJLLB+12MAgBGozoA=
  • Thread-topic: [PATCH v10 03/17] vpci: use per-domain PCI lock to protect vpci structure

Hi Roger

Thank you for the review.

Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:

> On Thu, Oct 12, 2023 at 10:09:15PM +0000, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>> 
>> Use a previously introduced per-domain read/write lock to check
>> whether vpci is present, so we are sure there are no accesses to the
>> contents of the vpci struct if not. This lock can be used (and in a
>> few cases is used right away) so that vpci removal can be performed
>> while holding the lock in write mode. Previously such removal could
>> race with vpci_read for example.
>> 
>> When taking both d->pci_lock and pdev->vpci->lock, they should be
>> taken in this exact order: d->pci_lock then pdev->vpci->lock to avoid
>> possible deadlock situations.
>> 
>> 1. Per-domain's pci_rwlock is used to protect pdev->vpci structure
>> from being removed.
>> 
>> 2. Writing the command register and ROM BAR register may trigger
>> modify_bars to run, which in turn may access multiple pdevs while
>> checking for the existing BAR's overlap. The overlapping check, if
>> done under the read lock, requires vpci->lock to be acquired on both
>> devices being compared, which may produce a deadlock. It is not
>> possible to upgrade read lock to write lock in such a case. So, in
>> order to prevent the deadlock, use d->pci_lock instead. To prevent
>> deadlock while locking both hwdom->pci_lock and dom_xen->pci_lock,
>> always lock hwdom first.
>> 
>> All other code, which doesn't lead to pdev->vpci destruction and does
>> not access multiple pdevs at the same time, can still use a
>> combination of the read lock and pdev->vpci->lock.
>> 
>> 3. Drop const qualifier where the new rwlock is used and this is
>> appropriate.
>> 
>> 4. Do not call process_pending_softirqs with any locks held. For that
>> unlock prior the call and re-acquire the locks after. After
>> re-acquiring the lock there is no need to check if pdev->vpci exists:
>>  - in apply_map because of the context it is called (no race condition
>>    possible)
>>  - for MSI/MSI-X debug code because it is called at the end of
>>    pdev->vpci access and no further access to pdev->vpci is made
>> 
>> 5. Use d->pci_lock around for_each_pdev and pci_get_pdev_by_domain
>> while accessing pdevs in vpci code.
>> 
>> 6. We are removing multiple ASSERT(pcidevs_locked()) instances because
>> they are too strict now: they should be corrected to
>> ASSERT(pcidevs_locked() || rw_is_locked(&d->pci_lock)), but problem is
>> that mentioned instances does not have access to the domain
>> pointer and it is not feasible to pass a domain pointer to a function
>> just for debugging purposes.
>> 
>> There is a possible lock inversion in MSI code, as some parts of it
>> acquire pcidevs_lock() while already holding d->pci_lock.
>
> Is this going to be addressed in a further patch?
>

It is actually addressed in this patch, in the v10. I just forgot to
remove this sentence from the patch description. My bad.

This was fixed by additional parameter to allocate_and_map_msi_pirq(),
as it is being called from two places: from vpci_msi_enable(), while we
already are holding d->pci_lock, or from physdev_map_pirq(), when there
are no locks are taken.

[...]

>> @@ -2908,7 +2908,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
>> index, int *pirq_p,
>>  
>>      msi->irq = irq;
>>  
>> -    pcidevs_lock();
>> +    if ( use_pci_lock )
>> +        pcidevs_lock();
>
> Instead of passing the flag it might be better if the caller can take
> the lock, as to avoid having to pass an extra parameter.
>
> Then we should also assert that either the pcidev_lock or the
> per-domain pci lock is taken?
>

This is a good idea. I'll add lock in physdev_map_pirq(). This is only
one place where we call physdev_map_pirq() without holding any lock.

>>      /* Verify or get pirq. */
>>      write_lock(&d->event_lock);
>>      pirq = allocate_pirq(d, index, *pirq_p, irq, type, &msi->entry_nr);
>> @@ -2924,7 +2925,8 @@ int allocate_and_map_msi_pirq(struct domain *d, int 
>> index, int *pirq_p,
>>  
>>   done:
>>      write_unlock(&d->event_lock);
>> -    pcidevs_unlock();
>> +    if ( use_pci_lock )
>> +        pcidevs_unlock();
>>      if ( ret )
>>      {
>>          switch ( type )
>> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
>> index 20275260b3..466725d8ca 100644
>> --- a/xen/arch/x86/msi.c
>> +++ b/xen/arch/x86/msi.c
>> @@ -602,7 +602,7 @@ static int msi_capability_init(struct pci_dev *dev,
>>      unsigned int i, mpos;
>>      uint16_t control;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(pcidevs_locked() || rw_is_locked(&dev->domain->pci_lock));
>>      pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
>>      if ( !pos )
>>          return -ENODEV;
>> @@ -771,7 +771,7 @@ static int msix_capability_init(struct pci_dev *dev,
>>      if ( !pos )
>>          return -ENODEV;
>>  
>> -    ASSERT(pcidevs_locked());
>> +    ASSERT(pcidevs_locked() || rw_is_locked(&dev->domain->pci_lock));
>>  
>>      control = pci_conf_read16(dev->sbdf, msix_control_reg(pos));
>>      /*
>> @@ -988,8 +988,6 @@ static int __pci_enable_msi(struct msi_info *msi, struct 
>> msi_desc **desc,
>>  {
>>      struct msi_desc *old_desc;
>>  
>> -    ASSERT(pcidevs_locked());
>> -
>>      if ( !pdev )
>>          return -ENODEV;
>>  
>> @@ -1043,8 +1041,6 @@ static int __pci_enable_msix(struct msi_info *msi, 
>> struct msi_desc **desc,
>>  {
>>      struct msi_desc *old_desc;
>>  
>> -    ASSERT(pcidevs_locked());
>> -
>>      if ( !pdev || !pdev->msix )
>>          return -ENODEV;
>>  
>> @@ -1154,8 +1150,6 @@ int pci_prepare_msix(u16 seg, u8 bus, u8 devfn, bool 
>> off)
>>  int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc,
>>                 struct pci_dev *pdev)
>>  {
>> -    ASSERT(pcidevs_locked());
>> -
>
> If you have the pdev in all the above function, you could expand the
> assert to test for pdev->domain->pci_lock?
>

Yes, you are right. This is the leftover from times where
pci_enable_msi() acquired pdev pointer by itself.

[...]

>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>> index 767c1ba718..a52e52db96 100644
>> --- a/xen/drivers/vpci/header.c
>> +++ b/xen/drivers/vpci/header.c
>> @@ -172,6 +172,7 @@ bool vpci_process_pending(struct vcpu *v)
>>          if ( rc == -ERESTART )
>>              return true;
>>  
>> +        write_lock(&v->domain->pci_lock);
>>          spin_lock(&v->vpci.pdev->vpci->lock);
>>          /* Disable memory decoding unconditionally on failure. */
>>          modify_decoding(v->vpci.pdev,
>> @@ -190,6 +191,7 @@ bool vpci_process_pending(struct vcpu *v)
>>               * failure.
>>               */
>>              vpci_remove_device(v->vpci.pdev);
>> +        write_unlock(&v->domain->pci_lock);
>>      }
>>  
>>      return false;
>> @@ -201,8 +203,20 @@ static int __init apply_map(struct domain *d, const 
>> struct pci_dev *pdev,
>>      struct map_data data = { .d = d, .map = true };
>>      int rc;
>>  
>> +    ASSERT(rw_is_write_locked(&d->pci_lock));
>> +
>>      while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == 
>> -ERESTART )
>> +    {
>> +        /*
>> +         * It's safe to drop and reacquire the lock in this context
>> +         * without risking pdev disappearing because devices cannot be
>> +         * removed until the initial domain has been started.
>> +         */
>> +        read_unlock(&d->pci_lock);
>>          process_pending_softirqs();
>> +        read_lock(&d->pci_lock);
>
> You are asserting the lock is taken in write mode just above the usage
> of read_{un,}lock().  Either the assert is wrong, or the usage of
> read_{un,}lock() is wrong.

Oops, looks like this is the rebasing artifact. The final version of the
code uses write locks of course. Patch "vpci/header: handle p2m range
sets per BAR" changes this piece of code too and replaces read lock
with the write lock. I'll move that change to this patch.

[...]

>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 3bec9a4153..112de56fb3 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -38,6 +38,8 @@ extern vpci_register_init_t *const __end_vpci_array[];
>>  
>>  void vpci_remove_device(struct pci_dev *pdev)
>>  {
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>>      if ( !has_vpci(pdev->domain) || !pdev->vpci )
>>          return;
>>  
>> @@ -73,6 +75,8 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>      const unsigned long *ro_map;
>>      int rc = 0;
>>  
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>>      if ( !has_vpci(pdev->domain) )
>>          return 0;
>>  
>> @@ -326,11 +330,12 @@ static uint32_t merge_result(uint32_t data, uint32_t 
>> new, unsigned int size,
>>  
>>  uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
>>  {
>> -    const struct domain *d = current->domain;
>> +    struct domain *d = current->domain;
>>      const struct pci_dev *pdev;
>>      const struct vpci_register *r;
>>      unsigned int data_offset = 0;
>>      uint32_t data = ~(uint32_t)0;
>> +    rwlock_t *lock;
>>  
>>      if ( !size )
>>      {
>> @@ -342,11 +347,21 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size)
>>       * Find the PCI dev matching the address, which for hwdom also requires
>>       * consulting DomXEN.  Passthrough everything that's not trapped.
>>       */
>> +    lock = &d->pci_lock;
>> +    read_lock(lock);
>>      pdev = pci_get_pdev(d, sbdf);
>>      if ( !pdev && is_hardware_domain(d) )
>> +    {
>> +        read_unlock(lock);
>> +        lock = &dom_xen->pci_lock;
>> +        read_lock(lock);
>>          pdev = pci_get_pdev(dom_xen, sbdf);
>
> I'm unsure whether devices assigned to dom_xen can change ownership
> after boot, so maybe there's no need for all this lock dance, as the
> device cannot disappear?
>
> Maybe just taking the hardware domain lock is enough to prevent
> concurrent accesses in that case, as the hardware domain is the only
> allowed to access devices owned by dom_xen.

This sounds correct. If there are no objections then I can remove extra
locks.

-- 
WBR, Volodymyr

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.