|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock
Hi Roger,
Roger Pau Monné <roger.pau@xxxxxxxxxx> writes:
> On Thu, Jul 20, 2023 at 12:32:31AM +0000, Volodymyr Babchuk wrote:
>> Add per-domain d->pci_lock that protects access to
>> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
>> that underlying pdev will not disappear under feet. This is a rw-lock,
>> but this patch adds only write_lock()s. There will be read_lock()
>> users in the next patches.
>>
>> This lock should be taken in write mode every time d->pdev_list is
>> altered. This covers both accesses to d->pdev_list and accesses to
>> pdev->domain_list fields. All write accesses also should be protected
>> by pcidevs_lock() as well. Idea is that any user that wants read
>> access to the list or to the devices stored in the list should use
>> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
>> this two locks will ensure only that pdev of interest will not
>> disappear from under feet and that the pdev still will be assigned to
>> the same domain. Of course, any new users should use pcidevs_lock()
>> when it is appropriate (e.g. when accessing any other state that is
>> protected by the said lock).
>
> I think this needs a note about the ordering:
>
> "In case both the newly introduced per-domain rwlock and the pcidevs
> lock is taken, the later must be acquired first."
Thanks. Added.
>>
>> Any write access to pdev->domain_list should be protected by both
>> pcidevs_lock() and d->pci_lock in the write mode.
>
> You also protect calls to vpci_remove_device() with the per-domain
> pci_lock it seems, and that will need some explanation as it's not
> obvious.
Well, strictly speaking, it is not required in this patch. But it is
needed in the next one. I can lock only "list_del(&pdev->domain_list);"
end extend then locked area in the next patch. On other hand, this patch
already protects vpci_add_handlers() call in the pci_add_device() due to
the code layout, so it may be natural to protect vpci_remove_device() as
well. What is your opinion?
>>
>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
>>
>> ---
>>
>> Changes in v8:
>> - New patch
>>
>> Changes in v8 vs RFC:
>> - Removed all read_locks after discussion with Roger in #xendevel
>> - pci_release_devices() now returns the first error code
>> - extended commit message
>> - added missing lock in pci_remove_device()
>> - extended locked region in pci_add_device() to protect list_del() calls
>> ---
>> xen/common/domain.c | 1 +
>> xen/drivers/passthrough/amd/pci_amd_iommu.c | 9 ++-
>> xen/drivers/passthrough/pci.c | 68 +++++++++++++++++----
>> xen/drivers/passthrough/vtd/iommu.c | 9 ++-
>> xen/include/xen/sched.h | 1 +
>> 5 files changed, 74 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index caaa402637..5d8a8836da 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>>
>> #ifdef CONFIG_HAS_PCI
>> INIT_LIST_HEAD(&d->pdev_list);
>> + rwlock_init(&d->pci_lock);
>> #endif
>>
>> /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> index 94e3775506..e2f2e2e950 100644
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>>
>> if ( devfn == pdev->devfn && pdev->domain != target )
>> {
>> - list_move(&pdev->domain_list, &target->pdev_list);
>> - pdev->domain = target;
>
> You seem to have inadvertently dropped the above line? (and so devices
> would keep the previous pdev->domain value)
>
Oops, yes. Thank you. I was testing those patches on Intel machine, so
AMD part left not verified.
>> + write_lock(&pdev->domain->pci_lock);
>> + list_del(&pdev->domain_list);
>> + write_unlock(&pdev->domain->pci_lock);
>> +
>> + write_lock(&target->pci_lock);
>> + list_add(&pdev->domain_list, &target->pdev_list);
>> + write_unlock(&target->pci_lock);
>> }
>>
>> /*
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 95846e84f2..5b4632ead2 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>> if ( pdev->domain )
>> return;
>> pdev->domain = dom_xen;
>> + write_lock(&dom_xen->pci_lock);
>> list_add(&pdev->domain_list, &dom_xen->pdev_list);
>> + write_unlock(&dom_xen->pci_lock);
>> }
>>
>> int __init pci_hide_device(unsigned int seg, unsigned int bus,
>> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> ret = 0;
>> if ( !pdev->domain )
>> {
>> + write_lock(&hardware_domain->pci_lock);
>> pdev->domain = hardware_domain;
>> list_add(&pdev->domain_list, &hardware_domain->pdev_list);
>>
>> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>> list_del(&pdev->domain_list);
>> pdev->domain = NULL;
>> + write_unlock(&hardware_domain->pci_lock);
>
> Strictly speaking, this could move one line earlier, as accesses to
> pdev->domain are not protected by the d->pci_lock? Same in other
> instances (above and below), as you seem to introduce a pattern to
> perform accesses to pdev->domain with the rwlock taken.
>
Yes, you are right. I'll move the unlock() call.
>> goto out;
>> }
>> ret = iommu_add_device(pdev);
>> @@ -768,8 +772,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>> vpci_remove_device(pdev);
>> list_del(&pdev->domain_list);
>> pdev->domain = NULL;
>> + write_unlock(&hardware_domain->pci_lock);
>> goto out;
>> }
>> + write_unlock(&hardware_domain->pci_lock);
>> }
>> else
>> iommu_enable_device(pdev);
>> @@ -812,11 +818,13 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
>> list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list )
>> if ( pdev->bus == bus && pdev->devfn == devfn )
>> {
>> + write_lock(&pdev->domain->pci_lock);
>> vpci_remove_device(pdev);
>> pci_cleanup_msi(pdev);
>> ret = iommu_remove_device(pdev);
>> if ( pdev->domain )
>> list_del(&pdev->domain_list);
>> + write_unlock(&pdev->domain->pci_lock);
>
> Here you seem to protect more than strictly required, I would think
> only the list_del() would need to be done holding the rwlock?
>
Yes, I believe this is a spill from a next patch. At first all those
changes were introduced in "vpci: use per-domain PCI lock to protect
vpci structure", but then I decided to split changes into two patches.
>> printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
>> free_pdev(pseg, pdev);
>> break;
>> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t
>> seg, uint8_t bus,
>>
>> int pci_release_devices(struct domain *d)
>> {
>> - struct pci_dev *pdev, *tmp;
>> - u8 bus, devfn;
>> - int ret;
>> + int combined_ret;
>> + LIST_HEAD(failed_pdevs);
>>
>> pcidevs_lock();
>> - ret = arch_pci_clean_pirqs(d);
>> - if ( ret )
>> + write_lock(&d->pci_lock);
>> + combined_ret = arch_pci_clean_pirqs(d);
>
> Why do you need the per-domain rwlock for arch_pci_clean_pirqs()?
> That function doesn't modify the per-domain pdev list.
You are right, I will correct this in the next version.
>
>> + if ( combined_ret )
>> {
>> pcidevs_unlock();
>> - return ret;
>> + write_unlock(&d->pci_lock);
>> + return combined_ret;
>
> Ideally we would like to keep the same order on unlock, so the rwlock
> should be released before the pcidevs lock (unless there's a reason
> not to).
I'll move write_lock() further below, so this will be fixed automatically.
>
>> }
>> - list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
>> +
>> + while ( !list_empty(&d->pdev_list) )
>> {
>> - bus = pdev->bus;
>> - devfn = pdev->devfn;
>> - ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>> + struct pci_dev *pdev = list_first_entry(&d->pdev_list,
>> + struct pci_dev,
>> + domain_list);
>> + uint16_t seg = pdev->seg;
>> + uint8_t bus = pdev->bus;
>> + uint8_t devfn = pdev->devfn;
>> + int ret;
>> +
>> + write_unlock(&d->pci_lock);
>> + ret = deassign_device(d, seg, bus, devfn);
>> + write_lock(&d->pci_lock);
>> + if ( ret )
>> + {
>> + bool still_present = false;
>> + const struct pci_dev *tmp;
>> +
>> + /*
>> + * We need to check if deassign_device() left our pdev in
>> + * domain's list. As we dropped the lock, we can't be sure
>> + * that list wasn't permutated in some random way, so we
>> + * need to traverse the whole list.
>> + */
>> + for_each_pdev ( d, tmp )
>> + {
>> + if ( tmp == pdev )
>> + {
>> + still_present = true;
>> + break;
>> + }
>> + }
>> + if ( still_present )
>> + list_move(&pdev->domain_list, &failed_pdevs);
>
> You can get rid of the still_present variable, and just do:
>
> for_each_pdev ( d, tmp )
> if ( tmp == pdev )
> {
> list_move(&pdev->domain_list, &failed_pdevs);
> break;
> }
>
>
Yep, thanks.
--
WBR, Volodymyr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |