|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v9 01/16] pci: introduce per-domain PCI rwlock
On Tue, Aug 29, 2023 at 11:19:42PM +0000, Volodymyr Babchuk wrote:
> Add per-domain d->pci_lock that protects access to
> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
> that underlying pdev will not disappear under feet. This is a rw-lock,
> but this patch adds only write_lock()s. There will be read_lock()
> users in the next patches.
>
> This lock should be taken in write mode every time d->pdev_list is
> altered. This covers both accesses to d->pdev_list and accesses to
> pdev->domain_list fields.
Why do you mention pdev->domain_list here? I don't think the lock
covers accesses to pdev->domain_list, unless that domain_list field
happens to be part of the linked list in d->pdev_list. I find it kind
of odd to mention here.
> All write accesses also should be protected
> by pcidevs_lock() as well. Idea is that any user that wants read
> access to the list or to the devices stored in the list should use
> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
> this two locks will ensure only that pdev of interest will not
> disappear from under feet and that the pdev still will be assigned to
> the same domain. Of course, any new users should use pcidevs_lock()
> when it is appropriate (e.g. when accessing any other state that is
> protected by the said lock). In case both the newly introduced
> per-domain rwlock and the pcidevs lock is taken, the later must be
> acquired first.
>
> Any write access to pdev->domain_list should be protected by both
> pcidevs_lock() and d->pci_lock in the write mode.
>
> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
>
> ---
>
> Changes in v9:
> - returned back "pdev->domain = target;" in AMD IOMMU code
> - used "source" instead of pdev->domain in IOMMU functions
> - added comment about lock ordering in the commit message
> - reduced locked regions
> - minor changes non-functional changes in various places
>
> Changes in v8:
> - New patch
>
> Changes in v8 vs RFC:
> - Removed all read_locks after discussion with Roger in #xendevel
> - pci_release_devices() now returns the first error code
> - extended commit message
> - added missing lock in pci_remove_device()
> - extended locked region in pci_add_device() to protect list_del() calls
> ---
> xen/common/domain.c | 1 +
> xen/drivers/passthrough/amd/pci_amd_iommu.c | 9 ++-
> xen/drivers/passthrough/pci.c | 71 +++++++++++++++++----
> xen/drivers/passthrough/vtd/iommu.c | 9 ++-
> xen/include/xen/sched.h | 1 +
> 5 files changed, 78 insertions(+), 13 deletions(-)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 304aa04fa6..9b04a20160 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -651,6 +651,7 @@ struct domain *domain_create(domid_t domid,
>
> #ifdef CONFIG_HAS_PCI
> INIT_LIST_HEAD(&d->pdev_list);
> + rwlock_init(&d->pci_lock);
> #endif
>
> /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index bea70db4b7..d219bd9453 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,7 +476,14 @@ static int cf_check reassign_device(
>
> if ( devfn == pdev->devfn && pdev->domain != target )
> {
> - list_move(&pdev->domain_list, &target->pdev_list);
> + write_lock(&source->pci_lock);
> + list_del(&pdev->domain_list);
> + write_unlock(&source->pci_lock);
> +
> + write_lock(&target->pci_lock);
> + list_add(&pdev->domain_list, &target->pdev_list);
> + write_unlock(&target->pci_lock);
> +
> pdev->domain = target;
While I don't think this is strictly an issue right now, it would be
better to set pdev->domain before the device is added to domain_list.
A pattern like:
read_lock(d->pci_lock);
for_each_pdev(d, pdev)
foo(pdev->domain);
read_unlock(d->pci_lock);
Wouldn't work currently if the pdev is added to domain_list before the
pdev->domain field is updated to reflect the new owner.
> }
>
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 33452791a8..79ca928672 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
> if ( pdev->domain )
> return;
> pdev->domain = dom_xen;
> + write_lock(&dom_xen->pci_lock);
> list_add(&pdev->domain_list, &dom_xen->pdev_list);
> + write_unlock(&dom_xen->pci_lock);
> }
>
> int __init pci_hide_device(unsigned int seg, unsigned int bus,
> @@ -748,7 +750,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> if ( !pdev->domain )
> {
> pdev->domain = hardware_domain;
> + write_lock(&hardware_domain->pci_lock);
> list_add(&pdev->domain_list, &hardware_domain->pdev_list);
> + write_unlock(&hardware_domain->pci_lock);
>
> /*
> * For devices not discovered by Xen during boot, add vPCI handlers
> @@ -758,7 +762,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> if ( ret )
> {
> printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
> + write_lock(&hardware_domain->pci_lock);
> list_del(&pdev->domain_list);
> + write_unlock(&hardware_domain->pci_lock);
> pdev->domain = NULL;
> goto out;
> }
> @@ -766,7 +772,9 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
> if ( ret )
> {
> vpci_remove_device(pdev);
> + write_lock(&hardware_domain->pci_lock);
> list_del(&pdev->domain_list);
> + write_unlock(&hardware_domain->pci_lock);
> pdev->domain = NULL;
> goto out;
> }
> @@ -816,7 +824,11 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> pci_cleanup_msi(pdev);
> ret = iommu_remove_device(pdev);
> if ( pdev->domain )
> + {
> + write_lock(&pdev->domain->pci_lock);
> list_del(&pdev->domain_list);
> + write_unlock(&pdev->domain->pci_lock);
> + }
> printk(XENLOG_DEBUG "PCI remove device %pp\n", &pdev->sbdf);
> free_pdev(pseg, pdev);
> break;
> @@ -887,26 +899,61 @@ static int deassign_device(struct domain *d, uint16_t
> seg, uint8_t bus,
>
> int pci_release_devices(struct domain *d)
> {
> - struct pci_dev *pdev, *tmp;
> - u8 bus, devfn;
> - int ret;
> + int combined_ret;
> + LIST_HEAD(failed_pdevs);
>
> pcidevs_lock();
> - ret = arch_pci_clean_pirqs(d);
> - if ( ret )
> +
> + combined_ret = arch_pci_clean_pirqs(d);
> + if ( combined_ret )
> {
> pcidevs_unlock();
> - return ret;
> + return combined_ret;
> }
> - list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
> +
> + write_lock(&d->pci_lock);
> +
> + while ( !list_empty(&d->pdev_list) )
> {
> - bus = pdev->bus;
> - devfn = pdev->devfn;
> - ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
> + struct pci_dev *pdev = list_first_entry(&d->pdev_list,
> + struct pci_dev,
> + domain_list);
> + uint16_t seg = pdev->seg;
> + uint8_t bus = pdev->bus;
> + uint8_t devfn = pdev->devfn;
> + int ret;
> +
> + write_unlock(&d->pci_lock);
> + ret = deassign_device(d, seg, bus, devfn);
> + write_lock(&d->pci_lock);
> + if ( ret )
> + {
> + const struct pci_dev *tmp;
> +
> + /*
> + * We need to check if deassign_device() left our pdev in
> + * domain's list. As we dropped the lock, we can't be sure
> + * that list wasn't permutated in some random way, so we
> + * need to traverse the whole list.
> + */
> + for_each_pdev ( d, tmp )
> + {
> + if ( tmp == pdev )
> + {
> + list_move_tail(&pdev->domain_list, &failed_pdevs);
> + break;
> + }
> + }
> +
> + combined_ret = combined_ret ?: ret;
> + }
> }
> +
> + list_splice(&failed_pdevs, &d->pdev_list);
> + write_unlock(&d->pci_lock);
> pcidevs_unlock();
>
> - return ret;
> + return combined_ret;
> }
>
> #define PCI_CLASS_BRIDGE_HOST 0x0600
> @@ -1125,7 +1172,9 @@ static int __hwdom_init cf_check
> _setup_hwdom_pci_devices(
> if ( !pdev->domain )
> {
> pdev->domain = ctxt->d;
> + write_lock(&ctxt->d->pci_lock);
> list_add(&pdev->domain_list, &ctxt->d->pdev_list);
> + write_unlock(&ctxt->d->pci_lock);
> setup_one_hwdom_device(ctxt, pdev);
> }
> else if ( pdev->domain == dom_xen )
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 0e3062c820..3228900c97 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2806,7 +2806,14 @@ static int cf_check reassign_device_ownership(
>
> if ( devfn == pdev->devfn && pdev->domain != target )
> {
> - list_move(&pdev->domain_list, &target->pdev_list);
> + write_lock(&source->pci_lock);
> + list_del(&pdev->domain_list);
> + write_unlock(&source->pci_lock);
> +
> + write_lock(&target->pci_lock);
> + list_add(&pdev->domain_list, &target->pdev_list);
> + write_unlock(&target->pci_lock);
> +
> pdev->domain = target;
Same comment as in reassign_device() above.
Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |