[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 01/10] xen: pci: add per-domain pci list lock


  • To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 15 Feb 2023 10:06:52 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uAMQHtUU+jC5CWIjD7TLCZeWF3JSpL7vegAHmsKnp38=; b=FjA3TKlrVuEvJ+vOGipnjw9tAkj4cZGkIjZfbMU08fWbnfcqNpwuWnQuHS5kO5oBpLkQ+QKCcnzaD+ilRUTwIDMRb7I928FZZlDAZsM6rYiyEtCBIXAeVb2ujU60N2t7K/1i4YGbOIUG5VXbnx17PUlJ6duIpVj8fY+MFvdqvNFf/+ODN/p0miWCK6EplzDYRYZMtB9i3RpvnVdvIBCOJwfcALUYJWeEJlQUdPvvVRvzj6djtSPWfEz33zwS+Pzt4r7swBicC/PoJ4vynuDOXF5NCZjc9UuGVig+SN8TYt78pzYBRY02fiIGDtEBtha8oAcuvuhv9eyKvQNggAWRZg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=M8urDbEM7dghsVBUxUYrsI1G19o6qoe8qZj9Er/2LChMLMc8be8Ir3/UMrKsMsKK8/m//e31XKnYt+AsFuOBxr9DmTuCPtHxM9c2HHhKa4IoGYORc7Qu2HtXhVAnoGVSDkB/g08meS9rUUBGLUTYarpHBIKldpAHT1Y7lKtVpteIugThf5jBFexI3Vt7Q0+EsR9GUL9IZSW/Jj3vk9nRf6U5WCob10IXPRmhyAQfoJxVOCs7kWqzHw/HoXgydm+LFrdISorE9uYN8iksh9iBwDZ6c30c2qGwGEhKLX2CpI/j/nUB3GxGHaoEWqQJw701VMw/XwbpbZ0N6Efso+evEA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Paul Durrant <paul@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, "Stewart.Hildebrand@xxxxxxx" <Stewart.Hildebrand@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Wed, 15 Feb 2023 09:07:14 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 15.02.2023 00:38, Volodymyr Babchuk wrote:
> Stefano Stabellini <sstabellini@xxxxxxxxxx> writes:
>> On Wed, 31 Aug 2022, Volodymyr Babchuk wrote:
>> 1) iommu.c:reassign_device_ownership and pci_amd_iommu.c:reassign_device
>> Both functions without any pdevs_lock locking do:
>> list_move(&pdev->domain_list, &target->pdev_list);
>>
>> It seems to be it would need pdevs_lock. Maybe we need to change
>> list_move into list_del (protected by the pdevs_lock of the old domain)
>> and list_add (protected by the pdev_lock of the new domain).
> 
> Yes, I did as you suggested. But this leads to another potential
> issue. I'll describe it below, in deassign_device() part.
> 
> [...]
> 
>>> +    spin_lock(&d->pdevs_lock);
>>>      list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
>>>      {
>>>          bus = pdev->bus;
>>>          devfn = pdev->devfn;
>>>          ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>>
>> This causes pdevs_lock to be taken twice. deassign_device also takes a
>> pdevs_lock.  Probably we need to change all the
>> spin_lock(&d->pdevs_lock) into spin_lock_recursive.
> 
> You are right, I missed that deassign_device() causes
> iommu*_reassign_device() call. But there lies the issue: with recursive
> locks, reassign_device() will not be able to unlock source->pdevs_lock,
> but will try to take target->pdevs_lock also. This potentially might
> lead to deadlock, if another call to reassign_device() moves some other
> pdev in the opposite way at the same time. This is why I want to avoid
> recursive spinlocks if possible.
> 
> So, I am thinking: why does IOMMU code move a pdev across domains in the
> first place? We are making IOMMU driver responsible of managing domain's
> pdevs, which does not look right, as this is the responsibility of pci.c

The boundary between what is PCI and what is IOMMU is pretty fuzzy, I'm
afraid. After all pass-through (with IOMMU) is all about PCI devices (on
x86 at least). Despite the filename being pci.c, much what's there is
actually IOMMU code. Specifically deassign_device() is the vendor-
independent IOMMU part of the operation; moving that ...

> I want to propose another approach: implement deassign_device() function
> in IOMMU drivers.

... into vendor-specific code would mean code duplication, which ought
to be avoided as much as possible.

> Then, instead of calling to reassign_device() we might
> do the following:
> 
> 1. deassign_device()
> 2. remove pdev from source->pdev_list
> 3. add pdef to target->pdev_list
> 4. assign_device()

I'm not sure such ordering would end up correct. It may need to be

1. remove pdev from source->pdev_list
2. deassign_device()
3. assign_device()
4. add pdev to target->pdev_list (or back to source upon failure)

which still may be troublesome: The present placement of the moving is, in
particular, ensuring that ownership (expressed by pdev->domain) changes at
the same time as list membership. With things properly locked it _may_ be
okay to separate (in time) these two steps, but that'll require quite a bit
of care (read: justification that it's correct/safe).

And of course you could then also ask why it's low level driver code
changing pdev->domain. I don't see how you would move that to generic code,
as the field shouldn't be left stale for an extended period of time, nor
can it sensibly be transiently set to e.g. NULL.

Additionally deassign_device() is misnamed - it's really re-assigning
the device (as you can see from it invoking
hd->platform_ops->reassign_device()). You cannot split de-assign from
assign: The device always should be assigned _somewhere_ (DOM_IO if
nothing else; see XSA-400), except late enough during hot-unplug. But that
would be taken care of by the alternative ordering above (combining 2 and
3 into a single step).

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.