Xen project Mailing List

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

From: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Fri, 4 Feb 2022 14:43:07 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Jr/JeYFNvwkOH3TaAOWqQi8B6Ku515wim4Y54Gn8UFg=; b=aQ8A+QHSgsEbOiIfHiVXtafAOW41gXq7W7bteFsnjbCRQuRbsSNHUsAl3Wn027C+m//3LR9AOxAC38w6P365cMUOj4JFy8IaCUzngnsyxt3LKgfu/eqpEemRS9UZ/+S7Mcun6wKKsghGJNstsWCKaBTiuvEspxRi+esPprsr4pE/NgtGTtxlednK1RPWY7/elYL/V19GjRokNABOG6wrq7f4vBV56/hR7Qn7gWjKQZrBCVFG+MhE83NoXi9ovK3B/FlUh15bpiER1Ao1v5I4Sj4zQUksYDdsQPi6mLy308mirMRqIrSOBKv04C5aVbMXyZqcvjom1smbRY1aY98JpA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=awmKGqtwzJMxJpU4H0/u6isPskAuqT/h/a+uzc1nSyp4qJFlBACV31YfBK9peWz6MROnaK4mVeIPCf4RyM+8OppOZFDQULBA02Ey/SzrShM7rZARCA3HCki+PfjOYcrbqI7NsRWEUc5OXfECHjudO0fWz50Ghf+21U8JRLZF1YytMKUt1b6OrXG7/lZ4GbNfgd1FRAzjTpfqsBg9IcpaTp2aR4KYxNXHXl10nSvPaO2kfegn3anq7lH3JjWooPe5e2kBjF6ewaabHfd4qnjir93eTtB9SmK0Y8oOtqTXcUKdosggFvEqnzDq/R8uL7QCC8efDj/9BpdeUEYuZdpmxw==

Cc: "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Delivery-date: Fri, 04 Feb 2022 14:43:26 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHYGZFc/MnzQOjwVEeBBUHLSW0md6yDBUkAgAASSACAAATYAIAAD/WAgAAKNgCAAAbfgIAABnuAgAAQvgCAAAMCAIAAAY4AgAADxICAABrnAA==

Thread-topic: [PATCH v6 03/13] vpci: move lock outside of struct vpci

On 04.02.22 15:06, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 12:53:20PM +0000, Oleksandr Andrushchenko wrote: >> >> On 04.02.22 14:47, Jan Beulich wrote: >>> On 04.02.2022 13:37, Oleksandr Andrushchenko wrote: >>>> On 04.02.22 13:37, Jan Beulich wrote: >>>>> On 04.02.2022 12:13, Roger Pau Monné wrote: >>>>>> On Fri, Feb 04, 2022 at 11:49:18AM +0100, Jan Beulich wrote: >>>>>>> On 04.02.2022 11:12, Oleksandr Andrushchenko wrote: >>>>>>>> On 04.02.22 11:15, Jan Beulich wrote: >>>>>>>>> On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: >>>>>>>>>> On 04.02.22 09:52, Jan Beulich wrote: >>>>>>>>>>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: >>>>>>>>>>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev >>>>>>>>>>>> *pdev, uint16_t cmd, bool rom_only) >>>>>>>>>>>> continue; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> + spin_lock(&tmp->vpci_lock); >>>>>>>>>>>> + if ( !tmp->vpci ) >>>>>>>>>>>> + { >>>>>>>>>>>> + spin_unlock(&tmp->vpci_lock); >>>>>>>>>>>> + continue; >>>>>>>>>>>> + } >>>>>>>>>>>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); >>>>>>>>>>>> i++ ) >>>>>>>>>>>> { >>>>>>>>>>>> const struct vpci_bar *bar = >>>>>>>>>>>> &tmp->vpci->header.bars[i]; >>>>>>>>>>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev >>>>>>>>>>>> *pdev, uint16_t cmd, bool rom_only) >>>>>>>>>>>> rc = rangeset_remove_range(mem, start, end); >>>>>>>>>>>> if ( rc ) >>>>>>>>>>>> { >>>>>>>>>>>> + spin_unlock(&tmp->vpci_lock); >>>>>>>>>>>> printk(XENLOG_G_WARNING "Failed to remove >>>>>>>>>>>> [%lx, %lx]: %d\n", >>>>>>>>>>>> start, end, rc); >>>>>>>>>>>> rangeset_destroy(mem); >>>>>>>>>>>> return rc; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> + spin_unlock(&tmp->vpci_lock); >>>>>>>>>>>> } >>>>>>>>>>> At the first glance this simply looks like another unjustified (in >>>>>>>>>>> the >>>>>>>>>>> description) change, as you're not converting anything here but you >>>>>>>>>>> actually add locking (and I realize this was there before, so I'm >>>>>>>>>>> sorry >>>>>>>>>>> for not pointing this out earlier). >>>>>>>>>> Well, I thought that the description already has "...the lock can be >>>>>>>>>> used (and in a few cases is used right away) to check whether vpci >>>>>>>>>> is present" and this is enough for such uses as here. >>>>>>>>>>> But then I wonder whether you >>>>>>>>>>> actually tested this, since I can't help getting the impression that >>>>>>>>>>> you're introducing a live-lock: The function is called from >>>>>>>>>>> cmd_write() >>>>>>>>>>> and rom_write(), which in turn are called out of vpci_write(). Yet >>>>>>>>>>> that >>>>>>>>>>> function already holds the lock, and the lock is not (currently) >>>>>>>>>>> recursive. (For the 3rd caller of the function - init_bars() - otoh >>>>>>>>>>> the locking looks to be entirely unnecessary.) >>>>>>>>>> Well, you are correct: if tmp != pdev then it is correct to acquire >>>>>>>>>> the lock. But if tmp == pdev and rom_only == true >>>>>>>>>> then we'll deadlock. >>>>>>>>>> >>>>>>>>>> It seems we need to have the locking conditional, e.g. only lock >>>>>>>>>> if tmp != pdev >>>>>>>>> Which will address the live-lock, but introduce ABBA deadlock >>>>>>>>> potential >>>>>>>>> between the two locks. >>>>>>>> I am not sure I can suggest a better solution here >>>>>>>> @Roger, @Jan, could you please help here? >>>>>>> Well, first of all I'd like to mention that while it may have been okay >>>>>>> to >>>>>>> not hold pcidevs_lock here for Dom0, it surely needs acquiring when >>>>>>> dealing >>>>>>> with DomU-s' lists of PCI devices. The requirement really applies to the >>>>>>> other use of for_each_pdev() as well (in vpci_dump_msi()), except that >>>>>>> there it probably wants to be a try-lock. >>>>>>> >>>>>>> Next I'd like to point out that here we have the still pending issue of >>>>>>> how to deal with hidden devices, which Dom0 can access. See my RFC patch >>>>>>> "vPCI: account for hidden devices in modify_bars()". Whatever the >>>>>>> solution >>>>>>> here, I think it wants to at least account for the extra need there. >>>>>> Yes, sorry, I should take care of that. >>>>>> >>>>>>> Now it is quite clear that pcidevs_lock isn't going to help with >>>>>>> avoiding >>>>>>> the deadlock, as it's imo not an option at all to acquire that lock >>>>>>> everywhere else you access ->vpci (or else the vpci lock itself would be >>>>>>> pointless). But a per-domain auxiliary r/w lock may help: Other paths >>>>>>> would acquire it in read mode, and here you'd acquire it in write mode >>>>>>> (in >>>>>>> the former case around the vpci lock, while in the latter case there may >>>>>>> then not be any need to acquire the individual vpci locks at all). >>>>>>> FTAOD: >>>>>>> I haven't fully thought through all implications (and hence whether >>>>>>> this is >>>>>>> viable in the first place); I expect you will, documenting what you've >>>>>>> found in the resulting patch description. Of course the double lock >>>>>>> acquire/release would then likely want hiding in helper functions. >>>>>> I've been also thinking about this, and whether it's really worth to >>>>>> have a per-device lock rather than a per-domain one that protects all >>>>>> vpci regions of the devices assigned to the domain. >>>>>> >>>>>> The OS is likely to serialize accesses to the PCI config space anyway, >>>>>> and the only place I could see a benefit of having per-device locks is >>>>>> in the handling of MSI-X tables, as the handling of the mask bit is >>>>>> likely very performance sensitive, so adding a per-domain lock there >>>>>> could be a bottleneck. >>>>> Hmm, with method 1 accesses serializing globally is basically >>>>> unavoidable, but with MMCFG I see no reason why OSes may not (move >>>>> to) permit(ting) parallel accesses, with serialization perhaps done >>>>> only at device level. See our own pci_config_lock, which applies to >>>>> only method 1 accesses; we don't look to be serializing MMCFG >>>>> accesses at all. >>>>> >>>>>> We could alternatively do a per-domain rwlock for vpci and special case >>>>>> the MSI-X area to also have a per-device specific lock. At which point >>>>>> it becomes fairly similar to what you propose. >>>> @Jan, @Roger >>>> >>>> 1. d->vpci_lock - rwlock <- this protects vpci >>>> 2. pdev->vpci->msix_tbl_lock - rwlock <- this protects MSI-X tables >>>> or should it better be pdev->msix_tbl_lock as MSI-X tables don't >>>> really depend on vPCI? >>> If so, perhaps indeed better the latter. But as said in reply to Roger, >>> I'm not convinced (yet) that doing away with the per-device lock is a >>> good move. As said there - we're ourselves doing fully parallel MMCFG >>> accesses, so OSes ought to be fine to do so, too. >> But with pdev->vpci_lock we face ABBA... > I think it would be easier to start with a per-domain rwlock that > guarantees pdev->vpci cannot be removed under our feet. This would be > taken in read mode in vpci_{read,write} and in write mode when > removing a device from a domain. > > Then there are also other issues regarding vPCI locking that need to > be fixed, but that lock would likely be a start. Or let's see the problem at a different angle: this is the only place which breaks the use of pdev->vpci_lock. Because all other places do not try to acquire the lock of any two devices at a time. So, what if we re-work the offending piece of code instead? That way we do not break parallel access and have the lock per-device which might also be a plus. By re-work I mean, that instead of reading already mapped regions from tmp we can employ a d->pci_mapped_regions range set which will hold all the already mapped ranges. And when it is needed to access that range set we use pcidevs_lock which seems to be rare. So, modify_bars will rely on pdev->vpci_lock + pcidevs_lock and ABBA won't be possible at all. > > Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.