Xen project Mailing List

Re: [PATCH v6 03/13] vpci: move lock outside of struct vpci

To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Fri, 4 Feb 2022 11:57:30 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5Ei6D6DYVxWsWl6EIOxM2w/fJrygwoeM3gINgQBcIx0=; b=EhwILJKM0SyCYJawGt+V/iMlyzslTaouP6YaBKlW1bKV/MS8pPwCWfs0Qx8laW5JaygI4+9NToL7SD+crfaEQZFK40JxfE88IaUD6BTDhyO5GmGytN5ERRVmwpb7aWobG3HsKt2E3R5lda5SnQNdAooh2VUkpojKDCFtWp7LyY2yOfKuidmA+rtPHvjF8BJoG0HhWTigShjUQhQhDi7ukl+moXt2Pe9OUoX38t4/Lom7TUaBv6wmFa0Gq6U80VUil+9Avrfj6iv7LnmS7opdyIsErmeOQnfLIGNNOwuWbUvdyHifrz8jdNT7KuQ1TRVJ/SGZk3rp23PREx46nehSow==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XDHC1C3VT3VITYOGbrTW7ibGcVdU/QiK5JLdgauyKeJbfpCUOBoYH31Z8BqqZagVn2ZTNj5/9lzjtj5OIJikymK9rvi1XHsYjdSb+aStTBNSgvdT2RFOZbDPZLMCalQexS1en6+ngT6UmsD3R8Ym2D3XBEpdW6XXhp0fljyGBy/wUfEcv9uXD+b2uCrzQaPawVsVdaveCu8B1Xa+Ko7g/helQ+bAHzWNmM4R4jpXYubUznD9eZZ1y/Lwd4K7FzLH0PvgXfOeMdeRS/PF1jj0cbs4v4//Ikpb6/ttsPLTsjImjCm/njpY3e141Luj5rvk2iWgIPkOnrzHA9/Pmhg9dQ==

Authentication-results: esa5.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Cc: Jan Beulich <jbeulich@xxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, "andrew.cooper3@xxxxxxxxxx" <andrew.cooper3@xxxxxxxxxx>, "george.dunlap@xxxxxxxxxx" <george.dunlap@xxxxxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 04 Feb 2022 10:58:07 +0000

Ironport-data: A9a23:v155+q5L2q0kRlWVDvRTwAxRtBrBchMFZxGqfqrLsTDasY5as4F+v mIbCj+EaamLYjOke98iPd/jp0hU7cXcmNFlTQQ5qyg9Hi5G8cbLO4+Ufxz6V8+wwmwvb67FA +E2MISowBUcFyeEzvuV3zyIQUBUjclkfJKlYAL/En03FV8MpBsJ00o5wbZj2tIw27BVPivW0 T/Mi5yHULOa82Yc3lI8s8pvfzs24ZweEBtB1rAPTagjUG32zhH5P7pGTU2FFFPqQ5E8IwKPb 72rIIdVXI/u10xF5tuNyt4Xe6CRK1LYFVDmZnF+A8BOjvXez8CbP2lS2Pc0MC9qZzu1c99Z0 f5P7LCBET4TPqzxmPgZegYbDhFXIvgTkFPHCSDXXc27ykTHdz3nwul0DVFwNoodkgp1KTgQr 7pCcmlLN03dwbLtqF64YrAEasALNs7kMZlZonh95TrYEewnUdbIRKCiCdpwgm1h25gTQai2i 8wxYj1JRj3maE1zZngYNY4YsPiZhUHkfGgNwL6SjfVuuDWCpOBr65D1OcfRUsyHQ4NShEnwj kvc42n8NTQLO9WexCSt/2qlg6nEmiaTcIgfDqGi//hmxlia3HUOCQY+XEG+5/K+jyaWS99Zb kAZ5Ccqhawz71CwCMnwWQWip3yJtQJaXMBfe8Ug4QGQzuzP4gCWBkANVDsHY9sj3OcIQjgt2 k6MjsneLzVlu72ISlqQ7r6R6zi1PEA9L2UPeCsFRgst+MT4rcc4iRenZvFnHa2uh9v5AwbZx TyQsTM+jLUei80M/6ij9FWBiDWpzrDLUwo06wP/Tm+jqARja+aNQIil6kPS6/paG7qIVVmKv HUCmM+24fgHCNeGkynlaP4WALij6vKBMTvdqV1iBZ8s83Kq4XHLQGxLyGggfgEzaJ9CIGK3J h+I0e9M2HNNFGKBb/ZbUtqIMcQr8Kj8Ef36Fe3kR8UbN/CdazS71C1pYEeR2UXkn04tjbwzN P+nTCq8MZoJIf85lWTrHo/xxZdun3ljnj2LGfgX2jz6ieL2WZKDdVsS3LJihMgd5bjMngja+ s032yCim0QGC72WjsU6HOcuwbE2wZoTWMieRy9/LLfrzu9a9IcJUaS5LVQJINQNokitvr2Ul kxRo2cBoLYFuVXJKB+RdldoY671UJB0oBoTZHJwZgbzhiZ+MN3wt8/zkqfbm5F9r4SPKtYvF 5E4lzioWKwTGlwrBRxBBXUCkGCSXEvy3l/fV8ZUSDM+Y4RhV2T0FizMJWPSGN01JnPv76MW+ uT4viuCGMZrb1kyXa7+NaP0p3vs7Cd1pQ6HdxaRSjWlUB63q9YCxu2YpqJfHvzg3j2Yl2bDi lbLXUxFzQQPyqdsmOT0aWm/h97BO8N1H1ZAHnmd6rCzNCLA+XGkz5MGW+GNFQ0xnkupkEl7T ekKnfz6LtMdm1NG79h1H7pxlPps7Nrzvb5KiA9jGSyTPVisD7phJFiA3NVO6fIRluMI51PuV xLd4MReNJWIJNjhTAwbKj06Y7nRzvoTgDTTs6g4eR2o+C9t8bObekxOJB3Q2jdFJb54Pdp9k +csscIb8SKljR8uPorUhyxY7T3UfHcBT78mptcRB4qy0lgnzVRLYJr9DC7q4c7QN4UQYxdye jLN3fjMnbVRwEbGYkEfL3mV0LoPn4kKtTBL0EQGewaDlO3ai6JlxxZW6zk2EFhYl00Vz+JpN 2F3HERpPqHSrSxwjc1OUm3wSQFMABqVph74x1cTzTCLSkCpUirGLXEnOPbL90ccqjoOcj9e9 bCe6WDkTTe1I52hgnpsARZo+675UNh81gzeg8T2Tc2KEq4zbSfhnqLzN3EDrAHqAJ9piUDKz QWwED2ctUEv2fYsnpAG

Ironport-hdrordr: A9a23:sLwFjaw5gFpO+aIxxr9jKrPxtuskLtp133Aq2lEZdPULSKKlfp GV88jziyWZtN9wYhEdcdDpAtjnfZr5z+8J3WB3B8bfYOCGghrTEGgG1+rfKlLbakjDH4JmpM Ndmu1FeaLN5DtB/LbHCWuDYq4dKbC8mcjC74qurAYOcegpUdAa0+4QMHfrLqQcfng+OXNWLu v62iIRzADQB0j/I/7LTEUtbqzmnZnmhZjmaRkJC1oO7xSPtyqh7PrfHwKD1hkTfjtTyfN6mF K13zDR1+GGibWW2xXc32jc49B/n8bg8MJKAIiphtIOIjvhpw60bMBKWqGEvhoyvOazgWxa3e XkklMFBYBe+nnRdma6rV/E3BTh6i8n7zvYxVqRkRLY0IXEbQN/L/AEqZNScxPf5UZllsp7yr h302WQsIcSJQ/cnQzmjuK4Fi1Cpw6Rmz4PgOQTh3tQXc81c7lKt7ES+0tTDdMpAD/60oY6C+ NjZfusqsq+SWnqLEwxg1MfguBFBh8Ib1K7qwk5y4OoOgFt7TBEJxBy/r1aop8CnKhNPaWsqd 60dZiAr4s+PPP+W5gNc9vpcfHHeVAlfii8RV56AW6XXJ3vaEi94KIe3t0OlZWXkdozvd0PpK g=

Ironport-sdr: T1E2g3GEJtu+whThmC8irrkPrAySNmfUzpryHD0wToOyBxG4WDP3WWxTttRg6EhHPBebucCYdr z/Qnl7RZ8accrwtIDrRAh5/TIOXY8cz1hSFH/ntmbKfMkP9q8p53r09nav6kSzQyaz+W/FWYhR pS5RnvcLH97MhPRy9/TftIT/7KYP/4cOGEPYxJ4glJ9NRT3LJVxs/MRgHmHDDn6KjF90hxTvkB HrvrXQqE4/YrQ8+mDgr7R3KcCwvEYMa1CXYjy+QI/fcFBlsyAhvqsa12F/bv/XaAyspO9zmeaQ r8Ugf4L1VX1+l3LCgpF3EEIX

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Feb 04, 2022 at 10:12:46AM +0000, Oleksandr Andrushchenko wrote: > Hi, Jan! > > On 04.02.22 11:15, Jan Beulich wrote: > > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote: > >> On 04.02.22 09:52, Jan Beulich wrote: > >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote: > >>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, > >>>> uint16_t cmd, bool rom_only) > >>>> continue; > >>>> } > >>>> > >>>> + spin_lock(&tmp->vpci_lock); > >>>> + if ( !tmp->vpci ) > >>>> + { > >>>> + spin_unlock(&tmp->vpci_lock); > >>>> + continue; > >>>> + } > >>>> for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ ) > >>>> { > >>>> const struct vpci_bar *bar = &tmp->vpci->header.bars[i]; > >>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, > >>>> uint16_t cmd, bool rom_only) > >>>> rc = rangeset_remove_range(mem, start, end); > >>>> if ( rc ) > >>>> { > >>>> + spin_unlock(&tmp->vpci_lock); > >>>> printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: > >>>> %d\n", > >>>> start, end, rc); > >>>> rangeset_destroy(mem); > >>>> return rc; > >>>> } > >>>> } > >>>> + spin_unlock(&tmp->vpci_lock); > >>>> } > >>> At the first glance this simply looks like another unjustified (in the > >>> description) change, as you're not converting anything here but you > >>> actually add locking (and I realize this was there before, so I'm sorry > >>> for not pointing this out earlier). > >> Well, I thought that the description already has "...the lock can be > >> used (and in a few cases is used right away) to check whether vpci > >> is present" and this is enough for such uses as here. > >>> But then I wonder whether you > >>> actually tested this, since I can't help getting the impression that > >>> you're introducing a live-lock: The function is called from cmd_write() > >>> and rom_write(), which in turn are called out of vpci_write(). Yet that > >>> function already holds the lock, and the lock is not (currently) > >>> recursive. (For the 3rd caller of the function - init_bars() - otoh > >>> the locking looks to be entirely unnecessary.) > >> Well, you are correct: if tmp != pdev then it is correct to acquire > >> the lock. But if tmp == pdev and rom_only == true > >> then we'll deadlock. > >> > >> It seems we need to have the locking conditional, e.g. only lock > >> if tmp != pdev > > Which will address the live-lock, but introduce ABBA deadlock potential > > between the two locks. > I am not sure I can suggest a better solution here > @Roger, @Jan, could you please help here? I think we could set the locking order based on the memory address of the locks, ie: if ( &tmp->vpci_lock < &pdev->vpci_lock ) { spin_unlock(&pdev->vpci_lock); spin_lock(&tmp->vpci_lock); spin_lock(&pdev->vpci_lock); if ( !pdev->vpci || &pdev->vpci->header != header ) /* ERROR: vpci removed or recreated. */ } else spin_lock(&tmp->vpci_lock); That however creates a window where the address of the BARs on the current device (pdev) could be changed, so the result of the mapping might be skewed. I think the guest would only have itself to blame for that, as changing the position of the BARs while toggling memory decoding is not something sensible to do. > > > >>>> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long > >>>> addr, unsigned int len, > >>>> break; > >>>> } > >>>> > >>>> + msix_put(msix); > >>>> return X86EMUL_OKAY; > >>>> } > >>>> > >>>> - spin_lock(&msix->pdev->vpci->lock); > >>>> entry = get_entry(msix, addr); > >>>> offset = addr & (PCI_MSIX_ENTRY_SIZE - 1); > >>> You're increasing the locked region quite a bit here. If this is really > >>> needed, it wants explaining. And if this is deemed acceptable as a > >>> "side effect", it wants justifying or at least stating imo. Same for > >>> msix_write() then, obviously. > >> Yes, I do increase the locking region here, but the msix variable needs > >> to be protected all the time, so it seems to be obvious that it remains > >> under the lock > > What does the msix variable have to do with the vPCI lock? If you see > > a need to grow the locked region here, then surely this is independent > > of your conversion of the lock, and hence wants to be a prereq fix > > (which may in fact want/need backporting). > First of all, the implementation of msix_get is wrong and needs to be: > > /* > * Note: if vpci_msix found, then this function returns with > * pdev->vpci_lock held. Use msix_put to unlock. > */ > static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr) > { > struct vpci_msix *msix; > > list_for_each_entry ( msix, &d->arch.hvm.msix_tables, next ) Strictly speaking you would also need to introduce a lock here to protect msix_tables. This was all designed when hot-adding (or removing) PCI devices to the domain wasn't supported. > { > const struct vpci_bar *bars; > unsigned int i; > > spin_lock(&msix->pdev->vpci_lock); > if ( !msix->pdev->vpci ) > { > spin_unlock(&msix->pdev->vpci_lock); > continue; > } > > bars = msix->pdev->vpci->header.bars; > for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ ) > if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled && > VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) ) > return msix; > > spin_unlock(&msix->pdev->vpci_lock); > } > > return NULL; > } > > Then, both msix_{read|write} can dereference msix->pdev->vpci early, > this is why Roger suggested we move to msix_{get|put} here. > And yes, we grow the locked region here and yes this might want a > prereq fix. Or just be fixed while at it. Ideally yes, we would need a separate fix that introduced msix_{get,put}, because the currently unlocked regions of msix_{read,write} do access the BAR address fields, and doing so without holding the vpci lock would be racy. I would expect that the writing/reading of the addr field is done in a single instruction, so it's unlikely to be a problem in practice. That's kind of similar to the fact that modify_bars also accesses the addr and size fields of remote BARs without taking the respective lock. Once the lock is moved outside of the vpci struct and it's used to assert that pdev->vpci is present then we do need to hold it while accessing vpci, or else the struct could be removed under our feet. Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.