Xen project Mailing List

Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure

To: Oleksandr Andrushchenko <Oleksandr_Andrushchenko@xxxxxxxx>

Date: Mon, 14 Feb 2022 14:22:02 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8zydRb0P0vZ2SfBvuiDPx/mbmRGQUZ1KeivECWm8RR8=; b=WbTr6F/auIDgNdSFlgka/kkPN8uEnAnbpn6rsXNPXrCX8A0HMZ42XUIvfuYCZJppUY/8WomYdIsounjWIAqz9F+H3itDwh5QAjsSriKo7qxmzQvOgXUfkIiyMMojimYbduHyT/BPWIFGAvTbTX+OCflZ/t2mtSbFAb1NCuXyLSyT9C6RMo4UECgfnHdGivJTiHVWVacMkqK9c6h4cKTa8EwRFeExLXZqdVvwFofzeCmg/M38hDhsrQo9DB0z9apQtLh8fKnmvnX1XgPkfZ03m4EutXM98qUGFkIO4ysh0vK30KqImVIievdpn43MF2gYHCHCI6hVgMVUInqxAPwDMQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=K6vuYvZw2IodrHbt09+4OS+lpEsRiw0FV2WgKuAuQuJuLbIv+Cj0fgl40A/XH+nyHYLC2R7DUOLI3Rng3XIwg1Dvdc3oAit7towg9wZxLMHhSSh9t4VFTxZUvFNFPqfBBvppGPcg1AH7pP02EHYR7mFlL0gDtWS0QBTPaZb4r1RUiGVtqfQ82XUGFaGm8esfDIgWFLLIrhedth7jWdRXxBZaMY62XHGkiYo4DoIGXDI/Iw4W1Z2UD2pUjxy+aYYrnlIA7hWslcBC56sy3eqOXgX279nYtQCJ3VhVEYmL0bPya75QuZyPoXo485xpZDgl6U5sfPRMLyNisq0pqc6ZBA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "julien@xxxxxxx" <julien@xxxxxxx>, "sstabellini@xxxxxxxxxx" <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <Oleksandr_Tyshchenko@xxxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Artem Mygaiev <Artem_Mygaiev@xxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Rahul Singh <rahul.singh@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Mon, 14 Feb 2022 13:22:16 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.02.2022 14:13, Oleksandr Andrushchenko wrote: > > > On 14.02.22 14:57, Jan Beulich wrote: >> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote: >>> >>> On 14.02.22 13:25, Roger Pau Monné wrote: >>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote: >>>>> On 14.02.22 13:11, Roger Pau Monné wrote: >>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote: >>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote: >>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko >>>>>>>> wrote: >>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote: >>>>>>>>>> + >>>>>>>>>>>>> for ( i = 0; i < msix->max_entries; i++ ) >>>>>>>>>>>>> { >>>>>>>>>>>>> const struct vpci_msix_entry *entry = >>>>>>>>>>>>> &msix->entries[i]; >>>>>>>>>>>> Since this function is now called with the per-domain rwlock read >>>>>>>>>>>> locked it's likely not appropriate to call process_pending_softirqs >>>>>>>>>>>> while holding such lock (check below). >>>>>>>>>>> You are right, as it is possible that: >>>>>>>>>>> >>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>>>>>>>> >>>>>>>>>>> Even more, vpci_process_pending may also >>>>>>>>>>> >>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock >>>>>>>>>>> >>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs >>>>>>>>>>> must not hold d->vpci_rwlock at least. >>>>>>>>>>> >>>>>>>>>>> And also we need to check that pdev->vpci was not removed >>>>>>>>>>> in between or *re-created* >>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned >>>>>>>>>>>> to >>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same >>>>>>>>>>>> domain. >>>>>>>>>>> So, do you mean a pattern like the below should be used at all >>>>>>>>>>> places where we need to call process_pending_softirqs? >>>>>>>>>>> >>>>>>>>>>> read_unlock >>>>>>>>>>> process_pending_softirqs >>>>>>>>>>> read_lock >>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); >>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) >>>>>>>>>>> <continue processing> >>>>>>>>>> Something along those lines. You likely need to continue iterate >>>>>>>>>> using >>>>>>>>>> for_each_pdev. >>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought >>>>>>>>> this question before [1] and I was about to use some ID for that >>>>>>>>> purpose: >>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for >>>>>>>>> checks >>>>>>>> Given this is a debug message I would be OK with just doing the >>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists) >>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise >>>>>>>> just print a message and move on to the next device. >>>>>>> Agree, I see no big issue (probably) if we are not able to print >>>>>>> >>>>>>> How about this one: >>>>>>> >>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>>>>> index 809a6b4773e1..50373f04da82 100644 >>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, >>>>>>> const struct pci_dev *pdev, >>>>>>> struct rangeset *mem, uint16_t cmd) >>>>>>> { >>>>>>> struct map_data data = { .d = d, .map = true }; >>>>>>> + pci_sbdf_t sbdf = pdev->sbdf; >>>>>>> int rc; >>>>>>> >>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock)); >>>>>>> + >>>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) >>>>>>> == -ERESTART ) >>>>>>> + { >>>>>>> + >>>>>>> + /* >>>>>>> + * process_pending_softirqs may trigger vpci_process_pending >>>>>>> which >>>>>>> + * may need to acquire pdev->domain->vpci_rwlock in read mode. >>>>>>> + */ >>>>>>> + write_unlock(&pdev->domain->vpci_rwlock); >>>>>>> process_pending_softirqs(); >>>>>>> + write_lock(&pdev->domain->vpci_rwlock); >>>>>>> + >>>>>>> + /* Check if pdev still exists and vPCI was not removed or >>>>>>> re-created. */ >>>>>>> + if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn) >>>>>>> != pdev) >>>>>>> + if ( vpci is NOT the same ) >>>>>>> + { >>>>>>> + rc = 0; >>>>>>> + break; >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> rangeset_destroy(mem); >>>>>>> if ( !rc ) >>>>>>> modify_decoding(pdev, cmd, false); >>>>>>> >>>>>>> This one also wants process_pending_softirqs to run so it *might* >>>>>>> want pdev and vpci checks. But at the same time apply_map runs >>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be >>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms >>>>>>> it has something to do yet). So, I think we just need: >>>>>>> >>>>>>> write_unlock(&pdev->domain->vpci_rwlock); >>>>>>> process_pending_softirqs(); >>>>>>> write_lock(&pdev->domain->vpci_rwlock); >>>>>>> >>>>>>> and this should be enough >>>>>> Given the context apply_map is called from (dom0 specific init code), >>>>>> there's no need to check for the pdev to still exits, or whether vpci >>>>>> has been recreated, as it's not possible. Just add a comment to >>>>>> explicitly note that the context of the function is special, and thus >>>>>> there's no possibility of either the device or vpci going away. >>>>> Does it really need write_unlock/write_lock given the context?... >>>> I think it's bad practice to call process_pending_softirqs while >>>> holding any locks. This is a very specific context so it's likely fine >>>> to not drop the lock, but would still seem incorrect to me. >>> Ok >>>>> I think it doesn't as there is no chance defer_map is called, thus >>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>> Indeed, there's no chance of that because process_pending_softirqs >>>> will never try to do a scheduling operation that would result in our >>>> context being scheduled out. >>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == >>> -ERESTART ) >>> { >>> /* >>> * FIXME: Given the context apply_map is called from (dom0 >>> specific >>> * init code at system_state < SYS_STATE_active) it is not >>> strictly >>> * required that pdev->domain->vpci_rwlock is unlocked before >>> calling >>> * process_pending_softirqs as there is no contention possible >>> between >>> * this code and vpci_process_pending trying to acquire the lock >>> in >>> * read mode. But running process_pending_softirqs with any lock >>> held >>> * doesn't seem to be a good practice, so drop the lock and >>> re-acquire >>> * it right again. >>> */ >>> write_unlock(&pdev->domain->vpci_rwlock); >>> process_pending_softirqs(); >>> write_lock(&pdev->domain->vpci_rwlock); >>> } >> I'm afraid that's misleading at best. apply_map() is merely a specific >> example where you know the lock is going to be taken. But really any >> softirq handler could be acquiring any lock, so requesting to process >> softirqs cannot ever be done with any lock held. >> >> What you instead want to explain is why, after re-acquiring the lock, >> no further checking is needed for potentially changed state. > How about: > > /* > * FIXME: Given the context apply_map is called from (dom0 specific > * init code at system_state < SYS_STATE_active) there is no contention > * possible between this code and vpci_process_pending trying to acquire > * the lock in read mode and destroy pdev->vpci in its error path. > * Neither pdev may be disposed yet, so it is not required to check if the > * relevant pdev still exists after re-acquiring the lock. > */ I'm not sure I follow the first sentence; I guess a comma or two may help, and or using "as well as" in place of one of the two "and". I also don't think you mean contention, but rather a race between the named entities? Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.