[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On 14.02.2022 14:27, Oleksandr Andrushchenko wrote: > > > On 14.02.22 15:22, Jan Beulich wrote: >> On 14.02.2022 14:13, Oleksandr Andrushchenko wrote: >>> >>> On 14.02.22 14:57, Jan Beulich wrote: >>>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote: >>>>> On 14.02.22 13:25, Roger Pau Monné wrote: >>>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote: >>>>>>> On 14.02.22 13:11, Roger Pau Monné wrote: >>>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko >>>>>>>> wrote: >>>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote: >>>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko >>>>>>>>>> wrote: >>>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote: >>>>>>>>>>>> + >>>>>>>>>>>>>>> for ( i = 0; i < msix->max_entries; i++ ) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> const struct vpci_msix_entry *entry = >>>>>>>>>>>>>>> &msix->entries[i]; >>>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read >>>>>>>>>>>>>> locked it's likely not appropriate to call >>>>>>>>>>>>>> process_pending_softirqs >>>>>>>>>>>>>> while holding such lock (check below). >>>>>>>>>>>>> You are right, as it is possible that: >>>>>>>>>>>>> >>>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>>>>>>>>>> >>>>>>>>>>>>> Even more, vpci_process_pending may also >>>>>>>>>>>>> >>>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock >>>>>>>>>>>>> >>>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs >>>>>>>>>>>>> must not hold d->vpci_rwlock at least. >>>>>>>>>>>>> >>>>>>>>>>>>> And also we need to check that pdev->vpci was not removed >>>>>>>>>>>>> in between or *re-created* >>>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs >>>>>>>>>>>>>> assigned to >>>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same >>>>>>>>>>>>>> domain. >>>>>>>>>>>>> So, do you mean a pattern like the below should be used at all >>>>>>>>>>>>> places where we need to call process_pending_softirqs? >>>>>>>>>>>>> >>>>>>>>>>>>> read_unlock >>>>>>>>>>>>> process_pending_softirqs >>>>>>>>>>>>> read_lock >>>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn); >>>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) ) >>>>>>>>>>>>> <continue processing> >>>>>>>>>>>> Something along those lines. You likely need to continue iterate >>>>>>>>>>>> using >>>>>>>>>>>> for_each_pdev. >>>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought >>>>>>>>>>> this question before [1] and I was about to use some ID for that >>>>>>>>>>> purpose: >>>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for >>>>>>>>>>> checks >>>>>>>>>> Given this is a debug message I would be OK with just doing the >>>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists) >>>>>>>>>> and that the resume MSI entry is not past the current limit. >>>>>>>>>> Otherwise >>>>>>>>>> just print a message and move on to the next device. >>>>>>>>> Agree, I see no big issue (probably) if we are not able to print >>>>>>>>> >>>>>>>>> How about this one: >>>>>>>>> >>>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >>>>>>>>> index 809a6b4773e1..50373f04da82 100644 >>>>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d, >>>>>>>>> const struct pci_dev *pdev, >>>>>>>>> struct rangeset *mem, uint16_t cmd) >>>>>>>>> { >>>>>>>>> struct map_data data = { .d = d, .map = true }; >>>>>>>>> + pci_sbdf_t sbdf = pdev->sbdf; >>>>>>>>> int rc; >>>>>>>>> >>>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock)); >>>>>>>>> + >>>>>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, >>>>>>>>> &data)) == -ERESTART ) >>>>>>>>> + { >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> + * process_pending_softirqs may trigger vpci_process_pending >>>>>>>>> which >>>>>>>>> + * may need to acquire pdev->domain->vpci_rwlock in read >>>>>>>>> mode. >>>>>>>>> + */ >>>>>>>>> + write_unlock(&pdev->domain->vpci_rwlock); >>>>>>>>> process_pending_softirqs(); >>>>>>>>> + write_lock(&pdev->domain->vpci_rwlock); >>>>>>>>> + >>>>>>>>> + /* Check if pdev still exists and vPCI was not removed or >>>>>>>>> re-created. */ >>>>>>>>> + if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, >>>>>>>>> sbdf.devfn) != pdev) >>>>>>>>> + if ( vpci is NOT the same ) >>>>>>>>> + { >>>>>>>>> + rc = 0; >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> rangeset_destroy(mem); >>>>>>>>> if ( !rc ) >>>>>>>>> modify_decoding(pdev, cmd, false); >>>>>>>>> >>>>>>>>> This one also wants process_pending_softirqs to run so it *might* >>>>>>>>> want pdev and vpci checks. But at the same time apply_map runs >>>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be >>>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms >>>>>>>>> it has something to do yet). So, I think we just need: >>>>>>>>> >>>>>>>>> write_unlock(&pdev->domain->vpci_rwlock); >>>>>>>>> process_pending_softirqs(); >>>>>>>>> write_lock(&pdev->domain->vpci_rwlock); >>>>>>>>> >>>>>>>>> and this should be enough >>>>>>>> Given the context apply_map is called from (dom0 specific init code), >>>>>>>> there's no need to check for the pdev to still exits, or whether vpci >>>>>>>> has been recreated, as it's not possible. Just add a comment to >>>>>>>> explicitly note that the context of the function is special, and thus >>>>>>>> there's no possibility of either the device or vpci going away. >>>>>>> Does it really need write_unlock/write_lock given the context?... >>>>>> I think it's bad practice to call process_pending_softirqs while >>>>>> holding any locks. This is a very specific context so it's likely fine >>>>>> to not drop the lock, but would still seem incorrect to me. >>>>> Ok >>>>>>> I think it doesn't as there is no chance defer_map is called, thus >>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock >>>>>> Indeed, there's no chance of that because process_pending_softirqs >>>>>> will never try to do a scheduling operation that would result in our >>>>>> context being scheduled out. >>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) == >>>>> -ERESTART ) >>>>> { >>>>> /* >>>>> * FIXME: Given the context apply_map is called from (dom0 >>>>> specific >>>>> * init code at system_state < SYS_STATE_active) it is not >>>>> strictly >>>>> * required that pdev->domain->vpci_rwlock is unlocked before >>>>> calling >>>>> * process_pending_softirqs as there is no contention possible >>>>> between >>>>> * this code and vpci_process_pending trying to acquire the >>>>> lock in >>>>> * read mode. But running process_pending_softirqs with any >>>>> lock held >>>>> * doesn't seem to be a good practice, so drop the lock and >>>>> re-acquire >>>>> * it right again. >>>>> */ >>>>> write_unlock(&pdev->domain->vpci_rwlock); >>>>> process_pending_softirqs(); >>>>> write_lock(&pdev->domain->vpci_rwlock); >>>>> } >>>> I'm afraid that's misleading at best. apply_map() is merely a specific >>>> example where you know the lock is going to be taken. But really any >>>> softirq handler could be acquiring any lock, so requesting to process >>>> softirqs cannot ever be done with any lock held. >>>> >>>> What you instead want to explain is why, after re-acquiring the lock, >>>> no further checking is needed for potentially changed state. >>> How about: >>> >>> /* >>> * FIXME: Given the context apply_map is called from (dom0 specific >>> * init code at system_state < SYS_STATE_active) there is no contention >>> * possible between this code and vpci_process_pending trying to acquire >>> * the lock in read mode and destroy pdev->vpci in its error path. >>> * Neither pdev may be disposed yet, so it is not required to check if the >>> * relevant pdev still exists after re-acquiring the lock. >>> */ >> I'm not sure I follow the first sentence; I guess a comma or two may help, >> and or using "as well as" in place of one of the two "and". I also don't >> think you mean contention, but rather a race between the named entities? > /* > * FIXME: Given the context from which apply_map is called (dom0 specific > * init code at system_state < SYS_STATE_active) there is no race condition > * possible between this code and vpci_process_pending which may try to > acquire > * the lock in read mode and also try to destroy pdev->vpci in its error > path. > * Neither pdev may be disposed yet, so it is not required to check if the > * relevant pdev still exists after re-acquiring the lock. > */ I'm still struggling with the language, sorry. You look to only have replaced "contention"? Reading it again I'd also like to mention that to me (not a native speaker) "Neither pdev may be ..." expresses "None of the pdev-s may be ...", when I think you mean "Nor may pdev be ..." Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |