|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
On 14.02.2022 14:13, Oleksandr Andrushchenko wrote:
>
>
> On 14.02.22 14:57, Jan Beulich wrote:
>> On 14.02.2022 12:37, Oleksandr Andrushchenko wrote:
>>>
>>> On 14.02.22 13:25, Roger Pau Monné wrote:
>>>> On Mon, Feb 14, 2022 at 11:15:27AM +0000, Oleksandr Andrushchenko wrote:
>>>>> On 14.02.22 13:11, Roger Pau Monné wrote:
>>>>>> On Mon, Feb 14, 2022 at 10:53:43AM +0000, Oleksandr Andrushchenko wrote:
>>>>>>> On 14.02.22 12:34, Roger Pau Monné wrote:
>>>>>>>> On Mon, Feb 14, 2022 at 09:36:39AM +0000, Oleksandr Andrushchenko
>>>>>>>> wrote:
>>>>>>>>> On 11.02.22 13:40, Roger Pau Monné wrote:
>>>>>>>>>> +
>>>>>>>>>>>>> for ( i = 0; i < msix->max_entries; i++ )
>>>>>>>>>>>>> {
>>>>>>>>>>>>> const struct vpci_msix_entry *entry =
>>>>>>>>>>>>> &msix->entries[i];
>>>>>>>>>>>> Since this function is now called with the per-domain rwlock read
>>>>>>>>>>>> locked it's likely not appropriate to call process_pending_softirqs
>>>>>>>>>>>> while holding such lock (check below).
>>>>>>>>>>> You are right, as it is possible that:
>>>>>>>>>>>
>>>>>>>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>>>>>>>>>
>>>>>>>>>>> Even more, vpci_process_pending may also
>>>>>>>>>>>
>>>>>>>>>>> read_unlock -> vpci_remove_device -> write_lock
>>>>>>>>>>>
>>>>>>>>>>> in its error path. So, any invocation of process_pending_softirqs
>>>>>>>>>>> must not hold d->vpci_rwlock at least.
>>>>>>>>>>>
>>>>>>>>>>> And also we need to check that pdev->vpci was not removed
>>>>>>>>>>> in between or *re-created*
>>>>>>>>>>>> We will likely need to re-iterate over the list of pdevs assigned
>>>>>>>>>>>> to
>>>>>>>>>>>> the domain and assert that the pdev is still assigned to the same
>>>>>>>>>>>> domain.
>>>>>>>>>>> So, do you mean a pattern like the below should be used at all
>>>>>>>>>>> places where we need to call process_pending_softirqs?
>>>>>>>>>>>
>>>>>>>>>>> read_unlock
>>>>>>>>>>> process_pending_softirqs
>>>>>>>>>>> read_lock
>>>>>>>>>>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>>>>>>>>>>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>>>>>>>>>>> <continue processing>
>>>>>>>>>> Something along those lines. You likely need to continue iterate
>>>>>>>>>> using
>>>>>>>>>> for_each_pdev.
>>>>>>>>> How do we tell if pdev->vpci is the same? Jan has already brought
>>>>>>>>> this question before [1] and I was about to use some ID for that
>>>>>>>>> purpose:
>>>>>>>>> pdev->vpci->id = d->vpci_id++ and then we use pdev->vpci->id for
>>>>>>>>> checks
>>>>>>>> Given this is a debug message I would be OK with just doing the
>>>>>>>> minimal checks to prevent Xen from crashing (ie: pdev->vpci exists)
>>>>>>>> and that the resume MSI entry is not past the current limit. Otherwise
>>>>>>>> just print a message and move on to the next device.
>>>>>>> Agree, I see no big issue (probably) if we are not able to print
>>>>>>>
>>>>>>> How about this one:
>>>>>>>
>>>>>>> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
>>>>>>> index 809a6b4773e1..50373f04da82 100644
>>>>>>> --- a/xen/drivers/vpci/header.c
>>>>>>> +++ b/xen/drivers/vpci/header.c
>>>>>>> @@ -171,10 +171,31 @@ static int __init apply_map(struct domain *d,
>>>>>>> const struct pci_dev *pdev,
>>>>>>> struct rangeset *mem, uint16_t cmd)
>>>>>>> {
>>>>>>> struct map_data data = { .d = d, .map = true };
>>>>>>> + pci_sbdf_t sbdf = pdev->sbdf;
>>>>>>> int rc;
>>>>>>>
>>>>>>> + ASSERT(rw_is_write_locked(&pdev->domain->vpci_rwlock));
>>>>>>> +
>>>>>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data))
>>>>>>> == -ERESTART )
>>>>>>> + {
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * process_pending_softirqs may trigger vpci_process_pending
>>>>>>> which
>>>>>>> + * may need to acquire pdev->domain->vpci_rwlock in read mode.
>>>>>>> + */
>>>>>>> + write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>> process_pending_softirqs();
>>>>>>> + write_lock(&pdev->domain->vpci_rwlock);
>>>>>>> +
>>>>>>> + /* Check if pdev still exists and vPCI was not removed or
>>>>>>> re-created. */
>>>>>>> + if (pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn)
>>>>>>> != pdev)
>>>>>>> + if ( vpci is NOT the same )
>>>>>>> + {
>>>>>>> + rc = 0;
>>>>>>> + break;
>>>>>>> + }
>>>>>>> + }
>>>>>>> +
>>>>>>> rangeset_destroy(mem);
>>>>>>> if ( !rc )
>>>>>>> modify_decoding(pdev, cmd, false);
>>>>>>>
>>>>>>> This one also wants process_pending_softirqs to run so it *might*
>>>>>>> want pdev and vpci checks. But at the same time apply_map runs
>>>>>>> at ( system_state < SYS_STATE_active ), so defer_map won't be
>>>>>>> running yet, thus no vpci_process_pending is possible yet (in terms
>>>>>>> it has something to do yet). So, I think we just need:
>>>>>>>
>>>>>>> write_unlock(&pdev->domain->vpci_rwlock);
>>>>>>> process_pending_softirqs();
>>>>>>> write_lock(&pdev->domain->vpci_rwlock);
>>>>>>>
>>>>>>> and this should be enough
>>>>>> Given the context apply_map is called from (dom0 specific init code),
>>>>>> there's no need to check for the pdev to still exits, or whether vpci
>>>>>> has been recreated, as it's not possible. Just add a comment to
>>>>>> explicitly note that the context of the function is special, and thus
>>>>>> there's no possibility of either the device or vpci going away.
>>>>> Does it really need write_unlock/write_lock given the context?...
>>>> I think it's bad practice to call process_pending_softirqs while
>>>> holding any locks. This is a very specific context so it's likely fine
>>>> to not drop the lock, but would still seem incorrect to me.
>>> Ok
>>>>> I think it doesn't as there is no chance defer_map is called, thus
>>>>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>>> Indeed, there's no chance of that because process_pending_softirqs
>>>> will never try to do a scheduling operation that would result in our
>>>> context being scheduled out.
>>> while ( (rc = rangeset_consume_ranges(mem, map_range, &data)) ==
>>> -ERESTART )
>>> {
>>> /*
>>> * FIXME: Given the context apply_map is called from (dom0
>>> specific
>>> * init code at system_state < SYS_STATE_active) it is not
>>> strictly
>>> * required that pdev->domain->vpci_rwlock is unlocked before
>>> calling
>>> * process_pending_softirqs as there is no contention possible
>>> between
>>> * this code and vpci_process_pending trying to acquire the lock
>>> in
>>> * read mode. But running process_pending_softirqs with any lock
>>> held
>>> * doesn't seem to be a good practice, so drop the lock and
>>> re-acquire
>>> * it right again.
>>> */
>>> write_unlock(&pdev->domain->vpci_rwlock);
>>> process_pending_softirqs();
>>> write_lock(&pdev->domain->vpci_rwlock);
>>> }
>> I'm afraid that's misleading at best. apply_map() is merely a specific
>> example where you know the lock is going to be taken. But really any
>> softirq handler could be acquiring any lock, so requesting to process
>> softirqs cannot ever be done with any lock held.
>>
>> What you instead want to explain is why, after re-acquiring the lock,
>> no further checking is needed for potentially changed state.
> How about:
>
> /*
> * FIXME: Given the context apply_map is called from (dom0 specific
> * init code at system_state < SYS_STATE_active) there is no contention
> * possible between this code and vpci_process_pending trying to acquire
> * the lock in read mode and destroy pdev->vpci in its error path.
> * Neither pdev may be disposed yet, so it is not required to check if the
> * relevant pdev still exists after re-acquiring the lock.
> */
I'm not sure I follow the first sentence; I guess a comma or two may help,
and or using "as well as" in place of one of the two "and". I also don't
think you mean contention, but rather a race between the named entities?
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |