 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] vpci: introduce per-domain lock to protect vpci structure
 
On 11.02.22 13:40, Roger Pau Monné wrote:
> On Fri, Feb 11, 2022 at 07:27:39AM +0000, Oleksandr Andrushchenko wrote:
>> Hi, Roger!
>>
>> On 10.02.22 18:16, Roger Pau Monné wrote:
>>> On Wed, Feb 09, 2022 at 03:36:27PM +0200, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>
>>>> Introduce a per-domain read/write lock to check whether vpci is present,
>>>> so we are sure there are no accesses to the contents of the vpci struct
>>>> if not. This lock can be used (and in a few cases is used right away)
>>>> so that vpci removal can be performed while holding the lock in write
>>>> mode. Previously such removal could race with vpci_read for example.
>>> Sadly there's still a race in the usage of pci_get_pdev_by_domain wrt
>>> pci_remove_device, and likely when vPCI gets also used in
>>> {de}assign_device I think.
>> Yes, this is indeed an issue, but I was not trying to solve it in
>> context of vPCI locking yet. I think we should discuss how do
>> we approach pdev locking, so I can create a patch for that.
>> that being said, I would like not to solve pdev in  this patch yet
>>
>> ...I do understand we do want to avoid that, but at the moment
>> a single reliable way for making sure pdev is alive seems to
>> be pcidevs_lock....
> I think we will need to make pcidevs_lock a rwlock and take it in read
> mode for pci_get_pdev_by_domain.
>
> We didn't have this scenario before where PCI emulation is done in the
> hypervisor, and hence the locking around those data structures has not
> been designed for those use-cases.
Yes, I do understand that.
I hope pcidevs lock move to rwlock can be done as a separate
patch. While this is not done, do you think we can proceed with
vPCI series and pcidevs locking re-work being done after?
>
>>>> 1. Per-domain's vpci_rwlock is used to protect pdev->vpci structure
>>>> from being removed.
>>>>
>>>> 2. Writing the command register and ROM BAR register may trigger
>>>> modify_bars to run, which in turn may access multiple pdevs while
>>>> checking for the existing BAR's overlap. The overlapping check, if done
>>>> under the read lock, requires vpci->lock to be acquired on both devices
>>>> being compared, which may produce a deadlock. It is not possible to
>>>> upgrade read lock to write lock in such a case. So, in order to prevent
>>>> the deadlock, check which registers are going to be written and acquire
>>>> the lock in the appropriate mode from the beginning.
>>>>
>>>> All other code, which doesn't lead to pdev->vpci destruction and does not
>>>> access multiple pdevs at the same time, can still use a combination of the
>>>> read lock and pdev->vpci->lock.
>>>>
>>>> 3. Optimize if ROM BAR write lock required detection by caching offset
>>>> of the ROM BAR register in vpci->header->rom_reg which depends on
>>>> header's type.
>>>>
>>>> 4. Reduce locked region in vpci_remove_device as it is now possible
>>>> to set pdev->vpci to NULL early right after the write lock is acquired.
>>>>
>>>> 5. Reduce locked region in vpci_add_handlers as it is possible to
>>>> initialize many more fields of the struct vpci before assigning it to
>>>> pdev->vpci.
>>>>
>>>> 6. vpci_{add|remove}_register are required to be called with the write lock
>>>> held, but it is not feasible to add an assert there as it requires
>>>> struct domain to be passed for that. So, add a comment about this 
>>>> requirement
>>>> to these and other functions with the equivalent constraints.
>>>>
>>>> 7. Drop const qualifier where the new rwlock is used and this is 
>>>> appropriate.
>>>>
>>>> 8. This is based on the discussion at [1].
>>>>
>>>> [1] 
>>>> https://urldefense.com/v3/__https://lore.kernel.org/all/20220204063459.680961-4-andr2000@xxxxxxxxx/__;!!GF_29dbcQIUBPA!gObSySzN7s6zSKrcpSEi6vw18fRPls157cuRoqq4KDd7Ic_Nvh_cFlyVXPRpEWBkI38pgsvvfg$
>>>>  [lore[.]kernel[.]org]
>>>>
>>>> Suggested-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
>>>> Suggested-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>
>>>>
>>>> ---
>>>> This was checked on x86: with and without PVH Dom0.
>>>> ---
>>>>    xen/arch/x86/hvm/vmsi.c   |   2 +
>>>>    xen/common/domain.c       |   3 +
>>>>    xen/drivers/vpci/header.c |   8 +++
>>>>    xen/drivers/vpci/msi.c    |   8 ++-
>>>>    xen/drivers/vpci/msix.c   |  40 +++++++++++--
>>>>    xen/drivers/vpci/vpci.c   | 114 ++++++++++++++++++++++++++++----------
>>>>    xen/include/xen/sched.h   |   3 +
>>>>    xen/include/xen/vpci.h    |   2 +
>>>>    8 files changed, 146 insertions(+), 34 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>>>> index 13e2a190b439..351cb968a423 100644
>>>> --- a/xen/arch/x86/hvm/vmsi.c
>>>> +++ b/xen/arch/x86/hvm/vmsi.c
>>>> @@ -893,6 +893,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>>>>    {
>>>>        unsigned int i;
>>>>    
>>>> +    ASSERT(!!rw_is_locked(&msix->pdev->domain->vpci_rwlock));
>>>                 ^ no need for the double negation.
>> Ok, will update all asserts which use !!
>>> Also this asserts that the lock is taken, but could be by a different
>>> pCPU.  I guess it's better than nothing.
>> Fair enough. Do you still want the asserts or should I remove them?
> Likely fine to leave them.
Ok
>
>>>> +
>>>>        for ( i = 0; i < msix->max_entries; i++ )
>>>>        {
>>>>            const struct vpci_msix_entry *entry = &msix->entries[i];
>>> Since this function is now called with the per-domain rwlock read
>>> locked it's likely not appropriate to call process_pending_softirqs
>>> while holding such lock (check below).
>> You are right, as it is possible that:
>>
>> process_pending_softirqs -> vpci_process_pending -> read_lock
>>
>> Even more, vpci_process_pending may also
>>
>> read_unlock -> vpci_remove_device -> write_lock
>>
>> in its error path. So, any invocation of process_pending_softirqs
>> must not hold d->vpci_rwlock at least.
>>
>> And also we need to check that pdev->vpci was not removed
>> in between or *re-created*
>>> We will likely need to re-iterate over the list of pdevs assigned to
>>> the domain and assert that the pdev is still assigned to the same
>>> domain.
>> So, do you mean a pattern like the below should be used at all
>> places where we need to call process_pending_softirqs?
>>
>> read_unlock
>> process_pending_softirqs
>> read_lock
>> pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
>> if ( pdev && pdev->vpci && is_the_same_vpci(pdev->vpci) )
>> <continue processing>
> Something along those lines. You likely need to continue iterate using
> for_each_pdev.
I'll try to play around it and see what will it look like
>
>>>> +{
>>>> +    /*
>>>> +     * Writing the command register and ROM BAR register may trigger
>>>> +     * modify_bars to run which in turn may access multiple pdevs while
>>>> +     * checking for the existing BAR's overlap. The overlapping check, if 
>>>> done
>>>> +     * under the read lock, requires vpci->lock to be acquired on both 
>>>> devices
>>>> +     * being compared, which may produce a deadlock. It is not possible to
>>>> +     * upgrade read lock to write lock in such a case. So, in order to 
>>>> prevent
>>>> +     * the deadlock, check which registers are going to be written and 
>>>> acquire
>>>> +     * the lock in the appropriate mode from the beginning.
>>>> +     */
>>>> +    if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) )
>>>> +        return true;
>>>> +
>>>> +    if ( !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4) )
>>> No need for the comparison if rom_reg is unset. Also you can OR both
>>> conditions into a single if.
>> If we open code vpci_offset_cmp with a single if all this is going
>> to be a bit clumsy:
>>
>>       if ( r1_offset < r2_offset + r2_size &&
>>            r2_offset < r1_offset + r1_size )
>>           return 0;
>> This is a single check.
>> Now we need to check two registers with the code above and
>> also check that pdev->vpci->header.rom_reg != 0
>>
>> I think it would be more readable if we have a tiny helper function
>>
>> static bool vpci_offset_cmp(unsigned int r1_offset, unsigned int r1_size,
>>                              unsigned int r2_offset, unsigned int r2_size)
>> {
>>       /* Return 0 if registers overlap. */
>>       if ( r1_offset < r2_offset + r2_size &&
>>            r2_offset < r1_offset + r1_size )
>>           return false;
>>       return true;
>> }
Do you mean this helper to be converted into
static bool overlaps(unsigned int r1_offset, unsigned int r1_size,
                             unsigned int r2_offset, unsigned int r2_size)
>>
>> So, then we can have something like
>>
>> static bool vpci_header_write_lock(const struct pci_dev *pdev,
>>                                      unsigned int start, unsigned int size)
>> {
>>       if ( !vpci_offset_cmp(start, size, PCI_COMMAND, 2) ||
>>            (pdev->vpci->header.rom_reg &&
>>             !vpci_offset_cmp(start, size, pdev->vpci->header.rom_reg, 4)) )
>>           return true;
>>
>>       return false;
>> }
> Just create an 'overlaps' static function in header.c.
Please see above
>
> Thanks, Roger.
Thank you,
Oleksandr
 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |