[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal
On 22.11.21 16:37, Jan Beulich wrote: > On 22.11.2021 15:21, Oleksandr Andrushchenko wrote: >> On 19.11.21 15:34, Oleksandr Andrushchenko wrote: >>> On 19.11.21 15:25, Jan Beulich wrote: >>>> On 19.11.2021 14:16, Oleksandr Andrushchenko wrote: >>>>> On 19.11.21 15:00, Jan Beulich wrote: >>>>>> On 19.11.2021 13:34, Oleksandr Andrushchenko wrote: >>>>>>> Possible locking and other work needed: >>>>>>> ======================================= >>>>>>> >>>>>>> 1. pcidevs_{lock|unlock} is too heavy and is per-host >>>>>>> 2. pdev->vpci->lock cannot be used as vpci is freed by >>>>>>> vpci_remove_device >>>>>>> 3. We may want a dedicated per-domain rw lock to be implemented: >>>>>>> >>>>>>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h >>>>>>> index 28146ee404e6..ebf071893b21 100644 >>>>>>> --- a/xen/include/xen/sched.h >>>>>>> +++ b/xen/include/xen/sched.h >>>>>>> @@ -444,6 +444,7 @@ struct domain >>>>>>> >>>>>>> #ifdef CONFIG_HAS_PCI >>>>>>> struct list_head pdev_list; >>>>>>> + rwlock_t vpci_rwlock; >>>>>>> + bool vpci_terminating; <- atomic? >>>>>>> #endif >>>>>>> then vpci_remove_device is a writer (cold path) and >>>>>>> vpci_process_pending and >>>>>>> vpci_mmio_{read|write} are readers (hot path). >>>>>> Right - you need such a lock for other purposes anyway, as per the >>>>>> discussion with Julien. >>>>> What about bool vpci_terminating? Do you see it as an atomic type or just >>>>> bool? >>>> Having seen only ... >>>> >>>>>>> do_physdev_op(PHYSDEVOP_pci_device_remove) will need >>>>>>> hypercall_create_continuation >>>>>>> to be implemented, so when re-start removal if need be: >>>>>>> >>>>>>> vpci_remove_device() >>>>>>> { >>>>>>> d->vpci_terminating = true; >>>> ... this use so far, I can't tell yet. But at a first glance a boolean >>>> looks to be what you need. >>>> >>>>>>> remove vPCI register handlers <- this will cut off PCI_COMMAND >>>>>>> emulation among others >>>>>>> if ( !write_trylock(d->vpci_rwlock) ) >>>>>>> return -ERESTART; >>>>>>> xfree(pdev->vpci); >>>>>>> pdev->vpci = NULL; >>>>>>> } >>>>>>> >>>>>>> Then this d->vpci_rwlock becomes a dedicated vpci per-domain lock for >>>>>>> other operations which may require it, e.g. virtual bus topology can >>>>>>> use it when assigning vSBDF etc. >>>>>>> >>>>>>> 4. vpci_remove_device needs to be removed from vpci_process_pending >>>>>>> and do nothing for Dom0 and crash DomU otherwise: >>>>>> Why is this? I'm not outright opposed, but I don't immediately see why >>>>>> trying to remove the problematic device wouldn't be a reasonable course >>>>>> of action anymore. vpci_remove_device() may need to become more careful >>>>>> as to not crashing, >>>>> vpci_remove_device does not crash, vpci_process_pending does >>>>>> though. >>>>> Assume we are in an error state in vpci_process_pending *on one of the >>>>> vCPUs* >>>>> and we call vpci_remove_device. vpci_remove_device tries to acquire the >>>>> lock and it can't just because there are some other vpci code is running >>>>> on other vCPU. >>>>> Then what do we do here? We are in SoftIRQ context now and we can't spin >>>>> trying to acquire d->vpci_rwlock forever. Neither we can blindly free vpci >>>>> structure because it is seen by all vCPUs and may crash them. >>>>> >>>>> If vpci_remove_device is in hypercall context it just returns -ERESTART >>>>> and >>>>> hypercall continuation helps here. But not in SoftIRQ context. >>>> Maybe then you want to invoke this cleanup from RCU context (whether >>>> vpci_remove_device() itself or a suitable clone there of is TBD)? (I >>>> will admit though that I didn't check whether that would satisfy all >>>> constraints.) >>>> >>>> Then again it also hasn't become clear to me why you use write_trylock() >>>> there. The lock contention you describe doesn't, on the surface, look >>>> any different from situations elsewhere. >>> I use write_trylock in vpci_remove_device because if we can't >>> acquire the lock then we defer device removal. This would work >>> well if called from a hypercall which will employ hypercall continuation. >>> But SoftIRQ getting -ERESTART is something that we can't probably >>> handle by restarting as hypercall can, thus I only see that >>> vpci_process_pending >>> will need to spin and wait until vpci_remove_device succeeds. >> Does anybody have any better solution for preventing SoftIRQ from >> spinning on vpci_remove_device and -ERESTART? > Well, at this point I can suggest only a marginal improvement: Instead of > spinning inside the softirq handler, you want to re-raise the softirq and > exit the handler. That way at least higher "priority" softirqs won't be > starved. > > Beyond that - maybe the guest (or just a vcpu of it) needs pausing in such > an event, with the work deferred to a tasklet? > > Yet I don't think my earlier question regarding the use of write_trylock() > was really answered. What you said in reply doesn't explain (to me at > least) why write_lock() is not an option. I was thinking that we do not want to freeze in case we are calling vpci_remove_device from SoftIRQ context, thus we try to lock and if we can't we return -ERESTART indicating that the removal needs to be deferred. If we use write_lock, then SoftIRQ -> write_lock will spin there waiting for readers to release the lock. write_lock actually makes things a lot easier, but I just don't know if it is ok to use it. If so, then vpci_remove_device becomes synchronous and there is no need in hypercall continuation and other heavy machinery for re-scheduling SoftIRQ... > > Jan > Thank you, Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |