[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 02/11] vpci: cancel pending map/unmap on vpci removal
On 16.11.21 10:01, Jan Beulich wrote: > On 16.11.2021 08:32, Oleksandr Andrushchenko wrote: >> On 15.11.21 18:56, Jan Beulich wrote: >>> On 05.11.2021 07:56, Oleksandr Andrushchenko wrote: >>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx> >>>> >>>> When a vPCI is removed for a PCI device it is possible that we have >>>> scheduled a delayed work for map/unmap operations for that device. >>>> For example, the following scenario can illustrate the problem: >>>> >>>> pci_physdev_op >>>> pci_add_device >>>> init_bars -> modify_bars -> defer_map -> >>>> raise_softirq(SCHEDULE_SOFTIRQ) >>>> iommu_add_device <- FAILS >>>> vpci_remove_device -> xfree(pdev->vpci) >>>> >>>> leave_hypervisor_to_guest >>>> vpci_process_pending: v->vpci.mem != NULL; v->vpci.pdev->vpci == NULL >>>> >>>> For the hardware domain we continue execution as the worse that >>>> could happen is that MMIO mappings are left in place when the >>>> device has been deassigned >>> Is continuing safe in this case? I.e. isn't there the risk of a NULL >>> deref? >> I think it is safe to continue > And why do you think so? I.e. why is there no race for Dom0 when there > is one for DomU? Well, then we need to use a lock to synchronize the two. I guess this needs to be pci devs lock unfortunately > >>>> For unprivileged domains that get a failure in the middle of a vPCI >>>> {un}map operation we need to destroy them, as we don't know in which >>>> state the p2m is. This can only happen in vpci_process_pending for >>>> DomUs as they won't be allowed to call pci_add_device. >>> You saying "we need to destroy them" made me look for a new domain_crash() >>> that you add, but there is none. What is this about? >> Yes, I guess we need to implicitly destroy the domain, > What do you mean by "implicitly"? @@ -151,14 +151,18 @@ bool vpci_process_pending(struct vcpu *v) vpci_cancel_pending(v->vpci.pdev); if ( rc ) + { /* * FIXME: in case of failure remove the device from the domain. * Note that there might still be leftover mappings. While this is + * safe for Dom0, for DomUs the domain needs to be killed in order + * to avoid leaking stale p2m mappings on failure. */ vpci_remove_device(v->vpci.pdev); + + if ( !is_hardware_domain(v->domain) ) + domain_crash(v->domain); > >>>> @@ -165,6 +164,18 @@ bool vpci_process_pending(struct vcpu *v) >>>> return false; >>>> } >>>> >>>> +void vpci_cancel_pending(const struct pci_dev *pdev) >>>> +{ >>>> + struct vcpu *v = current; >>>> + >>>> + /* Cancel any pending work now. */ >>> Doesn't "any" include pending work on all vCPU-s of the guest, not >>> just current? Is current even relevant (as in: part of the correct >>> domain), considering ... >>> >>>> --- a/xen/drivers/vpci/vpci.c >>>> +++ b/xen/drivers/vpci/vpci.c >>>> @@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev) >>>> xfree(r); >>>> } >>>> spin_unlock(&pdev->vpci->lock); >>>> + >>>> + vpci_cancel_pending(pdev); >>> ... this code path, when coming here from pci_{add,remove}_device()? >>> >>> I can agree that there's a problem here, but I think you need to >>> properly (i.e. in a race free manner) drain pending work. >> Yes, the code is inconsistent with this respect. I am thinking about: >> >> void vpci_cancel_pending(const struct pci_dev *pdev) >> { >> struct domain *d = pdev->domain; >> struct vcpu *v; >> >> /* Cancel any pending work now. */ >> domain_lock(d); >> for_each_vcpu ( d, v ) >> { >> vcpu_pause(v); >> if ( v->vpci.mem && v->vpci.pdev == pdev) > Nit: Same style issue as in the original patch. Will fix > >> { >> rangeset_destroy(v->vpci.mem); >> v->vpci.mem = NULL; >> } >> vcpu_unpause(v); >> } >> domain_unlock(d); >> } >> >> which seems to solve all the concerns. Is this what you mean? > Something along these lines. I expect you will want to make use of > domain_pause_except_self(), Yes, this is what is needed here, thanks. The only question is that int domain_pause_except_self(struct domain *d) { [snip] /* Avoid racing with other vcpus which may want to be pausing us */ if ( !spin_trylock(&d->hypercall_deadlock_mutex) ) return -ERESTART; so it is not clear what do we do in case of -ERESTART: do we want to spin? Otherwise we will leave the job not done effectively not canceling the pending work. Any idea other then spinning? > and I don't understand the purpose of > acquiring the domain lock. You are right, no need > > Jan > Thank you, Oleksandr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |