[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/vpmu: add cpu hot unplug notifier for vpmu
>>> On 17.05.17 at 16:32, <boris.ostrovsky@xxxxxxxxxx> wrote: > On 05/17/2017 10:09 AM, Jan Beulich wrote: >>>>> On 17.05.17 at 15:58, <boris.ostrovsky@xxxxxxxxxx> wrote: >>> On 05/17/2017 08:54 AM, Jan Beulich wrote: >>>>>>> On 17.05.17 at 14:40, <luwei.kang@xxxxxxxxx> wrote: >>>>>>>>> On 16.05.17 at 19:29, <luwei.kang@xxxxxxxxx> wrote: >>>>>>> Currently, hot unplug a cpu with vpmu enabled may cause system hang >>>>>>> due to send IPI to a die physical cpu. This patch add a cpu hot unplug >>>>>>> notifer to save vpmu context before cpu offline. >>>>>>> >>>>>>> Consider one scene, hotplug physical cpu N with vpmu is enabled. >>>>>> I think you mean "scenario" and "hot unplug". >>>>>> >>>>>>> The vcpu which running on this physical cpu before will be switch to >>>>>>> other online cpu. Before load the vpmu context to new physical cpu, a >>>>>>> IPI will be send to cpu N to save the vpmu context. >>>>>>> System will hang in function on_select_cpus because of that physical >>>>>>> cpu is offline and can not do any response. >>>>>> Doesn't this make clear that you would better also make sure >>>>>> ->last_pcpu doesn't hold to the then stale CPU anymore? For >>>>>> example, vpmu_load() compares it with smp_processor_id() (the subsequent >>>>>> use > >>>>> is guarded by a VPMU_CONTEXT_LOADED flag >>>>>> check), allowing badness if the same or another CPU with the same number >>>>> comes up again quickly enough. Similarly >>>>>> vpmu_arch_destroy() uses it without checking VPMU_CONTEXT_LOADED. >>>>> I think it may can't make sure "->last_pcpu" doesn't hold to the >>>>> then >>>>> stale CPU. The purpose of this notifier is to save the vpmu context >>>>> before >>>>> cpu offline. Avoid save vpmu context by send IPI to that offline cpu. >>>>> There >>>>> is no reason to change the value except it saving (vpmu_save()) in >>>>> another >>>>> physical cpu. >>>> I'm afraid I don't understand most of your reply. >>>> >>>>> Regarding vpmu_arch_destroy(), it indeed will cause same issue. What >>>>> about add " this_cpu(cpu) = NULL" in cpu_callback() to clean the >>>>> last_vcpu >>>>> pointer of this physical cpu. >>>> That's being done by vpmu_save_force() already afaict (assuming >>>> you mean this_cpu(last_vcpu)), albeit for whatever reason open >>>> coding this_cpu(). >>>> >>>>> In addition, add VPMU_CONTEXT_LOADED check before execute >>>>> on_selected_cpus(cpumask_of(vcpu_vpmu(v)->last_pcpu), vpmu_save_force, v, >>>>> 1) > in >>>>> vpmu_arch_destroy(). Because of force save operation has been finished in >>>>> notifier function. >>>> I'm not sure whether that would be correct. Boris? >>> >>> I believe we still have a race with vpmu_load(): it can be past >>> VPMU_CONTEXT_LOADED test and committed to the remote call when the >>> remote VCPU becomes offlined. >> The offlined entity is a pCPU, and such offlining happens in stop- >> machine context iirc. > > Oh, then I think this should work --- remote calls are predicated on > VPMU_CONTEXT_LOADED being set and the callback will clear it. > > Nevertheless, I'd still make sure that last_pcpu doesn't point to an > offlined processor. Well, we can't reasonably use some random online one; my intention rather was to write some visibly invalid value there, so that comparing with any possibly online CPU would always produce false. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |