[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen
Hi, Keir, Concerning the last running vcpu on the dying cpu, I have some thought. Yes, there would be a short time after the stop_machine_run when this vcpu v->processor == dying_cpu. But anyhow, we set fie __VPF_migrating flag for that vcpu and issued a schedule_softirq on the dying cpu. This softirq should run immediately after stop_machine context, am I right? If so, by the time the schedule softirq is executed, this last vcpu is migrated away from this dying cpu. But saving of its context will be delayed to play_dead->sync_lazy_context. If another cpu issues the schedule request to this dying cpu (vcpu_sleep_nosync->cpu_raise_softirq(vc->processor....)) during this time, the request will be serviced by the above code sequence. So it is safe in such cases. Am I missing something important? I am not quite confident on the statements, though. Shan Haitao -----Original Message----- From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] Sent: 2008年9月11日 22:15 To: Shan, Haitao; Haitao Shan; Tian, Kevin Cc: xen-devel@xxxxxxxxxxxxxxxxxxx Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen I applied the patch with the following changes: * I rewrote your changes to fixup_irqs(). We should force lazy EOIs *after* we have serviced any straggling interrupts. Also we should actually clear the EOI stack so it is empty next time the CPU comes online. * I simplified your changes to schedule.c in light of the fact we run in stop_machine context. Hence we can be quite relaxed about locking, for example. * I removed your change to __csched_vcpu_is_migrateable() and instead put a similar check in csched_load_balance(). I think this is clearer and also cheaper. I note that the VCPU currently running on the offlined CPU continues to run there even after __cpu_disable(), and until that CPU does a final run through the scheduler soon after. I hope it does not matter there is one vcpu with v->processor == offlined_cpu for a short while (e.g., what if another CPU does vcpu_sleep_nosync(v) -> cpu_raise_softirq(v->processor, ...)). I *think* it's actually okay, but I'm not totally certain. Really I guess this patch needs some stress testing (lots of online/offline cycles while pausing/unpausing domains, etc). Perhaps we could plumb through a Xen sysctl and make a small dom0 utility for this purpose? -- Keir On 11/9/08 12:33, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote: > Thanks! > Concerning cpu online/offline development, I have a small question here. > Since cpu_online_map is very important, code in different subsystems may use > it extensively. If such code is not designed with cpu online/offline in mind, > it may introduce race conditions, just like the one fixed in cpu calibration > rendezvous. > Currently, we solve it in a find-and-fix manner. Do you have any idea that can > solve the problem in a cleaner way? > Thanks in advance. > > Shan Haitao > > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] > Sent: 2008年9月11日 19:13 > To: Shan, Haitao; Haitao Shan > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: Re: [Xen-devel] Re: [PATCH 1/4] CPU online/offline support in Xen > > It looks much better. I'll read through, maybe tweak, and most likely then > check it in. > > Thanks, > Keir > > On 11/9/08 09:02, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote: > >> Hi, Keir, >> >> Attached is the updated patch using the methods as you described in >> another mail. >> What do you think of the one? >> >> Signed-off-by: Shan Haitao <haitao.shan@xxxxxxxxx> >> >> Best Regards >> Haitao Shan >> >> Haitao Shan wrote: >>> Agree. Placing migration in stop_machine context will definitely make >>> our jobs easier. I will start making a new patch tomorrow. :) >>> I place the migraton code outside the stop_machine_run context, partly >>> because I am not quite sure how long it will take to migrate all the >>> vcpus away. If it takes too much time, all useful works are blocked >>> since all cpus are in the stop_machine context. Of course, I borrowed >>> the ideas from kernel, which also let me made the desicion. >>> >>> 2008/9/10 Keir Fraser <keir.fraser@xxxxxxxxxxxxx>: >>>> I feel this is more complicated than it needs to be. >>>> >>>> How about clearing VCPUs from the offlined CPU's runqueue from the >>>> very end of __cpu_disable()? At that point all other CPUs are safely >>>> in softirq context with IRQs disabled, and we are running on the >>>> correct CPU (being offlined). We could have a hook into the >>>> scheduler subsystem at that point to break affinities, assign to >>>> different runqueues, etc. We would just need to be careful not to >>>> try an IPI. :-) This approach would not need a cpu_schedule_map >>>> (which is really increasing code fragility imo, by creating possible >>>> extra confusion about which cpumask is the wright one to use in a >>>> given situation). >>>> >>>> My feeling, unless I've missed something, is that this would make >>>> the patch quite a bit smaller and with a smaller spread of code >>>> changes. >>>> >>>> -- Keir >>>> >>>> On 9/9/08 09:59, "Shan, Haitao" <haitao.shan@xxxxxxxxx> wrote: >>>> >>>>> This patch implements cpu offline feature. >>>>> >>>>> Best Regards >>>>> Haitao Shan >>>> >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |