[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] cpuidle causing Dom0 soft lockups
Hi Jan, Could you please try the attached patch. this patch try to avoid entering deep C state when there is vCPU local irq disabled, and polling event channel. When tested in my 64 CPU box, this issue is gone with this patch. Best Regards Ke >-----Original Message----- >From: Yu, Ke >Sent: Wednesday, February 03, 2010 1:07 AM >To: Jan Beulich; Keir Fraser >Cc: xen-devel@xxxxxxxxxxxxxxxxxxx >Subject: RE: [Xen-devel] cpuidle causing Dom0 soft lockups > >>-----Original Message----- >>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx] >>Sent: Tuesday, February 02, 2010 3:55 PM >>To: Keir Fraser; Yu, Ke >>Cc: xen-devel@xxxxxxxxxxxxxxxxxxx >>Subject: Re: [Xen-devel] cpuidle causing Dom0 soft lockups >> >>>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 21.01.10 12:03 >>> >>>On 21/01/2010 10:53, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote: >>>> I can see your point. But how can you consider shipping with something >>>> apparently severely broken. As said before - the fact that this manifests >>>> itself by hanging many-vCPU Dom0 has the very likely implication that >>>> there are (so far unnoticed) problems with smaller Dom0-s. If I had a >>>> machine at hand that supports C3, I'd try to do some measurements >>>> with smaller domains... >>> >>>Well it's a fallback I guess. If we can't make progress on solving it then I >>>suppose I agree. >> >>Just fyi, we now also have seen an issue on a 24-CPU system that went >>away with cpuidle=0 (and static analysis of the hang hinted in that >>direction). All I can judge so far is that this likely has something to do >>with our kernel's intensive use of the poll hypercall (i.e. we see vCPU-s >>not waking up from the call despite there being pending unmasked or >>polled for events). >> >>Jan > >Hi Jan, > >We just identified the cause of this issue, and is trying to find appropriate >way >to fix it. > >This issue is the result of following sequence: >1. every dom0 vCPU has one 250HZ timer (i.e. 4ms period). The vCPU >timer_interrupt handler will acquire a global ticket spin lock xtime_lock. >When xtime_lock is hold by other vCPU, the vCPU will poll event channel and >become blocked. As a result, the pCPU where the vCPU runs will become idle. >Later, when the lock holder release xtime_lock, it will notify event channel to >wake up the vCPU. As a result, the pCPU will wake up from idle state, and >schedule the vCPU to run. > >From the above, we can see the latency of vCPU timer interrupt is consisted >of the following items. The "latency" here means the time between beginning >to acquire lock and finally lock acquired. >T1 - CPU execution time ( e.g. timer interrupt lock holding time, event channel >notification time) >T2 - CPU idle wake up time, i.e. the time CPU wake up from deep C state (e.g. >C3) to C0, usually it is in the order of several 10us or 100us > >2. then let's consider the case of large number of CPUs, e.g. 64 pCPU and 64 >VCPU in dom0, let's assume the lock holding sequence is VCPU0 -> >VCPU1->VCPU2 ... ->VCPU63. >Then vCPU63 will spend 64*(T1 + T2) to acquire the xtime_lock. if T1+T2 is >100us, then the total latency would be ~6ms. >As we have known that the timer is 250HZ, or 4ms period, so when event >channel notification issued, and pCPU schedule vCPU63, hypervisor will find >the timer is over-due, and will send another TIMER_VIRQ for vCPU63 (see >schedule()->vcpu_periodic_timer_work() for detail). In this case, vCPU63 will >be always busy handling timer interrupt, and not be able to update the watch >dog, thus cause the softlock up. > >So from the above sequence, we can see: >- cpuidle driver add extra latency, thus make this issue more easy to occurs. >- Large number of CPU multiply the latency >- ticket spin lock lead fixed lock acquiring sequence, thus lead the latency >repeatedly being 64*(T1+T2), thus make this issue more easy to occurs. >and the fundamental cause of this issue is that vCPU timer interrupt handler >is not good for scaling, due to the global xtime_lock. > >From cpuidle point of view, one thing we are trying to do is: changing the >cpuidle driver to not enter deep C state when there is vCPU with local irq >disabled and event channel polling. In this case, the T2 latency will be >eliminated. > >Anyway, cpuidle is just one side, we can anticipate that if CPU number is large >enough to lead NR_CPU * T1 > 4ms, this issue will occurs again. So another >way is to make dom0 scaling well by not using xtime_lock, although this is >pretty hard currently. Or another way is to limit dom0 vCPU number to >certain reasonable level. > >Regards >Ke Attachment:
cpuidle-hint-v2.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |