[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] dom0 hang
Oh, yes, do_softirq is over killing here, we actually only want to handle the TIMER_SOFTIRQ, for other SOFTIRQ, it should simply return. I will cook a patch for this. thanks for identifying this issue. Best Regards Ke >-----Original Message----- >From: Mukesh Rathor [mailto:mukesh.rathor@xxxxxxxxxx] >Sent: Friday, July 03, 2009 9:19 AM >To: mukesh.rathor@xxxxxxxxxx >Cc: George Dunlap; Tian, Kevin; xen-devel@xxxxxxxxxxxxxxxxxxx; Yu, Ke; Kurt C. >Hackel >Subject: Re: [Xen-devel] dom0 hang > > >Hi Kevin/Yu: > >acpi_processor_idle() >{ > sched_tick_suspend(); > /* > * sched_tick_suspend may raise TIMER_SOFTIRQ by __stop_timer, > * which will break the later assumption of no sofirq pending, > * so add do_softirq > */ > if ( softirq_pending(smp_processor_id()) ) > do_softirq(); <=============== > > local_irq_disable(); > if ( softirq_pending(smp_processor_id()) ) > { > local_irq_enable(); > sched_tick_resume(); > cpufreq_dbs_timer_resume(); > return; > } > >wouldn't the do_softirq() call scheduler with tick suspended, and >the scheduler then context switches to another vcpu0 (with *_BOOST) which >would result in the stuck vcpu I described? > >thanks >Mukesh > > >Mukesh Rathor wrote: >> ah, i totally missed csched_tick(): >> if ( !is_idle_vcpu(current) ) >> csched_vcpu_acct(cpu); >> >> yeah, looks like that's what is going on. i'm still waiting to >> reproduce. at first glance, looking at c/s 19460, seems like >> suspend/resume, well at least the resume, should happen in >> csched_schedule()..... >> >> thanks, >> Mukesh >> >> >> George Dunlap wrote: >>> [Oops, adding back in distro list, also adding Kevin Tian and Yu Ke >>> who wrote cs 19460] >>> >>> The functionality I was talking about, subtracting credits and >>> clearing BOOST, happens in csched_vcpu_acct() (which is different than >>> csched_acct()). vcpu_acct() is called from csched_tick(), which >>> should still happen every 10ms on every cpu. >>> >>> The patch I referred to (cs 19460) disables and re-enables tickers in >>> xen/arch/x86/acpi/cpu_idle.c:acpi_processor_idle() every time the >>> processor idles. I can't see anywhere else that tickers are disabled, >>> so it's probably something not properly re-enabling them again. >>> >>> Try applying the attached patch to see if that changes anything. (I'm >>> on the road, so I can't repro the lockup issue.) If that doesn't >>> work, try disabling c-states and see if that helps. Then at least >>> we'll know where the problem lies. >>> >>> -George >>> >>> On Thu, Jul 2, 2009 at 10:10 PM, Mukesh >>> Rathor<mukesh.rathor@xxxxxxxxxx> wrote: >>>> that seems to only suspend csched_pcpu.ticker which is csched_tick >>>> that is >>>> only sorting local runq. >>>> >>>> again, we are concerned about csched_priv.master_ticker that calls >>>> csched_acct? correct, so i can trace that? >>>> >>>> thanks, >>>> mukesh >>>> >>>> >>>> George Dunlap wrote: >>>>> Ah, I see that there's been some changes to tick stuff with the >>>>> c-state (e.g., cs 19460). It looks like they're supposed to be going >>>>> still, but perhaps the tick_suspend() and tick_resume() aren't being >>>>> called properly. Let me take a closer look. >>>>> >>>>> -George >>>>> >>>>> On Thu, Jul 2, 2009 at 8:14 PM, Mukesh >Rathor<mukesh.rathor@xxxxxxxxxx> >>>>> wrote: >>>>>> George Dunlap wrote: >>>>>>> On Thu, Jul 2, 2009 at 4:19 AM, Mukesh >>>>>>> Rathor<mukesh.rathor@xxxxxxxxxx> >>>>>>> wrote: >>>>>>>> dom0 hang: >>>>>>>> vcpu0 is trying to wakeup a task and in try_to_wake_up() calls >>>>>>>> task_rq_lock(). since the task has cpu set to 1, it gets runq lock >>>>>>>> for vcpu1. next it calls resched_task() which results in sending >>>>>>>> IPI >>>>>>>> to vcpu1. for that, vcpu0 gets into the >HYPERVISOR_event_channel_op >>>>>>>> HCALL and is waiting to return. Meanwhile, vcpu1 got running, >>>>>>>> and is >>>>>>>> spinning on it's runq lock in >>>>>>>> "schedule():spin_lock_irq(&rq->lock);", >>>>>>>> that vcpu0 is holding (and is waiting to return from the HCALL). >>>>>>>> >>>>>>>> As I had noticed before, vcpu0 never gets scheduled in xen. So >>>>>>>> looking further into xen: >>>>>>>> >>>>>>>> xen: >>>>>>>> Both vcpu's are on the same runq, in this case cpu1. But the >>>>>>>> priority of vcpu1 has been set to CSCHED_PRI_TS_BOOST. As a >result, >>>>>>>> the scheduler always picks vcpu1, and vcpu0 is starved. Also, I >>>>>>>> see in >>>>>>>> kdb that the scheduler timer is not set on cpu 0. That would've >>>>>>>> allowed csched_load_balance() to kick in on cpu0. [Also, on >>>>>>>> cpu1, the accounting timer, csched_tick, is not set. Altho, >>>>>>>> csched_tick() is running on cpu0, it only checks runq for cpu0.] >>>>>>>> >>>>>>>> Looks like c/s 19500 changed csched_schedule(): >>>>>>>> >>>>>>>> - ret.time = MILLISECS(CSCHED_MSECS_PER_TSLICE); >>>>>>>> + ret.time = (is_idle_vcpu(snext->vcpu) ? >>>>>>>> + -1 : MILLISECS(CSCHED_MSECS_PER_TSLICE)); >>>>>>>> >>>>>>>> The quickest fix for us would be to just back that out. >>>>>>>> >>>>>>>> >>>>>>>> BTW, just a comment on following (all in sched_credit.c): >>>>>>>> >>>>>>>> if ( svc->pri == CSCHED_PRI_TS_UNDER && >>>>>>>> !(svc->flags & CSCHED_FLAG_VCPU_PARKED) ) >>>>>>>> { >>>>>>>> svc->pri = CSCHED_PRI_TS_BOOST; >>>>>>>> } >>>>>>>> comibined with >>>>>>>> if ( snext->pri > CSCHED_PRI_TS_OVER ) >>>>>>>> __runq_remove(snext); >>>>>>>> >>>>>>>> Setting CSCHED_PRI_TS_BOOST as pri of vcpu seems dangerous. >To >>>>>>>> me, >>>>>>>> since csched_schedule() never checks for time accumulated by a >>>>>>>> vcpu at pri CSCHED_PRI_TS_BOOST, that is same as pinning a >>>>>>>> vcpu to a >>>>>>>> pcpu. if that vcpu never makes progress, essentially, the system >>>>>>>> has lost a physical cpu. Optionally, csched_schedule() should >>>>>>>> always >>>>>>>> check for cpu time accumulated and reduce the priority over >time. >>>>>>>> I can't tell right off if it already does that. or something like >>>>>>>> that :)... my 2 cents. >>>>>>> Hmm... what's supposed to happen is that eventually a timer tick will >>>>>>> interrupt vcpu1. If cpu1 is set to be "active", then it will be >>>>>>> debited 10ms worth of credit. Eventually, it will go into OVER, and >>>>>>> lose BOOST. If it's "inactive", then when the tick happens, it will >>>>>>> be set to "active" and be debited 10ms again, setting it directly >>>>>>> into >>>>>>> OVER (and thus also losing boost). >>>>>>> >>>>>>> Can you see if the timer ticks are still happening, and perhaps put >>>>>>> some tracing it to verify that what I described above is happening? >>>>>>> >>>>>>> -George >>>>>> George, >>>>>> >>>>>> Is that in csched_acct()? Looks like that's somehow gotten removed. If >>>>>> true, then may be that's the fundamental problem to chase. >>>>>> >>>>>> Here's what the trq looks like when hung, not in any schedule >>>>>> function: >>>>>> >>>>>> [0]xkdb> dtrq >>>>>> CPU[00]: NOW:0x00003f2db9af369e >>>>>> 1: exp=0x00003ee31cb32200 fn:csched_tick >data:0000000000000000 >>>>>> 2: exp=0x00003ee347ece164 fn:time_calibration >data:0000000000000000 >>>>>> 3: exp=0x00003ee69a28f04b fn:mce_work_fn >data:0000000000000000 >>>>>> 4: exp=0x00003f055895e25f fn:plt_overflow >data:0000000000000000 >>>>>> 5: exp=0x00003ee353810216 fn:rtc_update_second >data:ffff83007f0226d8 >>>>>> >>>>>> CPU[01]: NOW:0x00003f2db9af369e >>>>>> 1: exp=0x00003ee30b847988 fn:s_timer_fn >data:0000000000000000 >>>>>> 2: exp=0x00003f1b309ebd45 fn:pmt_timer_callback >data:ffff83007f022a68 >>>>>> >>>>>> >>>>>> thanks >>>>>> Mukesh >>>>>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |