[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] cpuidle causing Dom0 soft lockups
>From: Jan Beulich [mailto:JBeulich@xxxxxxxxxx] >Sent: 2010年2月24日 0:45 > >>>> Keir Fraser <keir.fraser@xxxxxxxxxxxxx> 23.02.10 11:57 >>> >>On 23/02/2010 10:37, "Jan Beulich" <JBeulich@xxxxxxxxxx> wrote: >> >>>> Right. According to the code, there should be no way to >this BUG_ON. >>>> If it happens, that reveal either bugs of code or the necessity of >>>> adding code to migrate urgent vcpu count. Do you have more >>>> information on how this BUG_ON happens? >>> >>> Obviously there are vCPU-s that get inserted on a run queue with >>> is_urgent set (which according to my reading of Keir's description >>> shouldn't happen). In particular, this >> >>Is it possible for a polling VCPU to become runnable without it being >>cleared from poll_mask? I suspect maybe that is the problem, >and that needs >>dealing with, or the proper handling needs to be added to >sched_credit.c. > >I don't think that's the case, at least not exclusively. Using > >--- a/xen/common/sched_credit.c >+++ b/xen/common/sched_credit.c >@@ -201,6 +201,7 @@ __runq_insert(unsigned int cpu, struct c > > BUG_ON( __vcpu_on_runq(svc) ); > BUG_ON( cpu != svc->vcpu->processor ); >+WARN_ON(svc->vcpu->is_urgent);//temp > > list_for_each( iter, runq ) > { >--- a/xen/common/schedule.c >+++ b/xen/common/schedule.c >@@ -139,6 +139,7 @@ static inline void vcpu_runstate_change( > >ASSERT(spin_is_locked(&per_cpu(schedule_data,v->processor).sche >dule_lock)); > > vcpu_urgent_count_update(v); >+WARN_ON(v->is_urgent && new_state <= RUNSTATE_runnable);//temp > > trace_runstate_change(v, new_state); > >I get pairs of warnings (i.e. each for the same vCPU): > >(XEN) Xen WARN at schedule.c:142 >(XEN) Xen call trace: >(XEN) [<ffff82c48011c8d5>] schedule+0x375/0x510 >(XEN) [<ffff82c48011deb8>] __do_softirq+0x58/0x80 >(XEN) [<ffff82c4801e61e6>] process_softirqs+0x6/0x10 > >(XEN) Xen WARN at sched_credit.c:204 >(XEN) Xen call trace: >(XEN) [<ffff82c4801186b9>] csched_vcpu_wake+0x169/0x1a0 >(XEN) [<ffff82c4801497f2>] update_runstate_area+0x102/0x110 >(XEN) [<ffff82c48011cdcf>] vcpu_wake+0x13f/0x390 >(XEN) [<ffff82c48014b1a0>] context_switch+0x760/0xed0 >(XEN) [<ffff82c48014913d>] vcpu_kick+0x1d/0x80 >(XEN) [<ffff82c480107feb>] evtchn_set_pending+0xab/0x1b0 >(XEN) [<ffff82c4801083a9>] evtchn_send+0x129/0x150 >(XEN) [<ffff82c480108950>] do_event_channel_op+0x4c0/0xf50 >(XEN) [<ffff82c4801461b5>] reprogram_timer+0x55/0x90 >(XEN) [<ffff82c4801461b5>] reprogram_timer+0x55/0x90 >(XEN) [<ffff82c48011fd44>] timer_softirq_action+0x1a4/0x360 >(XEN) [<ffff82c4801e6169>] syscall_enter+0xa9/0xae > >In schedule() this is always "prev" transitioning to RUNSTATE_runnable >(i.e. _VPF_blocked not set), yet the second call trace shows that >_VPF_blocked must have been set at that point (otherwise >vcpu_unblock(), tail-called from vcpu_kick(), would not have called >vcpu_wake()). If the order wasn't always that shown, or if the two >traces got intermixed, this could hint at a race - but they are always >that way, which so far I cannot make sense of. > Such race surely exists, since two paths are each updating multiple fileds which however are not all protected with same scheduler lock. vcpu_unblock manipulates _VPF_blocked w/o acuquiring scheduler lock. Then below sequence is possible: enter schedule() with _VPF_blocked ... <-vcpu_unblock clears _VPF_blocked. poll_mask hasn't been cleared yet ... vcpu_runstate_change( prev, (test_bit(_VPF_blocked, &prev->pause_flags) ? RUNSTATE_blocked : (vcpu_runnable(prev) ? RUNSTATE_runnable : RUNSTATE_offline)), now); Then RUNSTATE_runnable is chosen. Then vcpu_urgent_count_update will set is_urgent flag since poll_mask has bit set. Then vcpu_unblock clears poll_mask and then invoke vcpu_wake. vcpu_wake wait for scheduler lock and then will see is_urgent flag set for a runnable vcpu. Here one necessary check is missed in vcpu_urgent_count_update, as only polling vcpu in blocked state should be cared in cpuidle. If there's any runnable vcpu in temporary polling state, idle vcpu won't get scheduled. is_urgent can be only set when new runstate is blocked, and when vcpu is in poll_mask. As runstate change always happen with scheduler lock, this can effectively ensure set/clear of is_urgent. Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |