|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6][RFC]xen: sched: convert RTDS from time to event driven model
On 2/26/2016 4:11 AM, Dario Faggioli wrote: It looks like the current code doesn't add a vcpu to the replenishment queue when vcpu_insert() is called. When the scheduler is initialized, all the vcpus are added to the replenishment queue after waking up from sleep. This needs to be changed (add vcpu to replq in vcpu_insert()) to make it consistent in a sense that when rt_vcpu_insert() is called, it needs to have a corresponding replenishment event queued. This way the ASSERT() is for both cases in __runq_insert() to enforce the fact that "when a vcpu is inserted to runq/depletedq, a replenishment event is waiting for it".I am ok with this (calling replq_insert() in rt_vcpu_insert()). Not just doing that unconditionally though, as a vcpu can, e.g., be paused when the insert_vcpu hook is called, and in that case, I don't think we want to enqueue the replenishment event, do we? Yes.
The v6 doesn't have the if statement commented out when I submitted it. But I tried commenting it out, the assertion failed. It fails in V6 with: rt_vcpu_sleep(): removing replenishment event for all cases rt_context_saved(): not removing replenishment events rt_vcpu_wake(): not checking if the event is already queued. or with: rt_vcpu_sleep(): not removing replenishment event at all rt_context_saved(): removing replenishment events if not runnable rt_vcpu_wake(): not checking if the event is already queued. Also with:rt_vcpu_sleep(): removing replenishment event if the vcpu is on runq/depletedq rt_context_saved(): removing replenishment events if not runnable rt_vcpu_wake(): not checking if the event is already queued.I added debug prints in all these functions and noticed that it could be caused by racing between spurious wakeups and context switching. See the following events for the last modification above: (XEN) cpu1 picked idle (XEN) d0 attempted to change d0v1's CR4 flags 00000620 -> 00040660 (XEN) cpu2 picked idle (XEN) vcpu1 sleeps on cpu (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) vcpu1 wakes up nowhere (XEN) cpu0 picked vcpu1 (XEN) vcpu1 sleeps on cpu (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) vcpu1 wakes up nowhere (XEN) cpu0 picked vcpu1 (XEN) cpu0 picked idle (XEN) vcpu1 context saved not runnable (XEN) cpu0 picked vcpu0 (XEN) vcpu1 wakes up nowhere (XEN) cpu1 picked vcpu1 *** vcpu1 is on a cpu (XEN) cpu1 picked idle *** vcpu1 is waiting to be context switched (XEN) vcpu2 wakes up nowhere (XEN) cpu0 picked vcpu0 (XEN) cpu2 picked vcpu2 (XEN) cpu0 picked vcpu0 (XEN) cpu0 picked vcpu0 (XEN) d0 attempted to change d0v2's CR4 flags 00000620 -> 00040660 (XEN) cpu0 picked vcpu0 (XEN) vcpu2 sleeps on cpu (XEN) vcpu1 wakes up nowhere *** vcpu1 wakes up without sleep? (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526 (XEN) ----[ Xen-4.7-unstable x86_64 debug=y Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b ... (XEN) Xen call trace: (XEN) [<ffff82d08012a151>] sched_rt.c#rt_vcpu_wake+0x11f/0x17b (XEN) [<ffff82d08012bf2c>] vcpu_wake+0x213/0x3d4 (XEN) [<ffff82d08012c447>] vcpu_unblock+0x4b/0x4d (XEN) [<ffff82d080169cea>] vcpu_kick+0x20/0x6f (XEN) [<ffff82d080169d65>] vcpu_mark_events_pending+0x2c/0x2f (XEN) [<ffff82d08010762a>] event_2l.c#evtchn_2l_set_pending+0xa9/0xb9 (XEN) [<ffff82d080108312>] send_guest_vcpu_virq+0x9d/0xba (XEN) [<ffff82d080196cbe>] send_timer_event+0xe/0x10 (XEN) [<ffff82d08012a7b5>] schedule.c#vcpu_singleshot_timer_fn+0x9/0xb (XEN) [<ffff82d080131978>] timer.c#execute_timer+0x4e/0x6c (XEN) [<ffff82d080131ab9>] timer.c#timer_softirq_action+0xdd/0x213 (XEN) [<ffff82d08012df32>] softirq.c#__do_softirq+0x82/0x8d (XEN) [<ffff82d08012df8a>] do_softirq+0x13/0x15 (XEN) [<ffff82d080243ad1>] cpufreq.c#process_softirqs+0x21/0x30So, it looks like spurious wakeup for vcpu1 happens before it was completely context switched off a cpu. But rt_vcpu_wake() didn't see it on cpu with curr_on_cpu() so it fell through the first two RETURNs. I guess the replenishment queue check is necessary for this situation? Thanks, Tianyang _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |