[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v6][RFC]xen: sched: convert RTDS from time to event driven model



On Fri, 2016-02-26 at 13:33 -0500, Chen, Tianyang wrote:
> Attached.
> 
Hey,

here I am... sorry it took a bit.

I've had a look, and I've been able to come up with some code that I at
least do not dislike too much! ;-P

Have a look at the attached patch (you should apply it on top of the
sched_rt.c file that you sent me in attachment).

About what was happening...

> On 2016-02-26 13:09, Dario Faggioli wrote:
> > On Fri, 2016-02-26 at 12:28 -0500, Tianyang Chen wrote:
> > > 
> > > I added debug prints in all these functions and noticed that it
> > > could
> > > be 
> > > caused by racing between spurious wakeups and context switching.
> > > 
It's not really spurious wakeup, it's regular wakeup happening
immediately after the vcpu blocked, when we still haven't been able to
execute rt_context_saved().

It's not really a race, and in fact we have the _RTDS_scheduled and
_RTDS_delayed_runq_add flags that are meant to deal with exactly these
situations.

In fact (I added some more debug printk):

> > > (XEN) vcpu1 wakes up nowhere
> > > (XEN) cpu1 picked vcpu1     *** vcpu1 is on a cpu
> > > (XEN) cpu1 picked idle      *** vcpu1 is waiting to be context
> > > switched
> > > (XEN) vcpu2 wakes up nowhere
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) cpu2 picked vcpu2
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) d0 attempted to change d0v2's CR4 flags 00000620 ->
> > > 00040660
> > > (XEN) cpu0 picked vcpu0
> > > (XEN) vcpu2 sleeps on cpu
> > > (XEN) vcpu1 wakes up nowhere      *** vcpu1 wakes up without
> > > sleep?
> > > 
> > > (XEN) Assertion '!__vcpu_on_replq(svc)' failed at sched_rt.c:526
> > > (XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]---
> > > -
>
(XEN) [   56.113897] vcpu13 wakes up nowhere
(XEN) [   56.113906] cpu4 picked vcpu13
(XEN) [   56.113962] vcpu13 blocks
(XEN) [   56.113965] cpu4 picked idle
(XEN) [   56.113969] vcpu13 unblocks
(XEN) [   56.113972] vcpu13 wakes up nowhere
(XEN) [   56.113980] vcpu13 woken while still in scheduling tail
(XEN) [   56.113985] vcpu13 context saved and added back to queue
(XEN) [   56.113992] cpu4 picked vcpu13

So, as you can see, at 56.113962 vcpu13 blocks (note, _blocks_ !=
_sleeps_), and cpu4 goes idle. Ideally, rt_context_saved() would run
and remove the replenishment event of vcpu13 from the replenishment
queue.

But, at 56.113969, vcpu13 wakes up already, before rt_context_saved()
had a chance to run (which happens at 56.113985). It is not a spurious
wakeup, as the vcpu was actually blocked and is being unblocked.

Since rt_context_saved() hasn't run yet, the replenishment event is
still in the queue, and hence any ASSERT asking for
!__vcpu_on_replq(vcpu13) is doomed to making Xen explode! :-/

That does not happen in the execution trace above, because that was
collected with my patch applied, where I either leave the replenishment
event alone, if it still valid (i.e., no deadline miss happened during
the blocked period) or, if a new one needs to be enqueued, I dequeue
the old one first.

The end result is not particularly pretty, but I am up for doing even
worse, for the sake of keeping things like this:

 ASSERT( !__vcpu_on_replq(svc) );

at the beginning of replq_insert() (and its appropriate counterpart at
the beginning of replq_remove()).

In fact, I consider them really really helpful, when reasoning and
trying to figure out how the code works... There is nothing that I hate
more than an 'enqueue' function for which you don't know if it is ok to
call it when the entity being enqueued is in the queue already (and
what actually happens if it is).

These ASSERTs, greatly help, from that point of view, clearly stating
that, _no_, in this case it's absolutely not right to call the enqueue
function if the event is in the queue already. :-)

Hope I made myself clear.

I gave some testing to the attached patch, and it seems to work to me,
but I'd appreciate if you could do more of that.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: sched_rt_wakeup_schedtail.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.