Re: [Xen-users] whether xen scheduler supports preemption

So, first of all... Can you use plain text instead of HTML for e-mails?

On mer, 2013-06-26 at 21:16 +0800, åä wrote:
> Thank you very much for your detail explanation! See below.
You're welcome. Although, at this point, I'm curious about why you're
interested in this... What is it that you want to achieve?

> >... Yes, that is at least most of it. In fact, when a vcpu wakes up, it
> >is added to a specific runq, and the 'tickling' mechanism is there right
> >to ensure that the said vcpu starts to run as soon as possible, either
> >if there are idle pcpus, or the running vcpus have lower priority, the
> >latter case being the definition of preemption.
> When a vcpu wakes up, it is added to a specific runq. Whether the specific 
> runq is the runnable queue?
Well, the vcpu wakes-up, so yes, it is the runnable queue of a specific
pCPU. Which 'specific pCPU' depends, and I suggest you looking more
deeply in the scheduler code. From the top of my head, I'd say it is the
runqueue of the pCPU where the vCPU was when it went to sleep.

> either if there are idle pcpus, or the running vcpus have lower priority?
In credit1, it works like this:
 - you (the vCPU) wake-up and I (Xen scheduler) queue you on the runq
   of the pCPU when you where before going to sleep;
 - if that pCPU is busy, I poke other pCPUs to see if you can run there
   (that's the meaning of 'tickling');
 - if the above is not possible, I check if preemption is required. If
   yes, I preempt the vCPU running on the runq, if not, you have to wait
   for your turn (or for some other pCPU becoming idle and picking you
   up) in the runq.

Does that make sense?

> I do not understand your meaning. You mean that if there are idle pcpus, the 
> waked up vcpu will be scheduled on the idle pcpus to run. 
For sure, the scheduler will try as hard as he can to achieve this, yes.

> If not, it will preempted the current running vcpus if the waked up vcpu has 
> the higher priority compared to the the current vcpu. Whether my 
> understanding is right?
I believe it is. Actually, I believe this is either the definition or,
in any case, the only sensible thing that a reasonable enough
preemptible scheduler should do. :-)

For the deep technicalities of how this is implemented in credit1,
please refer to my hopefully accurate explanation above, or, even
better, to sched_credit.c.

> > If you, for instance, avoid raising the SCHEDULE_SOFTIRQ for busy
> > pcpus
> > (I would still tickle the idle ones, or you'll get funny results! :-O),
> > you definitely are making the (credit) scheduler less preemptible.
> I can not understand here. still tickle the idle ones, or you'll get funny 
> results! What's the meaning?
The meaning is that, given the explanation above, inhibiting preemption
by, for instance, not tickling the busy pCPUs might actually work. On
the other hand, if you have idle pCPUs, having them running the woken-up
task is not a preemption, right? Well, if you do not tickle those pCPUs
you won't get there, and you not only will get rid of peemption on busy
pCPUs, you will also have idle pCPUs that remains idle, even if there
are vCPUs waiting to be executed.

This means you're killing not only preemption, but also work
conserving-ness, and that might not be among your original goals (or was

> >Of course, wake-ups is not the only cause of SCHEDULE_SOFTIRQ being
> >raised. E.g., it fires periodically at the scheduling time slice
> >boundaries. If you want to avoid vcpus being interrupted by others with
> >higher priority for this case too, you probably have more paths to tweak
> >than just the csched_vcpu_wake() function.
> >
> Yes, I can not remember the number of raising SCHEDULE_SOFTIRQ interrupt. 
> Long time ago, I check the places of raising SCHEDULE_SOFTIRQ interrupt. It 
> is about seven places.
Fine. Then, to be sure, I'd check all of them and see what they end up
doing. I know they're all calling csched_schedule(), what I mean is I'd
check the conditions and the parameters, to verify which ones of these 7
possible situations could lead to preemption.

What you can be quite sure of, is ha there's not going to be a
preemption without a call to csched_schedule() being involved, so you
may even try to instrument the code at that level.. It really all
depends on your final purpose.

> >And here I'm failing at understanding what you mean again... When a
> >SCHEDULE_SOFTIRQ is raised for a given pcpu, that pcpu will deal with
> >it, well, ASAP (look at how softirqs & tasklets work in the hypervisor
> >source code). What do you mean by "give up the physical cpu"?
> I mean after raising the SCHEDULE_SOFTIRQ interrupt, the handler function 
> schedule() will execute in time or need to wait the current vcpu scheduled 
> out. Which part decides the priority among them? 
Mmm... I spot some confusion here. Why the scheduling out of a vcpu
should be involved in all this? I mean, raising a SCHEDULE_SOFTIRQ and,
most important, handling it, happens in Xen code. That means there is a
pCPU executing hypervisor code, independently of which one is the vCPU
that is or was running on that same pCPU. Well, this same hypervisor
code will get to execute, at some point, csched_schedule(), make the
scheduling decision and, if that is the case, dschedule the running vCPU
and schedule another one (and here you are a preemption).

Actually, we really can't wait for a vCPU to be descheduled to execute
the Xen scheduler, since it's the Xen scheduler itself that deschedules
vCPUs! :-O

Perhaps, with "scheduled out" you mean something like block, i.e., you
want to know if Xen is able to interrupt the vCPUs or if it always run
them to completion or blocking. In which case, the former, we interrupt
the vCPUs, just like an (preemptible) OS scheduler interrupts OS's
tasks. Whether or not that will result in a preemption, it depends both
on the scheduler and on the circumstances.

Sounds better now?

> Can you give me some guidance, where is the code for  softirqs & tasklets.
Well, grep and find are usually good friends, when the question is where
is the code! :-P


$ grep tasklet xen.git/xen/*


$ grep softirq xen.git/xen/*

Produce a lot of output here. Also, I'd try something like that... You
know, programmers usually have quite few fantasy

$ find ./xen.git/xen/ -iname tasklet*

$ find ./xen.git/xen/ -iname softirq*

> Another question:
> In the schedule() function of schedule.c file, at first, it will set the flag 
> tasklet_work_scheduled according to whether has the tasklet_work. What is the 
> tasklet work?
After having inspected at least some of the sources above, look for the
do_tasklet() function, and revise what it does. If it's the concept of
tasklet and softirq that you're unfamiliar with, well, very quickly it's
just one way of deferring work in an OS (or, in out case, an hypervisor,
but still).

Linux makes uses of these kind of things pretty heavily (although the
names, the implementation, and the number of different variants of them
changes with kernel versions). I trust/hope you can find enough
documentation about that on line. :-)

> In the csched_schedule() of  sched_credit.c file, it will give the idle vcpu 
> boost priority if the tasklet_work_scheduled is set. 
> I have some difficult for understanding this part. Maybe my confusion is not 
> knowing the tasklet work. Can you give some explanation why designing like 
> this?
Again, tasklet is deferred work. That means there is this pretty
function you want to call, but you can call it right now. Typical
example is because you have interrupt disabled and the pretty function
in question wants interrupt enabled, or it is you that you don't want to
keep interrupts disabled for too long, or any other reason.

Ok, what you do is to make a note about calling that function later, and
that's exactly what tasklet does. The reason why we execute them in idle
domain's context is, well, because we have to execute them
somewhere!  :-)

Seriously, our scheduler schedules vCPUs, not 'functions', so you either
call a function from where you are (and we already said you can't) or,
when you're done, the scheduler will pick a vCPU and get on with it, and
your function will never be called. What we hence do is making sure it
is one of the idle domain's vCPUs that is scheduled, as well as making
sure that such vCPU will call your function as part of 'its workload'.

Check out the idle_loop() function, it's in xen/arch/x86/domain.c.


