Xen project Mailing List

Re: [Xen-users] whether xen scheduler supports preemption

From: Dario Faggioli <dario.faggioli@xxxxxxxxxx>

Date: Thu, 27 Jun 2013 12:30:05 +0200

Delivery-date: Thu, 27 Jun 2013 10:30:44 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

So, first of all... Can you use plain text instead of HTML for e-mails? On mer, 2013-06-26 at 21:16 +0800, åä wrote: > Thank you very much for your detail explanation! See below. > You're welcome. Although, at this point, I'm curious about why you're interested in this... What is it that you want to achieve? > >... Yes, that is at least most of it. In fact, when a vcpu wakes up, it > >is added to a specific runq, and the 'tickling' mechanism is there right > >to ensure that the said vcpu starts to run as soon as possible, either > >if there are idle pcpus, or the running vcpus have lower priority, the > >latter case being the definition of preemption. > When a vcpu wakes up, it is added to a specific runq. Whether the specific > runq is the runnable queue? > Well, the vcpu wakes-up, so yes, it is the runnable queue of a specific pCPU. Which 'specific pCPU' depends, and I suggest you looking more deeply in the scheduler code. From the top of my head, I'd say it is the runqueue of the pCPU where the vCPU was when it went to sleep. > either if there are idle pcpus, or the running vcpus have lower priority? > In credit1, it works like this: - you (the vCPU) wake-up and I (Xen scheduler) queue you on the runq of the pCPU when you where before going to sleep; - if that pCPU is busy, I poke other pCPUs to see if you can run there (that's the meaning of 'tickling'); - if the above is not possible, I check if preemption is required. If yes, I preempt the vCPU running on the runq, if not, you have to wait for your turn (or for some other pCPU becoming idle and picking you up) in the runq. Does that make sense? > I do not understand your meaning. You mean that if there are idle pcpus, the > waked up vcpu will be scheduled on the idle pcpus to run. > For sure, the scheduler will try as hard as he can to achieve this, yes. > If not, it will preempted the current running vcpus if the waked up vcpu has > the higher priority compared to the the current vcpu. Whether my > understanding is right? > I believe it is. Actually, I believe this is either the definition or, in any case, the only sensible thing that a reasonable enough preemptible scheduler should do. :-) For the deep technicalities of how this is implemented in credit1, please refer to my hopefully accurate explanation above, or, even better, to sched_credit.c. > > If you, for instance, avoid raising the SCHEDULE_SOFTIRQ for busy > > pcpus > > (I would still tickle the idle ones, or you'll get funny results! :-O), > > you definitely are making the (credit) scheduler less preemptible. > I can not understand here. still tickle the idle ones, or you'll get funny > results! What's the meaning? > The meaning is that, given the explanation above, inhibiting preemption by, for instance, not tickling the busy pCPUs might actually work. On the other hand, if you have idle pCPUs, having them running the woken-up task is not a preemption, right? Well, if you do not tickle those pCPUs you won't get there, and you not only will get rid of peemption on busy pCPUs, you will also have idle pCPUs that remains idle, even if there are vCPUs waiting to be executed. This means you're killing not only preemption, but also work conserving-ness, and that might not be among your original goals (or was it?). > >Of course, wake-ups is not the only cause of SCHEDULE_SOFTIRQ being > >raised. E.g., it fires periodically at the scheduling time slice > >boundaries. If you want to avoid vcpus being interrupted by others with > >higher priority for this case too, you probably have more paths to tweak > >than just the csched_vcpu_wake() function. > > > Yes, I can not remember the number of raising SCHEDULE_SOFTIRQ interrupt. > Long time ago, I check the places of raising SCHEDULE_SOFTIRQ interrupt. It > is about seven places. > Fine. Then, to be sure, I'd check all of them and see what they end up doing. I know they're all calling csched_schedule(), what I mean is I'd check the conditions and the parameters, to verify which ones of these 7 possible situations could lead to preemption. What you can be quite sure of, is ha there's not going to be a preemption without a call to csched_schedule() being involved, so you may even try to instrument the code at that level.. It really all depends on your final purpose. > >And here I'm failing at understanding what you mean again... When a > >SCHEDULE_SOFTIRQ is raised for a given pcpu, that pcpu will deal with > >it, well, ASAP (look at how softirqs & tasklets work in the hypervisor > >source code). What do you mean by "give up the physical cpu"? > I mean after raising the SCHEDULE_SOFTIRQ interrupt, the handler function > schedule() will execute in time or need to wait the current vcpu scheduled > out. Which part decides the priority among them? > Mmm... I spot some confusion here. Why the scheduling out of a vcpu should be involved in all this? I mean, raising a SCHEDULE_SOFTIRQ and, most important, handling it, happens in Xen code. That means there is a pCPU executing hypervisor code, independently of which one is the vCPU that is or was running on that same pCPU. Well, this same hypervisor code will get to execute, at some point, csched_schedule(), make the scheduling decision and, if that is the case, dschedule the running vCPU and schedule another one (and here you are a preemption). Actually, we really can't wait for a vCPU to be descheduled to execute the Xen scheduler, since it's the Xen scheduler itself that deschedules vCPUs! :-O Perhaps, with "scheduled out" you mean something like block, i.e., you want to know if Xen is able to interrupt the vCPUs or if it always run them to completion or blocking. In which case, the former, we interrupt the vCPUs, just like an (preemptible) OS scheduler interrupts OS's tasks. Whether or not that will result in a preemption, it depends both on the scheduler and on the circumstances. Sounds better now? > Can you give me some guidance, where is the code for softirqs & tasklets. > Well, grep and find are usually good friends, when the question is where is the code! :-P Both $ grep tasklet xen.git/xen/* and $ grep softirq xen.git/xen/* Produce a lot of output here. Also, I'd try something like that... You know, programmers usually have quite few fantasy $ find ./xen.git/xen/ -iname tasklet* ./xen/include/xen/tasklet.h ./xen/common/tasklet.c $ find ./xen.git/xen/ -iname softirq* ./xen/include/asm-x86/softirq.h ./xen/include/xen/softirq.h ./xen/include/asm-arm/softirq.h ./xen/common/softirq.c > Another question: > In the schedule() function of schedule.c file, at first, it will set the flag > tasklet_work_scheduled according to whether has the tasklet_work. What is the > tasklet work? > After having inspected at least some of the sources above, look for the do_tasklet() function, and revise what it does. If it's the concept of tasklet and softirq that you're unfamiliar with, well, very quickly it's just one way of deferring work in an OS (or, in out case, an hypervisor, but still). Linux makes uses of these kind of things pretty heavily (although the names, the implementation, and the number of different variants of them changes with kernel versions). I trust/hope you can find enough documentation about that on line. :-) > In the csched_schedule() of sched_credit.c file, it will give the idle vcpu > boost priority if the tasklet_work_scheduled is set. > I have some difficult for understanding this part. Maybe my confusion is not > knowing the tasklet work. Can you give some explanation why designing like > this? > Again, tasklet is deferred work. That means there is this pretty function you want to call, but you can call it right now. Typical example is because you have interrupt disabled and the pretty function in question wants interrupt enabled, or it is you that you don't want to keep interrupts disabled for too long, or any other reason. Ok, what you do is to make a note about calling that function later, and that's exactly what tasklet does. The reason why we execute them in idle domain's context is, well, because we have to execute them somewhere! :-) Seriously, our scheduler schedules vCPUs, not 'functions', so you either call a function from where you are (and we already said you can't) or, when you're done, the scheduler will pick a vCPU and get on with it, and your function will never be called. What we hence do is making sure it is one of the idle domain's vCPUs that is scheduled, as well as making sure that such vCPU will call your function as part of 'its workload'. Check out the idle_loop() function, it's in xen/arch/x86/domain.c. Regards, Dario

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.