[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] About vcpu wakeup and runq tickling in credit
On 15/11/12 12:10, Dario Faggioli wrote: On Tue, 2012-10-23 at 16:16 +0100, George Dunlap wrote:As it comes to possible solution, I think that, for instance, tickling all the CPUs in both v_W's and v_C's affinity masks could solve this, but that would also potentially increase the overhead (by asking _a_lot_ of CPUs to reschedule), and again, it's hard to say if/when it's worth...Well in my code, opt_tickle_idle_one is on by default, which means only one other cpu will be woken up. If there were an easy way to make it wake up a CPU in v_C's affinity as well (supposing that there was no overlap), that would probably be a win. Of course, that's only necessary if: * v_C is lower priority than v_W * There are no idlers that intersect both v_C and v_W's affinity mask. It's probably a good idea though to try to set up a scenario where this might be an issue and see how often it actually happens.Ok, I think I managed in reproducing this. Look at the following trace, considering that d51 has vcpu-affinity with pcpus 8-15, while d0 has no affinity at all (its vcpus can run everywhere): 166.853945095 ---|-|-------x-| d51v1 runstate_change d0v7 blocked->runnable ]166.853945884 ---|-|-------x-| d51v1 28004(2:8:4) 2 [ 0 7 ] . ]166.853986385 ---|-|-------x-| d51v1 2800e(2:8:e) 2 [ 33 4bf97be ] ]166.853986522 ---|-|-------x-| d51v1 2800f(2:8:f) 3 [ 0 a050 1c9c380 ] ]166.853986636 ---|-|-------x-| d51v1 2800a(2:8:a) 4 [ 33 1 0 7 ] . 166.853986775 ---|-|-------x-| d51v1 runstate_change d51v1 running->runnable 166.853986905 ---|-|-------x-| d?v? runstate_change d0v7 runnable->running . . . ]166.854195353 ---|-|-------x-| d0v7 28006(2:8:6) 2 [ 0 7 ] ]166.854196484 ---|-|-------x-| d0v7 2800e(2:8:e) 2 [ 0 33530 ] ]166.854196584 ---|-|-------x-| d0v7 2800f(2:8:f) 3 [ 33 33530 1c9c380 ] ]166.854196691 ---|-|-------x-| d0v7 2800a(2:8:a) 4 [ 0 7 33 1 ] 166.854196809 ---|-|-------x-| d0v7 runstate_change d0v7 running->blocked 166.854197175 ---|-|-------x-| d?v? runstate_change d51v1 runnable->running So, if I'm not reading the trace wrong, when d0v7 wakes up (very first event) it preempts d51v1. Now, even if almost all pcpus 8-15 are idle, none of them get tickled and comes to pick d51v1 up, which has then to wait in the runq until d0v7 goes back to sleep. I suspect this could be because, at d0v7 wakeup time, we try to tickle some pcpu which is in d0v7's affinity, but not in d51v1's one (as in the second 'if() {}' block in __runq_tickle() we only care about new->vcpu->cpu_affinity, and in this case, new is d0v7). I know, looking at the timestamps it doesn't look like it is a big deal in this case, and I'm still working on producing numbers that can better show whether or not this is a real problem. Anyway, and independently from the results of these tests, why do I care so much? Well, if you substitute the concept of "vcpu-affinity" with "node-affinity" above (which is what I am doing in my NUMA aware scheduling patches) you'll see why this is bothering me quite a bit. In fact, in that case, waking up a random pcpu with which d0v7 has node-affinity with, while d51v1 has not, would cause d51v1 being pulled by that cpu (since node-affinity is only preference)! So, in the vcpu-affinity case, if pcpu 3 get tickled, when it peeks at pcpu 13's runq for work to steal it does not find anything suitable and give up, leaving d51v1 in the runq even if there are idle pcpus on which it could run, which is already bad. In the node-affinity case, pcpu 3 will actually manage in stealing d51v1 and running it, even if there are idle pcpus with which it has node-affinity, and thus defeating most of the benefits of the whole NUMA aware scheduling thing (at least for some workloads). Maybe what we should do is do the wake-up based on who is likely to run on the current cpu: i.e., if "current" is likely to be pre-empted, look at idlers based on "current"'s mask; if "new" is likely to be put on the queue, look at idlers based on "new"'s mask. What do you think? -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |