[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH-for-4.17] xen/sched: fix race in RTDS scheduler
On 21.10.22 11:37, Dario Faggioli wrote: Ok, and now, something not really related to the bug being fixed here, but about the code that is being touched: On Fri, 2022-10-21 at 08:10 +0200, Juergen Gross wrote:diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c index d6de25531b..960a8033e2 100644 --- a/xen/common/sched/rt.c +++ b/xen/common/sched/rt.c @@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit, else if ( !unit_runnable_state(snext->unit) ) { q_remove(snext); + replq_remove(ops, snext); snext = rt_unit(sched_idle_unit(sched_cpu)); }So, adding a few more context here, the code looks like this: snext = runq_pick(ops, cpumask_of(sched_cpu), cur_cpu); if ( snext == NULL ) snext = rt_unit(sched_idle_unit(sched_cpu)); else if ( !unit_runnable_state(snext->unit) ) { q_remove(snext); snext = rt_unit(sched_idle_unit(sched_cpu)); } Basically, we've tried to pick-up the highest priority task from the runqueue. If snext is NULL, the runqueue was just empty, so we pick up idle (and then, later, we'll check whether the currently running unit is still runnable; and if it is, we'll continue to run it, of course). However, it can happen that --e.g., due to core-scheduling-- we picked up a unit that, despite being in the runqueue, is not runnable. At this point what we do is removing it from the runqueue (to avoid picking it up again) and we go for idle. Now, I may be missing/misremembering something, but it looks to me that it's possible that there are other runnable units in the runqueue. And if that's the case, why do we just pick idle and move on, instead of continuing trying? Juergen... Am I missing or misremembering any fundamental reason why we cannot continue to scan the runqueue until the first runnable unit (if any) is found? No. This code was introduced in the RFC V2 series of core scheduling. And it was not the result of a previous discussion on xen-devel. Of course, this is not really related with the bug this patch is fixing, which is correct and should be applied, no matter what the outcome of this subthread will be. :-) I can write another patch trying to fix that, but that shouldn't be 4.17 material IMHO. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |