[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] xen/sched: fix sched_move_domain()



On Tue, Nov 14, 2023 at 1:30 PM Juergen Gross <jgross@xxxxxxxx> wrote:
>
> When moving a domain out of a cpupool running with the credit2
> scheduler and having multiple run-queues, the following ASSERT() can
> be observed:
>
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04023a700>] R credit2.c#csched2_unit_remove+0xe3/0xe7
> (XEN)    [<ffff82d040246adb>] S sched_move_domain+0x2f3/0x5b1
> (XEN)    [<ffff82d040234cf7>] S cpupool.c#cpupool_move_domain_locked+0x1d/0x3b
> (XEN)    [<ffff82d040236025>] S cpupool_move_domain+0x24/0x35
> (XEN)    [<ffff82d040206513>] S domain_kill+0xa5/0x116
> (XEN)    [<ffff82d040232b12>] S do_domctl+0xe5f/0x1951
> (XEN)    [<ffff82d0402276ba>] S timer.c#timer_lock+0x69/0x143
> (XEN)    [<ffff82d0402dc71b>] S pv_hypercall+0x44e/0x4a9
> (XEN)    [<ffff82d0402012b7>] S lstar_enter+0x137/0x140
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion 'svc->rqd == c2rqd(sched_unit_master(unit))' failed at 
> common/sched/credit2.c:1159
> (XEN) ****************************************
>
> This is happening as sched_move_domain() is setting a different cpu
> for a scheduling unit without telling the scheduler. When this unit is
> removed from the scheduler, the ASSERT() will trigger.
>
> In non-debug builds the result is usually a clobbered pointer, leading
> to another crash a short time later.
>
> Fix that by swapping the two involved actions (setting another cpu and
> removing the unit from the scheduler).
>
> Cc: Henry Wang <Henry.Wang@xxxxxxx>
> Link: https://github.com/Dasharo/dasharo-issues/issues/488
> Fixes: 70fadc41635b ("xen/cpupool: support moving domain between cpupools 
> with different granularity")
> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> ---
> V2:
> - added Link: (reporter didn't want to be added by name)
> ---
>  xen/common/sched/core.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index 12deefa745..e9f7486197 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -738,12 +738,13 @@ int sched_move_domain(struct domain *d, struct cpupool 
> *c)
>      new_p = cpumask_first(d->cpupool->cpu_valid);

There's a comment just above here which should probably be updated;
something like "Remove all units from the old scheduler, and
temporarily move them to the same processor to make locking easier
when moving the new units to nwe processors."

With that change:

Reviewed-by: George Dunlap <george.dunlap@xxxxxxxxx>

I could change that on check-if you'd like.

I take it at this point this is just for the 4.19 branch, and this
will be a candidate for backport to 4.18.1?

 -George



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.