 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Xen crash in scheduler during cpu hotplug
 On 15.05.24 15:22, Andrew Cooper wrote: On 15/05/2024 1:39 pm, Jan Beulich wrote:On 15.05.2024 13:58, Andrew Cooper wrote:Just so it doesn't get lost. In XenRT, we've found:(XEN) ----[ Xen-4.19.0-1-d x86_64 debug=y Tainted: H ]---- (XEN) CPU: 45 (XEN) RIP: e008:[<ffff82d040244cbf>] common/sched/credit.c#csched_load_balance+0x41/0x877 (XEN) RFLAGS: 0000000000010092 CONTEXT: hypervisor (XEN) rax: ffff82d040981618 rbx: ffff82d040981618 rcx: 0000000000000000 (XEN) rdx: 0000003ff68cd000 rsi: 000000000000002d rdi: ffff83103723d450 (XEN) rbp: ffff83207caa7d48 rsp: ffff83207caa7b98 r8: 0000000000000000 (XEN) r9: ffff831037253cf0 r10: ffff83103767c3f0 r11: 0000000000000009 (XEN) r12: ffff831037237990 r13: ffff831037237990 r14: ffff831037253720 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 0000000000f526e0 (XEN) cr3: 000000005bc2f000 cr2: 0000000000000010 (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d040244cbf> (common/sched/credit.c#csched_load_balance+0x41/0x877): (XEN) 48 8b 0c 10 48 8b 49 08 <48> 8b 79 10 48 89 bd b8 fe ff ff 49 8b 4e 28 48 <snip> (XEN) Xen call trace: (XEN) [<ffff82d040244cbf>] R common/sched/credit.c#csched_load_balance+0x41/0x877 
I think I've found the reason.
In schedule_cpu_add() the cpupool and granularity are set only after
releasing the scheduling lock. I think those must be inside the locked
region.
Can you give this one a try (not tested at all)?
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 0cb33831d2..babac7aad6 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -3176,6 +3176,8 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     sr->scheduler = new_ops;
     sr->sched_priv = ppriv;
+    sr->granularity = cpupool_get_granularity(c);
+    sr->cpupool = c;
     /*
      * Reroute the lock to the per pCPU lock as /last/ thing. In fact,
@@ -3188,8 +3190,6 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
     /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
     spin_unlock_irqrestore(old_lock, flags);
-    sr->granularity = cpupool_get_granularity(c);
-    sr->cpupool = c;
     /* The  cpu is added to a pool, trigger it to go pick up some work */
     cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
Juergen
 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |