[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 04/12] xen/arm: Prevent crash during disable_nonboot_cpus on suspend
Hi Volodymyr, On Sat, Aug 23, 2025 at 3:36 AM Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx> wrote: > > > Hi Mykola, > > Mykola Kvach <xakep.amatop@xxxxxxxxx> writes: > > While I approve the change, the commit message is somewhat > unclear. Maybe "Don't release IRQs on suspend" will be better? Do you mean commit message title ? > > > From: Mykola Kvach <mykola_kvach@xxxxxxxx> > > > > If we call disable_nonboot_cpus on ARM64 with system_state set > > to SYS_STATE_suspend, the following assertion will be triggered: > > > > ``` > > (XEN) [ 25.582712] Disabling non-boot CPUs ... > > (XEN) [ 25.587032] Assertion '!in_irq() && (local_irq_is_enabled() || > > num_online_cpus() <= 1)' failed at common/xmalloc_tlsf.c:714 > > [...] > > (XEN) [ 25.975069] Xen call trace: > > (XEN) [ 25.978353] [<00000a000022e098>] xfree+0x130/0x1a4 (PC) > > (XEN) [ 25.984314] [<00000a000022e08c>] xfree+0x124/0x1a4 (LR) > > (XEN) [ 25.990276] [<00000a00002747d4>] release_irq+0xe4/0xe8 > > (XEN) [ 25.996152] [<00000a0000278588>] > > time.c#cpu_time_callback+0x44/0x60 > > (XEN) [ 26.003150] [<00000a000021d678>] notifier_call_chain+0x7c/0xa0 > > (XEN) [ 26.009717] [<00000a00002018e0>] > > cpu.c#cpu_notifier_call_chain+0x24/0x48 > > (XEN) [ 26.017148] [<00000a000020192c>] cpu.c#_take_cpu_down+0x28/0x34 > > (XEN) [ 26.023801] [<00000a0000201944>] cpu.c#take_cpu_down+0xc/0x18 > > (XEN) [ 26.030281] [<00000a0000225c5c>] > > stop_machine.c#stopmachine_action+0xbc/0xe4 > > (XEN) [ 26.038057] [<00000a00002264bc>] > > tasklet.c#do_tasklet_work+0xb8/0x100 > > (XEN) [ 26.045229] [<00000a00002268a4>] do_tasklet+0x68/0xb0 > > (XEN) [ 26.051018] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194 > > (XEN) [ 26.057585] [<00000a0000277e30>] start_secondary+0x21c/0x220 > > (XEN) [ 26.063978] [<00000a0000361258>] 00000a0000361258 > > ``` > > > > This happens because before invoking take_cpu_down via the stop_machine_run > > function on the target CPU, stop_machine_run requests > > the STOPMACHINE_DISABLE_IRQ state on that CPU. Releasing memory in > > the release_irq function then triggers the assertion: > > > > /* > > * Heap allocations may need TLB flushes which may require IRQs to be > > * enabled (except when only 1 PCPU is online). > > */ > > > > This patch adds system state checks to guard calls to request_irq > > and release_irq. These calls are now skipped when system_state is > > SYS_STATE_{resume,suspend}, preventing unsafe operations during > > suspend/resume handling. > > If any call to release_irq() during suspend will trigger ASSERT, and it > is fine to leave IRQs as is during suspend, maybe it will be easier to > put > > + if ( system_state == SYS_STATE_suspend ) > + return; > > straight into release_irq() code? This will be easier than playing > whack-a-mole when some other patch will add another release_irq() call > somewhere. I’m fine with adding this check directly into release_irq(), as long as the other maintainers agree with this approach. > > > > > > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx> > > --- > > Changes in V4: > > - removed the prior tasklet-based workaround in favor of a more > > straightforward and safer solution > > - reworked the approach by adding explicit system state checks around > > request_irq and release_irq calls, skips these calls during suspend > > and resume states to avoid unsafe memory operations when IRQs are > > disabled > > --- > > xen/arch/arm/gic.c | 6 ++++++ > > xen/arch/arm/tee/ffa_notif.c | 2 +- > > xen/arch/arm/time.c | 18 ++++++++++++------ > > 3 files changed, 19 insertions(+), 7 deletions(-) > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index a018bd7715..9856cb1592 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -388,6 +388,9 @@ void gic_dump_info(struct vcpu *v) > > > > void init_maintenance_interrupt(void) > > { > > + if ( system_state == SYS_STATE_resume ) > > + return; > > + > > request_irq(gic_hw_ops->info->maintenance_irq, 0, > > maintenance_interrupt, > > "irq-maintenance", NULL); > > } > > @@ -461,6 +464,9 @@ static int cpu_gic_callback(struct notifier_block *nfb, > > switch ( action ) > > { > > case CPU_DYING: > > + if ( system_state == SYS_STATE_suspend ) > > + break; > > + > > /* This is reverting the work done in init_maintenance_interrupt */ > > release_irq(gic_hw_ops->info->maintenance_irq, NULL); > > break; > > diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c > > index 00efaf8f73..06f715a82b 100644 > > --- a/xen/arch/arm/tee/ffa_notif.c > > +++ b/xen/arch/arm/tee/ffa_notif.c > > @@ -347,7 +347,7 @@ void ffa_notif_init_interrupt(void) > > { > > int ret; > > > > - if ( notif_enabled && notif_sri_irq < NR_GIC_SGI ) > > + if ( notif_enabled && notif_sri_irq < NR_GIC_SGI && system_state != > > SYS_STATE_resume ) > > { > > /* > > * An error here is unlikely since the primary CPU has already > > diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c > > index ad984fdfdd..b2e07ade43 100644 > > --- a/xen/arch/arm/time.c > > +++ b/xen/arch/arm/time.c > > @@ -320,10 +320,13 @@ void init_timer_interrupt(void) > > WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2); > > disable_physical_timers(); > > > > - request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt, > > - "hyptimer", NULL); > > - request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt, > > - "virtimer", NULL); > > + if ( system_state != SYS_STATE_resume ) > > + { > > + request_irq(timer_irq[TIMER_HYP_PPI], 0, htimer_interrupt, > > + "hyptimer", NULL); > > + request_irq(timer_irq[TIMER_VIRT_PPI], 0, vtimer_interrupt, > > + "virtimer", NULL); > > + } > > > > check_timer_irq_cfg(timer_irq[TIMER_HYP_PPI], "hypervisor"); > > check_timer_irq_cfg(timer_irq[TIMER_VIRT_PPI], "virtual"); > > @@ -338,8 +341,11 @@ static void deinit_timer_interrupt(void) > > { > > disable_physical_timers(); > > > > - release_irq(timer_irq[TIMER_HYP_PPI], NULL); > > - release_irq(timer_irq[TIMER_VIRT_PPI], NULL); > > + if ( system_state != SYS_STATE_suspend ) > > + { > > + release_irq(timer_irq[TIMER_HYP_PPI], NULL); > > + release_irq(timer_irq[TIMER_VIRT_PPI], NULL); > > + } > > } > > > > /* Wait a set number of microseconds */ > > -- > WBR, Volodymyr Best regards, Mykola
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |