|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
Hi Volodymyr,
Thank you for the feedback.
On Fri, May 8, 2026 at 1:25 AM Volodymyr Babchuk
<Volodymyr_Babchuk@xxxxxxxx> wrote:
>
> Hi Mykola,
>
> Mykola Kvach <xakep.amatop@xxxxxxxxx> writes:
>
> [...]
>
> >> > + status = can_system_suspend();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_scheduler;
> >>
> >> When we have an error and we get the resume_scheduler path, we apply back
> >> the
> >> context of the guest saved previously in do_psci_1_0_system_suspend(), so
> >> am I
> >> correct saying the guest won’t get any PSCI error back and we resume the
> >> guest
> >> from the guest resume entrypoint?
> >>
> >> In case, should we have a different path that returns a PSCI error
> >> (PSCI_*) into the guest
> >> x0, and skips the context restore?
> >
> > You are right about the current control flow: once the virtual
> > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> > later failure in the Xen-wide suspend path resumes the domain through the
> > normal
> > domain resume path, rather than returning a PSCI error from the original
> > call.
> >
> > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> > path parks the domain and saves its resume context. The actual Xen-wide host
> > suspend is a separate step that is attempted only after all domains are
> > suspended.
> >
> > So a failure in the later Xen-wide suspend step is treated as an abort of
> > the
> > host suspend attempt after the domain suspend was already accepted. The
> > domain
> > is then resumed through the existing domain resume path, similarly to the
> > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> > call path and modifying the saved vCPU context again.
> >
> > I agree this design is not obvious from the patch. I will clarify the commit
> > message and comments. If you or the maintainers think that failures before
> > the
> > physical SYSTEM_SUSPEND call succeeds should be reported back through the
> > original virtual PSCI call, then this would require a different flow. I was
> > trying to avoid that extra complexity in this series.
>
> I think that there is no sense to reporting an error back to guest. PSCI
> allows resume at any stage, so it is acceptable to have such brief "suspend"
>
> >
> >>
> >> > + }
> >> > +
> >> > + /*
> >> > + * Non-boot CPUs have to be disabled on suspend and enabled on
> >> > resume
> >> > + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to
> >> > PSCI
> >> > + * CPU_OFF to be called by each non-boot CPU. Depending on the
> >> > underlying
> >> > + * platform capabilities, this may lead to the physical powering
> >> > down of
> >> > + * CPUs.
> >> > + */
> >> > + status = disable_nonboot_cpus();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_nonboot_cpus;
> >> > + }
> >> > +
> >> > + time_suspend();
> >> > +
> >> > + status = iommu_suspend();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_time;
> >> > + }
> >> > +
> >> > + console_start_sync();
> >> > + status = console_suspend();
> >> > + if ( status )
> >> > + {
> >> > + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n",
> >> > status);
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_end_sync;
> >> > + }
> >> > +
> >> > + local_irq_save(flags);
> >> > + status = gic_suspend();
> >> > + if ( status )
> >> > + {
> >> > + system_state = SYS_STATE_resume;
> >> > + goto resume_irqs;
> >> > + }
> >> > +
> >> > + set_init_ttbr(xen_pgtable);
> >> > +
> >> > + /*
> >> > + * Enable identity mapping before entering suspend to simplify
> >> > + * the resume path
> >> > + */
> >> > + update_boot_mapping(true);
> >> > +
> >> > + if ( prepare_resume_ctx(&cpu_context) )
> >> > + {
> >> > + status = call_psci_system_suspend();
> >> > + /*
> >> > + * If suspend is finalized properly by above system suspend
> >> > PSCI call,
> >> > + * the code below in this 'if' branch will never execute.
> >> > Execution
> >> > + * will continue from hyp_resume which is the hypervisor's
> >> > resume point.
> >> > + * In hyp_resume CPU context will be restored and since
> >> > link-register is
> >> > + * restored as well, it will appear to return from
> >> > prepare_resume_ctx.
> >> > + * The difference in returning from prepare_resume_ctx on
> >> > system suspend
> >> > + * versus resume is in function's return value: on suspend, the
> >> > return
> >> > + * value is a non-zero value, on resume it is zero. That is why
> >> > the
> >> > + * control flow will not re-enter this 'if' branch on resume.
> >> > + */
> >> > + if ( status )
> >> > + dprintk(XENLOG_WARNING, "PSCI system suspend failed,
> >> > err=%d\n",
> >> > + status);
> >> > + }
> >> > +
> >> > + system_state = SYS_STATE_resume;
> >> > + update_boot_mapping(false);
> >> > +
> >> > + gic_resume();
> >> > +
> >> > + resume_irqs:
> >> > + local_irq_restore(flags);
> >> > +
> >> > + console_resume();
> >> > + resume_end_sync:
> >> > + console_end_sync();
> >> > +
> >> > + iommu_resume();
> >> > +
> >> > + resume_time:
> >> > + time_resume();
> >> > +
> >> > + resume_nonboot_cpus:
> >> > + /*
> >> > + * The rcu_barrier() has to be added to ensure that the per cpu
> >> > area is
> >> > + * freed before a non-boot CPU tries to initialize it
> >> > (_free_percpu_area()
> >> > + * has to be called before the init_percpu_area()). This scenario
> >> > occurs
> >> > + * when non-boot CPUs are hot-unplugged on suspend and hotplugged
> >> > on resume.
> >> > + */
> >> > + rcu_barrier();
> >> > + enable_nonboot_cpus();
> >> > +
> >> > + resume_scheduler:
> >> > + scheduler_enable();
> >> > + thaw_domains();
> >> > +
> >> > + system_state = SYS_STATE_active;
> >> > +
> >> > + printk("Resume (status %d)\n", status);
> >> > +
> >> > + domain_resume(d);
> >> > +}
> >> > +
> >> > +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL);
> >> > +
> >> > +void host_system_suspend(struct domain *d)
> >> > +{
> >> > + system_suspend_tasklet.data = (void *)d;
> >> > + /*
> >> > + * The suspend procedure has to be finalized by the pCPU#0
> >> > (non-boot pCPUs
> >> > + * will be disabled during the suspend).
> >> > + */
> >> > + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0);
> >> > +}
> >> > +
> >> > /*
> >> > * Local variables:
> >> > * mode: C
> >> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> >> > index bd87ec430d..8fb9172186 100644
> >> > --- a/xen/arch/arm/vpsci.c
> >> > +++ b/xen/arch/arm/vpsci.c
> >> > @@ -5,6 +5,7 @@
> >> >
> >> > #include <asm/current.h>
> >> > #include <asm/domain.h>
> >> > +#include <asm/suspend.h>
> >> > #include <asm/vgic.h>
> >> > #include <asm/vpsci.h>
> >> > #include <asm/event.h>
> >> > @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t
> >> > epoint, register_t cid)
> >> > if ( is_64bit_domain(d) && is_thumb )
> >> > return PSCI_INVALID_ADDRESS;
> >> >
> >> > - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> >> > - if ( is_hardware_domain(d) )
> >> > + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> >> > return PSCI_NOT_SUPPORTED;
> >> >
> >> > /* Ensure that all CPUs other than the calling one are offline */
> >> > @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t
> >> > epoint, register_t cid)
> >> > "SYSTEM_SUSPEND requested, epoint=%#"PRIregister",
> >> > cid=%#"PRIregister"\n",
> >> > epoint, cid);
> >> >
> >> > + if ( is_control_domain(d) )
> >>
> >> Why is_control_domain() here and not is_hardware_domain() ?
> >
> > The use of is_control_domain() is intentional.
> >
> > The intended model is that Xen-wide host suspend is orchestrated by the
> > privileged management/control domain. The control domain coordinates the
> > toolstack side, asks other domains to enter suspend, and then issues the
> > final
> > SYSTEM_SUSPEND request to Xen.
> >
> > This does not have to be the same entity as the hardware domain. If the
> > hardware domain is separate, it is one of the domains that the control
> > domain
> > parks before the final host suspend step.
> >
> > The hwdom-specific checks in this patch have a different purpose: they avoid
> > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> > domain to be parked as part of the suspend sequence. They do not define the
> > policy for who is allowed to trigger Xen-wide host suspend.
> >
> > That said, this policy may not be optimal for all configurations, especially
> > when the control and hardware domain roles are split. I would appreciate
> > your
> > view, as well as the maintainers' views, on whether the trigger should
> > remain
> > control-domain based, be tied to the hardware domain instead, or be
> > expressed
> > through a separate host-suspend capability/helper.
>
>
> Hardware domain owns all the hardware. Hardware shall be put to
> power-down/suspended state before suspending the SoC, so it can be
> resumed afterwards. You can't just pause hardware domain in the same way
> as pausing all other domains.
>
> (Of course, we'll have the same issues with domain that have
> passed-through hardware, but in this case Dom0 shall orchestrate proper
> suspend sequence for these)
Yes, I agree that the hardware domain must not be externally
paused as a replacement for its own suspend path.
What I meant to describe is a guest-driven suspend sequence.
The control domain/toolstack may orchestrate the sequence,
but each domain that needs to quiesce hardware, including
the hardware domain and any domain with passed-through
devices, is expected to enter its own suspend path first and
quiesce its devices before issuing the virtual PSCI
SYSTEM_SUSPEND call.
Xen only treats other domains as ready for host suspend after
they have voluntarily reached SHUTDOWN_suspend. In the split
control/hardware-domain case, the final host-wide suspend
request from the control domain is accepted only after the
other domains, including the hardware domain, are already in
SHUTDOWN_suspend.
So my wording saying that the control domain "parks" the
hardware domain was imprecise. The control domain orchestrates
the sequence; it does not externally pause the hardware domain
as a substitute for its own suspend path.
Best regards,
Mykola
>
> [...]
>
> --
> WBR, Volodymyr
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |