Xen project Mailing List

Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain

To: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

From: Mykola Kvach <xakep.amatop@xxxxxxxxx>

Date: Fri, 8 May 2026 11:37:02 +0300

Arc-authentication-results: i=1; mx.google.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=wpuQRbOQwrvqrL5TV1TLO/gtVPLeB51qcMTr2o4owZ8=; fh=+B/FOpuBiPJXqAXn9GQiahGpdb0p7rrV8Rb8Q45Y2iA=; b=Kx9dwFOmbQDSYUmbmyeIcBh4anQDYQtGEDDMHZ3OVLmVXfZfTkJy0DEQ+HRsP4CUSU BSoKa4Z0xFhlGt7rblpfFRA+7MW/k6s4Nyj2miGggNypepFA1tNr7zsV4mLMJSPCtZeM 7HCvVCuZZwNrptj9YFtD3GzLVrl0ElfhbfqcRDmnmWXdcCtuN1A/A4jot62MsEciRFpw WpciZURj537Jaf5r2pJ4LBPgG2nQ4revfoMoyWSl4MzLY42F2woNeIR/0OzbmB8fNTHb 55pJTFwTXWDZN6io5RrWimZKHxUzXrWNk6LT9r+Q2kPezF1Yin1qztD2DVzFOHP9b+td x51w==; darn=lists.xenproject.org

Arc-seal: i=1; a=rsa-sha256; t=1778229435; cv=none; d=google.com; s=arc-20240605; b=XEufQMKJj34/59AFkU1u6f7520oyhSl4DYJb94SS5Zp7G/0cWggtzNRuSdRrQdmB44 BZDfkS1yd0+2ruvTDKnWF+57tJuTHeyASY4Vow3pLAEe68rc0glySwz77tl71CSkTS8W 0d00IC5nu3ipM5vuqUkGo3QN+dsAMUSDwM3+KD8DendRGYRiI2ahjMz3+9Fc59DWUasa LOB+gIXBddDsOoXyOhANdttsIn36yfYXcxQIAi+5qjGDfiTvz6HGfpq5Fmz6ESaJjgbC 2hI6C2QdvQqRqRVVJT5kxIy9zh/tCA11V7Ja01HX6SwUqe62IEa/QXEvQuyxwd+cCqqd oQiQ==

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Content-Transfer-Encoding:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"

Cc: Luca Fancellu <Luca.Fancellu@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mykola Kvach <Mykola_Kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>

Delivery-date: Fri, 08 May 2026 08:37:19 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Volodymyr, Thank you for the feedback. On Fri, May 8, 2026 at 1:25 AM Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx> wrote: > > Hi Mykola, > > Mykola Kvach <xakep.amatop@xxxxxxxxx> writes: > > [...] > > >> > + status = can_system_suspend(); > >> > + if ( status ) > >> > + { > >> > + system_state = SYS_STATE_resume; > >> > + goto resume_scheduler; > >> > >> When we have an error and we get the resume_scheduler path, we apply back > >> the > >> context of the guest saved previously in do_psci_1_0_system_suspend(), so > >> am I > >> correct saying the guest won’t get any PSCI error back and we resume the > >> guest > >> from the guest resume entrypoint? > >> > >> In case, should we have a different path that returns a PSCI error > >> (PSCI_*) into the guest > >> x0, and skips the context restore? > > > > You are right about the current control flow: once the virtual > > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a > > later failure in the Xen-wide suspend path resumes the domain through the > > normal > > domain resume path, rather than returning a PSCI error from the original > > call. > > > > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND > > path parks the domain and saves its resume context. The actual Xen-wide host > > suspend is a separate step that is attempted only after all domains are > > suspended. > > > > So a failure in the later Xen-wide suspend step is treated as an abort of > > the > > host suspend attempt after the domain suspend was already accepted. The > > domain > > is then resumed through the existing domain resume path, similarly to the > > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI > > call path and modifying the saved vCPU context again. > > > > I agree this design is not obvious from the patch. I will clarify the commit > > message and comments. If you or the maintainers think that failures before > > the > > physical SYSTEM_SUSPEND call succeeds should be reported back through the > > original virtual PSCI call, then this would require a different flow. I was > > trying to avoid that extra complexity in this series. > > I think that there is no sense to reporting an error back to guest. PSCI > allows resume at any stage, so it is acceptable to have such brief "suspend" > > > > >> > >> > + } > >> > + > >> > + /* > >> > + * Non-boot CPUs have to be disabled on suspend and enabled on > >> > resume > >> > + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to > >> > PSCI > >> > + * CPU_OFF to be called by each non-boot CPU. Depending on the > >> > underlying > >> > + * platform capabilities, this may lead to the physical powering > >> > down of > >> > + * CPUs. > >> > + */ > >> > + status = disable_nonboot_cpus(); > >> > + if ( status ) > >> > + { > >> > + system_state = SYS_STATE_resume; > >> > + goto resume_nonboot_cpus; > >> > + } > >> > + > >> > + time_suspend(); > >> > + > >> > + status = iommu_suspend(); > >> > + if ( status ) > >> > + { > >> > + system_state = SYS_STATE_resume; > >> > + goto resume_time; > >> > + } > >> > + > >> > + console_start_sync(); > >> > + status = console_suspend(); > >> > + if ( status ) > >> > + { > >> > + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", > >> > status); > >> > + system_state = SYS_STATE_resume; > >> > + goto resume_end_sync; > >> > + } > >> > + > >> > + local_irq_save(flags); > >> > + status = gic_suspend(); > >> > + if ( status ) > >> > + { > >> > + system_state = SYS_STATE_resume; > >> > + goto resume_irqs; > >> > + } > >> > + > >> > + set_init_ttbr(xen_pgtable); > >> > + > >> > + /* > >> > + * Enable identity mapping before entering suspend to simplify > >> > + * the resume path > >> > + */ > >> > + update_boot_mapping(true); > >> > + > >> > + if ( prepare_resume_ctx(&cpu_context) ) > >> > + { > >> > + status = call_psci_system_suspend(); > >> > + /* > >> > + * If suspend is finalized properly by above system suspend > >> > PSCI call, > >> > + * the code below in this 'if' branch will never execute. > >> > Execution > >> > + * will continue from hyp_resume which is the hypervisor's > >> > resume point. > >> > + * In hyp_resume CPU context will be restored and since > >> > link-register is > >> > + * restored as well, it will appear to return from > >> > prepare_resume_ctx. > >> > + * The difference in returning from prepare_resume_ctx on > >> > system suspend > >> > + * versus resume is in function's return value: on suspend, the > >> > return > >> > + * value is a non-zero value, on resume it is zero. That is why > >> > the > >> > + * control flow will not re-enter this 'if' branch on resume. > >> > + */ > >> > + if ( status ) > >> > + dprintk(XENLOG_WARNING, "PSCI system suspend failed, > >> > err=%d\n", > >> > + status); > >> > + } > >> > + > >> > + system_state = SYS_STATE_resume; > >> > + update_boot_mapping(false); > >> > + > >> > + gic_resume(); > >> > + > >> > + resume_irqs: > >> > + local_irq_restore(flags); > >> > + > >> > + console_resume(); > >> > + resume_end_sync: > >> > + console_end_sync(); > >> > + > >> > + iommu_resume(); > >> > + > >> > + resume_time: > >> > + time_resume(); > >> > + > >> > + resume_nonboot_cpus: > >> > + /* > >> > + * The rcu_barrier() has to be added to ensure that the per cpu > >> > area is > >> > + * freed before a non-boot CPU tries to initialize it > >> > (_free_percpu_area() > >> > + * has to be called before the init_percpu_area()). This scenario > >> > occurs > >> > + * when non-boot CPUs are hot-unplugged on suspend and hotplugged > >> > on resume. > >> > + */ > >> > + rcu_barrier(); > >> > + enable_nonboot_cpus(); > >> > + > >> > + resume_scheduler: > >> > + scheduler_enable(); > >> > + thaw_domains(); > >> > + > >> > + system_state = SYS_STATE_active; > >> > + > >> > + printk("Resume (status %d)\n", status); > >> > + > >> > + domain_resume(d); > >> > +} > >> > + > >> > +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL); > >> > + > >> > +void host_system_suspend(struct domain *d) > >> > +{ > >> > + system_suspend_tasklet.data = (void *)d; > >> > + /* > >> > + * The suspend procedure has to be finalized by the pCPU#0 > >> > (non-boot pCPUs > >> > + * will be disabled during the suspend). > >> > + */ > >> > + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0); > >> > +} > >> > + > >> > /* > >> > * Local variables: > >> > * mode: C > >> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c > >> > index bd87ec430d..8fb9172186 100644 > >> > --- a/xen/arch/arm/vpsci.c > >> > +++ b/xen/arch/arm/vpsci.c > >> > @@ -5,6 +5,7 @@ > >> > > >> > #include <asm/current.h> > >> > #include <asm/domain.h> > >> > +#include <asm/suspend.h> > >> > #include <asm/vgic.h> > >> > #include <asm/vpsci.h> > >> > #include <asm/event.h> > >> > @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t > >> > epoint, register_t cid) > >> > if ( is_64bit_domain(d) && is_thumb ) > >> > return PSCI_INVALID_ADDRESS; > >> > > >> > - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */ > >> > - if ( is_hardware_domain(d) ) > >> > + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) ) > >> > return PSCI_NOT_SUPPORTED; > >> > > >> > /* Ensure that all CPUs other than the calling one are offline */ > >> > @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t > >> > epoint, register_t cid) > >> > "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", > >> > cid=%#"PRIregister"\n", > >> > epoint, cid); > >> > > >> > + if ( is_control_domain(d) ) > >> > >> Why is_control_domain() here and not is_hardware_domain() ? > > > > The use of is_control_domain() is intentional. > > > > The intended model is that Xen-wide host suspend is orchestrated by the > > privileged management/control domain. The control domain coordinates the > > toolstack side, asks other domains to enter suspend, and then issues the > > final > > SYSTEM_SUSPEND request to Xen. > > > > This does not have to be the same entity as the hardware domain. If the > > hardware domain is separate, it is one of the domains that the control > > domain > > parks before the final host suspend step. > > > > The hwdom-specific checks in this patch have a different purpose: they avoid > > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware > > domain to be parked as part of the suspend sequence. They do not define the > > policy for who is allowed to trigger Xen-wide host suspend. > > > > That said, this policy may not be optimal for all configurations, especially > > when the control and hardware domain roles are split. I would appreciate > > your > > view, as well as the maintainers' views, on whether the trigger should > > remain > > control-domain based, be tied to the hardware domain instead, or be > > expressed > > through a separate host-suspend capability/helper. > > > Hardware domain owns all the hardware. Hardware shall be put to > power-down/suspended state before suspending the SoC, so it can be > resumed afterwards. You can't just pause hardware domain in the same way > as pausing all other domains. > > (Of course, we'll have the same issues with domain that have > passed-through hardware, but in this case Dom0 shall orchestrate proper > suspend sequence for these) Yes, I agree that the hardware domain must not be externally paused as a replacement for its own suspend path. What I meant to describe is a guest-driven suspend sequence. The control domain/toolstack may orchestrate the sequence, but each domain that needs to quiesce hardware, including the hardware domain and any domain with passed-through devices, is expected to enter its own suspend path first and quiesce its devices before issuing the virtual PSCI SYSTEM_SUSPEND call. Xen only treats other domains as ready for host suspend after they have voluntarily reached SHUTDOWN_suspend. In the split control/hardware-domain case, the final host-wide suspend request from the control domain is accepted only after the other domains, including the hardware domain, are already in SHUTDOWN_suspend. So my wording saying that the control domain "parks" the hardware domain was imprecise. The control domain orchestrates the sequence; it does not externally pause the hardware domain as a substitute for its own suspend path. Best regards, Mykola > > [...] > > -- > WBR, Volodymyr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.