Xen project Mailing List

Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain

To: Mykola Kvach <xakep.amatop@xxxxxxxxx>

From: Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Date: Thu, 7 May 2026 22:25:46 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=epam.com; dmarc=pass action=none header.from=epam.com; dkim=pass header.d=epam.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8CYKkN6OC4hgWfFZ/Kcm5rwvZIFRl93FxEWOgd76wko=; b=yG4zRAJT16jQ9wckrEVQiVcvLreWEcBJ5qB6Bl5K3ljgVOv4/vw1/Jg8j6Mi7BnO/1UsRpYpAQRLkGUXhqdTuXIxPVRoMmccGdC+Y24oNAox0fk/NmDrw2+07kZoLAAYF/1dsOLK7Pd/Ny2Nep6xyXHWwupbOTpV8Tu3gwD2BmYCU/XRANtRCpx3+62oi6Qe78a7kmNYJsEHag2V4VVEocC+yb00HPaCRFkE00dJhqoq4MRhHCCaE977tEaEtcbgQfnrTcDH7Z2cTO/BrZgvqn3MkeTYiBH5whWi5Qr2tLcXOGfaO1xW+Aa1C5SL330X1VT7vzlMpIPIqhz8+2W23w==

Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LPq+yMJiIueGkPQiBFuG4TJoXDBAPJFKnyYADOSTMpkLwrCr1PZjdcV5otB0TzHbvowLacGdSAynbC0fKjNZXBtSvf0KZmHtyX2uM1O9uZ75THJbOHlP9+OuUKOOsPc3OfdrsywZAkzibPkiO9b7GKQz9eOdgRjAPuUBSlW50GyMV5lyQefob22TDIrBPrxSCywAjHeTj++BMibZk2I47EDiaEwQUf7nrbtWUiRw5LcTwtv2Fudwf5/978hej6IggJBIDXt6FpdGmqpv3JWjf8dGkXXs8slLItBL+0g8SOXeL0Z2AJRO+tt/RDJbK43R0OSsc+IF/3YGFCvoCb/+fA==

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=selector1 header.d=epam.com header.i="@epam.com" header.h="From:Date:Subject:Message-ID:Content-Type:MIME-Version:x-ms-exchange-senderadcheck"

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=epam.com;

Cc: Luca Fancellu <Luca.Fancellu@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mykola Kvach <Mykola_Kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>

Delivery-date: Thu, 07 May 2026 22:25:58 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQHcwo4kFiAy6Uhk9ESItMHuQ9a0Lg==

Thread-topic: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain

Hi Mykola, Mykola Kvach <xakep.amatop@xxxxxxxxx> writes: [...] >> > + status = can_system_suspend(); >> > + if ( status ) >> > + { >> > + system_state = SYS_STATE_resume; >> > + goto resume_scheduler; >> >> When we have an error and we get the resume_scheduler path, we apply back the >> context of the guest saved previously in do_psci_1_0_system_suspend(), so am >> I >> correct saying the guest won’t get any PSCI error back and we resume the >> guest >> from the guest resume entrypoint? >> >> In case, should we have a different path that returns a PSCI error (PSCI_*) >> into the guest >> x0, and skips the context restore? > > You are right about the current control flow: once the virtual > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a > later failure in the Xen-wide suspend path resumes the domain through the > normal > domain resume path, rather than returning a PSCI error from the original call. > > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND > path parks the domain and saves its resume context. The actual Xen-wide host > suspend is a separate step that is attempted only after all domains are > suspended. > > So a failure in the later Xen-wide suspend step is treated as an abort of the > host suspend attempt after the domain suspend was already accepted. The domain > is then resumed through the existing domain resume path, similarly to the > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI > call path and modifying the saved vCPU context again. > > I agree this design is not obvious from the patch. I will clarify the commit > message and comments. If you or the maintainers think that failures before the > physical SYSTEM_SUSPEND call succeeds should be reported back through the > original virtual PSCI call, then this would require a different flow. I was > trying to avoid that extra complexity in this series. I think that there is no sense to reporting an error back to guest. PSCI allows resume at any stage, so it is acceptable to have such brief "suspend" > >> >> > + } >> > + >> > + /* >> > + * Non-boot CPUs have to be disabled on suspend and enabled on resume >> > + * (hotplug-based mechanism). Disabling non-boot CPUs will lead to >> > PSCI >> > + * CPU_OFF to be called by each non-boot CPU. Depending on the >> > underlying >> > + * platform capabilities, this may lead to the physical powering down >> > of >> > + * CPUs. >> > + */ >> > + status = disable_nonboot_cpus(); >> > + if ( status ) >> > + { >> > + system_state = SYS_STATE_resume; >> > + goto resume_nonboot_cpus; >> > + } >> > + >> > + time_suspend(); >> > + >> > + status = iommu_suspend(); >> > + if ( status ) >> > + { >> > + system_state = SYS_STATE_resume; >> > + goto resume_time; >> > + } >> > + >> > + console_start_sync(); >> > + status = console_suspend(); >> > + if ( status ) >> > + { >> > + dprintk(XENLOG_ERR, "Failed to suspend the console, err=%d\n", >> > status); >> > + system_state = SYS_STATE_resume; >> > + goto resume_end_sync; >> > + } >> > + >> > + local_irq_save(flags); >> > + status = gic_suspend(); >> > + if ( status ) >> > + { >> > + system_state = SYS_STATE_resume; >> > + goto resume_irqs; >> > + } >> > + >> > + set_init_ttbr(xen_pgtable); >> > + >> > + /* >> > + * Enable identity mapping before entering suspend to simplify >> > + * the resume path >> > + */ >> > + update_boot_mapping(true); >> > + >> > + if ( prepare_resume_ctx(&cpu_context) ) >> > + { >> > + status = call_psci_system_suspend(); >> > + /* >> > + * If suspend is finalized properly by above system suspend PSCI >> > call, >> > + * the code below in this 'if' branch will never execute. >> > Execution >> > + * will continue from hyp_resume which is the hypervisor's resume >> > point. >> > + * In hyp_resume CPU context will be restored and since >> > link-register is >> > + * restored as well, it will appear to return from >> > prepare_resume_ctx. >> > + * The difference in returning from prepare_resume_ctx on system >> > suspend >> > + * versus resume is in function's return value: on suspend, the >> > return >> > + * value is a non-zero value, on resume it is zero. That is why >> > the >> > + * control flow will not re-enter this 'if' branch on resume. >> > + */ >> > + if ( status ) >> > + dprintk(XENLOG_WARNING, "PSCI system suspend failed, >> > err=%d\n", >> > + status); >> > + } >> > + >> > + system_state = SYS_STATE_resume; >> > + update_boot_mapping(false); >> > + >> > + gic_resume(); >> > + >> > + resume_irqs: >> > + local_irq_restore(flags); >> > + >> > + console_resume(); >> > + resume_end_sync: >> > + console_end_sync(); >> > + >> > + iommu_resume(); >> > + >> > + resume_time: >> > + time_resume(); >> > + >> > + resume_nonboot_cpus: >> > + /* >> > + * The rcu_barrier() has to be added to ensure that the per cpu area >> > is >> > + * freed before a non-boot CPU tries to initialize it >> > (_free_percpu_area() >> > + * has to be called before the init_percpu_area()). This scenario >> > occurs >> > + * when non-boot CPUs are hot-unplugged on suspend and hotplugged on >> > resume. >> > + */ >> > + rcu_barrier(); >> > + enable_nonboot_cpus(); >> > + >> > + resume_scheduler: >> > + scheduler_enable(); >> > + thaw_domains(); >> > + >> > + system_state = SYS_STATE_active; >> > + >> > + printk("Resume (status %d)\n", status); >> > + >> > + domain_resume(d); >> > +} >> > + >> > +static DECLARE_TASKLET(system_suspend_tasklet, system_suspend, NULL); >> > + >> > +void host_system_suspend(struct domain *d) >> > +{ >> > + system_suspend_tasklet.data = (void *)d; >> > + /* >> > + * The suspend procedure has to be finalized by the pCPU#0 (non-boot >> > pCPUs >> > + * will be disabled during the suspend). >> > + */ >> > + tasklet_schedule_on_cpu(&system_suspend_tasklet, 0); >> > +} >> > + >> > /* >> > * Local variables: >> > * mode: C >> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c >> > index bd87ec430d..8fb9172186 100644 >> > --- a/xen/arch/arm/vpsci.c >> > +++ b/xen/arch/arm/vpsci.c >> > @@ -5,6 +5,7 @@ >> > >> > #include <asm/current.h> >> > #include <asm/domain.h> >> > +#include <asm/suspend.h> >> > #include <asm/vgic.h> >> > #include <asm/vpsci.h> >> > #include <asm/event.h> >> > @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t >> > epoint, register_t cid) >> > if ( is_64bit_domain(d) && is_thumb ) >> > return PSCI_INVALID_ADDRESS; >> > >> > - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */ >> > - if ( is_hardware_domain(d) ) >> > + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) ) >> > return PSCI_NOT_SUPPORTED; >> > >> > /* Ensure that all CPUs other than the calling one are offline */ >> > @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t >> > epoint, register_t cid) >> > "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", >> > cid=%#"PRIregister"\n", >> > epoint, cid); >> > >> > + if ( is_control_domain(d) ) >> >> Why is_control_domain() here and not is_hardware_domain() ? > > The use of is_control_domain() is intentional. > > The intended model is that Xen-wide host suspend is orchestrated by the > privileged management/control domain. The control domain coordinates the > toolstack side, asks other domains to enter suspend, and then issues the final > SYSTEM_SUSPEND request to Xen. > > This does not have to be the same entity as the hardware domain. If the > hardware domain is separate, it is one of the domains that the control domain > parks before the final host suspend step. > > The hwdom-specific checks in this patch have a different purpose: they avoid > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware > domain to be parked as part of the suspend sequence. They do not define the > policy for who is allowed to trigger Xen-wide host suspend. > > That said, this policy may not be optimal for all configurations, especially > when the control and hardware domain roles are split. I would appreciate your > view, as well as the maintainers' views, on whether the trigger should remain > control-domain based, be tied to the hardware domain instead, or be expressed > through a separate host-suspend capability/helper. Hardware domain owns all the hardware. Hardware shall be put to power-down/suspended state before suspending the SoC, so it can be resumed afterwards. You can't just pause hardware domain in the same way as pausing all other domains. (Of course, we'll have the same issues with domain that have passed-through hardware, but in this case Dom0 shall orchestrate proper suspend sequence for these) [...] -- WBR, Volodymyr

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.