Xen project Mailing List

Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain

To: Luca Fancellu <Luca.Fancellu@xxxxxxx>

From: Mykola Kvach <xakep.amatop@xxxxxxxxx>

Date: Fri, 8 May 2026 23:49:13 +0300

Arc-authentication-results: i=1; mx.google.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=w2Ts6xCNp5C0CnS/K72bmyoG4TpDMeFoFiGVyriUgr4=; fh=eEq7B6rg7ssf3icSookiwOw/XyVLUuCIMVm7gUWrf/I=; b=RrM2/BL83Q/3pp05YkVIvRrj3xnbnJKI2wAJ7/18oDJ+Fxxvz0TWbDWHuY4XV6ABaK o3gwk/xRse/2JRUN1CgQ8Lkqw4j33K5E9GgMIJcvlIqvtqTlX5a7YR6/kO+0I6TI5vv9 5snW7LNV6xWLhbeqj3NJ0gec90U22nJtAQBgmiriDTBqBlI07avujzTBZjOIoLZt46u/ NMM7rOA9OLIXCP93EbLAJyAUOBhfHVLAqxq9nqRGv2yEVR+O8ZcnlIB7jK6DDW25XToB 1C18Uxa7pHe1LtPY7Akw/N6QwcbGui2vTtOYEcgrpQP7V99VRJ7yaRVwnpVZ6oBlbiar P+Qw==; darn=lists.xenproject.org

Arc-seal: i=1; a=rsa-sha256; t=1778273365; cv=none; d=google.com; s=arc-20240605; b=jChQCcQvG+DQy0UP7icr+wS2Sgo4PHEevts1xhdxIj5qFrXGKrDUsrQ37h+coRWfs/ o6pHQXPuOipUQVphuyt3mmSsZwZyMXhSDxriHo6cmtWUgBrjNgK4y+r6frxcuj5Zpa1I An0yrpeWJtM2fpVDkls8To5AIH3BFj9f2z5Mzz526CeCngGiJGWtVGbmtYzAuwDIvzWe GDZvnRy2h5ELFmAKTjsmvB9RlFKPCG5IndgChyup1mdIzr0VrQUo7TyJlCyNHu07h/ht HZrVNhuR2uUoFOspaxtV2PqE0TP1nzi6HcEdQy8zYZ5zkNVdHMWMCeW9nxKuALgq/PV2 670A==

Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Content-Transfer-Encoding:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>

Delivery-date: Fri, 08 May 2026 20:49:35 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, May 8, 2026 at 5:31 PM Luca Fancellu <Luca.Fancellu@xxxxxxx> wrote: > > Hi Mykola, > > >>> +/* Xen suspend. data identifies the domain that initiated suspend. */ > >>> +static void system_suspend(void *data) > >>> +{ > >>> + int status; > >>> + unsigned long flags; > >>> + struct domain *d = (struct domain *)data; > >>> + > >>> + BUG_ON(system_state != SYS_STATE_active); > >>> + > >>> + system_state = SYS_STATE_suspend; > >>> + > >>> + printk("Xen suspending...\n"); > >>> + > >>> + freeze_domains(); > >>> + scheduler_disable(); > >>> + > >>> + status = can_system_suspend(); > >>> + if ( status ) > >>> + { > >>> + system_state = SYS_STATE_resume; > >>> + goto resume_scheduler; > >> > >> When we have an error and we get the resume_scheduler path, we apply back > >> the > >> context of the guest saved previously in do_psci_1_0_system_suspend(), so > >> am I > >> correct saying the guest won’t get any PSCI error back and we resume the > >> guest > >> from the guest resume entrypoint? > >> > >> In case, should we have a different path that returns a PSCI error > >> (PSCI_*) into the guest > >> x0, and skips the context restore? > > > > You are right about the current control flow: once the virtual > > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a > > later failure in the Xen-wide suspend path resumes the domain through the > > normal > > domain resume path, rather than returning a PSCI error from the original > > call. > > > > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND > > path parks the domain and saves its resume context. The actual Xen-wide host > > suspend is a separate step that is attempted only after all domains are > > suspended. > > > > So a failure in the later Xen-wide suspend step is treated as an abort of > > the > > host suspend attempt after the domain suspend was already accepted. The > > domain > > is then resumed through the existing domain resume path, similarly to the > > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI > > call path and modifying the saved vCPU context again. > > > > I agree this design is not obvious from the patch. I will clarify the commit > > message and comments. If you or the maintainers think that failures before > > the > > physical SYSTEM_SUSPEND call succeeds should be reported back through the > > original virtual PSCI call, then this would require a different flow. I was > > trying to avoid that extra complexity in this series. > > Ok I understand, I’m wondering if inside do_psci_1_0_system_suspend() we > could do something > like: > > […] > if ( is_control_domain(d) && !other_domains_ready_for_suspend(d) ) > return PSCI_DENIED; Yes. I have reworked this locally and will include it in v9. The control-domain readiness check is now done in the vPSCI SYSTEM_SUSPEND path before building the guest resume context and before calling domain_shutdown(..., SHUTDOWN_suspend), so in that case the call returns PSCI_DENIED early rather than parking the domain first. > > […] > > But I’m ok also to only document this behaviour. > > > >>> > >>> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c > >>> index bd87ec430d..8fb9172186 100644 > >>> --- a/xen/arch/arm/vpsci.c > >>> +++ b/xen/arch/arm/vpsci.c > >>> @@ -5,6 +5,7 @@ > >>> > >>> #include <asm/current.h> > >>> #include <asm/domain.h> > >>> +#include <asm/suspend.h> > >>> #include <asm/vgic.h> > >>> #include <asm/vpsci.h> > >>> #include <asm/event.h> > >>> @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t > >>> epoint, register_t cid) > >>> if ( is_64bit_domain(d) && is_thumb ) > >>> return PSCI_INVALID_ADDRESS; > >>> > >>> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */ > >>> - if ( is_hardware_domain(d) ) > >>> + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) ) > >>> return PSCI_NOT_SUPPORTED; > >>> > >>> /* Ensure that all CPUs other than the calling one are offline */ > >>> @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t > >>> epoint, register_t cid) > >>> "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", > >>> cid=%#"PRIregister"\n", > >>> epoint, cid); > >>> > >>> + if ( is_control_domain(d) ) > >> > >> Why is_control_domain() here and not is_hardware_domain() ? > > > > The use of is_control_domain() is intentional. > > > > The intended model is that Xen-wide host suspend is orchestrated by the > > privileged management/control domain. The control domain coordinates the > > toolstack side, asks other domains to enter suspend, and then issues the > > final > > SYSTEM_SUSPEND request to Xen. > > > > This does not have to be the same entity as the hardware domain. If the > > hardware domain is separate, it is one of the domains that the control > > domain > > parks before the final host suspend step. > > > > The hwdom-specific checks in this patch have a different purpose: they avoid > > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware > > domain to be parked as part of the suspend sequence. They do not define the > > policy for who is allowed to trigger Xen-wide host suspend. > > > > That said, this policy may not be optimal for all configurations, especially > > when the control and hardware domain roles are split. I would appreciate > > your > > view, as well as the maintainers' views, on whether the trigger should > > remain > > control-domain based, be tied to the hardware domain instead, or be > > expressed > > through a separate host-suspend capability/helper. > > In the commit message and title I saw HW domain, so maybe the commit should > be updated > to say control domain instead? Yes, that was stale wording from an older version. I have fixed it locally for v9. > > At this point however I’m wondering about this code above: > ``` > if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) ) > return PSCI_NOT_SUPPORTED; > ``` > and in do_psci_1_0_features(), shouldn’t we use consistently > is_control_domain()? Yes. I have reworked this locally so the policy is now explicit and consistent. The control domain is the only domain whose SYSTEM_SUSPEND request may drive the Xen-wide host suspend path. For that domain, both PSCI_FEATURES(SYSTEM_SUSPEND) and the real SYSTEM_SUSPEND path now consult the same helper, so the advertised capability and the execution path stay consistent. That helper is an explicit host-suspend capability check: it requires firmware support for PSCI SYSTEM_SUSPEND and it also keeps host suspend disabled when Xen detects a missing suspend/resume path in required host-side components. For a non-control hardware domain, SYSTEM_SUSPEND remains a virtual guest suspend operation and does not by itself trigger Xen-wide host suspend. Other guests keep the existing virtual behaviour as well. > > > > >> > >>> + host_system_suspend(d); > >>> + > >>> return rc; > >>> } > >>> > >>> @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t > >>> psci_func_id) > >>> return 0; > >>> case PSCI_1_0_FN32_SYSTEM_SUSPEND: > >>> case PSCI_1_0_FN64_SYSTEM_SUSPEND: > >>> - return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED > >>> : 0; > >>> + if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) || > >>> + !is_hardware_domain(current->domain) ) > >> > >> Should this have also the condition that “is hardware domain and psci_ver > >> >= PSCI_VERSION(1, 0)”? > >> Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK > >> here but the call would > >> fail later in call_psci_system_suspend()? > > > > Good point. > > > > I agree that, for the domain allowed to trigger Xen-wide suspend, Xen should > > not advertise SYSTEM_SUSPEND if the host suspend path cannot be used. > > > > I think this should be checked as an explicit host SYSTEM_SUSPEND > > capability, > > rather than only as psci_ver >= PSCI_VERSION(1, 0). The same capability > > check > > also needs to be enforced in the actual SYSTEM_SUSPEND handler before > > parking > > the domain, because a caller may invoke SYSTEM_SUSPEND directly without > > first > > querying PSCI_FEATURES. > > > > For ordinary guests, the physical PSCI version is not relevant because they > > cannot trigger host suspend; their SYSTEM_SUSPEND path is virtual. > > > > I will make this consistent in v9: PSCI_FEATURES will advertise > > SYSTEM_SUSPEND > > for the host-suspend-triggering domain only when the host SYSTEM_SUSPEND > > backend > > is available, and the actual SYSTEM_SUSPEND path will enforce the same > > check. > > ok I have reworked this locally accordingly. Rather than open-coding a psci_ver >= PSCI_VERSION(1, 0) test, the new code uses an explicit host SYSTEM_SUSPEND capability predicate. For the control domain, PSCI_FEATURES(SYSTEM_SUSPEND) and the actual SYSTEM_SUSPEND handler now share that same check, so they stay consistent even when host suspend is blocked by firmware capability or by Xen-side runtime suspend/resume limitations. > > >>> > >>> diff --git a/xen/drivers/passthrough/arm/smmu.c > >>> b/xen/drivers/passthrough/arm/smmu.c > >>> index 22d306d0cb..45f29ef8ec 100644 > >>> --- a/xen/drivers/passthrough/arm/smmu.c > >>> +++ b/xen/drivers/passthrough/arm/smmu.c > >>> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct > >>> domain *d) > >>> xfree(xen_domain); > >>> } > >>> > >>> +#ifdef CONFIG_SYSTEM_SUSPEND > >>> +static int arm_smmu_suspend(void) > >>> +{ > >>> + return -ENOSYS; > >>> +} > >>> +#endif > >> > >> Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait > >> for the maintainers > >> view on this. > > > > I feel that gating this strictly on !CONFIG_ARM_SMMU might not be the most > > optimal approach here. > > > > CONFIG_ARM_SMMU is a build-time option and does not mean that an old > > SMMUv1/v2 > > device is actually present. Using it would disable system suspend even on > > platforms where only SMMUv3 is used, because CONFIG_ARM_SMMU is enabled by > > default for Arm. > > > > The condition should be runtime-based: whether the active/probed IOMMU > > devices > > have system suspend/resume support. For the old ARM SMMU driver this is not > > implemented today, so a platform with an SMMUv1/v2 instance should not > > expose > > or attempt host suspend. > > > > I think we should handle this by tracking whether any old ARM SMMUv1/v2 > > device > > was actually probed, or by adding a generic IOMMU suspend capability check. > > Then > > the host suspend availability check can reject system suspend only when > > such an > > unsupported IOMMU is present, instead of disabling the feature for all > > Arm builds > > with CONFIG_ARM_SMMU enabled. > > > > I would be interested to hear if you or the maintainers see a better way to > > express this capability. > > ok, let’s address Jan comment now and we can see what the maintainers think > about this. Ack, thanks. I have already reworked the Jan-related part locally for v9: the control-domain readiness/capability checks now happen in the vPSCI SYSTEM_SUSPEND path before the domain finishes suspend, and PSCI_FEATURES plus the real call now use the same host-suspend capability predicate. I’ll fold that into the next revision, and then we can see what the maintainers prefer for the remaining runtime-gating details. Best regards, Mykola > > Cheers, > Luca > >

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.