[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain


  • To: Luca Fancellu <Luca.Fancellu@xxxxxxx>
  • From: Mykola Kvach <xakep.amatop@xxxxxxxxx>
  • Date: Fri, 8 May 2026 23:49:13 +0300
  • Arc-authentication-results: i=1; mx.google.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=w2Ts6xCNp5C0CnS/K72bmyoG4TpDMeFoFiGVyriUgr4=; fh=eEq7B6rg7ssf3icSookiwOw/XyVLUuCIMVm7gUWrf/I=; b=RrM2/BL83Q/3pp05YkVIvRrj3xnbnJKI2wAJ7/18oDJ+Fxxvz0TWbDWHuY4XV6ABaK o3gwk/xRse/2JRUN1CgQ8Lkqw4j33K5E9GgMIJcvlIqvtqTlX5a7YR6/kO+0I6TI5vv9 5snW7LNV6xWLhbeqj3NJ0gec90U22nJtAQBgmiriDTBqBlI07avujzTBZjOIoLZt46u/ NMM7rOA9OLIXCP93EbLAJyAUOBhfHVLAqxq9nqRGv2yEVR+O8ZcnlIB7jK6DDW25XToB 1C18Uxa7pHe1LtPY7Akw/N6QwcbGui2vTtOYEcgrpQP7V99VRJ7yaRVwnpVZ6oBlbiar P+Qw==; darn=lists.xenproject.org
  • Arc-seal: i=1; a=rsa-sha256; t=1778273365; cv=none; d=google.com; s=arc-20240605; b=jChQCcQvG+DQy0UP7icr+wS2Sgo4PHEevts1xhdxIj5qFrXGKrDUsrQ37h+coRWfs/ o6pHQXPuOipUQVphuyt3mmSsZwZyMXhSDxriHo6cmtWUgBrjNgK4y+r6frxcuj5Zpa1I An0yrpeWJtM2fpVDkls8To5AIH3BFj9f2z5Mzz526CeCngGiJGWtVGbmtYzAuwDIvzWe GDZvnRy2h5ELFmAKTjsmvB9RlFKPCG5IndgChyup1mdIzr0VrQUo7TyJlCyNHu07h/ht HZrVNhuR2uUoFOspaxtV2PqE0TP1nzi6HcEdQy8zYZ5zkNVdHMWMCeW9nxKuALgq/PV2 670A==
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=20251104 header.d=gmail.com header.i="@gmail.com" header.h="Content-Transfer-Encoding:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References:MIME-Version"
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Mykola Kvach <mykola_kvach@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Rahul Singh <Rahul.Singh@xxxxxxx>
  • Delivery-date: Fri, 08 May 2026 20:49:35 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, May 8, 2026 at 5:31 PM Luca Fancellu <Luca.Fancellu@xxxxxxx> wrote:
>
> Hi Mykola,
>
> >>> +/* Xen suspend. data identifies the domain that initiated suspend. */
> >>> +static void system_suspend(void *data)
> >>> +{
> >>> +    int status;
> >>> +    unsigned long flags;
> >>> +    struct domain *d = (struct domain *)data;
> >>> +
> >>> +    BUG_ON(system_state != SYS_STATE_active);
> >>> +
> >>> +    system_state = SYS_STATE_suspend;
> >>> +
> >>> +    printk("Xen suspending...\n");
> >>> +
> >>> +    freeze_domains();
> >>> +    scheduler_disable();
> >>> +
> >>> +    status = can_system_suspend();
> >>> +    if ( status )
> >>> +    {
> >>> +        system_state = SYS_STATE_resume;
> >>> +        goto resume_scheduler;
> >>
> >> When we have an error and we get the resume_scheduler path, we apply back 
> >> the
> >> context of the guest saved previously in do_psci_1_0_system_suspend(), so 
> >> am I
> >> correct saying the guest won’t get any PSCI error back and we resume the 
> >> guest
> >> from the guest resume entrypoint?
> >>
> >> In case, should we have a different path that returns a PSCI error 
> >> (PSCI_*) into the guest
> >> x0, and skips the context restore?
> >
> > You are right about the current control flow: once the virtual
> > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> > later failure in the Xen-wide suspend path resumes the domain through the 
> > normal
> > domain resume path, rather than returning a PSCI error from the original 
> > call.
> >
> > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> > path parks the domain and saves its resume context. The actual Xen-wide host
> > suspend is a separate step that is attempted only after all domains are
> > suspended.
> >
> > So a failure in the later Xen-wide suspend step is treated as an abort of 
> > the
> > host suspend attempt after the domain suspend was already accepted. The 
> > domain
> > is then resumed through the existing domain resume path, similarly to the
> > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> > call path and modifying the saved vCPU context again.
> >
> > I agree this design is not obvious from the patch. I will clarify the commit
> > message and comments. If you or the maintainers think that failures before 
> > the
> > physical SYSTEM_SUSPEND call succeeds should be reported back through the
> > original virtual PSCI call, then this would require a different flow. I was
> > trying to avoid that extra complexity in this series.
>
> Ok I understand, I’m wondering if inside do_psci_1_0_system_suspend() we 
> could do something
> like:
>
> […]
> if ( is_control_domain(d) && !other_domains_ready_for_suspend(d) )
>     return PSCI_DENIED;

Yes. I have reworked this locally and will include it in v9.

The control-domain readiness check is now done in the vPSCI SYSTEM_SUSPEND
path before building the guest resume context and before calling
domain_shutdown(..., SHUTDOWN_suspend), so in that case the call returns
PSCI_DENIED early rather than parking the domain first.

>
> […]
>
> But I’m ok also to only document this behaviour.
>
>
> >>>
> >>> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> >>> index bd87ec430d..8fb9172186 100644
> >>> --- a/xen/arch/arm/vpsci.c
> >>> +++ b/xen/arch/arm/vpsci.c
> >>> @@ -5,6 +5,7 @@
> >>>
> >>> #include <asm/current.h>
> >>> #include <asm/domain.h>
> >>> +#include <asm/suspend.h>
> >>> #include <asm/vgic.h>
> >>> #include <asm/vpsci.h>
> >>> #include <asm/event.h>
> >>> @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t 
> >>> epoint, register_t cid)
> >>>    if ( is_64bit_domain(d) && is_thumb )
> >>>        return PSCI_INVALID_ADDRESS;
> >>>
> >>> -    /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> >>> -    if ( is_hardware_domain(d) )
> >>> +    if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> >>>        return PSCI_NOT_SUPPORTED;
> >>>
> >>>    /* Ensure that all CPUs other than the calling one are offline */
> >>> @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t 
> >>> epoint, register_t cid)
> >>>            "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", 
> >>> cid=%#"PRIregister"\n",
> >>>            epoint, cid);
> >>>
> >>> +    if ( is_control_domain(d) )
> >>
> >> Why is_control_domain() here and not is_hardware_domain() ?
> >
> > The use of is_control_domain() is intentional.
> >
> > The intended model is that Xen-wide host suspend is orchestrated by the
> > privileged management/control domain. The control domain coordinates the
> > toolstack side, asks other domains to enter suspend, and then issues the 
> > final
> > SYSTEM_SUSPEND request to Xen.
> >
> > This does not have to be the same entity as the hardware domain. If the
> > hardware domain is separate, it is one of the domains that the control 
> > domain
> > parks before the final host suspend step.
> >
> > The hwdom-specific checks in this patch have a different purpose: they avoid
> > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> > domain to be parked as part of the suspend sequence. They do not define the
> > policy for who is allowed to trigger Xen-wide host suspend.
> >
> > That said, this policy may not be optimal for all configurations, especially
> > when the control and hardware domain roles are split. I would appreciate 
> > your
> > view, as well as the maintainers' views, on whether the trigger should 
> > remain
> > control-domain based, be tied to the hardware domain instead, or be 
> > expressed
> > through a separate host-suspend capability/helper.
>
> In the commit message and title I saw HW domain, so maybe the commit should 
> be updated
> to say control domain instead?

Yes, that was stale wording from an older version. I have fixed it
locally for v9.

>
> At this point however I’m wondering about this code above:
> ```
>     if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
>         return PSCI_NOT_SUPPORTED;
> ```
> and in do_psci_1_0_features(), shouldn’t we use consistently 
> is_control_domain()?

Yes. I have reworked this locally so the policy is now explicit and
consistent.

The control domain is the only domain whose SYSTEM_SUSPEND request may
drive the Xen-wide host suspend path. For that domain, both
PSCI_FEATURES(SYSTEM_SUSPEND) and the real SYSTEM_SUSPEND path now
consult the same helper, so the advertised capability and the execution
path stay consistent.

That helper is an explicit host-suspend capability check: it requires
firmware support for PSCI SYSTEM_SUSPEND and it also keeps host suspend
disabled when Xen detects a missing suspend/resume path in required
host-side components.

For a non-control hardware domain, SYSTEM_SUSPEND remains a virtual
guest suspend operation and does not by itself trigger Xen-wide host
suspend. Other guests keep the existing virtual behaviour as well.

>
> >
> >>
> >>> +        host_system_suspend(d);
> >>> +
> >>>    return rc;
> >>> }
> >>>
> >>> @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t 
> >>> psci_func_id)
> >>>        return 0;
> >>>    case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> >>>    case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> >>> -        return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED 
> >>> : 0;
> >>> +        if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
> >>> +             !is_hardware_domain(current->domain) )
> >>
> >> Should this have also the condition that “is hardware domain and psci_ver 
> >> >= PSCI_VERSION(1, 0)”?
> >> Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK 
> >> here but the call would
> >> fail later in call_psci_system_suspend()?
> >
> > Good point.
> >
> > I agree that, for the domain allowed to trigger Xen-wide suspend, Xen should
> > not advertise SYSTEM_SUSPEND if the host suspend path cannot be used.
> >
> > I think this should be checked as an explicit host SYSTEM_SUSPEND 
> > capability,
> > rather than only as psci_ver >= PSCI_VERSION(1, 0). The same capability 
> > check
> > also needs to be enforced in the actual SYSTEM_SUSPEND handler before 
> > parking
> > the domain, because a caller may invoke SYSTEM_SUSPEND directly without 
> > first
> > querying PSCI_FEATURES.
> >
> > For ordinary guests, the physical PSCI version is not relevant because they
> > cannot trigger host suspend; their SYSTEM_SUSPEND path is virtual.
> >
> > I will make this consistent in v9: PSCI_FEATURES will advertise 
> > SYSTEM_SUSPEND
> > for the host-suspend-triggering domain only when the host SYSTEM_SUSPEND 
> > backend
> > is available, and the actual SYSTEM_SUSPEND path will enforce the same 
> > check.
>
> ok

I have reworked this locally accordingly.

Rather than open-coding a psci_ver >= PSCI_VERSION(1, 0) test, the new
code uses an explicit host SYSTEM_SUSPEND capability predicate. For the
control domain, PSCI_FEATURES(SYSTEM_SUSPEND) and the actual
SYSTEM_SUSPEND handler now share that same check, so they stay
consistent even when host suspend is blocked by firmware capability or
by Xen-side runtime suspend/resume limitations.

>
> >>>
> >>> diff --git a/xen/drivers/passthrough/arm/smmu.c 
> >>> b/xen/drivers/passthrough/arm/smmu.c
> >>> index 22d306d0cb..45f29ef8ec 100644
> >>> --- a/xen/drivers/passthrough/arm/smmu.c
> >>> +++ b/xen/drivers/passthrough/arm/smmu.c
> >>> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct 
> >>> domain *d)
> >>> xfree(xen_domain);
> >>> }
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +static int arm_smmu_suspend(void)
> >>> +{
> >>> + return -ENOSYS;
> >>> +}
> >>> +#endif
> >>
> >> Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait 
> >> for the maintainers
> >> view on this.
> >
> > I feel that gating this strictly on !CONFIG_ARM_SMMU might not be the most
> > optimal approach here.
> >
> > CONFIG_ARM_SMMU is a build-time option and does not mean that an old 
> > SMMUv1/v2
> > device is actually present. Using it would disable system suspend even on
> > platforms where only SMMUv3 is used, because CONFIG_ARM_SMMU is enabled by
> > default for Arm.
> >
> > The condition should be runtime-based: whether the active/probed IOMMU 
> > devices
> > have system suspend/resume support. For the old ARM SMMU driver this is not
> > implemented today, so a platform with an SMMUv1/v2 instance should not 
> > expose
> > or attempt host suspend.
> >
> > I think we should handle this by tracking whether any old ARM SMMUv1/v2 
> > device
> > was actually probed, or by adding a generic IOMMU suspend capability check. 
> > Then
> > the host suspend availability check can reject system suspend only when 
> > such an
> > unsupported IOMMU is present, instead of disabling the feature for all
> > Arm builds
> > with CONFIG_ARM_SMMU enabled.
> >
> > I would be interested to hear if you or the maintainers see a better way to
> > express this capability.
>
> ok, let’s address Jan comment now and we can see what the maintainers think 
> about this.

Ack, thanks.

I have already reworked the Jan-related part locally for v9: the
control-domain readiness/capability checks now happen in the vPSCI
SYSTEM_SUSPEND path before the domain finishes suspend, and PSCI_FEATURES
plus the real call now use the same host-suspend capability predicate.

I’ll fold that into the next revision, and then we can see what the
maintainers prefer for the remaining runtime-gating details.

Best regards,
Mykola

>
> Cheers,
> Luca
>
>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.