|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v8 13/13] xen/arm: Add support for system suspend triggered by hardware domain
On Fri, May 8, 2026 at 5:31 PM Luca Fancellu <Luca.Fancellu@xxxxxxx> wrote:
>
> Hi Mykola,
>
> >>> +/* Xen suspend. data identifies the domain that initiated suspend. */
> >>> +static void system_suspend(void *data)
> >>> +{
> >>> + int status;
> >>> + unsigned long flags;
> >>> + struct domain *d = (struct domain *)data;
> >>> +
> >>> + BUG_ON(system_state != SYS_STATE_active);
> >>> +
> >>> + system_state = SYS_STATE_suspend;
> >>> +
> >>> + printk("Xen suspending...\n");
> >>> +
> >>> + freeze_domains();
> >>> + scheduler_disable();
> >>> +
> >>> + status = can_system_suspend();
> >>> + if ( status )
> >>> + {
> >>> + system_state = SYS_STATE_resume;
> >>> + goto resume_scheduler;
> >>
> >> When we have an error and we get the resume_scheduler path, we apply back
> >> the
> >> context of the guest saved previously in do_psci_1_0_system_suspend(), so
> >> am I
> >> correct saying the guest won’t get any PSCI error back and we resume the
> >> guest
> >> from the guest resume entrypoint?
> >>
> >> In case, should we have a different path that returns a PSCI error
> >> (PSCI_*) into the guest
> >> x0, and skips the context restore?
> >
> > You are right about the current control flow: once the virtual
> > SYSTEM_SUSPEND request has been accepted and the domain has been parked, a
> > later failure in the Xen-wide suspend path resumes the domain through the
> > normal
> > domain resume path, rather than returning a PSCI error from the original
> > call.
> >
> > This is intentional in the current design. The virtual PSCI SYSTEM_SUSPEND
> > path parks the domain and saves its resume context. The actual Xen-wide host
> > suspend is a separate step that is attempted only after all domains are
> > suspended.
> >
> > So a failure in the later Xen-wide suspend step is treated as an abort of
> > the
> > host suspend attempt after the domain suspend was already accepted. The
> > domain
> > is then resumed through the existing domain resume path, similarly to the
> > toolstack/xl suspend-resume flow, rather than by re-entering the guest PSCI
> > call path and modifying the saved vCPU context again.
> >
> > I agree this design is not obvious from the patch. I will clarify the commit
> > message and comments. If you or the maintainers think that failures before
> > the
> > physical SYSTEM_SUSPEND call succeeds should be reported back through the
> > original virtual PSCI call, then this would require a different flow. I was
> > trying to avoid that extra complexity in this series.
>
> Ok I understand, I’m wondering if inside do_psci_1_0_system_suspend() we
> could do something
> like:
>
> […]
> if ( is_control_domain(d) && !other_domains_ready_for_suspend(d) )
> return PSCI_DENIED;
Yes. I have reworked this locally and will include it in v9.
The control-domain readiness check is now done in the vPSCI SYSTEM_SUSPEND
path before building the guest resume context and before calling
domain_shutdown(..., SHUTDOWN_suspend), so in that case the call returns
PSCI_DENIED early rather than parking the domain first.
>
> […]
>
> But I’m ok also to only document this behaviour.
>
>
> >>>
> >>> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> >>> index bd87ec430d..8fb9172186 100644
> >>> --- a/xen/arch/arm/vpsci.c
> >>> +++ b/xen/arch/arm/vpsci.c
> >>> @@ -5,6 +5,7 @@
> >>>
> >>> #include <asm/current.h>
> >>> #include <asm/domain.h>
> >>> +#include <asm/suspend.h>
> >>> #include <asm/vgic.h>
> >>> #include <asm/vpsci.h>
> >>> #include <asm/event.h>
> >>> @@ -232,8 +233,7 @@ static int32_t do_psci_1_0_system_suspend(register_t
> >>> epoint, register_t cid)
> >>> if ( is_64bit_domain(d) && is_thumb )
> >>> return PSCI_INVALID_ADDRESS;
> >>>
> >>> - /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> >>> - if ( is_hardware_domain(d) )
> >>> + if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> >>> return PSCI_NOT_SUPPORTED;
> >>>
> >>> /* Ensure that all CPUs other than the calling one are offline */
> >>> @@ -266,6 +266,9 @@ static int32_t do_psci_1_0_system_suspend(register_t
> >>> epoint, register_t cid)
> >>> "SYSTEM_SUSPEND requested, epoint=%#"PRIregister",
> >>> cid=%#"PRIregister"\n",
> >>> epoint, cid);
> >>>
> >>> + if ( is_control_domain(d) )
> >>
> >> Why is_control_domain() here and not is_hardware_domain() ?
> >
> > The use of is_control_domain() is intentional.
> >
> > The intended model is that Xen-wide host suspend is orchestrated by the
> > privileged management/control domain. The control domain coordinates the
> > toolstack side, asks other domains to enter suspend, and then issues the
> > final
> > SYSTEM_SUSPEND request to Xen.
> >
> > This does not have to be the same entity as the hardware domain. If the
> > hardware domain is separate, it is one of the domains that the control
> > domain
> > parks before the final host suspend step.
> >
> > The hwdom-specific checks in this patch have a different purpose: they avoid
> > the old hwdom_shutdown() path for SHUTDOWN_suspend and allow the hardware
> > domain to be parked as part of the suspend sequence. They do not define the
> > policy for who is allowed to trigger Xen-wide host suspend.
> >
> > That said, this policy may not be optimal for all configurations, especially
> > when the control and hardware domain roles are split. I would appreciate
> > your
> > view, as well as the maintainers' views, on whether the trigger should
> > remain
> > control-domain based, be tied to the hardware domain instead, or be
> > expressed
> > through a separate host-suspend capability/helper.
>
> In the commit message and title I saw HW domain, so maybe the commit should
> be updated
> to say control domain instead?
Yes, that was stale wording from an older version. I have fixed it
locally for v9.
>
> At this point however I’m wondering about this code above:
> ```
> if ( !IS_ENABLED(CONFIG_SYSTEM_SUSPEND) && is_hardware_domain(d) )
> return PSCI_NOT_SUPPORTED;
> ```
> and in do_psci_1_0_features(), shouldn’t we use consistently
> is_control_domain()?
Yes. I have reworked this locally so the policy is now explicit and
consistent.
The control domain is the only domain whose SYSTEM_SUSPEND request may
drive the Xen-wide host suspend path. For that domain, both
PSCI_FEATURES(SYSTEM_SUSPEND) and the real SYSTEM_SUSPEND path now
consult the same helper, so the advertised capability and the execution
path stay consistent.
That helper is an explicit host-suspend capability check: it requires
firmware support for PSCI SYSTEM_SUSPEND and it also keeps host suspend
disabled when Xen detects a missing suspend/resume path in required
host-side components.
For a non-control hardware domain, SYSTEM_SUSPEND remains a virtual
guest suspend operation and does not by itself trigger Xen-wide host
suspend. Other guests keep the existing virtual behaviour as well.
>
> >
> >>
> >>> + host_system_suspend(d);
> >>> +
> >>> return rc;
> >>> }
> >>>
> >>> @@ -290,7 +293,10 @@ static int32_t do_psci_1_0_features(uint32_t
> >>> psci_func_id)
> >>> return 0;
> >>> case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> >>> case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> >>> - return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED
> >>> : 0;
> >>> + if ( IS_ENABLED(CONFIG_SYSTEM_SUSPEND) ||
> >>> + !is_hardware_domain(current->domain) )
> >>
> >> Should this have also the condition that “is hardware domain and psci_ver
> >> >= PSCI_VERSION(1, 0)”?
> >> Otherwise if the host machine doestn’t support PSCI 1.0 we would return OK
> >> here but the call would
> >> fail later in call_psci_system_suspend()?
> >
> > Good point.
> >
> > I agree that, for the domain allowed to trigger Xen-wide suspend, Xen should
> > not advertise SYSTEM_SUSPEND if the host suspend path cannot be used.
> >
> > I think this should be checked as an explicit host SYSTEM_SUSPEND
> > capability,
> > rather than only as psci_ver >= PSCI_VERSION(1, 0). The same capability
> > check
> > also needs to be enforced in the actual SYSTEM_SUSPEND handler before
> > parking
> > the domain, because a caller may invoke SYSTEM_SUSPEND directly without
> > first
> > querying PSCI_FEATURES.
> >
> > For ordinary guests, the physical PSCI version is not relevant because they
> > cannot trigger host suspend; their SYSTEM_SUSPEND path is virtual.
> >
> > I will make this consistent in v9: PSCI_FEATURES will advertise
> > SYSTEM_SUSPEND
> > for the host-suspend-triggering domain only when the host SYSTEM_SUSPEND
> > backend
> > is available, and the actual SYSTEM_SUSPEND path will enforce the same
> > check.
>
> ok
I have reworked this locally accordingly.
Rather than open-coding a psci_ver >= PSCI_VERSION(1, 0) test, the new
code uses an explicit host SYSTEM_SUSPEND capability predicate. For the
control domain, PSCI_FEATURES(SYSTEM_SUSPEND) and the actual
SYSTEM_SUSPEND handler now share that same check, so they stay
consistent even when host suspend is blocked by firmware capability or
by Xen-side runtime suspend/resume limitations.
>
> >>>
> >>> diff --git a/xen/drivers/passthrough/arm/smmu.c
> >>> b/xen/drivers/passthrough/arm/smmu.c
> >>> index 22d306d0cb..45f29ef8ec 100644
> >>> --- a/xen/drivers/passthrough/arm/smmu.c
> >>> +++ b/xen/drivers/passthrough/arm/smmu.c
> >>> @@ -2947,6 +2947,13 @@ static void arm_smmu_iommu_domain_teardown(struct
> >>> domain *d)
> >>> xfree(xen_domain);
> >>> }
> >>>
> >>> +#ifdef CONFIG_SYSTEM_SUSPEND
> >>> +static int arm_smmu_suspend(void)
> >>> +{
> >>> + return -ENOSYS;
> >>> +}
> >>> +#endif
> >>
> >> Maybe we want to gate the feature also to !CONFIG_ARM_SMMU ? I would wait
> >> for the maintainers
> >> view on this.
> >
> > I feel that gating this strictly on !CONFIG_ARM_SMMU might not be the most
> > optimal approach here.
> >
> > CONFIG_ARM_SMMU is a build-time option and does not mean that an old
> > SMMUv1/v2
> > device is actually present. Using it would disable system suspend even on
> > platforms where only SMMUv3 is used, because CONFIG_ARM_SMMU is enabled by
> > default for Arm.
> >
> > The condition should be runtime-based: whether the active/probed IOMMU
> > devices
> > have system suspend/resume support. For the old ARM SMMU driver this is not
> > implemented today, so a platform with an SMMUv1/v2 instance should not
> > expose
> > or attempt host suspend.
> >
> > I think we should handle this by tracking whether any old ARM SMMUv1/v2
> > device
> > was actually probed, or by adding a generic IOMMU suspend capability check.
> > Then
> > the host suspend availability check can reject system suspend only when
> > such an
> > unsupported IOMMU is present, instead of disabling the feature for all
> > Arm builds
> > with CONFIG_ARM_SMMU enabled.
> >
> > I would be interested to hear if you or the maintainers see a better way to
> > express this capability.
>
> ok, let’s address Jan comment now and we can see what the maintainers think
> about this.
Ack, thanks.
I have already reworked the Jan-related part locally for v9: the
control-domain readiness/capability checks now happen in the vPSCI
SYSTEM_SUSPEND path before the domain finishes suspend, and PSCI_FEATURES
plus the real call now use the same host-suspend capability predicate.
I’ll fold that into the next revision, and then we can see what the
maintainers prefer for the remaining runtime-gating details.
Best regards,
Mykola
>
> Cheers,
> Luca
>
>
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |