[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v9 12/13] xen/arm: Add vPSCI SYSTEM_SUSPEND policy





On 5/18/26 17:16, Mykola Kvach wrote:
Hi Oleksandr,

Hello Mykola


Thank you for the review.

On Sun, May 17, 2026 at 5:37 PM Oleksandr Tyshchenko
<olekstysh@xxxxxxxxx> wrote:



On 5/12/26 20:07, Mykola Kvach wrote:
From: Mykola Kvach <mykola_kvach@xxxxxxxx>


Hello Mykola


Introduce CONFIG_HAS_HWDOM_SYSTEM_SUSPEND as an architecture-selected
capability for platforms where the hardware domain can be parked with
SHUTDOWN_suspend without calling hwdom_shutdown().

Expose PSCI SYSTEM_SUSPEND as a vPSCI operation for all domains. For
non-control domains, including the hardware domain when it is not acting as a
control domain, the call is handled as a guest/domain suspend request and
parks the domain in SHUTDOWN_suspend.

Control domains need additional sequencing because their SYSTEM_SUSPEND
request is used to coordinate host-wide suspend. A non-last awake control
domain may be parked in SHUTDOWN_suspend without requiring the host suspend
path to be available. The last awake control domain is treated as the point
where the request becomes a host-suspend request, and it may only proceed
when all non-control domains are already in SHUTDOWN_suspend and the host
suspend path is available.

Keep the control-domain sequencing and domain-readiness checks out of
PSCI_FEATURES. They are per-attempt runtime conditions rather than stable PSCI
function availability. Advertise SYSTEM_SUSPEND as implemented by vPSCI and
enforce the sequencing policy in the call handler.

Select HAS_HWDOM_SYSTEM_SUSPEND independently from CONFIG_SYSTEM_SUSPEND so
that SHUTDOWN_suspend from the hardware domain can be treated as a domain
suspend state rather than as a hardware-domain initiated host shutdown. This
does not by itself imply that host-wide suspend is available.

Add host_system_suspend_allowed() to combine the host PSCI SYSTEM_SUSPEND
capability with runtime blockers reported by Xen-owned subsystems. Add
runtime blockers for registered serial, IOMMU, GIC and SMMUv3 MSI IRQ paths
lacking suspend/resume support. These blockers are runtime based, so they
only apply to drivers or paths that Xen actually uses on the platform. For
SMMUv3, the blocker applies only when Xen actually uses the MSI IRQ path,
since resume does not restore the SMMU *_IRQ_CFGn MSI registers yet.

Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx>

Apart from Jan's and Luca's findings, patch looks good to me. Just a few
questions to clarify, apologies if these have already been discussed
elsewhere.

---
   xen/arch/arm/Kconfig                  |   1 +
   xen/arch/arm/gic.c                    |   6 ++
   xen/arch/arm/include/asm/psci.h       |   3 +
   xen/arch/arm/include/asm/suspend.h    |  10 ++-
   xen/arch/arm/psci.c                   |   7 ++
   xen/arch/arm/suspend.c                |  40 +++++++++
   xen/arch/arm/vpsci.c                  | 114 +++++++++++++++++++++++---
   xen/common/Kconfig                    |   3 +
   xen/common/domain.c                   |   7 +-
   xen/drivers/char/serial.c             |  12 +++
   xen/drivers/passthrough/arm/iommu.c   |   4 +
   xen/drivers/passthrough/arm/smmu-v3.c |   4 +
   xen/include/xen/serial.h              |   1 +
   xen/include/xen/suspend.h             |   2 +
   14 files changed, 201 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 79622b46a1..54a5bfb9ae 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -19,6 +19,7 @@ config ARM
       select HAS_ALTERNATIVE if HAS_VMAP
       select HAS_DEVICE_TREE_DISCOVERY
       select HAS_DOM0LESS
+     select HAS_HWDOM_SYSTEM_SUSPEND if !MPU
       select HAS_GRANT_CACHE_FLUSH if GRANT_TABLE
       select HAS_STACK_PROTECTOR
       select HAS_UBSAN
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 7727ffed5a..a5efcaeb4c 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -26,6 +26,7 @@
   #include <asm/device.h>
   #include <asm/io.h>
   #include <asm/gic.h>
+#include <asm/suspend.h>
   #include <asm/vgic.h>
   #include <asm/acpi.h>

@@ -44,6 +45,11 @@ static void __init __maybe_unused build_assertions(void)
   void register_gic_ops(const struct gic_hw_operations *ops)
   {
       gic_hw_ops = ops;
+
+#ifdef CONFIG_SYSTEM_SUSPEND
+    if ( !ops->suspend || !ops->resume )
+        host_system_suspend_disable("GIC driver lacks suspend/resume support");
+#endif
   }

   static void clear_cpu_lr_mask(void)
diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
index bb3c73496e..142fa1bfe5 100644
--- a/xen/arch/arm/include/asm/psci.h
+++ b/xen/arch/arm/include/asm/psci.h
@@ -24,6 +24,9 @@ void call_psci_cpu_off(void);
   void call_psci_system_off(void);
   void call_psci_system_reset(void);
   int call_psci_system_suspend(void);
+#ifdef CONFIG_SYSTEM_SUSPEND
+bool psci_system_suspend_allowed(void);
+#endif

   /* Range of allocated PSCI function numbers */
   #define     PSCI_FNUM_MIN_VALUE                 _AC(0,U)
diff --git a/xen/arch/arm/include/asm/suspend.h 
b/xen/arch/arm/include/asm/suspend.h
index 2d9fc331fc..87db12eac3 100644
--- a/xen/arch/arm/include/asm/suspend.h
+++ b/xen/arch/arm/include/asm/suspend.h
@@ -38,7 +38,15 @@ extern struct resume_cpu_context resume_cpu_context;

   int prepare_resume_ctx(void);
   void hyp_resume(void);
-#endif /* CONFIG_SYSTEM_SUSPEND */
+bool host_system_suspend_allowed(void);
+void host_system_suspend_disable(const char *reason);
+
+#else /* !CONFIG_SYSTEM_SUSPEND */
+
+static inline bool host_system_suspend_allowed(void) { return false; }
+static inline void host_system_suspend_disable(const char *reason) {}
+
+#endif

   #endif /* ARM_SUSPEND_H */

diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index e05dae1133..e9d78668fd 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -41,6 +41,13 @@ static bool __ro_after_init has_psci_system_suspend;

   #define PSCI_RET(res)   ((int32_t)(res).a0)

+#ifdef CONFIG_SYSTEM_SUSPEND
+bool psci_system_suspend_allowed(void)
+{
+    return has_psci_system_suspend;
+}
+#endif
+
   int call_psci_cpu_on(int cpu)
   {
       struct arm_smccc_res res;
diff --git a/xen/arch/arm/suspend.c b/xen/arch/arm/suspend.c
index 6ea4a0f9cc..a571035d2c 100644
--- a/xen/arch/arm/suspend.c
+++ b/xen/arch/arm/suspend.c
@@ -1,9 +1,49 @@
   /* SPDX-License-Identifier: GPL-2.0-only */

+#include <asm/psci.h>
   #include <asm/suspend.h>

+#include <xen/lib.h>
+#include <xen/serial.h>
+
   struct resume_cpu_context resume_cpu_context;

+/*
+ * Non-PSCI infrastructure can make host suspend impossible even when the PSCI
+ * SYSTEM_SUSPEND conduit is present, e.g. when a Xen-owned driver has no valid
+ * suspend/resume path.
+ *
+ * This gate is checked only when the last awake control domain attempts to
+ * turn a guest SYSTEM_SUSPEND request into a host-suspend request.
+ */
+static bool host_system_suspend_runtime_allowed = true;
+
+static bool host_serial_suspend_allowed(void)
+{
+    if ( serial_suspend_supported() )
+        return true;
+
+    printk_once(XENLOG_INFO
+                "Host SYSTEM_SUSPEND blocked: serial driver lacks suspend/resume 
support\n");
+
+    return false;
+}
+
+bool host_system_suspend_allowed(void)
+{
+    return psci_system_suspend_allowed() &&
+           host_serial_suspend_allowed() &&
+           host_system_suspend_runtime_allowed;
+}
+
+void host_system_suspend_disable(const char *reason)
+{
+    host_system_suspend_runtime_allowed = false;
+
+    printk(XENLOG_INFO "Host SYSTEM_SUSPEND blocked: %s\n",
+           reason ? reason : "unsupported suspend/resume path");
+}
+
   /*
    * Local variables:
    * mode: C
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index ac6af6118f..48a963ae3e 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -5,6 +5,7 @@

   #include <asm/current.h>
   #include <asm/domain.h>
+#include <asm/suspend.h>
   #include <asm/vgic.h>
   #include <asm/vpsci.h>
   #include <asm/event.h>
@@ -219,6 +220,89 @@ static void do_psci_0_2_system_reset(void)
       domain_shutdown(d,SHUTDOWN_reboot);
   }

+/*
+ * Serialise SYSTEM_SUSPEND policy decisions with the domain suspend 
transition,
+ * so multiple control domains cannot all observe each other as still awake.
+ */
+static DEFINE_SPINLOCK(vpsci_system_suspend_lock);
+
+static bool domain_in_suspend_state(struct domain *d)
+{
+    bool suspended;
+
+    spin_lock(&d->shutdown_lock);
+    suspended = d->is_shut_down && d->shutdown_code == SHUTDOWN_suspend;
+    spin_unlock(&d->shutdown_lock);
+
+    return suspended;
+}
+
+static int32_t domain_psci_system_suspend_policy(struct domain *d)
+{
+    struct domain *other;
+    bool last_awake_control_domain = true;
+    bool awake_non_control_domain = false;
+
+    /* Only control domains participate in sequencing policy. */
+    if ( !is_control_domain(d) )
+        return 0;
+
+    rcu_read_lock(&domlist_read_lock);
+
+    for_each_domain ( other )
+    {
+        bool suspended;
+
+        if ( other == d )
+            continue;
+
+        suspended = domain_in_suspend_state(other);
+        if ( suspended )
+            continue;
+
+        if ( is_control_domain(other) )
+        {
+            last_awake_control_domain = false;
+            break;
+        }
+
+        awake_non_control_domain = true;
+    }
+
+    rcu_read_unlock(&domlist_read_lock);
+
+    /*
+     * Another control domain is still awake. This request is only the first
+     * phase of the sequencing: park this control domain and leave the host
+     * running. Host-wide suspend gates must not block this intermediate state.
+     */
+    if ( !last_awake_control_domain )
+        return 0;
+
+    /*
+     * This is the last awake control domain. It must not be parked unless the
+     * request can proceed as a host-suspend request; otherwise Xen would lose
+     * the last domain that can coordinate the system suspend.
+     */
+    if ( awake_non_control_domain )
+    {
+        printk(XENLOG_DEBUG
+               "SYSTEM_SUSPEND denied: last awake control domain dom%u requested 
host suspend while non-control domains are still awake\n",
+               d->domain_id);
+        return PSCI_DENIED;
+    }
+
+    /*
+     * Host-wide gates are relevant only for the last-control-domain case. They
+     * must not block parking of a non-last control domain, but they must 
reject
+     * the last control domain when host suspend is not available.
+     */
+    if ( !host_system_suspend_allowed() )
+        return PSCI_NOT_SUPPORTED;

In do_psci_1_0_features(), the guest is told that SYSTEM_SUSPEND is
implemented:

case PSCI_1_0_FN64_SYSTEM_SUSPEND:
      return 0;  /* "This function IS implemented" */

However, when the guest actually calls SYSTEM_SUSPEND, the policy check
above can reject it by return PSCI_NOT_SUPPORTED; /* "This function is
NOT implemented" */

These two responses are contradictory. The guest probes via
PSCI_FEATURES, is told suspend is available, and then gets
PSCI_NOT_SUPPORTED when it attempts to use it (if let's say the SMMUv3
driver has blocked suspend).

  From Arm Power State Coordination Interface (DEN0022F.b):

5.3.2 Implementation responsibilities
... Any function that is not implemented must return NOT_SUPPORTED in
accordance with the SMC Calling Conventions [4]. In addition,
PSCI_FEATURES must also return NOT_SUPPORTED for any non-implemented
function...

My questing is should do_psci_1_0_features() consult the same policy so
that the feature is only advertised when it can actually be used?

Or am I missing something here?

For the PSCI_FEATURES part, I agree there is a problem with returning
PSCI_NOT_SUPPORTED after advertising SYSTEM_SUSPEND as implemented.

I would still prefer not to make PSCI_FEATURES depend on the full
domain_psci_system_suspend_policy(), because that policy depends on
runtime state: whether this is the last awake control domain and whether
other non-control domains are still awake. Reporting the feature as
available or unavailable based on that state would make PSCI_FEATURES
unstable between attempts.

However, once vPSCI advertises SYSTEM_SUSPEND as implemented, the call
handler should not return PSCI_NOT_SUPPORTED for an attempt-time policy
failure. I think PSCI_DENIED is a better fit for the last-control-domain
case when the request cannot proceed as host suspend because Xen currently
cannot support the host-wide suspend path. I will rework that part.

Thank you for the explanation, I agree. Returning PSCI_DENIED instead of NOT_SUPPORTED at call time is clearer than the current state and eliminates the contradiction with PSCI_FEATURES.




+
+    return 0;
+}
+
   static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
   {
       int32_t rc;
@@ -232,10 +316,6 @@ static int32_t do_psci_1_0_system_suspend(register_t 
epoint, register_t cid)
       if ( is_64bit_domain(d) && is_thumb )
           return PSCI_INVALID_ADDRESS;

-    /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
-    if ( is_hardware_domain(d) )
-        return PSCI_NOT_SUPPORTED;
-
       /* Ensure that all CPUs other than the calling one are offline */
       domain_lock(d);
       for_each_vcpu ( d, v )
@@ -252,16 +332,29 @@ static int32_t do_psci_1_0_system_suspend(register_t 
epoint, register_t cid)
       if ( rc )
           return PSCI_DENIED;

-    rc = domain_shutdown(d, SHUTDOWN_suspend);
+    spin_lock(&vpsci_system_suspend_lock);
+
+    rc = domain_psci_system_suspend_policy(d);
+    if ( !rc )
+    {
+        rc = domain_shutdown(d, SHUTDOWN_suspend);
+        if ( rc )
+            rc = PSCI_DENIED;
+        else
+        {
+            rctx->ctxt = ctxt;
+            rctx->wake_cpu = current;
+        }
+    }
+
+    spin_unlock(&vpsci_system_suspend_lock);
+
       if ( rc )
       {
           free_vcpu_guest_context(ctxt);
-        return PSCI_DENIED;
+        return rc;
       }

-    rctx->ctxt = ctxt;
-    rctx->wake_cpu = current;
-
       gprintk(XENLOG_DEBUG,
               "SYSTEM_SUSPEND requested, epoint=%#"PRIregister", 
cid=%#"PRIregister"\n",
               epoint, cid);
@@ -287,10 +380,9 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
       case PSCI_0_2_FN32_SYSTEM_RESET:
       case PSCI_1_0_FN32_PSCI_FEATURES:
       case ARM_SMCCC_VERSION_FID:
-        return 0;
       case PSCI_1_0_FN32_SYSTEM_SUSPEND:
       case PSCI_1_0_FN64_SYSTEM_SUSPEND:
-        return is_hardware_domain(current->domain) ? PSCI_NOT_SUPPORTED : 0;
+        return 0;
       default:
           return PSCI_NOT_SUPPORTED;
       }
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 5ff71480ee..816a1a4ecb 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -140,6 +140,9 @@ config HAS_EX_TABLE
   config HAS_FAST_MULTIPLY
       bool

+config HAS_HWDOM_SYSTEM_SUSPEND
+     bool
+
   config HAS_IOPORTS
       bool

diff --git a/xen/common/domain.c b/xen/common/domain.c
index bb9e210c28..d3edfb2a13 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1375,6 +1375,11 @@ void __domain_crash(struct domain *d)
       domain_shutdown(d, SHUTDOWN_crash);
   }

+static inline bool want_hwdom_shutdown(uint8_t reason)
+{
+    return !IS_ENABLED(CONFIG_HAS_HWDOM_SYSTEM_SUSPEND) ||
+           reason != SHUTDOWN_suspend;
+}

   int domain_shutdown(struct domain *d, u8 reason)
   {
@@ -1391,7 +1396,7 @@ int domain_shutdown(struct domain *d, u8 reason)
           d->shutdown_code = reason;
       reason = d->shutdown_code;

-    if ( is_hardware_domain(d) )
+    if ( is_hardware_domain(d) && want_hwdom_shutdown(reason) )
           hwdom_shutdown(reason);

       if ( d->is_shutting_down )
diff --git a/xen/drivers/char/serial.c b/xen/drivers/char/serial.c
index adb312d796..cc2b5b5dee 100644
--- a/xen/drivers/char/serial.c
+++ b/xen/drivers/char/serial.c
@@ -497,6 +497,8 @@ const struct vuart_info *serial_vuart_info(int idx)

   #ifdef CONFIG_SYSTEM_SUSPEND

+static bool __read_mostly serial_suspend_available = true;
+
   void serial_suspend(void)
   {
       int i;
@@ -513,6 +515,11 @@ void serial_resume(void)
               com[i].driver->resume(&com[i]);
   }

+bool serial_suspend_supported(void)
+{
+    return serial_suspend_available;
+}
+
   #endif /* CONFIG_SYSTEM_SUSPEND */

   void __init serial_register_uart(int idx, struct uart_driver *driver,
@@ -521,6 +528,11 @@ void __init serial_register_uart(int idx, struct 
uart_driver *driver,
       /* Store UART-specific info. */
       com[idx].driver = driver;
       com[idx].uart   = uart;
+
+#ifdef CONFIG_SYSTEM_SUSPEND
+    if ( !driver->suspend || !driver->resume )
+        serial_suspend_available = false;


I wonder why different suspend-blocker API are used for serial vs other
subsystems (GIC, IOMMU). Why serial_register_uart() could not simply
call host_system_suspend_disable() like every other subsystem?

This would avoid adding three unnecessary symbols
(serial_suspend_available, serial_suspend_supported(),
host_serial_suspend_allowed()).

Or am I missing something here?

For the serial case, the split was intentional. The other callers are in
Arm-specific code, while serial.c is common code. I wanted to avoid making
the common serial layer call an Arm host-suspend policy helper directly,
or requiring arch-specific stub(s) only for this case.

So the current shape keeps serial.c exposing only whether the registered
UART drivers have suspend/resume callbacks, while the Arm host suspend
policy decides whether that blocks host SYSTEM_SUSPEND.

That said, if the preferred direction is to make host_system_suspend_disable()
a generic suspend helper with no-op stub(s) for architectures that do not use
it, I can rework serial_register_uart() to call it directly and drop the
extra serial_suspend_supported() path.

Thank you for the explanation, now it is clear why the approaches diverge. The concern about keeping serial.c free of arch-specific calls makes sense. That said, a single no-op stub for non-Arm arches might be simpler than maintaining three extra symbols and a separate pattern. But, I will leave it to you and the maintainers to decide which approach is preferred.


Best regards,
Mykola




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.