[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v9 1/4] xen/arm: Implement PSCI SYSTEM_SUSPEND call for guests
Hi Volodymyr, On Sat, Aug 23, 2025 at 2:30 AM Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx> wrote: > > > Hi Mykola, > > Mykola Kvach <xakep.amatop@xxxxxxxxx> writes: > > > From: Mykola Kvach <mykola_kvach@xxxxxxxx> > > > > This patch adds support for the PSCI SYSTEM_SUSPEND function in the vPSCI > > (virtual PSCI) interface, allowing guests to request suspend via the PSCI > > v1.0 SYSTEM_SUSPEND call (both 32-bit and 64-bit variants). > > > > The implementation: > > - Adds SYSTEM_SUSPEND function IDs to PSCI definitions > > - Implements trapping and handling of SYSTEM_SUSPEND in vPSCI > > - Allows only non-hardware domains to invoke SYSTEM_SUSPEND; for the > > hardware domain, PSCI_NOT_SUPPORTED is returned to avoid halting the > > system in hwdom_shutdown() called from domain_shutdown > > - Ensures all secondary VCPUs of the calling domain are offline before > > allowing suspend due to PSCI spec > > > > GIC and virtual timer context must be saved when the domain suspends. > > This is done by moving the respective code in ctxt_switch_from > > before the return that happens if the domain suspended. > > > > domain_resume_nopause() is introduced to allow resuming a domain from > > SYSTEM_SUSPEND without pausing it first. This avoids problematic > > domain_pause() calls when resuming from suspend on Arm, particularly > > in error paths. The function is only used on Arm; other architectures > > continue to use the original domain_resume(). > > > > Usage: > > > > For Linux-based guests, suspend can be initiated with: > > echo mem > /sys/power/state > > or via: > > systemctl suspend > > > > Resuming the guest is performed from control domain using: > > xl resume <domain> > > > > Signed-off-by: Mykola Kvach <mykola_kvach@xxxxxxxx> > > --- > > Changes in V9: > > - no functional changes > > - cosmetic chnages after review > > - enhance commit message and add extra comment to the code after review > > > > Changes in V8: > > - GIC and virtual timer context must be saved when the domain suspends > > - rework locking > > - minor changes after code review > > > > Changes in V7: > > - add proper locking > > - minor changes after code review > > > > Changes in V6: > > - skip execution of ctxt_switch_from for vcpu that is in paused domain > > - add implementation of domain_resume without domain_pause > > - add helper function to determine if vcpu is suspended or not > > - ignore upper 32 bits of argument values when the domain is 64-bit > > and calls the SMC32 SYSTEM_SUSPEND function > > - cosmetic changes after review > > > > Changes in V5: > > - don't use standby mode, restore execution in a provided by guest point > > - move checking that all CPUs, except current one, are offline to after > > pausing the vCPUs > > - provide ret status from arch_domain_shutdown and handle it in > > domain_shutdown > > - adjust VPSCI_NR_FUNCS to reflect the number of newly added PSCI functions > > > > Changes in V4: > > Dropped all changes related to watchdog, domain is marked as shutting > > down in domain_shutdown and watchdog timeout handler won't trigger > > because of it. > > > > Previous versions included code to manage Xen watchdog timers during > > suspend, > > but this was removed. When a guest OS starts the Xen watchdog (either via > > the > > kernel driver or xenwatchdogd), it is responsible for managing that state > > across suspend/resume. On Linux, the Xen kernel driver properly stops the > > watchdog during suspend. However, when xenwatchdogd is used instead, suspend > > handling is incomplete, potentially leading to watchdog-triggered resets on > > resume. Xen leaves watchdog handling to the guest OS and its services. > > > > Dropped all changes related to VCPU context, because instead domain_shutdown > > is used, so we don't need any extra changes for suspending domain. > > > > Changes in V3: > > Dropped all domain flags and related code (which touched common functions > > like > > vcpu_unblock), keeping only the necessary changes for Xen suspend/resume, > > i.e. > > suspend/resume is now fully supported only for the hardware domain. > > Proper support for domU suspend/resume will be added in a future patch. > > This patch does not yet include VCPU context reset or domain context > > restoration in VCPU. > > --- > > xen/arch/arm/domain.c | 17 +++-- > > xen/arch/arm/include/asm/perfc_defn.h | 1 + > > xen/arch/arm/include/asm/psci.h | 2 + > > xen/arch/arm/include/asm/vpsci.h | 2 +- > > xen/arch/arm/vpsci.c | 101 ++++++++++++++++++++++---- > > xen/common/domain.c | 31 ++++++-- > > xen/include/xen/sched.h | 6 ++ > > 7 files changed, 131 insertions(+), 29 deletions(-) > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index 310c578909..9e9649c4e2 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p) > > if ( is_idle_vcpu(p) ) > > return; > > > > + /* Arch timer */ > > + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1); > > + virt_timer_save(p); > > + > > + /* VGIC */ > > + gic_save_state(p); > > + > > I'd like to see comment here on why we don't need to do rest of the > ctx save code. I saw that this is described in the commit message, but > comment there will be very helpful for future contributors. Honestly, regarding the vCPU context, the only thing we need to prevent from being saved (overwritten) in this call is the SCTLR_EL1 register. This was already handled in one of the previous patches, but the change was not accepted during review. As for other registers, saving them is not strictly required according to the PSCI specification. It states that at startup we must set the AIF or DAIF bits for AArch32 or AArch64, respectively (DEN0022F.b 1.3, "6.4.3.3 Exceptions"). SCTLR.{I, C, M} must be set to {0, 0, 0} (DEN0022F.b 1.3, "6.4.3.4 MMU, cache and branch predictor enable"). Finally, the context ID and the entry point must also be saved in the guest registers, but in this function they are not modified. I'll add a comment as you proposed, to make this clearer for future contributors. > > > + if ( test_bit(_VPF_suspended, &p->pause_flags) ) > > + return; > > + > > p2m_save_state(p); > > > > /* CP 15 */ > > @@ -106,10 +116,6 @@ static void ctxt_switch_from(struct vcpu *p) > > p->arch.tpidrro_el0 = READ_SYSREG(TPIDRRO_EL0); > > p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1); > > > > - /* Arch timer */ > > - p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1); > > - virt_timer_save(p); > > - > > if ( is_32bit_domain(p->domain) && cpu_has_thumbee ) > > { > > p->arch.teecr = READ_SYSREG(TEECR32_EL1); > > @@ -158,9 +164,6 @@ static void ctxt_switch_from(struct vcpu *p) > > > > /* XXX MPU */ > > > > - /* VGIC */ > > - gic_save_state(p); > > - > > isb(); > > } > > > > diff --git a/xen/arch/arm/include/asm/perfc_defn.h > > b/xen/arch/arm/include/asm/perfc_defn.h > > index effd25b69e..8dfcac7e3b 100644 > > --- a/xen/arch/arm/include/asm/perfc_defn.h > > +++ b/xen/arch/arm/include/asm/perfc_defn.h > > @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset, "vpsci: > > system_reset") > > PERFCOUNTER(vpsci_cpu_suspend, "vpsci: cpu_suspend") > > PERFCOUNTER(vpsci_cpu_affinity_info, "vpsci: cpu_affinity_info") > > PERFCOUNTER(vpsci_features, "vpsci: features") > > +PERFCOUNTER(vpsci_system_suspend, "vpsci: system_suspend") > > > > PERFCOUNTER(vcpu_kick, "vcpu: notify other vcpu") > > > > diff --git a/xen/arch/arm/include/asm/psci.h > > b/xen/arch/arm/include/asm/psci.h > > index 4780972621..48a93e6b79 100644 > > --- a/xen/arch/arm/include/asm/psci.h > > +++ b/xen/arch/arm/include/asm/psci.h > > @@ -47,10 +47,12 @@ void call_psci_system_reset(void); > > #define PSCI_0_2_FN32_SYSTEM_OFF PSCI_0_2_FN32(8) > > #define PSCI_0_2_FN32_SYSTEM_RESET PSCI_0_2_FN32(9) > > #define PSCI_1_0_FN32_PSCI_FEATURES PSCI_0_2_FN32(10) > > +#define PSCI_1_0_FN32_SYSTEM_SUSPEND PSCI_0_2_FN32(14) > > > > #define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1) > > #define PSCI_0_2_FN64_CPU_ON PSCI_0_2_FN64(3) > > #define PSCI_0_2_FN64_AFFINITY_INFO PSCI_0_2_FN64(4) > > +#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14) > > > > /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */ > > #define PSCI_0_2_AFFINITY_LEVEL_ON 0 > > diff --git a/xen/arch/arm/include/asm/vpsci.h > > b/xen/arch/arm/include/asm/vpsci.h > > index 0cca5e6830..69d40f9d7f 100644 > > --- a/xen/arch/arm/include/asm/vpsci.h > > +++ b/xen/arch/arm/include/asm/vpsci.h > > @@ -23,7 +23,7 @@ > > #include <asm/psci.h> > > > > /* Number of function implemented by virtual PSCI (only 0.2 or later) */ > > -#define VPSCI_NR_FUNCS 12 > > +#define VPSCI_NR_FUNCS 14 > > > > /* Functions handle PSCI calls from the guests */ > > bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid); > > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c > > index 7ba9ccd94b..67d369a8a2 100644 > > --- a/xen/arch/arm/vpsci.c > > +++ b/xen/arch/arm/vpsci.c > > @@ -10,28 +10,18 @@ > > > > #include <public/sched.h> > > > > -static int do_common_cpu_on(register_t target_cpu, register_t entry_point, > > - register_t context_id) > > +static int do_setup_vcpu_ctx(struct vcpu *v, register_t entry_point, > > + register_t context_id) > > { > > - struct vcpu *v; > > struct domain *d = current->domain; > > struct vcpu_guest_context *ctxt; > > int rc; > > bool is_thumb = entry_point & 1; > > - register_t vcpuid; > > - > > - vcpuid = vaffinity_to_vcpuid(target_cpu); > > - > > - if ( (v = domain_vcpu(d, vcpuid)) == NULL ) > > - return PSCI_INVALID_PARAMETERS; > > > > /* THUMB set is not allowed with 64-bit domain */ > > if ( is_64bit_domain(d) && is_thumb ) > > return PSCI_INVALID_ADDRESS; > > > > - if ( !test_bit(_VPF_down, &v->pause_flags) ) > > - return PSCI_ALREADY_ON; > > - > > if ( (ctxt = alloc_vcpu_guest_context()) == NULL ) > > return PSCI_DENIED; > > > > @@ -78,11 +68,32 @@ static int do_common_cpu_on(register_t target_cpu, > > register_t entry_point, > > if ( rc < 0 ) > > return PSCI_DENIED; > > > > - vcpu_wake(v); > > - > > return PSCI_SUCCESS; > > } > > > > +static int do_common_cpu_on(register_t target_cpu, register_t entry_point, > > + register_t context_id) > > +{ > > + int rc; > > + struct vcpu *v; > > + struct domain *d = current->domain; > > + register_t vcpuid; > > + > > + vcpuid = vaffinity_to_vcpuid(target_cpu); > > + > > + if ( (v = domain_vcpu(d, vcpuid)) == NULL ) > > + return PSCI_INVALID_PARAMETERS; > > + > > + if ( !test_bit(_VPF_down, &v->pause_flags) ) > > + return PSCI_ALREADY_ON; > > + > > + rc = do_setup_vcpu_ctx(v, entry_point, context_id); > > + if ( rc == PSCI_SUCCESS ) > > + vcpu_wake(v); > > + > > + return rc; > > +} > > + > > static int32_t do_psci_cpu_on(uint32_t vcpuid, register_t entry_point) > > { > > int32_t ret; > > @@ -197,6 +208,48 @@ static void do_psci_0_2_system_reset(void) > > domain_shutdown(d,SHUTDOWN_reboot); > > } > > > > +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t > > cid) > > +{ > > + int32_t rc; > > + struct vcpu *v; > > + struct domain *d = current->domain; > > + > > + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */ > > + if ( is_hardware_domain(d) ) > > + return PSCI_NOT_SUPPORTED; > > + > > + /* Ensure that all CPUs other than the calling one are offline */ > > + domain_lock(d); > > + for_each_vcpu ( d, v ) > > + { > > + if ( v != current && is_vcpu_online(v) ) > > + { > > + domain_unlock(d); > > + return PSCI_DENIED; > > + } > > + } > > + domain_unlock(d); > > + > > + rc = domain_shutdown(d, SHUTDOWN_suspend); > > + if ( rc ) > > + return PSCI_DENIED; > > + > > + rc = do_setup_vcpu_ctx(current, epoint, cid); > > + if ( rc != PSCI_SUCCESS ) > > + { > > + domain_resume_nopause(d); > > + return rc; > > + } > > + > > + set_bit(_VPF_suspended, ¤t->pause_flags); > > + > > + dprintk(XENLOG_DEBUG, > > + "Dom %u: SYSTEM_SUSPEND requested, epoint=%#lx, cid=%#lx\n", > > + d->domain_id, epoint, cid); > > + > > + return rc; > > +} > > + > > static int32_t do_psci_1_0_features(uint32_t psci_func_id) > > { > > /* /!\ Ordered by function ID and not name */ > > @@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t > > psci_func_id) > > case PSCI_0_2_FN32_SYSTEM_OFF: > > case PSCI_0_2_FN32_SYSTEM_RESET: > > case PSCI_1_0_FN32_PSCI_FEATURES: > > + case PSCI_1_0_FN32_SYSTEM_SUSPEND: > > + case PSCI_1_0_FN64_SYSTEM_SUSPEND: > > case ARM_SMCCC_VERSION_FID: > > return 0; > > default: > > @@ -344,6 +399,24 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, > > uint32_t fid) > > return true; > > } > > > > + case PSCI_1_0_FN32_SYSTEM_SUSPEND: > > + case PSCI_1_0_FN64_SYSTEM_SUSPEND: > > + { > > + register_t epoint = PSCI_ARG(regs, 1); > > + register_t cid = PSCI_ARG(regs, 2); > > + > > + if ( is_64bit_domain(current->domain) && > > + fid == PSCI_1_0_FN32_SYSTEM_SUSPEND ) > > + { > > + epoint &= GENMASK(31, 0); > > + cid &= GENMASK(31, 0); > > + } > > + > > + perfc_incr(vpsci_system_suspend); > > + PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid)); > > + return true; > > + } > > + > > default: > > return false; > > } > > diff --git a/xen/common/domain.c b/xen/common/domain.c > > index 5241a1629e..624c3e2c27 100644 > > --- a/xen/common/domain.c > > +++ b/xen/common/domain.c > > @@ -1343,16 +1343,13 @@ int domain_shutdown(struct domain *d, u8 reason) > > return 0; > > } > > > > -void domain_resume(struct domain *d) > > +#ifndef CONFIG_ARM > > +static > > +#endif > > +void domain_resume_nopause(struct domain *d) > > It took me some time to understand that this function makes assumption > that domain is already paused. As it behaves like *_unlocked() > functions, maybe it is better to call it something like > domain_resume_paused() ? Thank you for your suggestion. The "nopause" in the function name is intended to highlight the difference from the previous implementation, where domain_resume() would explicitly pause the domain before resuming it. Pausing the domain is not required here. This function is executed on the last running virtual CPU of the domain, so there is no need to wait for other vCPUs to finish execution in order to perform a safe operation. Since we are already inside the HVC trap handler, it is not possible for other code to run or for race conditions to occur in shutdown states. The state of other vCPUs cannot change at this point. For other parts of the code that do not execute in the context of the current domain, a domain pause is performed before such operations to ensure that all virtual CPUs are blocked and not running. This guarantees that operations which require all vCPUs to be paused/unpaused are performed safely. > > > { > > struct vcpu *v; > > > > - /* > > - * Some code paths assume that shutdown status does not get reset under > > - * their feet (e.g., some assertions make this assumption). > > - */ > > - domain_pause(d); > > - > > spin_lock(&d->shutdown_lock); > > > > d->is_shutting_down = d->is_shut_down = 0; > > @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d) > > > > for_each_vcpu ( d, v ) > > { > > + /* > > + * No need to conditionally clear _VPF_suspended here: > > + * - This bit is only set on Arm64, and only after a successful > > suspend. > > + * - domain_resume_nopause() may also be called from paths other > > than > > + * the suspend/resume flow, such as "soft-reset" actions (e.g., > > + * on_poweroff), as part of the Xenstore control/shutdown > > protocol. > > + * These require guest acknowledgement to complete the operation. > > + * So clearing the bit unconditionally is safe. > > + */ > > + clear_bit(_VPF_suspended, &v->pause_flags); > > + > > if ( v->paused_for_shutdown ) > > vcpu_unpause(v); > > v->paused_for_shutdown = 0; > > } > > > > spin_unlock(&d->shutdown_lock); > > +} > > > > +void domain_resume(struct domain *d) > > +{ > > + /* > > + * Some code paths assume that shutdown status does not get reset under > > + * their feet (e.g., some assertions make this assumption). > > + */ > > + domain_pause(d); > > + domain_resume_nopause(d); > > domain_unpause(d); > > } > > > > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h > > index fd5c9f9333..c1848d8ea6 100644 > > --- a/xen/include/xen/sched.h > > +++ b/xen/include/xen/sched.h > > @@ -814,6 +814,9 @@ void domain_destroy(struct domain *d); > > int domain_kill(struct domain *d); > > int domain_shutdown(struct domain *d, u8 reason); > > void domain_resume(struct domain *d); > > +#ifdef CONFIG_ARM > > +void domain_resume_nopause(struct domain *d); > > +#endif > > Probably I need comment from other reviewers here. Do we really need to > make this ARM-specific? I understand that right now it will be used only It is done to avoid violation of MISRA C Rule 8.7: Functions and objects should not be defined with external linkage if they are referenced in only one translation unit > by ARM, but still... Also, I am not big fan of that > > > +#ifndef CONFIG_ARM > > +static > > +#endif > > in the function definition. This is done to avoid violation of MISRA C Rule 8.8: The static storage class specifier shall be used in all declarations of objects and functions that have internal linkage > > > -- > WBR, Volodymyr Best regards, Mykola
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |