[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling
On 28/03/2019 13:37, Juergen Gross wrote: > On 28/03/2019 14:33, Julien Grall wrote: >> Hi, >> >> On 3/28/19 1:01 PM, Volodymyr Babchuk wrote: >>> Hello Juergen, >>> >>> On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote: >>>> >>>> Especially in the scheduler area (schedule.c, cpupool.c) there is a >>>> rather complex handling involved when doing suspend and resume. >>>> >>>> This can be simplified a lot by not performing a complete cpu down and >>>> up cycle for the non-boot cpus, but keeping the pure software related >>>> state and freeing it only in case a cpu didn't come up again during >>>> resume. >>>> >>>> In summary not only the complexity can be reduced, but the failure >>>> tolerance will be even better with this series: With a dedicated hook >>>> for failing cpus when resuming it is now possible to survive e.g. a >>>> cpupool being left without any cpu after resume by moving its domains >>>> to cpupool0. >>>> >>>> Juergen Gross (6): >>>> xen/sched: call cpu_disable_scheduler() via cpu notifier >>>> xen: add helper for calling notifier_call_chain() to common/cpu.c >>>> xen: add new cpu notifier action CPU_RESUME_FAILED >>>> xen: don't free percpu areas during suspend >>>> xen/cpupool: simplify suspend/resume handling >>>> xen/sched: don't disable scheduler on cpus during suspend >>>> >>>> xen/arch/arm/smpboot.c | 4 - >>>> xen/arch/x86/percpu.c | 3 +- >>>> xen/arch/x86/smpboot.c | 3 - >>>> xen/common/cpu.c | 61 +++++++------- >>>> xen/common/cpupool.c | 131 ++++++++++++----------------- >>>> xen/common/schedule.c | 203 >>>> +++++++++++++++++++-------------------------- >>>> xen/include/xen/cpu.h | 29 ++++--- >>>> xen/include/xen/sched-if.h | 1 - >>>> 8 files changed, 190 insertions(+), 245 deletions(-) >>>> >>> >>> I tested your patch series on ARM64 platform. We had issue with hard >>> affinity - there was assertion failure in sched_credit2 code during >>> suspension if one of the vCPUs is pinned to non-0 pCPU. >> When you report an error, please make clear what commit you are using >> and whether you have patches applied on top. >> >> In this case, we have no support of suspend/resume on Arm today. So bug >> report around suspend/resume is a bit confusing to have. It is also more >> difficult to help when you don't have the full picture as a bug may be >> in your code and upstream Xen. >> >> I saw Juergen suggested a fix, please carry it in whatever series you have. >> >>> (XEN) **************************************** >>> (XEN) Panic on CPU 0: >>> (XEN) PSCI cpu off failed for CPU0 err=-3 >>> (XEN) **************************************** >> >> PSCI CPU off failing is never a good news. Here, the command has been >> denied by PSCI monitor. But... why does CPU off is actually called on >> CPU0? Shouldn't we have turned off the platform instead? > > Could it be that a scheduler lock is no longer reachable as the percpu > memory of another cpu has been released and allocated again? That would > be one of the possible results of my series. The data abort shown before the panic is potentially the percpu issue. But I don't think it will have the effect to try to turn off CPU0. This looks more an issue in the machine_halt/machine_restart path. Indeed CPU off may rightfully return -3 (DENIED) if the Trusted-OS reside on this CPU. We technically should have checked before that the CPU could be turned off. But it looks like we are missing this code. I vaguely remember to already have pointed out that issue in the past. Cheers, > > > Juergen > -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |