[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 0/6] xen: simplify suspend/resume handling


  • To: Julien Grall <julien.grall@xxxxxxx>, Volodymyr Babchuk <vlad.babchuk@xxxxxxxxx>
  • From: Juergen Gross <jgross@xxxxxxxx>
  • Date: Thu, 28 Mar 2019 14:37:34 +0100
  • Autocrypt: addr=jgross@xxxxxxxx; prefer-encrypt=mutual; keydata= mQENBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAG0H0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT6JATkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPuQENBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAGJAR8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHf4kBrQQY AQgAIBYhBIUSZ3Lo9gSUpdCX97DendYovxMvBQJa3fDQAhsCAIEJELDendYovxMvdiAEGRYI AB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCWt3w0AAKCRCAXGG7T9hjvk2LAP99B/9FenK/ 1lfifxQmsoOrjbZtzCS6OKxPqOLHaY47BgEAqKKn36YAPpbk09d2GTVetoQJwiylx/Z9/mQI CUbQMg1pNQf9EjA1bNcMbnzJCgt0P9Q9wWCLwZa01SnQWFz8Z4HEaKldie+5bHBL5CzVBrLv 81tqX+/j95llpazzCXZW2sdNL3r8gXqrajSox7LR2rYDGdltAhQuISd2BHrbkQVEWD4hs7iV 1KQHe2uwXbKlguKPhk5ubZxqwsg/uIHw0qZDk+d0vxjTtO2JD5Jv/CeDgaBX4Emgp0NYs8IC UIyKXBtnzwiNv4cX9qKlz2Gyq9b+GdcLYZqMlIBjdCz0yJvgeb3WPNsCOanvbjelDhskx9gd 6YUUFFqgsLtrKpCNyy203a58g2WosU9k9H+LcheS37Ph2vMVTISMszW9W8gyORSgmw==
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Thu, 28 Mar 2019 13:37:42 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 28/03/2019 14:33, Julien Grall wrote:
> Hi,
> 
> On 3/28/19 1:01 PM, Volodymyr Babchuk wrote:
>> Hello Juergen,
>>
>> On Thu, 28 Mar 2019 at 14:09, Juergen Gross <jgross@xxxxxxxx> wrote:
>>>
>>> Especially in the scheduler area (schedule.c, cpupool.c) there is a
>>> rather complex handling involved when doing suspend and resume.
>>>
>>> This can be simplified a lot by not performing a complete cpu down and
>>> up cycle for the non-boot cpus, but keeping the pure software related
>>> state and freeing it only in case a cpu didn't come up again during
>>> resume.
>>>
>>> In summary not only the complexity can be reduced, but the failure
>>> tolerance will be even better with this series: With a dedicated hook
>>> for failing cpus when resuming it is now possible to survive e.g. a
>>> cpupool being left without any cpu after resume by moving its domains
>>> to cpupool0.
>>>
>>> Juergen Gross (6):
>>>    xen/sched: call cpu_disable_scheduler() via cpu notifier
>>>    xen: add helper for calling notifier_call_chain() to common/cpu.c
>>>    xen: add new cpu notifier action CPU_RESUME_FAILED
>>>    xen: don't free percpu areas during suspend
>>>    xen/cpupool: simplify suspend/resume handling
>>>    xen/sched: don't disable scheduler on cpus during suspend
>>>
>>>   xen/arch/arm/smpboot.c     |   4 -
>>>   xen/arch/x86/percpu.c      |   3 +-
>>>   xen/arch/x86/smpboot.c     |   3 -
>>>   xen/common/cpu.c           |  61 +++++++-------
>>>   xen/common/cpupool.c       | 131 ++++++++++++-----------------
>>>   xen/common/schedule.c      | 203
>>> +++++++++++++++++++--------------------------
>>>   xen/include/xen/cpu.h      |  29 ++++---
>>>   xen/include/xen/sched-if.h |   1 -
>>>   8 files changed, 190 insertions(+), 245 deletions(-)
>>>
>>
>> I tested your patch series on ARM64 platform. We had issue with hard
>> affinity - there was assertion failure in sched_credit2 code during
>> suspension if one of the vCPUs is pinned to non-0 pCPU.
> When you report an error, please make clear what commit you are using
> and whether you have patches applied on top.
> 
> In this case, we have no support of suspend/resume on Arm today. So bug
> report around suspend/resume is a bit confusing to have. It is also more
> difficult to help when you don't have the full picture as a bug may be
> in your code and upstream Xen.
> 
> I saw Juergen suggested a fix, please carry it in whatever series you have.
> 
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) PSCI cpu off failed for CPU0 err=-3
>> (XEN) ****************************************
> 
> PSCI CPU off failing is never a good news. Here, the command has been
> denied by PSCI monitor. But... why does CPU off is actually called on
> CPU0? Shouldn't we have turned off the platform instead?

Could it be that a scheduler lock is no longer reachable as the percpu
memory of another cpu has been released and allocated again? That would
be one of the possible results of my series.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.