[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] xen: use freeze/restore/thaw PM events for suspend/resume/chkpt
I didnt test the patch against the latest xen_suspend patch series you sent out. I couldnt find it in any of the trees. And since you said earlier that the xen_hvm_suspend fix would be (re)fixed to PM_FREEZE after my patch, I refrained from touching it. But I did test with 2.6.38-rc1 32 bit kernel, PVHVM mode. It "seemed" to work fine for save/restore/checkpoint. I could see the PM event messages in dmesg (freeze, thaw, restore related timing stats) On Wed, Feb 16, 2011 at 3:43 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: > On Wed, 2011-02-16 at 06:51 +0000, Shriram Rajagopalan wrote: >> Use PM_FREEZE, PM_THAW and PM_RESTORE power events for >> suspend/resume/checkpoint functionality, instead of PM_SUSPEND >> and PM_RESUME. Use of these pm events fixes the Xen Guest hangup >> when taking checkpoints. When a suspend event is cancelled >> (while taking checkpoints once/continuously), we use PM_THAW >> instead of PM_RESUME. PM_RESTORE is used when suspend is not >> cancelled. See Documentation/power/devices.txt and linux/pm.h >> for more info about freeze, thaw and restore. The sequence of >> pm events in a suspend-resume scenario is shown below. >> >> dpm_suspend_start(PMSG_FREEZE); >> >> dpm_suspend_noirq(PMSG_FREEZE); >> >> sysdev_suspend(PMSG_FREEZE); >> cancelled = suspend_hypercall() >> sysdev_resume(); >> >> dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE); >> >> dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE); > > With this patch I get > > [ 18.902808] PM: Device pcspkr failed to freeze: error -22 > [ 18.902835] xen suspend: dpm_suspend_start -22 > > apparently due to a lack of CONFIG_HIBERNATE which is a prerequisite for > using the freeze methods (see pm_ops function). > > As I mentioned earlier I think some of the CONFIG_PM_SLEEP tests in > drivers/xen/manage.c need to be adjusted for the new suspend scheme (and > I suspect they are a little wrong for the old one too). > > Since CONFIG_HIBERNATE is a "suspend to disk" option I think this needs > running past the core pm guys to determine the correct approach, it > might be to make PMSG_FREEZE support enabled by some some less specific > configuration option. > > Enabling CONFIG_HIBERNATE does seem to be sufficient to make this work > though. > > Ian. > On a related note, my initial kernel config had somehow enabled CONFIG_MICROCODE. So, with a PV kernel (2.6.38-rc1), I got the following WARNING stack trace for checkpoint & restore (ie freeze/thaw or freeze/restore) Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.255561] PM: freeze of devices complete after 0.123 msecs Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.255603] PM: late freeze of devices complete after 0.035 msecs Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] ------------[ cut here ]------------ Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] WARNING: at ...arch/x86/kernel/microcode_core.c:454 mc_sysdev_resume+0x30/0x5c() Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] Modules linked in: Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] Pid: 6, comm: migration/0 Not tainted 2.6.38-rc1-xenu #12 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] Call Trace: Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff810417db>] ? warn_slowpath_common+0x80/0x98 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8107c601>] ? cpu_stopper_thread+0x10d/0x172 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff81041808>] ? warn_slowpath_null+0x15/0x17 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff810276c5>] ? mc_sysdev_resume+0x30/0x5c Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff812294f9>] ? __sysdev_resume+0x74/0xc4 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff812295ae>] ? sysdev_resume+0x65/0xa6 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff81204736>] ? xen_suspend+0xc4/0xcb Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8107c6f1>] ? stop_machine_cpu_stop+0x7d/0xb6 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8107c674>] ? stop_machine_cpu_stop+0x0/0xb6 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8107c5d7>] ? cpu_stopper_thread+0xe3/0x172 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff813ab106>] ? schedule+0x4e7/0x516 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff81006cf2>] ? check_events+0x12/0x20 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8107c4f4>] ? cpu_stopper_thread+0x0/0x172 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff81057438>] ? kthread+0x7d/0x85 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8100b724>] ? kernel_thread_helper+0x4/0x10 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8100ab36>] ? int_ret_from_sys_call+0x7/0x1b Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff813ac6a1>] ? retint_restore_args+0x5/0x6 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] [<ffffffff8100b720>] ? kernel_thread_helper+0x0/0x10 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256614] ---[ end trace 24fdc8979bd6c62e ]--- Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.256346] PM: early restore of devices complete after 0.047 msecs Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.270496] PM: restore of devices complete after 13.106 msecs Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.279878] Setting capacity to 41943040 Feb 16 06:02:35 rshriram-vm1 kernel: [ 147.293516] Setting capacity to 41943040 Feb 16 06:04:29 rshriram-vm1 init: hvc0 main process ended, respawning Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.776082] PM: freeze of devices complete after 0.161 msecs Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.776127] PM: late freeze of devices complete after 0.037 msecs Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] ------------[ cut here ]------------ Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] WARNING: at ...arch/x86/kernel/microcode_core.c:454 mc_sysdev_resume+0x30/0x5c() Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] Modules linked in: Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] Pid: 6, comm: migration/0 Tainted: G W 2.6.38-rc1-xenu #12 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] Call Trace: Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff810417db>] ? warn_slowpath_common+0x80/0x98 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8107c601>] ? cpu_stopper_thread+0x10d/0x172 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81041808>] ? warn_slowpath_null+0x15/0x17 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff810276c5>] ? mc_sysdev_resume+0x30/0x5c Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff812294f9>] ? __sysdev_resume+0x74/0xc4 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff812295ae>] ? sysdev_resume+0x65/0xa6 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81204736>] ? xen_suspend+0xc4/0xcb Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8107c6f1>] ? stop_machine_cpu_stop+0x7d/0xb6 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8107c674>] ? stop_machine_cpu_stop+0x0/0xb6 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8107c5d7>] ? cpu_stopper_thread+0xe3/0x172 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff813ab106>] ? schedule+0x4e7/0x516 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81006cf2>] ? check_events+0x12/0x20 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8107c4f4>] ? cpu_stopper_thread+0x0/0x172 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff81057438>] ? kthread+0x7d/0x85 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8100b724>] ? kernel_thread_helper+0x4/0x10 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8100ab36>] ? int_ret_from_sys_call+0x7/0x1b Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff813ac6a1>] ? retint_restore_args+0x5/0x6 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] [<ffffffff8100b720>] ? kernel_thread_helper+0x0/0x10 Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777141] ---[ end trace 24fdc8979bd6c62f ]--- Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777060] PM: early thaw of devices complete after 0.045 msecs Feb 16 06:15:30 rshriram-vm1 kernel: [ 906.777060] PM: thaw of devices complete after 0.067 msecs sysdev_resume() call we make in drivers/xen/manage.c results in calling [sysdev_drivers]->(resume)() Looking at the microcode_core.c driver, the mc_sysdev resume function raises this warning if more than 1 CPU is online during system resume. If sysdev_resume took an arg like sysdev_suspend and called appropriate [sysdev_drivers]->(thaw)() or (restore)(), we could supply (PM_THAW/PM_RESTORE) and avoid this sort of warning. I am not sure if this would fit in with the intended functionality of sysdev_resume() function in drivers/base/sys.c. Of course, disabling CONFIG_MICROCODE makes the warning go away but I was thinking along the lines of a stock kernel config that has lots of things enabled. Correct me if I am wrong about this. shriram _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |