[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen: use freeze/restore/thaw PM events for suspend/resume/chkpt



I didnt test the patch against the latest xen_suspend patch series you sent out. I
couldnt find it in any of the trees. And since you said earlier that the xen_hvm_suspend
fix would be (re)fixed to PM_FREEZE after my patch, I refrained from touching it.
But I did test with 2.6.38-rc1 32 bit kernel, PVHVM mode. It "seemed" to work fine for
save/restore/checkpoint. I could see the PM event messages in dmesg (freeze, thaw,
restore related timing stats)

On Wed, Feb 16, 2011 at 3:43 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Wed, 2011-02-16 at 06:51 +0000, Shriram Rajagopalan wrote:
>> Use PM_FREEZE, PM_THAW and PM_RESTORE power events for
>> suspend/resume/checkpoint functionality, instead of PM_SUSPEND
>> and PM_RESUME. Use of these pm events fixes the Xen Guest hangup
>> when taking checkpoints. When a suspend event is cancelled
>> (while taking checkpoints once/continuously), we use PM_THAW
>> instead of PM_RESUME. PM_RESTORE is used when suspend is not
>> cancelled. See Documentation/power/devices.txt and linux/pm.h
>> for more info about freeze, thaw and restore. The sequence of
>> pm events in a suspend-resume scenario is shown below.
>>
>>         dpm_suspend_start(PMSG_FREEZE);
>>
>>                 dpm_suspend_noirq(PMSG_FREEZE);
>>
>>                        sysdev_suspend(PMSG_FREEZE);
>>                        cancelled = suspend_hypercall()
>>                        sysdev_resume();
>>
>>                dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE);
>>
>>        dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE);
>
> With this patch I get
>
> [   18.902808] PM: Device pcspkr failed to freeze: error -22
> [   18.902835] xen suspend: dpm_suspend_start -22
>
> apparently due to a lack of CONFIG_HIBERNATE which is a prerequisite for
> using the freeze methods (see pm_ops function).
>
> As I mentioned earlier I think some of the CONFIG_PM_SLEEP tests in
> drivers/xen/manage.c need to be adjusted for the new suspend scheme (and
> I suspect they are a little wrong for the old one too).
>
> Since CONFIG_HIBERNATE is a "suspend to disk" option I think this needs
> running past the core pm guys to determine the correct approach, it
> might be to make PMSG_FREEZE support enabled by some some less specific
> configuration option.
>
> Enabling CONFIG_HIBERNATE does seem to be sufficient to make this work
> though.
>
> Ian.
>
On a related note, my initial kernel config had somehow enabled CONFIG_MICROCODE.
So, with a PV kernel (2.6.38-rc1), I got the following WARNING stack trace for
checkpoint & restore (ie freeze/thaw or freeze/restore)

Feb 16 06:02:35 rshriram-vm1 kernel: [  147.255561] PM: freeze of devices complete after 0.123 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.255603] PM: late freeze of devices complete after 0.035 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] ------------[ cut here ]------------
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] WARNING: at ...arch/x86/kernel/microcode_core.c:454 mc_sysdev_resume+0x30/0x5c()
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] Modules linked in:
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] Pid: 6, comm: migration/0 Not tainted 2.6.38-rc1-xenu #12
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] Call Trace:
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff810417db>] ? warn_slowpath_common+0x80/0x98
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c601>] ? cpu_stopper_thread+0x10d/0x172
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81041808>] ? warn_slowpath_null+0x15/0x17
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff810276c5>] ? mc_sysdev_resume+0x30/0x5c
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff812294f9>] ? __sysdev_resume+0x74/0xc4
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff812295ae>] ? sysdev_resume+0x65/0xa6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81204736>] ? xen_suspend+0xc4/0xcb
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c6f1>] ? stop_machine_cpu_stop+0x7d/0xb6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c674>] ? stop_machine_cpu_stop+0x0/0xb6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c5d7>] ? cpu_stopper_thread+0xe3/0x172
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff813ab106>] ? schedule+0x4e7/0x516
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81006cf2>] ? check_events+0x12/0x20
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8107c4f4>] ? cpu_stopper_thread+0x0/0x172
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff81057438>] ? kthread+0x7d/0x85
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8100b724>] ? kernel_thread_helper+0x4/0x10
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8100ab36>] ? int_ret_from_sys_call+0x7/0x1b
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff813ac6a1>] ? retint_restore_args+0x5/0x6
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614]  [<ffffffff8100b720>] ? kernel_thread_helper+0x0/0x10
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256614] ---[ end trace 24fdc8979bd6c62e ]---
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.256346] PM: early restore of devices complete after 0.047 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.270496] PM: restore of devices complete after 13.106 msecs
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.279878] Setting capacity to 41943040
Feb 16 06:02:35 rshriram-vm1 kernel: [  147.293516] Setting capacity to 41943040
Feb 16 06:04:29 rshriram-vm1 init: hvc0 main process ended, respawning

Feb 16 06:15:30 rshriram-vm1 kernel: [  906.776082] PM: freeze of devices complete after 0.161 msecs
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.776127] PM: late freeze of devices complete after 0.037 msecs
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] ------------[ cut here ]------------
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] WARNING: at ...arch/x86/kernel/microcode_core.c:454 mc_sysdev_resume+0x30/0x5c()
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] Modules linked in:
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] Pid: 6, comm: migration/0 Tainted: G        W   2.6.38-rc1-xenu #12
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] Call Trace:
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff810417db>] ? warn_slowpath_common+0x80/0x98
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c601>] ? cpu_stopper_thread+0x10d/0x172
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81041808>] ? warn_slowpath_null+0x15/0x17
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff810276c5>] ? mc_sysdev_resume+0x30/0x5c
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff812294f9>] ? __sysdev_resume+0x74/0xc4
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff812295ae>] ? sysdev_resume+0x65/0xa6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81204736>] ? xen_suspend+0xc4/0xcb
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c6f1>] ? stop_machine_cpu_stop+0x7d/0xb6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c674>] ? stop_machine_cpu_stop+0x0/0xb6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c5d7>] ? cpu_stopper_thread+0xe3/0x172
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff813ab106>] ? schedule+0x4e7/0x516
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cf2>] ? check_events+0x12/0x20
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81006cdf>] ? xen_restore_fl_direct_end+0x0/0x1
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8107c4f4>] ? cpu_stopper_thread+0x0/0x172
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff81057438>] ? kthread+0x7d/0x85
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8100b724>] ? kernel_thread_helper+0x4/0x10
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8100ab36>] ? int_ret_from_sys_call+0x7/0x1b
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff813ac6a1>] ? retint_restore_args+0x5/0x6
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141]  [<ffffffff8100b720>] ? kernel_thread_helper+0x0/0x10
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777141] ---[ end trace 24fdc8979bd6c62f ]---
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777060] PM: early thaw of devices complete after 0.045 msecs
Feb 16 06:15:30 rshriram-vm1 kernel: [  906.777060] PM: thaw of devices complete after 0.067 msecs

sysdev_resume() call we make in drivers/xen/manage.c results in calling [sysdev_drivers]->(resume)()
Looking at the microcode_core.c driver, the mc_sysdev resume function
raises this warning if more than 1 CPU is online during system resume.

If sysdev_resume took an arg like sysdev_suspend and called
appropriate [sysdev_drivers]->(thaw)() or (restore)(), we could supply (PM_THAW/PM_RESTORE)
and avoid this sort of warning.

I am not sure if this would fit in with the intended functionality of sysdev_resume()
function in drivers/base/sys.c.

Of course, disabling CONFIG_MICROCODE makes the warning go away but I was
thinking along the lines of a stock kernel config that has lots of things enabled.
Correct me if I am wrong about this.

shriram
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.