[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pvops: Does PVOPS guest os support online "suspend/resume"

On Tue, Aug 13, 2013 at 02:38:18PM +0000, Gonglei (Arei) wrote:
> Hi,
> I rechecked the different kernels today, and found that I made a mistake 
> before. sorry for misleading you all:)
> All in all, the problems should be concluded in the 2 items below:
> 1 the kernel 2.6.32 PVOPS guest os(I tested RHEL6.1 and RHEL6.3), does have 
> bugs in ONLINE suspend/resume (checkpoint), which was,
> as Shriram mentioned, fixed in:
> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/xen/manage.c?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> 2 the kernel above 3.0(I tested Ubuntu12.10 with kernel 3.5 and Ubuntu13.04 
> with kernel 3.8), they seem to have another "bug":
>   1) if we set MULTI VCPUS for the guest os, it would have problems in 
> resuming(to be correctly, it's thaw).
>      In details:
>          <1>set the guest os with 4 vcpus
>              in dom1.cfg: vcpus=4
>          <2>xl create dom1.cfg
>              excute command "top -d 1" in guest dom1's vnc window
>          <3>xl save -c dom1 /opt/dom1.save
>          <4>after step <3>, we check the guest dom1's vnc window, and found 
> that:
>              kernel thread migration/1, migration/2, migration/3 got their 
> cpu usage up to 100%
>                    the guest os couldn't respond to any request such as mouse 
> movement or keyboard input.
>                    no "thaw" things printed in dom1's serial output.
>   2) if we set only 1 vcpu for the guest os, it would thaw back and works 
> fine.
>   3) anyother odd thing is that: if we use the saved file generated in 2-1) 
> to restore the guest, and then do online suspend/resume (xl save -c, 
> checkpoint),
> it would be fine, no problems occurred.
> Such problem occurs on guest os with kernel 3.5/3.8(maybe other kernels as 
> well, not tested). I hope that the steps I did was correct.

Please do check with the upstream kernel. There were some CPU hotplug issues in 
older kernels
and just to make sure that this is not one of them it would be good to 
eliminate this.

Please do test with v3.11-rc5.

> Have you ever entercounter such "suspend/resume checkpoint on multi-vcpu 
> guest os" problem?
> -------
> PS: BTW, I'm wondering why using freeze/thaw instead of suspend/resume would 
> solve the problem with kernels below 3.0?
>  It seems that blkfront_resume is still called if we use thaw method here, 
> because blkfront has no available pm_op.
>     static int device_resume(struct device *dev, pm_message_t state, bool 
> async)
>     {
>          ââââ
>                    if (dev->bus) {
>                    if (dev->bus->pm) {
>                             info = "bus ";
>                             callback = pm_op(dev->bus->pm, state);
>                    } else if (dev->bus->resume) {
>                             info = "legacy bus ";
>                             callback = dev->bus->resume;  //blkfront_resume 
> is called here. here?
>                             goto End;

One easy way to figure this out is to stick printks in here to see if that 
blkfront code
is indeed called. You can also use 'dump_stack()' to get a nice stack-trace.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.