[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Prepping a bugfix push



On Friday, 04 December 2009 at 08:37, Jeremy Fitzhardinge wrote:
> On 12/04/09 07:50, Ian Campbell wrote:
> >On Fri, 2009-12-04 at 07:46 +0000, Ian Campbell wrote:
> >>I've been doing regular suspend/resumes not checkpoint ones as Brendan
> >>is doing, I did try a couple of checkpointed ones yesterday and they
> >>failed, IIRC with a similar softlockup to this one.
> >So what is happening is that the device event channels are getting torn
> >down by the resume handler and never completely reinstated in the
> >cancelled suspend (aka checkpoint) case.
> 
> Hm.
> 
> >In 2.6.18 there was a separate ->suspend_cancel() callback for each
> >driver, called instead of the ->resume() callback in exactly these
> >circumstances. The cancel callback doesn't do any of the teardown, in
> >fact for blkfront it doesn't even exist.
> >
> >(As a proof of concept, commenting out the entire contents of
> >blkfront_resume and netfront_resume makes checkpointing work OK for me,
> >at the cost of breaking regular resume, of course)
> >
> >pv-ops uses the generic power management infrastructure which does not
> >have a concept of cancelling a suspend. Perhaps it should? Otherwise a
> >different solution will be required, I'm not sure what that might be yet
> >yet.
> 
> Well, the obvious one is to treat it as a full suspend followed by
> immediate resume.  That is, just remove all the special case handling
> for checkpoint, and let it do the normal resume stuff when the
> hypercall returns.
> 
> I think the PM core can fail to suspend; it just resumes anything
> that has been suspended so far.

Hmm. I just tried changing the SUSPEND_CANCEL elfnote to 0 in pvops,
and now save -c takes a very long time. From the xend log:

[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:3025) 
XendDomainInfo.resumeDomain(19)
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2319) Destroying device model
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2326) Releasing devices
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vif/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) 
XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing vbd/51713
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) 
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51713
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:2332) Removing console/0
[2009-12-04 08:57:58 4917] DEBUG (XendDomainInfo:1213) 
XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2009-12-04 08:57:58 4917] INFO (XendDomainInfo:3260) Dev 51713 still active, 
looping...

that last line repeats for a very long time, and eventually gives
up. The domain is still broken when save completes.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.