[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] 3.4.70+ kernel WARNING spew dysfunction on failed migration



create ^
title it libxl should implement non-suspend-cancel based resume path
owner Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
thanks

To summarise what I just said to Ian J in the corridor (and lets have a
bug to record it):

There are two mechanisms by which a suspend can be aborted and the
original domain resumed.

The older method is that the toolstack resets a bunch of state (see
tools/python/xen/xend/XendDomainInfo.py resumeDomain) and then restarts
the domain. The domain will see HYPERVISOR_suspend return 0 and will
continue without any realisation that it is actually running in the
original domain and not in a new one. This method is supposed to be
implemented by libxl_domain_resume(suspend_cancel=0) but it is not.

The other method is newer and in this case the toolstack arranges that
HYPERVISOR_suspend returns 1 and restarts it (I beleiv . The domain will
observe this and realise that it has been restarted in the same domain
and will behave accordingly. This method is implemented, correctly
AFAIK, by libxl_domain_resume(suspend_cancel=1).

However the newer method is not available in all kernels, although it
does date from the Linux 2.6.18 days and is implemented in all Linux
pvops kernels I can't speak for others (e.g. BSD). The toolstack is
supposed to check for the XEN_ELFNOTE_SUSPEND_CANCEL ELF note when
building the domain. The presence/absence of this flag needs to be
remembered so that it can be consulted on resume (this also implies
preserving that knowledge over migration).

xl currently uses libxl_domain_resume(suspend_cancel=0) on migration
failure which as it stands won't work for *any* domain. Arguably
switching to suspend_cancel=1 for now will mean that some subset of
kernels will work, and those which don't will not have regressed, until
we can correctly implement the suspend_cancel=0 and the necessary
tracking of XEN_ELFNOTE_SUSPEND_CANCEL.

I've also just noticed that on failure to save (as opposed to migrate)
xl does use suspend_cancel=1.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.