[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] question about migration
On 04/01/16 15:31, Ian Jackson wrote: > Andrew Cooper writes ("Re: [Xen-devel] question about migration"): >> On 25/12/2015 03:06, Wen Congyang wrote: >>> Another problem: >>> If migration fails after the guest is suspended, we will resume it in the >>> source. >>> In this case, we cannot shutdown it. because no process hanlds the shutdown >>> event. >>> The log in /var/log/xen/xl-hvm_nopv.log: >>> Waiting for domain hvm_nopv (domid 1) to die [pid 5508] >>> Domain 1 has shut down, reason code 2 0x2 >>> Domain has suspended. >>> Done. Exiting now >>> >>> The xl has exited... > ... >> Hmm yes. This is a libxl bug in libxl_evenable_domain_death(). CC'ing >> the toolstack maintainers. > AIUI this is a response to Wen's comments above. > >> It waits for the @releasedomain watch, but doesn't interpret the results >> correctly. In particular, if it can still make successful hypercalls >> with the provided domid, that domain was not the subject of >> @releasedomain. (I also observe that domain_death_xswatch_callback() is >> very inefficient. It only needs to make a single hypercall, not query >> the entire state of all domains.) > I don't understand precisely what you allege this bug to be, but: > > * libxl_evenable_domain_death may generate two events, a > DOMAIN_SHUTDOWN and a DOMAIN_DEATH, or only one, a DOMAIN_DEATH. > This is documented in libxl.h (although it refers to DESTROY rather > than DEATH - see patch below to fix the doc). > > * @releaseDomain usually triggers twice for each domain: once when it > goes to SHUTDOWN and once when it is actually destroyed. (This is > obviously necessary to implement the above.) So it does. I clearly had an accident with `git grep` when I came the opposite conclusion. Apologies for the noise generated from this. > > * @releaseDomain does not have a specific domain which is the "subject > of @releaseDomain". Arguably this is unhelpful, but it is not > libxl's fault. It arises from the VIRQ generated by Xen. Note that > xenstored needs to search its own list of active domains to see what > has happened; it generates the @releaseDomain event and throws away > the domid. The semantics of @releaseDomain are quite mad, but this is have it has always been. The current semantics are a scalability limitation which someone in XenServer will likely get around to in due course (we support 1000 VMs per host). > * It is not possible to resume the domain in the source after it has > suspended. This functionality exists and is already used in several circumstances, both by libxl, and other toolstacks. xl has an added split-brain problem here that plain demonic toolstacks don't have; specifically that there are two completely independent processes playing with the domain state at the same time. The daemonic xl needs to ignore DOMAIN_SHUTDOWN and tidy up only after DOMAIN_DEATH. Under these circumstances, a failed migrate which resumes the domain won't result in qemu being cleaned up. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |