[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-API] [ACS4.4, XenServer] Problem starting system VMs
Hi, I think I’ve tracked this down. I believe it’s a bug in the XenServer’s event mechanism, specifically a bug where some shared state causes parallel calls to event.from to interfere with each other. From CloudStack’s point of view this manifests as * spurious SESSION_INVALID exceptions in waitForTask, which triggers cleanup (Task.destroy), which prevents the VM.start from completing, leaving the VM paused * empty lists of events being returned in non-timeout cases I’ve prototyped a fix together with a test case (which fails before and passes after) and made a pull request containing both: https://github.com/xapi-project/xen-api/pull/1719 I’d appreciate review from xapi experts, particularly Jon Ludlam (cc:d). I’ve also cc:d the main xapi development list. Cheers, Dave On 29 Apr 2014, at 05:15, Mike Tutkowski <mike.tutkowski@xxxxxxxxxxxxx> wrote: > Actually, the only issue I'm noticing now is the SSVM being automatically > paused shortly after being created (while creating a new cloud). > > If I go to XenCenter and forcefully shut the VM down, CloudStack restarts > it OK. > > > On Mon, Apr 28, 2014 at 7:34 PM, Mike Tutkowski < > mike.tutkowski@xxxxxxxxxxxxx> wrote: > >> Figured I'd CC Anthony and Edison to see if they have any input on this >> (it looks like most of the changes on the relevant file >> (Xenserver625StorageProcessor.java) were performed by one or the other). >> >> >> On Mon, Apr 28, 2014 at 12:40 PM, Mike Tutkowski < >> mike.tutkowski@xxxxxxxxxxxxx> wrote: >> >>> Thanks for the reply, guys. >>> >>> Just wanted to point out that this is on 4.4 for me (although the issue >>> may also be present on master). >>> >>> I have a sufficient number of IP addresses for both system and user VMs, >>> so that should be OK (but good thought, Punith). >>> >>> I plan to continue debugging this later this afternoon, but have been in >>> meetings all morning. >>> >>> Thanks! >>> >>> >>> On Mon, Apr 28, 2014 at 10:41 AM, Dave Scott <Dave.Scott@xxxxxxxxxx>wrote: >>> >>>> Hi, >>>> >>>> (sorry to reply to my own email!) >>>> >>>> On 28 Apr 2014, at 11:42, Dave Scott <Dave.Scott@xxxxxxxxxx> wrote: >>>> >>>>> >>>>> Hi Mike, >>>>> >>>>> On 28 Apr 2014, at 04:44, Mike Tutkowski <mike.tutkowski@xxxxxxxxxxxxx> >>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I recently installed 6.2 with XS62ESP1 and XS62ESP1004 (so that >>>>>> Xenserver625StorageProcessor would be utilized). >>>>>> >>>>>> When I create a cloud from scratch, my SSVM starts up fine, but CPVM >>>> ends >>>>>> up in the Paused state. I have to force a shutdown of that VM and then >>>>>> CloudStack restarts it and it works. This consistently happens. The >>>> system >>>>>> VMs are being deployed to the local storage of the one XS host I have >>>> in my >>>>>> one and only cluster. >>>>>> >>>>>> Any thoughts on that? >>>>> >>>>> I'm seeing the same symptom on my test cloud with 6.2 and XS62ESP1004. >>>> I think there's a problem with XenAPI session and task handling in the >>>> cloudstack master branch, although I've not tracked it down yet. In my >>>> management server log I see: >>>>> >>>>> WARN [c.c.h.x.r.CitrixResourceBase] (DirectAgent-5:ctx-47dccee1) >>>> Unable to start VM(v-2-VM) on host(1c4a31e9-469e-45c3-a0ad-9792ac7b >>>>> 20f6) due to You gave an invalid session reference. It may have been >>>> invalidated by a server restart, or timed out. You should get >>>>> a new session handle, using one of the session.login_ calls. This >>>> error does not invalidate the current connection. The handle para >>>>> meter echoes the bad value given. >>>>> You gave an invalid session reference. It may have been invalidated >>>> by a server restart, or timed out. You should get a new session >>>>> handle, using one of the session.login_ calls. This error does not >>>> invalidate the current connection. The handle parameter echoes >>>>> the bad value given. >>>>> at com.xensource.xenapi.Types.checkResponse(Types.java:218) >>>>> at com.xensource.xenapi.Connection.dispatch(Connection.java:395) >>>>> at >>>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:463) >>>>> at com.xensource.xenapi.Event.from(Event.java:270) >>>>> at >>>> org.apache.cloudstack.hypervisor.xenserver.XenServerResourceNewBase.waitForTask(XenServerResourceNewBase.java:113) >>>>> at >>>> com.cloud.hypervisor.xen.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:3455) >>>>> >>>>> Somehow the XenAPI session being used by the Event.from in the >>>> XenServerResourceNewBase.waitForTask (used for recent 6.2 XenServers only) >>>> is being logged-out somewhere. When this happens, the cloudstack cleanup >>>> code calls Task.cancel and Task.destroy, and then the XenServer >>>> Async.VM.start fails trying to update Task.progress before it internally >>>> calls VM.unpause. >>>>> >>>>> I made a hack to disable caching of Connection/sessions: >>>>> >>>>> >>>> https://github.com/djs55/cloudstack/commit/a388b71279086e42710e26340df0632d0d8135e4 >>>> >>>> For reference / experimentation, I've made a slightly more plausible >>>> patch: >>>> >>>> >>>> https://github.com/djs55/cloudstack/commit/9d40f56c6384d04a5f0fb22e5b97530c0164e0b2 >>>> >>>> It catches the SESSION_INVALID in the XenServerConnection and >>>> transparently logs back in. This would prevent the higher level bits of the >>>> XenServer plugin from having to deal with sessions being expired beneath >>>> them. >>>> >>>> Chers, >>>> Dave >>>> >>>>> >>>>> I suspect this now leaks Connections/sessions, but the symptom goes >>>> away. >>>>> >>>>> So far my thoughts are: >>>>> >>>>> 1. we need to find who's calling session.logout and why -- this will >>>> help fix the problem in the short term >>>>> >>>>> 2. The XenServer XenAPI bindings are harder to use than they should be >>>> (IMHO). In particular I think the bindings should take care of handling >>>> SESSION_INVALID exceptions and re-authenticating transparently, to avoid >>>> polluting the cloudstack code with rarely-used exception handlers. >>>>> >>>>> 3. the semantics of XenAPI task.destroy could be improved: instead of >>>> immediately removing the task (which then causes cleanup code to fail >>>> randomly it seems), it should be more like Unix waitpid with NOHANG i.e. >>>> set a bit which says, "I'm done with this. Destroy it when you are finished >>>> with it." >>>>> >>>>> >>>>>> >>>>>> Also, if I try to kick off a user VM to local storage, I get the >>>>>> general-purpose InsufficientCapacityException and the virtual router >>>> does >>>>>> not even start up. >>>>> >>>>> No idea about this one :) >>>>> >>>>> Cheers, >>>>> Dave >>>>> >>>>>> >>>>>> Can anyone create a similar cloud to what I've described here with XS >>>> 6.2, >>>>>> XS62ESP1, and XS62ESP1004? I re-ran this test using a XS 6.1 host and >>>> it >>>>>> works just fine. >>>>>> >>>>>> At the moment, this is blocking a test case I'm trying to execute to >>>> verify >>>>>> code I had to write in Xenserver625StorageProcessor. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> -- >>>>>> *Mike Tutkowski* >>>>>> *Senior CloudStack Developer, SolidFire Inc.* >>>>>> e: mike.tutkowski@xxxxxxxxxxxxx >>>>>> o: 303.746.7302 >>>>>> Advancing the way the world uses the >>>>>> cloud<http://solidfire.com/solution/overview/?video=play> >>>>>> *(tm)* >>>>> >>>> >>>> >>> >>> >>> -- >>> *Mike Tutkowski* >>> *Senior CloudStack Developer, SolidFire Inc.* >>> e: mike.tutkowski@xxxxxxxxxxxxx >>> o: 303.746.7302 >>> Advancing the way the world uses the >>> cloud<http://solidfire.com/solution/overview/?video=play> >>> *(tm)* >>> >> >> >> >> -- >> *Mike Tutkowski* >> *Senior CloudStack Developer, SolidFire Inc.* >> e: mike.tutkowski@xxxxxxxxxxxxx >> o: 303.746.7302 >> Advancing the way the world uses the >> cloud<http://solidfire.com/solution/overview/?video=play> >> *(tm)* >> > > > > -- > *Mike Tutkowski* > *Senior CloudStack Developer, SolidFire Inc.* > e: mike.tutkowski@xxxxxxxxxxxxx > o: 303.746.7302 > Advancing the way the world uses the > cloud<http://solidfire.com/solution/overview/?video=play> > *(tm)* _______________________________________________ Xen-api mailing list Xen-api@xxxxxxxxxxxxx http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |