[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] xl shutdown --wait "racy"



Wednesday, April 16, 2014, 5:02:50 PM, you wrote:

> On Wed, 2014-04-16 at 16:55 +0200, Sander Eikelenboom wrote:
>> Wednesday, April 16, 2014, 4:33:30 PM, you wrote:
>> 
>> > On Wed, 2014-04-16 at 16:26 +0200, Sander Eikelenboom wrote:
>> >> Wednesday, April 16, 2014, 4:13:59 PM, you wrote:
>> >> 
>> >> > On Wed, 2014-04-16 at 16:08 +0200, Sander Eikelenboom wrote:
>> >> >> Hi Ian (C|J) Konrad,
>> >> >> 
>> >> >> I'm currently trying to workaround the pci-(detach|assignable-remove) 
>> >> >> issues i 
>> >> >> reported earlier. 
>> >> >> 
>> >> >> The workaround i thought of was:
>> >> >> - shutting down the guest
>> >> >> - starting it without 1 of the original devices passed through
>> >> >> - use xl pci-assignable-remove and bind the device to the dom0 driver.
>> >> >> 
>> >> >> But during this i noticed that a "xl shutdown --wait" does wait .. but 
>> >> >> returns:
>> >> >> - Before the domain is removed from for instance "xl list", it still 
>> >> >> listed there in 
>> >> >> "--ps--" state.
>> >> >> - before pciback has done it's restore config space magic.
>> >> >> 
>> >> >> So it seems the wait loop is exiting somewhat prematurely, is this 
>> >> >> expected ? 
>> >> 
>> >> > It is waiting for the domain to be shutdown (state 's') not for the
>> >> > domain to be destroyed. So it's doing what it said it would (I
>> >> > appreciate you might not find this distinction helpful under the
>> >> > circumstances...)
>> >> 
>> >> It's at least not entirely what i expected ;-)
>> >> 
>> >> Is it because there can be different "follow-up actions" due to the 
>> >> "on_poweroff=" config option ?
>> 
>> > Not really, those are somewhat unrelated.
>> 
>> > shutdown and destroy are two distinct events. Once a domain has shutdown
>> > (called the shutdown hypercall etc) it goes into state "shutdown" and an
>> > event is generated from the hypervisor to the toolstack. The toolstack's
>> > response to this is to actually destroy the domain, that is to tear down
>> > the resources it is using etc.
>> 
>> > on_* only matter for the destroy phase since they tell the toolstack
>> > what it should do (restart, preserve, really destroy etc).
>> 
>> Hmm ok, it should be called "--wait_until_halfway" then ;-)

> ;-)

>> On the more serious side .. would patches be accepted that:
>> 
>> a) differentiate when it returns from waiting based on the on_*
>> 
>>         preserve: this could probably stay as is .. after the shutdown event
>>         destroy:
>>         restart:
>>         rename-restart:
>>         coredump-destroy:
>>         coredump-restart:
>> 
>>         for the other ones .. i don't know if there actually are events in 
>> libxl 
>>         that could be 'easily' coupled ?

> Might be tricky, since on_* is processed by the daemonised xl which is
> monitoring the domain, not the xl shutdown process.

>> b) make it possible for the xl commandline to overrule the on_* from the 
>> configfile

> I guess you mean the xl shutdown command. This will also be tricky, for
> the same reasons as a.

>> c) also introduce a -w/--wait for xl destroy

> Yes.

> I'll add:

> d) Make "xl shutdown --wait" actually wait for the domain to be
> destroyed.

> Probably, assuming that is possible (I'm concerned about races in the
> implementation of this...). Might also interact weirdly with on_* I
> suppose.

Well if we could pass down the events that "wait_for_domain_deaths" is allowed 
to return on ... now it seems to return on *any* event .. and only print 
something different on both shutdown and complete death ... 

Is there a special event that's triggered on timeout (as defined in 
/etc/defaults/xendomains: XENDOMAINS_STOP_MAXWAIT=300 ?

The the solution seems to be to let the caller of "wait_for_domain_deaths" be 
able the specify the events it should return on. (always return on timeout ... 
return on any unless specified ... only return on specified when specific 
events 
are specified)  

Could you elaborate on how you think this would get "racy" ?

static void wait_for_domain_deaths(libxl_evgen_domain_death **deathws, int nr)
{
    int rc, count = 0;
    LOG("Waiting for %d domains", nr);
    while(1 && count < nr) {
        libxl_event *event;
        rc = libxl_event_wait(ctx, &event, LIBXL_EVENTMASK_ALL, 0,0);
        if (rc) {
            LOG("Failed to get event, quitting (rc=%d)", rc);
            exit(-1);
        }

        switch (event->type) {
        case LIBXL_EVENT_TYPE_DOMAIN_DEATH:
            LOG("Domain %d has been destroyed", event->domid);
            libxl_evdisable_domain_death(ctx, deathws[event->for_user]);
            count++;
            break;
        case LIBXL_EVENT_TYPE_DOMAIN_SHUTDOWN:
            LOG("Domain %d has been shut down, reason code %d",
                event->domid, event->u.domain_shutdown.shutdown_reason);
            libxl_evdisable_domain_death(ctx, deathws[event->for_user]);
            count++;
            break;
        default:
            LOG("Unexpected event type %d", event->type);
            break;
        }
        libxl_event_free(ctx, event);
    }
}


> Ian.




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.