[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Fwd: Re: [PATCH v3 3/5] libxl: call hotplug scripts from libxl for vbd



Thanks for your reply.  I see you didn't reply to all of my comments.
Do you plan to address the others later ?

Roger Pau Monne writes ("Fwd: Re: [Xen-devel] [PATCH v3 3/5] libxl: call 
hotplug scripts from libxl for vbd"):
> Ian Jackson escribió:
> > So we no longer have any function which will synchronously and
> > immediately destroy a domain and throw away everything to do with it.
> > Is it actually the case that you think such a function cannot be
> > provided ?
> 
> It can be provided, but we should create another command or add an
> option to "destroy" (like -f), that doesn't wait for backend
> disconnection and just nukes both frontend/backend xenstore entries.
> This will also prevent us from executing hotplug scripts, and could lead
> to unexpected results.

Quite so.  But it is important to be able to disregard a
malfunctioning driver domain.  That would make it possible to shut
down all the guests using it, destroy the driver domain itself, and
restart it in a hopefully good state, for example.

> With this series we wait for the backend to disconnect for 10s, and if
> it doesn't respond we nuke the frontend and wait for another 10s. If
> after that it fails to disconnect we nuke the remaining backend xenstore
> entries.

I think we need to think about these timeouts.  At the very least
they're policy which should probably not be hardcoded in libxl; and
arguably people might want to be able to specify infinite ("I trust my
guest or driver domain and never want to pull the rug out from under
its feet") or zero ("this guest or driver domain has gone wrong and I
want it killed, now").

> >> +# Hack to prevent the execution of hotplug scripts from udev if the domain
> >> +# has been launched from libxl
> >> +if [ -n "${HOTPLUG_GATE}" ]&&  \
> >> +   `xenstore-read "$HOTPLUG_GATE">/dev/null 2>&1`; then
> >> +    exit 0
> >> +fi
> >
> > Oh I see.  Hmm.  Why should this string not be fixed ?  I'm not sure I
> > like the idea of being able to pass some random other xenstore path.
> 
> Anyway, I will have to pass an environmental variable when calling the
> script from udev, I can rename this to UDEV_CALL=1 if that sounds better.

Perhaps it would be better to do things the other way around, and have
an env var for the case where we're _not_ calling the script from
udev ?  After all, udev config is configuration (at least on my
distros) which the user may not update when the software is updated.

> > Also: this xenstore path should be a relative path, ie one relative to
> > the xenstore home of domain running this part of the tools.  That way
> > the scripts can be easily and automatically disabled for dom0 and
> > enabled in driver domains.
> 
> Something like:
> 
> /libxl/<domid>/disable_udev?
> 
> So that it embraces the whole domain instead of just applying to a
> single device?

Yes, except you want
  /local/domain/<domid>/libxl/disable_udev
or some such so that <domid> can access it via the relative path
  libxl/disable_udev
without needing to know its own domid.

> >> +    switch(disk->backend) {
> >> +    case LIBXL_DISK_BACKEND_PHY:
> >> +    case LIBXL_DISK_BACKEND_TAP:
> >> +        rc = libxl__initiate_device_add(egc, ao,&device);
> >> +        if (rc) goto out_free;
> >> +        break;
> >> +    case LIBXL_DISK_BACKEND_QDISK:
> >> +    default:
> >> +        libxl__ao_complete(egc, ao, 0);
> >> +        break;
> >> +    }
> >
> > Does this really need no confirmation from qemu ?
> 
> Qemu is not even started here, and it doesn't use any kind of hotplug
> scripts, so I think I can safely say yes.

Uh, what, LIBXL_DISK_BACKEND_QDISK but qemu is not started ?  How is
that possible ?

> >> +    rc = libxl__device_hotplug(gc, aorm->dev, DISCONNECT);
> >
> > This is not correct.  You need to do something more with the exit
> > status of the hotplug script than simply throwing it away as you do in
> > hotplug_exited.  libxl__device_hotplug needs to take a callback from
> > the caller so that it can notify the caller when the hotplug script
> > has exited.
> >
> > You need to not allow the ao to complete until the hotplug script has
> > exited.  Otherwise you will enter hotplug_exited after the
> > libxl__hotplug_state has been destroyed with the ao.
> 
> Ok, I think I've changed most of this stuff, and unified device
> addition/destruction on the same event, since it's just a matter of wait
> -> execute hotplug.

I'm not sure I quite understand your response, but OK :-).

> > If it's too slow and you need to time out, you need to send the
> > hotplug script a signal, and then wait for it to exit.  If necessary
> > that signal could be SIGKILL; SIGKILL can be relied on to cause the
> > hotplug script to actually terminate and thus the hotplug_exited
> > callback to occur.
> 
> Where is the best place to handle this? From what I see, we have no
> helper for doing this, so I guess I will have to use waitpid or
> something similar?

You must not use waitpid yourself.  libxl__ev_child_fork will do all
of that and simply call yo back when the child exits.

You can call kill on the pid at any time (provided that 1. you have
the ctx lock held and 2. the libxl__ev_child state struct is still
"inuse" ie still has the pid in it).

> > And I think the new interface is entirely wrong.  How does the caller
> > know when to complete the ao if libxl__initiate_device_remove never
> > calls back ?
> >
> > You need to split this into two functions: one with the current
> > interface which is a simple wrapper, used for all the existing call
> > sites, and one which never completes any ao but which always makes a
> > callback when the operation is complete.  That callback should be used
> > by the caller of libxl__initiate_device_remove to do whatever it needs
> > to, which might include completing the ao.
> 
> I understand what you are saying, but I have trouble thinking of a
> correct way to do this, since multiple events can be added to the same
> ao, and the libxl__initiate_device_remove "callback" will be called more
> than once, how do I know when I have to call ao_complete?

You have to maintain a data structure of some kind that tells you
whether you're expecting more callbacks and what they are.

> >> +/* Action to pass to hotplug caller functions */
> >> +typedef enum {
> >> +    CONNECT = 1,
> >> +    DISCONNECT = 2
> >> +} libxl__hotplug_action;
> >
> > These names are rather too generic, I think.
> 
> I've changed them to HOTPLUG_CONNECT and HOTPLUG_DISCONNECT.

Sounds good.

> >> +/*
> >> + * libxl__hotplug_exec is an internal function that should not be used
> >> + * directly, all hotplug script calls have to implement and use
> >> + * libxl__device_hotplug.
> >> + */
> >> +_hidden int libxl__device_hotplug(libxl__gc *gc, libxl__device *dev,
> >> +                                  libxl__hotplug_action action);
> >> +_hidden int libxl__hotplug_exec(libxl__gc *gc, char *arg0, char **args,
> >> +                                char **env);
> >
> > Why does libxl__hotplug_exec need to be declared here at all then ?
> > Could it be static in libxl_hotplug.c ?
> 
> No, because libxl_linux uses libxl_hotplug_launch, which is supposed to
> be a common launcher for both Linux and NetBSD.

Ah right.

> >> +    f_env = flexarray_make(9, 1);
> >> +    if (!f_env) {
> >> +        LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
> >> +                   "unable to create environment array");
> >> +        return NULL;
> >> +    }
> >
> > Isn't this an allocation failure which should be dealt with by
> > libxl__alloc_failed ?
> 
> Done, I think we should set a macro for
> 
> libxl__alloc_failed(ctx, __func__, 0,0);
> 
> Because I'm using that on more than one place.

Sure.  A helper macro for that would be fine.

> > For that matter, you've called libxl__device_kind_to_string twice.
> > Calling it only once would make this easier to read.
> >
> > I won't comment any more on places where I think GCSPRINTF,
> > LOG/LOGE/LOGEV and CTX might usefully replace what you have written.
> 
> I'm trying to spot most of them, but it's quite difficult. Are we going
> to do a massive replace of this at some point?

I'm not sure.  I don't currently have any plans to do so before 4.2.

But in new code it would be better to use the shorter convenience
macros simply to reduce the amount of nearly- meaningless repetition.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.