[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v16 2/7] remus: introduce remus device



Hongyang Yang writes ("Re: [PATCH v16 2/7] remus: introduce remus device"):
> On 07/16/2014 02:28 AM, Ian Jackson wrote:
> > Yang Hongyang writes ("[PATCH v16 2/7] remus: introduce remus device"):
> >>       libxl__xc_domain_saverestore_async_callback_done(egc, &dss->shs, 1);

Thanks for all your replies.  Everything that seemed good to me I
haven't responded to individually, so what is left is things that I
wanted to comment further on.

I've been trying to respond to these emails quickly, to enable you to
make sure that code changes flowing from all of my comments, and all
the resulting conversations, are incorporated in your series.

FYI I don't intend to review v17 because I want to finish all these
conversations, and resolve the outstanding questions.  When we have
answered the questions and the answers have been incorporated in your
series, I will re-review it.  I hope that makes sense.

> > But this is a bit mystifying to me.  It seems that when you have
> > finished committing to the checkpoint, you delay by the
> > inter-checkpoint delay before returning.  Is that right ?  I think
> > this could at the very least do with some commentary explaining why...
> 
> This delay time is the interval between checkpoints. Remus doing checkpoints
> periodically.

Yes, but why does this inter-checkpoint delay appear as a delay before
returning from the libxc checkpoint callback ?  There may be a good
reason, based on the libxc API semantics or something, but it isn't
evident from the doc comments or the code.

> >> @@ -1752,6 +1849,34 @@ static void domain_suspend_done(libxl__egc *egc,
> >>           xc_suspend_evtchn_release(CTX->xch, CTX->xce, domid,
> >>                              dss->guest_evtchn.port, 
> >> &dss->guest_evtchn_lockfd);
> >>
> >> +    if (dss->remus) {
> >
> > The usual libxl idiom for this kind of conditional processing can be
> > seen in libxl_create.c.  Search, for example, for
> > "domcreate_bootloader_done".  You'll see that what we do is
> > this:
> >
> >          if (restore_fd >= 0) {
> >              LOG(DEBUG, "restoring, not running bootloader\n");
> >                      domcreate_bootloader_done(egc, &dcs->bl, 0);
> >
> > So I think what you should do here is:
> >
> >        if (!dss->remus) {
> >            remus_teardown_done(egc,rds,0;
> 
> If it's not a remus suspend, there's no rds...so we can not call
> remus_teardown_done.

AFAICT dss has the rds as a member.  So you can pass &dss->rds (and
then use CONTAINER_OF to extract it).  Your existing function
libxl__remus_teardown_done only uses the passed-in rds with
CONTAINER_OF to find the dss.  So the contents of dss->rds don't
matter (although in fact it will be blank rather than undefined
because it will have been allocated with a zeroing allocator).

> >> +typedef enum libxl__remus_device_kind {
> >> +    LIBXL__REMUS_DEVICE_NIC,
> >> +    LIBXL__REMUS_DEVICE_DISK,
> >> +} libxl__remus_device_kind;
> >
> > AFAICT this enum is used only for filtering which subkind ops vtables
> > we attempt setup() on.  Maybe it would be better to remove this enum
> > and replace its uses in libxl__remus_devices_setup with references to
> > (lists of) vtables ?
...
> > Eg
> >      for (i = 0; i < rds->num_nics; i++) {
> >              libxl__remus_device_init(egc, rds,
> >                                       libxl__remus_device_subkinds_nic,
> >                                       &rds->nics[i]);
...
> Thanks for the detailed comments. this enum also used for match() and
> other uses, so I will keep it.

My proposal above would do away with the need for match to look at the
kind at all.  It would only iterate over the subkinds which are
relevant to that kind.

What are the other purposes you mention ?  I didn't notice them.

> > Although, now I have read later in this file, it seems that this
> > subkind-specific state is stored in the per-remus-operation state
> > structure, not in the libxl_ctx.
> >
> > I see from (for example) that libxl_netbuffer.c uses a netlink socket
> > per remus operation.  Would it be possible to share a netlink socket
> > between operations ?  I seem to remember this was a static global
> > variable in a previous version of the series, which suggests it would
> > be possible.
> >
> > If it is, why not put it in the libxl_ctx ?
> 
> It is in the libxl_ctx. init() will only be called one time per remus
> operation.

Oh, yes, I see, it is in the ctx.  But it is also unconditionally
allocated by nic_init.  So the code is wrong: if you run two remus
operations on the same ctx, each of them will allocate the
libxl__remus_netbuf_state.

I haven't looked at the teardown path but I suspect that it will leave
one of the operations going without a useable ns.

If you store this state in the ctx, and therefore share it amongst
multiple concurrent remus operations, you must set it up idempotently,
and only destroy it when there are no remus operations outstanding
(which I think means either refcounting, or - more sensibly - just
leaving it allocated until the ctx is destroyed).

> >> +    libxl__ao *ao;
> >> +    libxl__egc *egc;
> >> +    uint32_t domid;
> >> +    libxl__remus_callback *callback;
> >> +    /* the last ops must be NULL */
> >> +    const libxl__remus_device_subkind_ops **ops;
> >
> > Surely it shouldn't be the non-remus code's responsibility to provide
> > the vtable list ?  In fact I don't really understand why this needs to
> > be a parameter at all.  Isn't it always the same ?
> 
> We make remus device abstract layer more generic, because COLO will
> reuse remus device layer, so it will has diffrent ops.

I hadn't realised that you plan to use different ops for your high
availability system.  I'm tempted to ask all sorts of questions about
your plans the future but this is a bit of a can of worms.

So we have two ways to approach this patch series.  One would be to
can ask all the questions about how this will be done in the future in
your HA stack, to understand the motivations behind making this a
parameter here.  This will involve getting stuck into review of the
design of your HA stack.

The other would be to put that aside and simply review these patches
on their own terms.

I think the former, proactive, approach is going to be too impractical
here.  So I am going to adopt the latter, unless you want to try to
persuade me otherwise.

On that basis I think that for now, at least, this should be fixed.
If it needs to be made general later, that can be addressed at that
time.

> >> +    /* for calling scripts, eg. setup|teardown|match scripts */
> >> +    libxl__async_exec_state aes;
> >> +    /*
> >> +     * for async func calls, in the implementation of device ops, we
> >> +     * may use fork to do async ops. this is owned by device-specific
> >> +     * ops methods
> >> +     */
> >> +    libxl__ev_child child;
> >
> > I still think these variables would be better in the subkind private
> > data structs.  Having them here is a layering violation.  The fact
> > that each of the subkinds needs these is a coincidence, not part of
> > the design.  So it is fine to write them out twice for that reason.
...
> If we move these 2 variables into device specific structs, that is in
> concrete_data, there will be some problems:
>    concrete_data will be init when setup() is called.
>    match() is called before setup(), but in match() we may call hotplug
> scripts or doing async func calls, so this two vars may be needed.

I see.  This is rather unfortunate.

Why not merge match and setup ?  After all if match returns true we
always call setup...


> >> +/*----- helper functions -----*/
> >> +
> >> +static int init_device_subkind(libxl__remus_device_state *rds)
> >> +{
> >> +    int rc;
> >> +    const libxl__remus_device_subkind_ops **ops;
> >> +
> >> +    for (ops = rds->ops; *ops; ops++) {
> >> +        rc = (*ops)->init(rds);
> >
> > Why not call init immediately before match() ?  That would do away
> > with this loop entirely.
> >
> > Furthermore, this seems to imply that init is idempotent.  This isn't
> > spelled out in your API comments.
> >
> > If init is idempotent, and is going to be called before match() every
> > time, why not fold it into match ?
> 
> a ops will be called many times in match(). but init()/destroy() only
> need to be called once. so I think it's better to separate it.

init is in fact called more than once.  Is it supposed to be
idempotent ?  If it is supposed idempotent then it can check at the
start if anything needs to be done.  That would make the lifetime of
the per-subkkind state a matter entirely for the subkind, which would
simplify the API.  And, I think, the code.

> and also if a subkind init failed, we can exit immediately, and not
> need to wait for match complete.

I don't understand why this is important.  Performance isn't important
here, is it ?  And since we have to have the code for dealing with
asynchronous matching anyway, we don't gain any code simplicity from
separating init out.

> >> +static void device_match_cb(libxl__egc *egc,
> >> +                            libxl__remus_device *dev,
> >> +                            int rc)
> >> +{
...
> >> +    if (rc) {
> >> +        /* the ops does not match, try next ops */
> >
> > But what if rc==ERROR_MY_HEAD_EXPLODED ?
> 
> this case is treated as fail, see below:
>    if (!dev->ops || rc != ERROR_REMUS_DEVOPS_NOT_MATCH) {

That contradicts the comment, though.

> >> +void libxl__remus_devices_##api(libxl__egc *egc,                  \
> >> +                                libxl__remus_device_state *rds)   \
> >> +{                                                                 \
> >> +    int i;                                                        \
> >> +    libxl__remus_device *dev;                                     \
> >> +                                                                  \
> >> +    STATE_AO_GC(rds->ao);                                         \
> >> +                                                                  \
> >> +    rds->num_devices = 0;                                         \
> >> +    rds->saved_rc = 0;                                            \
> >> +                                                                  \
> >> +    if (rds->num_set_up == 0)                                     \
> >> +        goto out;                                                 \
> >
> > I think if you did as I suggest with check_all_devices_handled, you
> > would end up calling check_all_devices_handled on exit from this
> > function and therefore wouldn't need the special case.
> 
> This special case is needed because if rds->num_set_up == 0, rds->callback
> will no be called in below loop.

I think if you did as I suggest with check_all_devices_handled, you
would end up calling check_all_devices_handled *on every exit path*
from this function and therefore wouldn't need the special case.

> > This function is strikingly similar to the previous unfolded async
> > loops.
> >
> > And it's also similar to libxl__ao_device and libxl__multidev_*.
> >
> > Can any of this be factored out ?  Can we in fact reuse
> > libxl__ao_device and libxl__multidev_* ?
> 
> I don't have a good solution to reuse libxl__ao_device and libxl__multidev_*
> currently, so I will keep the original code for now, but will refactor it.

I'm sorry, I don't understand what you mean by not having a `good
solution'.  If multidev can be used then it would be much better if
you used it.  There would be less duplication.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.