[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v16 2/7] remus: introduce remus device



On Fri, Jul 18, 2014 at 10:24 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
Hongyang Yang writes ("Re: [PATCH v16 2/7] remus: introduce remus device"):
> Â ÂThere're two comments left that I wanted to comment further on.
> Apart from this two comments, I think I have addressed all your
> comments on v16, but before I send a v18 out, Shriram and I will
> have more detailed review on it, in case I have missed some of
> your comments.

Right. ÂThanks.

> My latest working tree that we will review on is hosted at github:
> https://github.com/macrosheep/xen/tree/remus-v17.6
> Refer to the latest 10 commits.
> What I think is the big changes compared to v16:
> Â - Merge match() and setup() api.
> Â - Reuse libxl__multidev and libxl__ao_device

Oh, good.


> >> Thanks for the detailed comments. this enum also used for match() and
> >> other uses, so I will keep it.
> >
> > My proposal above would do away with the need for match to look at the
> > kind at all. ÂIt would only iterate over the subkinds which are
> > relevant to that kind.
> >
> > What are the other purposes you mention ? ÂI didn't notice them.
>
> I intend to keep this *kind* because I will use them for filter out
> what kind of device the remus abstract layer need to handle.

Ah, I see, thanks for the explanation. ÂI'm still not entirely
convinced though:

> and this flag will be set by upper layer(remus save/restore) depend
> on the comand switch:
> int libxl_domain_remus_start(libxl_ctx *ctx, libxl_domain_remus_info *info,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â uint32_t domid, int send_fd, int recv_fd,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const libxl_asyncop_how *ao_how)
> {
> ...
> Â Â Âif (info->netbuf) {
> Â Â Â Â Âif (!libxl__netbuffer_enabled(gc)) {
> Â Â Â Â Â Â ÂLOG(ERROR, "Remus: No support for network buffering");
> Â Â Â Â Â Â Âgoto out;
> Â Â Â Â Â}
> Â Â Â Â Ârds->device_kind_flags |= LIBXL__REMUS_DEVICE_NIC;
> Â Â Â}
>
> Â Â Âif (info->diskbuf)
> Â Â Â Â Ârds->device_kind_flags |= LIBXL__REMUS_DEVICE_DISK;

So what is actually going on here is a mechanism for disabling remus
processing for certain kinds of device.

But wouldn't one perhaps want some devices of the same kind to have
remus enabled, and some disabled ?

No. By default, Remus would enable buffering for both disks and networks.
Leaving any one of these devices out would not guarantee consistency.Â

What is happening above is that Remus is trying to enable network buffering
if the info->netbuf flag is set (which is the default, unless the user explicitly disables
network buffering). ÂA similar check is performed for disks as well.

I think things would be more clearer if the if blocks became something like this:

if (info->netbuf) {
Â...
} else
 LOG(WARN, "Network buffering disabled. Failover may not be successful")

ÂA similar if-else block for the disk buffering should help.
ÂIn fact, I would suggest pushing these warning statements out to the xl cmdline
Âcode, so that even if the user were to filter out all the non-ERROR messages
Âthis warning would always be displayed in the console. [My knowledge of the
libxl log filtering is poor. So correct me if I am wrong about this.]


I'm not sure that this grouping makes much sense. ÂBut perhaps you can
explain why one might want to disable all network buffering and enable
all disk buffering, but why one would never want to enable buffering
only for a subset of nics, or a subset of disks.


Answered above. There is no way to guarantee consistency upon recovery.
I can certainly see use cases for this selective protection. But I consider these as
"exceptions" to the normal usage scenarios. Nice to haves, IMO and not a priority ATM.
Â

If this _is_ desirable, I think this kind of explicit conversion from
one type to another is undesirable if it can be eliminated.

In particular, perhaps what this really demonstrates is that the
commonality between libxl_domain_remus_info.netbuf and .diskbuf ought
to be properly represented in the idl for libxl_domain_remus_info.

Perhaps we should replace these by an array of defbools indexed by the
device kind ? ÂThe device kind would have to be in the IDL as part of
the API.



I am not upto speed on the implementation front. So I'll defer it to Hongyang.
Â
> >> and also if a subkind init failed, we can exit immediately, and not
> >> need to wait for match complete.
> >
> > I don't understand why this is important. ÂPerformance isn't important
> > here, is it ? ÂAnd since we have to have the code for dealing with
> > asynchronous matching anyway, we don't gain any code simplicity from
> > separating init out.
>
> a subkind init()/cleanup() only need to be called once per Remus instance.
> there're cases that this subkind is used for multiple devices, it we don't
> seperate it out, it will be called many times. I agree that I can make
> it capable to be called multiple times, but I think that's not what the
> design is. it is not supposed to be idempotent I think. and I think
> separate it makes the logic more comprehensive.

The cleanup needs to be idempotent anyway.

Making a setup function idempotent is normally easy. ÂYou just need to
spot at the start whether the setup has already been done.

Thanks,
Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.