[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 20 of 29 RFC] libxl: introduce libxl hotplug public API functions

On Thu, 2012-02-09 at 16:18 +0000, Stefano Stabellini wrote:
> On Thu, 9 Feb 2012, Ian Campbell wrote:
> > On Thu, 2012-02-09 at 16:00 +0000, Stefano Stabellini wrote:
> > > On Thu, 9 Feb 2012, Ian Campbell wrote:
> > > > On Thu, 2012-02-09 at 15:32 +0000, Stefano Stabellini wrote:
> > > > > On Thu, 9 Feb 2012, Ian Jackson wrote:
> > > > > > Stefano Stabellini writes ("Re: [Xen-devel] [PATCH 20 of 29 RFC] 
> > > > > > libxl: introduce libxl hotplug public API functions"):
> > > > > > > - we can reuse the "state" based mechanism to establish a 
> > > > > > > connection:
> > > > > > > again not a great protocol, but very well known and understood.
> > > > > > 
> > > > > > I don't think we have, in general, a good understanding of these
> > > > > > "state" based protocols ...
> > > > > 
> > > > > What?! We have netback, netfront, blkback, blkfront, pciback, 
> > > > > pcifront,
> > > > > kbdfront, fbfront, xenconsole, and these are only the ones in Linux!!
> > > > 
> > > > And no one I know is able to describe, accurately, exactly what the
> > > > state diagram for even one of those actually looks like or indeed should
> > > > look like. It became quite evident in these threads about hotplug script
> > > > handling etc that no one really knows for sure what (is supposed to)
> > > > happens when.
> > > 
> > > I thought that most of the thread was about the interface with the block
> > > scripts, that is an entirely different matter and completely obscure.
> > > If I am mistaken, please point me at the right email.
> > 
> > We are talking about reusing the existing xenbus state machine schema
> > for a new purpose. Ian J pointed out that these are not generally well
> > understood, you replied that it was and cited some examples. I pointed
> > out why these were not examples of why this stuff was well understood at
> > all, in fact quite the opposite.
> Sorry but I don't understand how these examples are supposed to be
> "quite the opposite".
> I quite like the idea of being able to read a single source file of less
> than 400 LOC to understand how a protocol works
> (drivers/input/misc/xen-kbdfront.c).

That is not a protocol specification, merely one implementation of it.
What does the BSD driver do? Is it exactly the same as Linux? Should BSD
driver authors be expected to reverse engineer the protocol from the
Linux code? What/who arbitrates when the two behave differently?

> In fact I don't think that understanding the protocol has been an issue
> for the GSoC student that had to write a new one.

Being able to reverse engineer something which works is not proof that
these things are "well understood" in the general case.

> I think we are under influence of a "reiventing the wheel" virus.

I think we are in danger of making the same mistakes again as have been
made with the device protocols and this is what I want to avoid.

Now, perhaps this style of state machine protocol is a reasonable design
choice in this case, but since we are starting afresh here this specific
new instance should be well documented _up_front_ not left in the "oh,
just read the Linux code" state we have now for many of our devices
which has lead to multiple slightly divergent implementations of the
same basic concept.

> > > > Justin just posted a good description for blkif.h which included a state
> > > > machine description. We need the same for pciif.h, netif.h etc etc.
> > >  
> > > The state machine is the same for block and network.
> > 
> > No, it's not. This is exactly what IanJ and I are talking about.
> Could you please elaborate?
> I am sure you know that the xenstore state machine is handled the same
> way for all the backends in QEMU (see hw/xen_backend.c).
> And the same thing is true for the frontends and the backends in Linux.

A substantial proportion of the threads about this hotplug script stuff
has been about the fact that no one is quite sure what really happens
when for all implementations nor what the common semantics are.

e.g. How do you ask a backend to shut down (do you set it to state 5?
state 6? do you nuke the xenstore dir?). Neither is anyone sure when the
correct point to call the hotplug scripts actually is, or even what
actually happens with them right now across the different backend
drivers or kernel types.

The actual state transitions which netback and blkback go through are
not the same: The netback protocol uses InitWait, the blkback one does
not or is it vice-versa? I can't remember and it isn't documented. Some
Linux frontends handled the kexec reconnect sequencing differently, by
disconnecting or reconnecting the actual underlying devices at subtly
different times and/or handling the transition from Closing back to Init
or InitWait differently.

And this is just for Linux talking to Linux.

I know for sure that the Windows frontends follow a different state
transition path to Linux (and that it has interacted badly with the
kexec differences in the Linux backends discussed above). I bet BSD has
some subtle differences in behaviour too.

The fact is that none of our device state machine protocols are not well
documented (although blkif.h is about to be). If this stuff were well
understood we would already have such documentation because it would be
trivial to write -- but it is not. If you disagree then please document
the netif state machine protocol in the form of a patch to netif.h.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.