[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors

Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to 
handle unrecoverable AER errors"):
> On 2017-08-08 15:33:01 +0100, Wei Liu wrote:
> > I think a bigger question is whether you agree with Ian's comments
> > regarding API design and whether you have more questions?
> Ian suggested that I document the use of the API (about the event loop),
> and I believe I addressed it. I don't have any more questions. Just
> waiting for Ian's "Ack", or more comments.

I'm afraid that I still have reservations about the design questions.
Evidently I didn't make my questions clear enough.

The most important question that seems unanswered to me is this:

  Why is this only sometimes the right thing to do ?  On what basis
  might a user choose ?

To which you answered:

  This is not an "only sometimes" thing. User doesn't choose it. We always
  want to watch for AER errors.

But this leads to more fundamental questions.

If this behaviour is always required, why do we have an API call to
request it ?  It sounds like not calling this new function of yours is
always a mistake.  Ie this function (which has an obscure name) is
like "IAC DONT RANDOMY-LOSE" (see RFC748, from 1st April 1978)
except that you are making DO RANDOMLY-LOSE the default (in violation
of the RFC, should anyone talk to the server over telnet...)

If you are inventing a new kind of monitoring process that must be run
for all domains, that is a thing that libxl does not have right now.
At least, it doesn't have it in this form.  (xl has the reboot
monitor, and this is done differently in libvirt.)

It was indeed a design principle of libxl that it should (at least,
wherever possible) be possible to run a domain _without_ a monitoring
process imposed by libxl.

So: why is what this API call requests, not done automatically by
pciback or by Xen ?

And: if you are inventing a new monitoring process that must be run
for every domain, you should call this out much more explicitly as a
fundamental design change.

We will then have to think about more questions: should this process
be run automatically by libxl, without special application request
(like the way that libxl runs qemu) ?

If not, how do we ensure that exactly one of these processes is
running for each guest ?

If your new design involves new behaviour in callers of libxl, do you
intend to send patches for libvirt to enable it ?

Looking at the code:

You handle errors by logging and continuing.  Why is that correct ?

If we are to keep the current API for the client, it needs to have
better doc comments.

Is the xenstore watch implementation vulnerable to unexpected paths
appearing in watch events ?

Why is the API not a never-completing ao ?  Or, why is it not an
evreg ?

But the fundamental design questions need answering first.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.