Xen project Mailing List

Re: [Xen-devel] [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors

To: Venu Busireddy <venu.busireddy@xxxxxxxxxx>

From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>

Date: Thu, 21 Sep 2017 18:12:54 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 21 Sep 2017 17:13:50 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Venu Busireddy writes ("Re: [PATCH v3 1/2] libxl: Implement the handler to handle unrecoverable AER errors"): > On 2017-08-08 15:33:01 +0100, Wei Liu wrote: > > I think a bigger question is whether you agree with Ian's comments > > regarding API design and whether you have more questions? > > Ian suggested that I document the use of the API (about the event loop), > and I believe I addressed it. I don't have any more questions. Just > waiting for Ian's "Ack", or more comments. I'm afraid that I still have reservations about the design questions. Evidently I didn't make my questions clear enough. The most important question that seems unanswered to me is this: Why is this only sometimes the right thing to do ? On what basis might a user choose ? To which you answered: This is not an "only sometimes" thing. User doesn't choose it. We always want to watch for AER errors. But this leads to more fundamental questions. If this behaviour is always required, why do we have an API call to request it ? It sounds like not calling this new function of yours is always a mistake. Ie this function (which has an obscure name) is like "IAC DONT RANDOMY-LOSE" (see RFC748, from 1st April 1978) except that you are making DO RANDOMLY-LOSE the default (in violation of the RFC, should anyone talk to the server over telnet...) If you are inventing a new kind of monitoring process that must be run for all domains, that is a thing that libxl does not have right now. At least, it doesn't have it in this form. (xl has the reboot monitor, and this is done differently in libvirt.) It was indeed a design principle of libxl that it should (at least, wherever possible) be possible to run a domain _without_ a monitoring process imposed by libxl. So: why is what this API call requests, not done automatically by pciback or by Xen ? And: if you are inventing a new monitoring process that must be run for every domain, you should call this out much more explicitly as a fundamental design change. We will then have to think about more questions: should this process be run automatically by libxl, without special application request (like the way that libxl runs qemu) ? If not, how do we ensure that exactly one of these processes is running for each guest ? If your new design involves new behaviour in callers of libxl, do you intend to send patches for libvirt to enable it ? Looking at the code: You handle errors by logging and continuing. Why is that correct ? If we are to keep the current API for the client, it needs to have better doc comments. Is the xenstore watch implementation vulnerable to unexpected paths appearing in watch events ? Why is the API not a never-completing ao ? Or, why is it not an evreg ? But the fundamental design questions need answering first. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.