[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] libxl: Implement the handler to handle unrecoverable AER errors.

On 2017-07-28 17:39:52 +0100, Ian Jackson wrote:
> Venu Busireddy writes ("[PATCH v2 1/2] libxl: Implement the handler to handle 
> unrecoverable AER errors."):
> > Implement the callback function to handle unrecoverable AER errors, and
> > also the public APIs that can be used to register/unregister the handler.
> > When an AER error occurs, the handler will forcibly remove the erring
> > PCIe device from the guest.
> Why is this only sometimes the right thing to do ?  On what basis
> might a user choose ?

This is not an "only sometimes" thing. User doesn't choose it. We always
want to watch for AER errors.

> If this is always the right thing to do then maybe we need to recast
> this as a general "please run monitoring for this domain" call ?

What does "recast" imply? Which "call" are you referring to?

> > +int libxl_reg_aer_events_handler(libxl_ctx *, uint32_t) 
> > +void libxl_unreg_aer_events_handler(libxl_ctx *, uint32_t) 
> I think these names are very unintuitive.  They describe the
> implementation, not the effect.

These names can be changed to anything you want. Please suggest any names
of your choice, and I will change them. That ensures that we don't spend
more review revisions in fine tuning those names.

> The API seems awkward.  Inside libxl, events are only processed while
> the application is inside libxl.  So for these functions to be
> effective, the calling application must arrange to be running the
> libxl event loop.  This should be documented, at least.

I am afraid I don't follow you. This scheme is tested and it works. So, I
do not follow you when you say, "...for these functions to be effective,..."

> What happens if more than one process calls this at once ?

Each call is handled within a separate 'xl' process's context. I don't
see a problem there. Do you have anything specific in mind?

> I looked at the message referred to in the 0/2
>   https://lists.xen.org/archives/html/xen-devel/2017-06/msg03274.html
>   https://lists.xen.org/archives/html/xen-devel/2017-06/msg03269.html
> and they says that the approach taken is to kill the guest.
> But the approach in these patches is not to kill the guest.
> What am I missing ?

My error. The first revision of these patches were coded to kill the
guest, and hence the commit message in that patch is still referring to
killing the guest. I will repost the xen_pciback patch with the correct
commit message, once we conclude the xen patch.

> > +typedef struct {
> > +    uint32_t domid;
> > +    libxl__ev_xswatch watch;
> > +} libxl_aer_watch;
> > +static libxl_aer_watch aer_watch;
> The global variable is completely unacceptable, I'm afraid.

It was much easier to code and understand this way. I can avoid the
global variable. I will fix this in the next revision.



> > +static void aer_backend_watch_callback(libxl__egc *egc,
> > +                                       libxl__ev_xswatch *watch,
> > +                                       const char *watch_path,
> > +                                       const char *event_path)
> > +{
> ...
> I haven't read this code in detail at this stage.
> Thanks,
> Ian.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.