[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/6] xen: Add support for hiding and unhiding pcie passthrough devices

On 2017-07-06 02:45:18 -0600, Jan Beulich wrote:
> >>> On 05.07.17 at 21:38, <venu.busireddy@xxxxxxxxxx> wrote:
> > On 2017-07-04 09:46:58 -0600, Jan Beulich wrote:
> >> >>> On 27.06.17 at 19:14, <venu.busireddy@xxxxxxxxxx> wrote:
> >> 
> >> First of all, please Cc all maintainers of code you modify.
> > 
> > I was using the names spit out by the scripts/get_maintainer.pl script
> > for the patch file. I didn't know that the script had a "-f" option, and
> > without it, the script spits out only two names, which I included. I now
> > have Cc'ed all the names that the "-f" option produced. Interestingly,
> > Daniel's name is not in the "-f" output, and hence, I am still confused
> > what the correct list is!
> I can't talk about the script, except that it is known to have
> limitations. Generally, changes to the public interface should be
> Cc-ed to all REST maintainers.

Understood. I assume that when in doubt, it is better to send to REST
maintainers than fewer people? I mean, is it better to send to more
people than omit some?

> >> > Add support for hiding and unhiding (by introducing two new hypercall
> >> > subops) pci devices that trigger AER fatal errors while assigned to
> >> > guests in passthrough mode. Hiding of the device is done by assigning
> >> > it to dom_xen dummy domain.
> >> 
> >> Would you mind explaining why simply de-assigning the device
> >> (with an existing operation) isn't suitable here? (This explanation
> >> would presumably belong either in the description here or in the
> >> cover letter.)
> > 
> > My initial thinking (for the first revision) was that the guest and
> > the device together are party to the evil things, and hence the guest
> > should be killed. But I agree that unassigning the device should be
> > sufficient. Once the device is removed, the guest can't do much that
> > any other guest can't. Therefore, I will change this patchset to simply
> > unassign the device from the guest.
> > 
> > Is that acceptable?
> I think so, but I may be missing parts of your reasoning as to why
> hiding the device may be a good thing.

Here is the rationale behind hiding the erring device.

If a device is misbehaving, one of the following two things could be

a) The error is caused by the misconfiguration of the guest driver or
   the firmware. This may not be a big problem.

b) The error is caused by the owner of the domain re-flashing the firmware
   of the device and inserting a rogue firmware. This is a big problem.

And the problem is that we can't differentiate between a) and b).

If it is case b), then we certainly need to investigate and make sure
that the firmware is the correct version and/or reload a new firmware to
over-write the old one (just to be safe). Either way, the device needs to
be unassignable until the root cause is investigated. Hiding the device
is the safest way to ensure that the device is unassignable. Otherwise,
the administrator may inadvertently reboot the domain to which the
device was assigned, or, the domain itself may reboot upon errors, and in
either case, the device gets reassigned to the domain upon reboot! Hiding
the device prevents this.

However, if you think that all of this is too much paranoia, I am fine
with not hiding the device, and we simply de-assign the device from the
domain. I leave the decision to you.

> >> > --- a/xen/include/public/domctl.h
> >> > +++ b/xen/include/public/domctl.h
> >> > @@ -1222,6 +1222,9 @@ struct xen_domctl {
> >> >  #define XEN_DOMCTL_gdbsx_pausevcpu             1001
> >> >  #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
> >> >  #define XEN_DOMCTL_gdbsx_domstatus             1003
> >> > +#define XEN_DOMCTL_hide_device                 2001
> >> > +#define XEN_DOMCTL_unhide_device               2002
> >> > +#define XEN_DOMCTL_test_hidden_device          2003
> >> 
> >> Why these strange numbers?
> > 
> > I saw the numbers jump from 79 to 1000 thru 1003, and likewise used
> > different starting numbers. Would you prefer 80 thru 82, or 1004 thru
> > 1006? Of course, depends on whether we support the hide/unhide operations.
> The gdbsx ones were chosen this way long ago, perhaps to have
> them out of the way from all "normal" ones.

As also suggested by Konrad, I will use 80 thru 82.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.