[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] pci-passthrough loses msi-x interrupts ability after domain destroy



Hi,

On Fri, Sep 22, 2017 at 03:25:00PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 22, 2017 at 09:35:40AM +0200, Sander Eikelenboom wrote:
> > On 22/09/17 04:09, Christopher Clark wrote:
> > > On Thu, Sep 21, 2017 at 1:27 PM, Sander Eikelenboom
> > > <linux@xxxxxxxxxxxxxx> wrote:
> > >>
> > >> On Thu, September 21, 2017, 10:39:52 AM, Roger Pau Monné wrote:
> > >>
> > >>> On Wed, Sep 20, 2017 at 03:50:35PM -0400, Jérôme Oufella wrote:
> > >>>>
> > >>>> I'm using PCI pass-through to map a PCIe (intel i210) controller into
> > >>>> a HVM domain. The system uses xen-pciback to hide the appropriate PCI
> > >>>> device from Dom0.
> > >>>>
> > >>>> When creating the HVM domain after an hypervisor cold boot, the HVM
> > >>>> domain can access and use the PCIe controller without problem.
> > >>>>
> > >>>> However, if the HVM domain is destroyed then restarted, it won't be
> > >>>> able to use the pass-through PCI device anymore. The PCI device is
> > >>>> seen and can be mapped, however, the interrupts will not be passed to
> > >>>> the HVM domain anymore (this is visible under a Linux guest as
> > >>>> /proc/interrupts counters remain 0). The behavior on a Windows10 guest
> > >>>> is the same.
> > >>>>
> > >>>> A few interesting hints I noticed:
> > >>>>
> > >>>> - On Dom0, 'lspci -vv' on that PCIe device between the "working" and
> > >>>> the "muted interrupts" states, I noted a difference between the
> > >>>> MSI-X caps:
> > >>>>
> > >>>> - Capabilities: [70] MSI-X: Enable- Count=5 Masked- <-- IRQs will work 
> > >>>> if domain started
> > >>>> + Capabilities: [70] MSI-X: Enable- Count=5 Masked+ <-- IRQs won't 
> > >>>> work if domain started
> > >>>>                                             ^^^^^^^
> > >>
> > >>> IMHO it seems that either your device is not able to perform a reset
> > >>> successfully, or Linux is not correctly performing such reset. I don't
> > >>> think there's a lot that can be done from the Xen side.
> > >>
> > >> Unfortunately for a lot of pci-devices a simple reset as performed by 
> > >> default isn't enough,
> > >> but also almost none support a real pci FLR.
> > >>
> > >> In the distant past Konrad has made a patchset that implemented a bus 
> > >> reset and
> > >> reseting config space. (It piggy backed on already existing libxl 
> > >> mechanism of
> > >> trying to call on a syfs "do_flr" attribute which triggers pciback to 
> > >> perform
> > >> the busreset and rewrite of config space for the device.
> > >>
> > >> I use that patchset ever since for my pci-passtrough needs and it works 
> > >> pretty
> > >> well. I can shutdown an restart VM's with pci devices passed trhough 
> > >> (also AMD
> > >> Radeon graphic cards).
> > > 
> > > Just to confirm the utility of that piece of work: OpenXT also uses an
> > > extended version of that same patch to perform device reset for
> > > passthrough.
> > > 
> > > I've attached a copy of that OpenXT patch to this message and it can
> > > also be obtained from our git repository:
> > > https://github.com/OpenXT/xenclient-oe/blob/f8d3b282a87231d9ae717b13d506e8e7e28c78c4/recipes-kernel/linux/4.9/patches/thorough-reset-interface-to-pciback-s-sysfs.patch
> > > This version creates a sysfs node named "reset_device" and the OpenXT
> > > libxl toolstack is patched to use that node instead of "do_flr".
> > 
> > Nice to hear there are more users of this patch. On #xen on IRC there were 
> > from time to time
> > also users who tried pci-passtrough and ran into this issue (and probably 
> > abandonning the idea
> > since having to restart your host before being able to use your pass 
> > throughed device again
> > defies much of the use case).
> >  
> > > Konrad's original work encountered pushback on upstream acceptance at
> > > the time it was developed. I'm not sure I've found where that
> > > discussion ended. Is there any prospect of a more comprehensive reset
> > > mechanism being accepted into xen-pciback, or elsewhere in the kernel?
> > 
> > Yeah it was nacked by David Vrabel and the discussion somewhat bleeded to 
> > death. 
> > >From what i remember the main issue was with the naming, since it doesn't 
> > >do a FLR,
> > the sysfs hook shouldn't be called "do_flr".
> > 
> > Some other perhaps minor issues i can think of are:
> > - No way to excempt pci-devices from this new way of resetting them.
> >   Perhaps there could be pci devices/topologies were this way of
> >   resetting causes more problems than it solves and could cause a
> >   regression. Unfortunately auto detecting what works doesn't seem to
> >   be possible. On the other hand (though only with my n=10) i haven't 
> > encountered
> >   such a device yet.
> > 
> > - The communication path between libxl and the kernel via sysfs.
> >   I think the preference was for a:
> >   a) having it use a more common used Xen communication channel or
> >   b) having it all self-contained in pci-back. (from my memory and the 
> > openxt patch description
> >      there could be some locking issue when trying to implement it this way,
> >      but the vfio guys had that solved for there reset implementation if i
> >      from one of the comments in there source code (patches by Alex 
> > Williamson
> >      if i remember correctly).
> > 
> > - Not an issue back then when the patch was made, but as the question 
> > earlier to Roger,
> >   the hypervisor seems to grow more interference with pci devices with the 
> > PVH dom0 work.
> >   If and hoow does that relate to pci-back and pci-passthrough and (the 
> > location of) resetting mechanisms ?
> > 
> > 
> > So i think David's NACK was mostly for the patchset having some hackish 
> > cosmetics.
> 
> He didn't like 'do_flr' which made sense as the patchset did not do FLR. It 
> made a bus-reset
> for more than one device (if those devices were assigned to pciback).
> 
> > 
> > On the upside one can conclude that this patchset is now pretty well tested 
> > over the years ;)
> > 
> > Since David has left, perhaps Jurgen/Boris/Konrad could express their views 
> > (again) ?
> > (CC'ed them as well)
> 
> I've asked Govinda (CC-ed) to refresh the patchset against the lastest kernel 
> and
> repost it and see where it goes.
> 

Nice. Looking forward to seeing the refreshed patchset hit the mailinglist! :)


Thanks,

-- Pasi

> > 
> > > As noted in the original LKML threads, vfio has similar relevant pci
> > > device reset retry logic. (Thanks to Rich Persaud for this pointer:)
> > > http://elixir.free-electrons.com/linux/v4.14-rc1/source/drivers/vfio/pci/vfio_pci.c#L1353
> > > 
> > > libvirt also performs similar reset logic, using a direct low level
> > > interface to config space (Thanks to Marek for this pointer, libvirt
> > > is used by Qubes:)
> > > https://github.com/libvirt/libvirt/blob/master/src/util/virpci.c#L929
> > > I thinks this indicates that it would be possible to extend libxl to
> > > do something similar, but that seems less satisfactory compared to
> > > performing the work in a kernel-provided implementation.
> > > 
> > > Is there a way forward to providing this functionality within Xen
> > > software or Linux> Christopher
> > > --
> > > 
> > > openxt.org
> > > 
> > 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.