[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Status of FLR in Xen 4.4



On Fri, Sep 27, 2013 at 02:27:46PM +0100, Gordan Bobic wrote:
> On Fri, 27 Sep 2013 14:26:31 +0200, Matthias
> <matthias.kannenberg@xxxxxxxxxxxxxx> wrote:
> >Hi Gordon,
> >
> >I tried your patch on my dom0 kernel and I think it somehow helped in
> >the sense that now I can reboot the domUs now without crashing the
> >whole host, but linux domU still gets a blackscreen and windows7 domU
> >only starts till black screen with (actual movable) cursor, but not
> >furthor.. this might only be a coincidence, though, have to double
> >check this..
> 
> What patch? Nothing I posted to the list is fit for public
> consumption yet. You shouldn't be using it unless you really,
> REALLY know exactly what it does and know exactly what you
> are trying to achieve.
> 
> >I tried some other stuff, too:
> >
> >1) after domU shutdown rebind both functions to the dom0 drivers,
> >do a
> >sysfs reset and re-add to assignable devices -> crashes dom0
> 
> My experience shows that letting dom0 drivers ever touch the hardware
> is a recipe for disaster.
> 
> >2) after domU shutdown rebind both functions to the dom0 drivers and
> >readd to assignable devices -> dom0 crashes somtime when domU using
> >the devices comes up, sometimes not, but no success either way
> > 3) sysfs reset of the devices within domU seems to be passed through
> >dom0 (see commands in qemu-log) but no effect
> 
> It's up to the drivers to do the sensible thing. Nvidia drivers
> handle this a little more sanely, but if the drivers cannot handle
> clobbering the device's state into a known state, you are pretty
> much fighting a losing battle.
> 
> >Also, I analysed your code and compared it to the stuff in the python
> >tools of xm and it is the same approach and i don't see any obvious
> >differences..
> 
> I am starting to suspect you aren't actually talking about my code
> but somebody else's...
> 
> >Then I tried to replicate the secondary bus reset on
> >command lind for testing purposes via
> >
> > printf 'x40' | dd of=/sys/devices/pci0000:00/0000:00:0b.0/config
> >bs=1
> >seek=$((0x3e)) count=1 conv=notrunc
> >
> >but I think I got some endians or offset slightly wrong because after
> >that xl refuses to give the device (00:0b.0 is the bus of my
> >2-function vga card I have assigned to my domU) to the domU and later
> >crashes dom0.
> >
> >So I'm a little lost at that point and would welcome some
> >suggestions.
> >
> >Does FLR reset works for any of you for vga cards?
> 
> If you are talking about VGA cards with _proper_ FLR implementations
> on PCI level - there is no such thing. In all cases it is down to
> the domU driver to handle the card in whatever state it is. This
> works reasonably well with supported Nvidia cards (i.e.
> Quadro [K][2456]000 and Grid K[12] and equivalent modified GeForce
> cards (Fermi 4xx and Kepler 6xx/7xx series)). I never managed to
> get it working properly on any other GPUs.
> 
> Even with Nvidia cards rebooting can lead to issues. For example,
> I have two GPUs passed to two different domUs. One is a GTX470
> modified to Q5000. The other is a GTX480 modified to Q6000. The
> domU with Q5000 always handled reboots reasonably reliably. The
> one with a Q6000 did not. I since switched the one with a Q6000
> to a QK5000 (modified GTX680), and now the reboots seem to work
> reasonably reliably, but I have found that there is still a
> crash if the monitor on the card changes between shutdown and
> restart - I'm guessing the card remembers it's state and if it
> isn't consistent when it returns, driver gets confused. I have
> other issues (see recent thread about Nvidia passthrough from
> David), but they seem to be specific to my setup.

This state thing. If one were to capture the cards state before
doing any PCI passthrough in and tried to write it exactly
back would that eliminate some of these issues?

I know that the pciback does that to the PCI configuration values.
(Or at least it should) whenever a device has been de-assigned
from a guest - or unplugged.

But I presume that the rest (the BAR contents) are not in any
way saved/restored. What would be the worst if one wrote exactly
all of the MMIO values back as they were?

(Probably a recipe for disaster, but who knows).
> 
> It's not perfect, but it's the only workable solution I have
> found.
> 
> Gordan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.