[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Status of FLR in Xen 4.4
On Fri, 27 Sep 2013 09:48:34 -0400, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: On Fri, Sep 27, 2013 at 02:27:46PM +0100, Gordan Bobic wrote:On Fri, 27 Sep 2013 14:26:31 +0200, Matthias <matthias.kannenberg@xxxxxxxxxxxxxx> wrote: >Hi Gordon, >>I tried your patch on my dom0 kernel and I think it somehow helped in>the sense that now I can reboot the domUs now without crashing the>whole host, but linux domU still gets a blackscreen and windows7 domU>only starts till black screen with (actual movable) cursor, but not >furthor.. this might only be a coincidence, though, have to double >check this.. What patch? Nothing I posted to the list is fit for public consumption yet. You shouldn't be using it unless you really, REALLY know exactly what it does and know exactly what you are trying to achieve. >I tried some other stuff, too: > >1) after domU shutdown rebind both functions to the dom0 drivers, >do a >sysfs reset and re-add to assignable devices -> crashes dom0My experience shows that letting dom0 drivers ever touch the hardwareis a recipe for disaster.>2) after domU shutdown rebind both functions to the dom0 drivers and>readd to assignable devices -> dom0 crashes somtime when domU using >the devices comes up, sometimes not, but no success either way> 3) sysfs reset of the devices within domU seems to be passed through>dom0 (see commands in qemu-log) but no effect It's up to the drivers to do the sensible thing. Nvidia drivers handle this a little more sanely, but if the drivers cannot handle clobbering the device's state into a known state, you are pretty much fighting a losing battle.>Also, I analysed your code and compared it to the stuff in the python>tools of xm and it is the same approach and i don't see any obvious >differences.. I am starting to suspect you aren't actually talking about my code but somebody else's... >Then I tried to replicate the secondary bus reset on >command lind for testing purposes via > > printf 'x40' | dd of=/sys/devices/pci0000:00/0000:00:0b.0/config >bs=1 >seek=$((0x3e)) count=1 conv=notrunc >>but I think I got some endians or offset slightly wrong because after>that xl refuses to give the device (00:0b.0 is the bus of my>2-function vga card I have assigned to my domU) to the domU and later>crashes dom0. > >So I'm a little lost at that point and would welcome some >suggestions. > >Does FLR reset works for any of you for vga cards? If you are talking about VGA cards with _proper_ FLR implementations on PCI level - there is no such thing. In all cases it is down to the domU driver to handle the card in whatever state it is. This works reasonably well with supported Nvidia cards (i.e. Quadro [K][2456]000 and Grid K[12] and equivalent modified GeForce cards (Fermi 4xx and Kepler 6xx/7xx series)). I never managed to get it working properly on any other GPUs. Even with Nvidia cards rebooting can lead to issues. For example, I have two GPUs passed to two different domUs. One is a GTX470 modified to Q5000. The other is a GTX480 modified to Q6000. The domU with Q5000 always handled reboots reasonably reliably. The one with a Q6000 did not. I since switched the one with a Q6000 to a QK5000 (modified GTX680), and now the reboots seem to work reasonably reliably, but I have found that there is still a crash if the monitor on the card changes between shutdown and restart - I'm guessing the card remembers it's state and if it isn't consistent when it returns, driver gets confused. I have other issues (see recent thread about Nvidia passthrough from David), but they seem to be specific to my setup.This state thing. If one were to capture the cards state before doing any PCI passthrough in and tried to write it exactly back would that eliminate some of these issues? I know that the pciback does that to the PCI configuration values. (Or at least it should) whenever a device has been de-assigned from a guest - or unplugged. But I presume that the rest (the BAR contents) are not in any way saved/restored. What would be the worst if one wrote exactly all of the MMIO values back as they were? (Probably a recipe for disaster, but who knows).It's not perfect, but it's the only workable solution I have found. That doesn't cover the entire state of the device. What about the rest of the device memory and states of all the proprietary registers? Since there are open source FB and accelerated drivers available for Radeon cards, enough is publicly known about them to be able to achieve suitable resetting. How difficult that might be to achieve, I have no idea. I have seen the open source Radeon Xorg driver successfully reset the GPU when the GPU stopped responding without taking Xorg or any of the running apps down in the process, so something similar to what it does might just be good enough. Whether it is a good idea to adopt anything but a fully hands-off approach to any passthrough hardware is a different question entirely. Gordan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |