[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Odd domU Reboot Bug (possibly VGA passthrough related)



On Wed, 3 Jul 2013 10:18:57 +0100, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
On Tue, 2013-07-02 at 21:44 +0100, Gordan Bobic wrote:
On 07/02/2013 09:42 AM, Ian Campbell wrote:
> On Mon, 2013-07-01 at 19:22 +0100, Gordan Bobic wrote:
>> The thing that bothers me is that NVRM seems to be what's complaining, >> but the GPU being passed through is firmly under control of xen-pciback.
>
> Do the xl -vvv logs or the logs under /var/log/xen/ say anything about
> rebinding the device at all?

Nothing at all.

> AIUI pci-assignable-add is supposed to unbind the original driver and
> bind to pciback and nothing is supposed to rebind until
> pci-assignable-remove, but perhaps something is (inadvertently)
> happening on domain shutdown too?
>
> If you examine /sys you should be able to see which driver is bound to
> the device, which might give a clue.

I'm quite certain it never unbinds - lspci -vvv shows the device still
being handled by the pciback driver.

Very strange that the NV driver is getting involved then.

That may have been just a fluke - it doesn't happen every time. Once the
PCI memory space starts getting stomped all over all bets are off WRT
what might happen.

Speaking of which - does qemu-xen in 4.2.x allocate the BARs
consistently / deterministically? I'm wondering it this could be caused
by the first initialization getting one set of BAR ranges, but the
second time it gets mapped somewhere else, and something between
qemu-xen, the driver and the card itself gets confused and goes
wrong.

Which also leads me to wondering if always ensuring that pBAR = vBAR might
be a good and desirable thing for everything (which might also improve
passthrough compatibility with VGA and other BAR-heavy devices).

> If you just nuke the NV driver from dom0 altogether does that help? What > about if you hide the device via the kernel command line rather than
> dynamically (assuming that works in your setup)?

I added xen-pciback module to initramfs and made sure it loads. I still have to manually add the USB controllers manually, though, because the USB driver appears to be built in on my kernel. Either way, this doesn't
change the situation, still works fine after a fresh reboot, but not
after a full VM shutdown.

But did you remove the nv.ko from dom0 altogher, ensuring it is never
loaded?

If you are referring to nvidia.ko, no, I didn't - I need it for dom0
to work properly. nvidiafb.ko is explicitly blacklisted (as is
nvidia.ko but the nvidia Xorg driver loads it anyway).

The pattern of events is quite consistent:
[...]
Time to start experimenting with different slots again, it seems...

I'm afraid most of the intricacies of this stuff are completely beyond me. You theory about bridges and slots sounds plausible so far as I am
qualified to comment though.

Last time I was fighting this with PCI memory stomps, making sure that
the VGA card was the only thing on the PCIe bridge chain seemed to help,
and the symptoms were very similar WRT AER errors getting thrown all
over the place. Potentially another problem that might implicitly go
away if pBAR=vBAR were to become the default...

Gordan

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.