[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.10



Hello Gordan, Casey,

On May 9, 2013, at 2:05 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:

> On 05/09/2013 06:35 PM, Casey DeLorme wrote:
>> Thanks for posting the results Gordan, unfortunate that it isn't
>>
>>       working as well as we hoped.
>>
>>
>>   I haven't given up _quite_ yet.
>>
>>   I discovered yesterday that it _looks liks_ one of my PCIe slots is
>>   actually duff (two different GPUs both fail to detect properly in it
>>   but work fine in other slots).
>>
>>   If it turns out to be a duff slot, there's no telling what else
>>   might be duff on the motherboard and how it might affect various
>>   things, even though several days of full load stability testing
>>   passed.
>>
>>   So some more bare-metal testing seems to be called for - right now I
>>   am not prepared to disregard the possibility that maybe I have a
>>   hardware issue somewhere that despite EDAC and ECC on everything,
>>   remains undetected and unreported in the logs.
>>
>>
>> I hope you manage to resolve it, though I feel the NF200 will be the
>> larger challenge.
>
> I hope I'll resolve it, too, but right now I am not convinced that the NF200 
> is actually the cause of my problems. My gut feeling says that if I can get 
> it working for 5 minutes at a time, something less fundamental than the NF200 
> PCIe routers are the cause of the problems.

I don't know if I'd be so quick to jump to that conclusion.... I'll explain :)

So the reason I asked about ACS enforcement is because I'm currently
trying to pass my Radeon 6990 into a VM.  I tried this a while back,
but only with ESXi.  After futzing with it for a day or two, I had to
quit because while I had VT-d, and the ESXi install said Passthrough
was supported, I ended up in a "this host requires a reboot before
this device can be assigned to a vm" loop of some sort.  Hours of
investigation revealed that the PEX 8647 (or whatever it is, Google
knows :P) which is the PCIe switch built in to the board of the 6990
is *supposed* to support ACS... but it's seemingly switched off.

I'd love to attempt to flash the chip if anyone can provide guidance.
Any fellow nerds care to help destroy---I mean fix!  Yeah, fix...---a
PCIe chip that requires an NDA to get the tools for... I'm down.
Maybe I should email AMD or PLX.  Back to the point though! ;)

So what might intrigue you the most here is that while I'm stuck with
a VGA device sitting behind this non-ACS compliant switch... My
results are almost identical to yours.  Passing one of the VGA devices
to the DomU, with or without the corresponding HDMI audio doesn't seem
to matter, I get this:

" it is so intermittent. It works well enough to boot up and work with
a gaming type load for a few minutes. Then something happens that
causes the VGA card to require a reset, and it all falls apart."

Seriously :P

It eventually likes to BSOD, usually on atikmpag.sys I think.  Plenty
of "an attempt was made to reset the display adapter and failed" blah
blah blah.  This happens 100% of the time if I try to boot with both
devices attached.  The first time I boot it up, the driver isn't
installed so it'll work until just before auto-login reaches the
desktop, but after that I can't boot at all with both VGA devices
attached. I'd love to explore more, but I'm running out of places to
look for solutions to my problem that don't involve my credit card and
some new hardware.  In a fit of delicious irony, my problem is almost
identical to yours---if only I'd bought some cheaper stuff it'd
probably all work just great :D

The only single GPU cards I have are the Radeon 5850s in the AMD box I
have.  I'm just a little reticent to tear the thing apart though cause
it gets used a lot.  I think my next step is to look for a video card
that properly supports FLR, though I'm considering a hard-hack: think
of a 12v relay and a PCIe extender cable---if a D3D0 reset actually
powers off the slot momentarily but the PSU plugs on the card prevent
it from working, then I could rig up a switch that ties those plugs'
power state into the slot itself---it's radical, yes, but possibly the
most inventive solution I can think of so far.  I'm super curious to
see if anyone more knowledgeable than myself thinks it would work,
because it'd be super cheap to build!  As the saying goes though, I'll
"cross that bridge when I come to it." :)

>>          2) My motherboard's PCIe slots are behind NF200 PCIe bridges
>>       (yes,
>>       EVGA have decided in their infinite wisdom to put all 7 PCIe slots
>>       behind NF200s, none are directly attached to the Intel NB).
>>
>>         I'm so sorry :P. NF200 has probably caused a lot of xen
>>       tinkerers to
>>         utter a few dozen cuss words a piece.
>>
>>         I can believe that. What is the solution, though?
>>
>>         The thing that drives me really nuts about the issues I'm seeing
>>       (which may or may not be specifically related to the NF200) is
>>       that it
>>       is so intermittent. It works well enough to boot up and work with a
>>       gaming type load for a few minutes. Then something happens that
>>       causes
>>       the VGA card to require a reset, and it all falls apart.
>>
>>       My solution was to buy another motherboard, I had no luck at all
>>       passing the devices behind the NF200, and similar to your situation
>>       all but one PCIe slot on that board was behind that bridge.
>>
>>
>>   Did you not manage to get it working at all? Or was it just
>>   intermittent like in my case? I can typically get about 5 minutes of
>>   gaming out of my ATI card before it all goes wrong.
>>
>>   Ironically, I was thinking about an Asus Sabertooth with an 8-core AMD,
>>   but opted to go for broke and get a couple of 6-core Xeons and an
>>   EVGA SR-2. It turns out, a solution that is 4x more expensive isn't
>>   actually better... :(
>>
>>
>> I was unable to get it working at all.  The NF200 simply threw errors
>> that 100% prevented me from passing the device.  I think it was missing
>> a number of specific features required for passthrough, and I vaguely
>> remember running lspci -vvv to verify what was missing.  Perhaps not all
>> NF200's are created equal?
>
> The only logged issue I had with the NF200s was the lack of ACS, which can be 
> disabled as I mentioned on this thread (at least if you are using the xm 
> stack). After I disabled that PCI passthrough has been working OK. It's just 
> VGA passthrough BSOD-ing after some minutes that is causing me problems.

In reading up on the wiki, there does indeed seem to be a lot more
info regarding the use of xl and PCI Passthrough today than the last
time I looked.  It seems that these types of configuration options are
set on a domain-by-domain basis, or even by device; docs say that
things like VPCI vs direct PASS mapping of slot layout(?) is actually
configured at the device level either in your DomU config file (like:
pci = ['0:d:0.0, pci-just-forking-work-damn-you]) or via xl (like: xl
pci-attach 1 0:d:0.0 pci-just-forking-work-damn-you).

With that in mind, even though I've taken your advice and added the
config info to my xend files, its entirely possible---especially in
light of what Casey said---that I'm just Doing It Wrong(TM).  It'd
likely be beneficial for us both to compare notes on that regard.  If
either of you would be willing to help, I could probably use some
pointers... I've kinda run out of logs to look at with my current
knowledge on the subject :P

>>          What about with PCIe devices behind NF200 bridges? I know the
>>       NF200s
>>       don't support PCI ACS, but that is a security feature (which I have
>>       disabled enforcement of to get this far), and AFAIK shouldn't
>>       actually
>>       affect the basic PCI passthrough capability.
>>
>>         Question: how'd you disable ACS?  I think it may be causing me
>>       some
>>       issues.
>>
>>         Put:
>>
>>         (pci-passthrough-strict-check no)
>>         (pci-dev-assign-strict-check no)
>>
>>         in /etc/xen/xend-config.sxp
>>
>>         If it was causing you issues, however, I'd expect you to find
>>       errors
>>       in logs pointing at it.
>>
>>       As I understand the xend-config.sxp [1] is for the xm toolstack and
>>       deprecated Xend service.
>>
>>
>>   xm toolstack and xend are what I am using. I have read reports of issues
>>   with VGA passthrough using the xl stack so I didn't even attempt to
>>   use it.
>>
>>
>> The xm toolstack was deprecated in version 4.1.  I read that it had not
>> been updated in months due to a lack of maintainers.
>
> I heard that xl is still feature-incomplete and experimental, and problematic 
> with VGA passthrough.
>
>> I did try xm back
>> when I started, the passthrough worked but had the same problems I had
>> when I began testing xl.  I have been using xl since then.  My logic was
>> simply "why become dependent on a tool that is no-longer maintained and
>> may be removed from the next release?"
>
> I'm not wedded to any particular tool stack, I'm happy to use whatever works. 
> But since libvirt and virt-manager are still using xm, and since I have seen 
> recent reports of xl being problematic for VGA passthrough as well as there 
> being no apparent way to disable ACS requirements with the xl stack, that 
> rules it out for me completely at the moment.

The xm stack was rather trying for me.  It's like it only wanted to
throw errors at me when I did PCI stuff.  Whereas xl has seemingly
been more than happy to do whatever I tell it.  Though I admit chances
are pretty good I was just running around, haphazardly using the wrong
version of python or something.  Given our nearly identical results
thus far, I'd wager that the toolstack itself isn't really the source
of our problems.  If that's true, though, the easy solution is likely
out the window :(

>> Does anyone know whether the xm toolstack been modified since 4.1 to
>> accommodate changes with Xen 4.2?  If it has not, it might be worth
>> considering xl.
>
> Does anyone know how to disable the ACS bridge requirement with the xl stack?

I'll second that question!

>>       Perhaps I am confused, or things changed while I wasn't looking, but
>>       for me enabling Xend breaks the xl toolstack.  My understanding
>>       is it
>>       was for the xm toolstack only and deprecated with 4.2.  Any chance
>>       you can share how you configured it to work?  Apparently it is
>>       required to get libvirt working, which I also did not know was
>>       compatible with Xen 4.2.
>>
>>
>>   It is possible I'm the one doing it wrong. I'm on EL6, and using
>>   virt-manager (at least for things it is willing to do), and that
>>   defaults to the xm stack and xend.
>>
>>   For what it's worth, it works for the most part - apart from VGA
>>   passthrough crashing within 5 minutes of gaming.
>>
>>
>> If you are using xm then it makes sense, as libvirt seems to require
>> xm/xend to be loaded in order to function.
>>
>> There are more upgrade notes
>> <http://wiki.xen.org/wiki/MigrationGuideToXen4.1%2B#Toolstack_upgrade_notes> 
>> about
>> xend now, so that is new to me.  According to the Xen Man Pages the
>> xend-config.sxp file doesn't have the flags you added; can you link to
>> resources that mentioned them?  I have not seen xl equivalents for your
>> xend configuration, so I guess xm does have some features xl does not still.
>
> This mentions it, among others:
> http://wiki.xen.org/wiki/Xen_PCI_Passthrough
>
> Google for
> xen pci-passthrough-strict-check pci-dev-assign-strict-check
>
> and you should find some relevant things easily enough.
>
> Gordan
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxx
> http://lists.xen.org/xen-users

Best Regards,
Andrew

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.