[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] ATI VGA Passthrough / Xen 4.2 / Linux 3.8.10


  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Fri, 10 May 2013 23:39:35 +0100
  • Delivery-date: Fri, 10 May 2013 22:40:49 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 05/10/2013 09:19 PM, Andrew Bobulsky wrote:

             2) I actually have it working - for 5 minutes or so at a
        time. If
             the problem was the lack of ACS, it wouldn't work at all.


        I just can't help but wonder if it /is/ the problem, though.
          It's the

        only thing I can pin down that our situations have in common as
        far as
        its being the only "non-compatible" portion of the
        implementation, aside
        from the nearly identical behavior, of course. Maybe the AMD
        driver does
        some stupid stuff that ACS can mitigate?  I just wish I knew more :(


    Now you got me thinking... I noticed that when the GPU starts to
    head toward the crash, this appears in the syslog:

    May  6 16:35:51 normandy kernel: pcieport 0000:00:03.0: AER:
    Multiple Uncorrected (Non-Fatal) error received: id=0000

    It certainly makes me wonder.

    Has anyone else seen this error?

    The device ID in question is:

    00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI
    Express Root Port 3 (rev 22)

    which does not bode well...

    Duff hardware?


Hmmm... I'll poke through my syslog at the next crash.  I tried:

        cat /var/log/syslog | grep pcieport
        cat /var/log/syslog.1 | grep pcieport
        dmesg | grep pcieport


Nothing came back from any of those.  I'll see if I can identify any
unique errors myself though!

Worth paying attention to. :)

                 So what might intrigue you the most here is that while
        I'm stuck
                 with
                 a VGA device sitting behind this non-ACS compliant
        switch... My
                 results are almost identical to yours.  Passing one of
        the VGA
                 devices
                 to the DomU, with or without the corresponding HDMI audio
                 doesn't seem
                 to matter, I get this:

                 " it is so intermittent. It works well enough to boot
        up and
                 work with
                 a gaming type load for a few minutes. Then something
        happens that
                 causes the VGA card to require a reset, and it all
        falls apart."

                 Seriously :P


             And you are convinced this is to do with the availability
        of ACS?


        Like I said, it's the only thing that I can pinpoint as being a
        hindrance to compatibility.  I guess my request here is if
        anyone can
        help me determine whether or not that's true?


    What motherboard are you using? Has anyone successfully used it for
    VGA passthrough? I don't think the possibility of both of us having
    similarly duff hardware has been systematically excluded yet.


I think I said it, but I'll link here anyway:
http://www.gigabyte.us/products/product-page.aspx?pid=2957#ov

Indeed, you did. Apologies, it's been a long week. :p

As to whether or not anyone's used it for passthrough before... I've got
no clue.  Probably not too many people, seeing as how I'm essentially
running a custom BIOS :P

BIOSes are getting so crap (except maybe on Asus boards) these days that I'm amazed anything works at all. You wouldn't believe the amount of BIOS buggyness people are encountering on the SR2, and that's now an EOL product that should by now have had most of it's bugs fixed (yeah - right).

                 It eventually likes to BSOD, usually on atikmpag.sys I
        think.
                   Plenty
                 of "an attempt was made to reset the display adapter
        and failed"
                 blah
                 blah blah.


             Yes, all too familiar.

                 This happens 100% of the time if I try to boot with both
                 devices attached.


             Both devices?


        Yes---that is to say both of the VGA controllers from the 6990. The
        relevant portion of my lspci looks like this:
        http://pastebin.com/raw.php?i=__GwekPNAW
        <http://pastebin.com/raw.php?i=GwekPNAW>


    OK, I get it. I seem to remember reading in the archives that dual
    VGA passthrough is problematic (my experience over the years shows
    that multiple GPUs are a false economy of highly questionably benefit).


That's actually pretty much completely accurate.  It drives me
particularly up the wall because I hate running things in full screen,
and crossfire basically doesn't work at all without that :P

I like my full screen gaming - but throw something obscure like an IBM T221 into the mix and things start to get rather non-trivial. T221 is 3840x2400 which is too much for DL-DVI to drive. But it's a 10+ year old monitor design and it actually takes 3xSL-DVI (but there's an adapter available that makes it drivable using 2xDL-DVI instead).

Then you have to stitch the screens together (workable with 2xDL-DVI on XP, you need a Quadro or an Eyefinity card for the driver features to do it on Vista and 7). What I've found back when my old 4870X2 was bleeding edge was that with dual monitors attached, the 2nd GPU never did anything at all (stayed stone cold, performance unaffected by Crossfire).

Since then I've learned my lesson - buy the biggest single GPU you can afford - it's as good as it's going to get. Everything else is going to be hit-and-miss. Debugging other people's products may be fun when you're 14, but I'm two decades too old to not have something better to do with my time. Nowdays I appreciate things that "just work" - the unfortunate thing I'm finding, however, is that there tend to be no things that "just work" that include all the features that I want - which in turn leads to endless debugging of other people's software to get it to do what I want, because apparently, nobody else has tried it before. :-/

        Note: devices 09 and 0a are my "primary" 6990's vga controllers.
          Also,
        my crossfire bridge is disconnected.  I'm working with the other
        card,
        devices 0d and 0e.  I've included the USB card as well in the list
        because I'm using it, but it causes me no problems whatsoever.
          For what
        its worth, that USB card works great in ESXi as well... Highpoint
        enabled ACS on their PEX chips :D

             Just out of interest:

             1) Are you using a multi-socket motherboard?


        Nope!  It's a Gigabyte GA-EX58-EXTREME.  It's LGA1366 with an i7
        920 in
        it.  VT-d support is provided through a hacked BIOS image that I
        found
        on the web a couple years or so ago.


    Having to use a hacked BIOS for VT-d support is not a good sign or a
    good starting point...


Technically, you're right.  AFAIK though, this particular generation of
i7 chips allows for VT-d to be managed entirely by the chipset/bios.

That's just it - I don't like things only manageable by binary blobs with no source code. I'd much rather just have a clean interface (e.g. from /sys/) to just write the relevant registers straight to the hardware to enable/disable features. Otherwise you're at the mercy of motherboard manufacturers who have no interest in supporting a product for people who have already bought it (sale's made, why should they care).

  There's no particular req (however artificial) coming out of the CPUs
for this generation that stipulates VT-d can't be patched in... so I
figured, "why not?"  I was modding my BIOS anyway and decided to use
this one as a base because it had both VT-d and fully updated option
ROMs for all my onboard stuff.  The world of BIOS modding is a /very/
neat one; I highly suggest every nerd spend a few days there at some
point in his life ;)

Last time I checked, this was mostly limited to people using BIOS editors to unhide features. Have things actually progressed to the point where you can add in a specific assembly payload to initialize things differently?

To the point though, it seems very well behaved on everything that
/isn't/ my 6990 :-(

Didn't you mention you had another ATI GPU in another rig that you could borrow temporarily? It might be worth a shot to see if it's the dual GPUs that are foiling you. Especially since they are inevitable on the same PCIe bridge. A standalone single GPU might just work.

Ironically, my Quadro has been refusing to play ball completely today (it worked passably well yesterday, although not as well as my 6450 card, which today seems to be working well enough to get to the login screen without BSOD-ing. Different slot this time, though, so we'll see how it fares in a bit.

[noirqbalance, limiting guest to 3.5GB of RAM]

[screen corruption, white/black lines]

Yeah.  I'm convinced now.  They might be a different color, but they're
in chrome (which uses a GPU accelerated 2d canvas) and they seem to
precede the crash pretty reliably.

Yes, similar here, although I don't use Chrome - I get them in most things, including on the desktop once it has all started to go wrong.

                 though I'm considering a hard-hack: think
                 of a 12v relay and a PCIe extender cable---if a D3D0
        reset actually
                 powers off the slot momentarily but the PSU plugs on
        the card
                 prevent
                 it from working, then I could rig up a switch that ties
        those plugs'
                 power state into the slot itself---it's radical, yes, but
                 possibly the
                 most inventive solution I can think of so far.  I'm
        super curious to
                 see if anyone more knowledgeable than myself thinks it
        would work,
                 because it'd be super cheap to build!  As the saying goes
                 though, I'll
                 "cross that bridge when I come to it." :)


             Interesting. In theory, I think this _should_ work provider
        your PCIe
             bridges support hot-plugging.

             To be certain, you'd have to switch both the PCIe slot and
        (if your card
             uses it) the external power inputs.


        That'd be the idea.  Assuming it works the way I think it does,
        I could
        tap a 12v (I'm pretty sure it's 12v in there) relay into the Vcc
        and GND
        pins of the PCIe slot and use the relay's output to switch the
        Vcc from
        the plug-in cables off of the PSU.  Bears testing with a
        slightly less
        expensive card, but I wouldn't be surprised to see it work!  It'd
        require some case modding for sure though, as the extension
        cable will
        get in the way of properly seating the card.  It could be
        possible to
        build a tap that could be "slipped in" to a card's PCIe slot...
          Short
        of proper FLR support, this could actually very cheaply be built
        into
        the expansion card itself.  I'd suspect that simply adding FLR
        would be
        cheaper on the card manufacturers though. :)


    Just get a case with more slot cutouts on the back than your
    motherboard has slots. Then feed the ribbon to the bottom so the
    card sits in the slot on the case that is below your motherboard -
    no modding required. :)


But... but!  I guess that'd require a mini(?) or MicroATX board.  I'm a
full size to XL ATX (or whatever the monster-sized boards are) kind of
guy.  Guess I just want more slots to pass GPUs to VMs, eh? :)

You don't need a smaller motherboard - you need a bigger case. :)

With your board, you could probably do this with a PC-P80 Armorsuit (one of the few off the shelf cases that will take my SR-2 due to a weird, needlessly oversized form factor - I mean seriously, who needs 7 PCIe x16 slots??).

Hmm... Something just occurred to me - on the SR-2 this could be implemented _TRIVIALLY_! The SR-2 has jumpers to disable/enable each of the PCIe slots. So in theory, all I'd have to do is put together a simple USB controlled witch that would toggle between connecting pins 1-2 and 2-3, and attach it using a normal 3-pin jumper-type header to the jumper block in question. Or (boringly), just wire it up to a suitable button on the front of the case.

I might just have to try this and see what happens (and hope it doesn't make the magic smoke escape from something).

There's supposed to be some cases out there that allow for mounting of
expansion cards on the end of flexible extenders.  Haven't heard about
them in a couple years, but either way chances are pretty good that such
cases aren't exactly affordable... they likely target enterprise
customers or simply have limited runs... economy of scale and all that.
  Probably the "slip-in" type of adapter/approach would be best, but I
don't wanna get ahead of myself on a simple idea that may not even work :P

Usually rack-mount cases.
But it's amazing what you can achieve with a dremel and a power drill in a few minutes. ;)

                 With that in mind, even though I've taken your advice
        and added the
                 config info to my xend files, its entirely
        possible---especially in
                 light of what Casey said---that I'm just Doing It
        Wrong(TM).  It'd
                 likely be beneficial for us both to compare notes on that
                 regard.  If
                 either of you would be willing to help, I could
        probably use some
                 pointers... I've kinda run out of logs to look at with
        my current
                 knowledge on the subject :P


             Certainly - what notes do you propose we compare?


        I'm not completely sure.  If you can point me to the proper files to
        verify that my device has the same PCIe-level compatibility
        issues as
        yours (verify that ACS isn't available to the device and so on)
        then I'd
        call that a step in the right direction.


    Another thing - Do "lspci -vt" - can you put the card in a slot
    where it doesn't share a bridge with any other PCIe devices?


I don't think so.  You should see the built-in bridge... it's implied
slightly up the hierarchy from the two side-by-side 6990 devices, which
itself attaches to the root port at the top:
http://pastebin.com/raw.php?i=4dGmneYi

But the 2 GPUs are inevitably on the same bridge. I think trying a single GPU would definitely be a good next step in troubleshooting.

Wish me luck!

To both of us! :)

Gordan


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.