[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Radeon DRM dom0 issues



On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote on 01/24/2014 
> 09:49:38 AM:
> 
> > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > To: Michael D Labriola <mlabriol@xxxxxxxx>, 
> > Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, 
> > michael.d.labriola@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx, xen-devel-
> > bounces@xxxxxxxxxxxxx
> > Date: 01/24/2014 09:50 AM
> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > 
> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
> > > xen-devel-bounces@xxxxxxxxxxxxx wrote on 01/21/2014 04:59:05 PM:
> > > 
> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > > > To: Michael D Labriola <mlabriol@xxxxxxxx>, 
> > > > Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, 
> > > > michael.d.labriola@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx
> > > > Date: 01/21/2014 04:59 PM
> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > Sent by: xen-devel-bounces@xxxxxxxxxxxxx
> > > > 
> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> > > > > Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote on 01/20/2014 
> 
> > > > > 10:38:27 AM:
> > > > > 
> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> > > > > > To: Michael D Labriola <mlabriol@xxxxxxxx>, 
> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, 
> > > > > > michael.d.labriola@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx
> > > > > > Date: 01/20/2014 10:38 AM
> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > > > 
> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola 
> wrote:
> > > > > > > Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> wrote on 01/20/2014 
> > > 10:14:36 
> > > > > AM:
> > > > > > > 
> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>
> > > > > > > > To: Michael D Labriola <mlabriol@xxxxxxxx>, 
> > > > > > > > Cc: xen-devel@xxxxxxxxxxxxx, michael.d.labriola@xxxxxxxxx
> > > > > > > > Date: 01/20/2014 10:14 AM
> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > > > > > 
> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola 
> 
> > > wrote:
> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM?  I'm having 
> > > consistent 
> > > > > > > crashes 
> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and 
> > > unusably 
> > > > > 
> > > > > > > slow 
> > > > > > > > > graphics with a newer HD7000 (can see each line refresh 
> > > > > indiviually on 
> > > > > > > 
> > > > > > > > > radeonfb tty).  All 3 systems seem to work fine bare 
> metal.
> > > > > > > > 
> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you 
> mean?
> > > > > > > 
> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0. 
> The 
> > > 
> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device 
> for 
> > > a 
> > > > > plain 
> > > > > > > text console login.
> > > > > > 
> > > > > > So sluggish is probably due to the PAT not being enabled. This 
> patch
> > > > > > should be applied:
> > > > > > 
> > > > > > lkml.org/lkml/2011/11/8/406
> > > > > > 
> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> > > > > > 
> > > > > > and these two reverted:
> > > > > > 
> > > > > >  "xen/pat: Disable PAT support for now."
> > > > > >  "xen/pat: Disable PAT using pat_enabled value."
> > > > > > 
> > > > > > Which is to say do:
> > > > > > 
> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> > > > > 
> > > > > Thanks!  I cherry-picked that patch out of your testing tree, 
> reverted 
> > > 
> > > > > those 2 commits, recompiled and installed.  Definitely fixed the 
> HD 
> > > 7000 
> > > > > sluggishness and appears to have fixed the R600 crashes (although 
> it's 
> > > 
> > > > > only been running a few hours).
> > > > > 
> > > > > How come that patch didn't get into mainline?  It looks pretty 
> > > innocuous 
> > > > > to me...
> > > > 
> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't 
> had
> > > > the chance nor time to implement it.
> > > 
> > > I see.  Well, I've got a handful of boxes in my lab that need that 
> patch 
> > > to be usable.  If you do come up with a more mainline-able solution, 
> I'd 
> > > gladly test it for you.  ;-)
> > 
> > Thank you!
> 
> Uh, oh.  Looks like those reverts and patches didn't entirely fix my 
> problem.  My box with the HD5450 (r600 gallium3d) started going bonkers 
> again yeserday.  After being solid as a rock for 2 weeks as my primary 
> workstation, X has crashed a half dozen or so times so far this week. I've 
> been in Xen with 2 paravirtual linux guests running almost constantly for 
> this whole period.  I don't understand what's changed, but my system has 
> been entirely unstable now.  I did recompile my kernel... but I all did 
> was merge the v3.13.1 stable commit into my working tree and turn a few 
> things on (netfilter, wifi, a couple drivers turned on here and there).  I 
> just went and verified that those patches are still applied in my tree 
> (i.e., I didn't accidentally undo them).  I'm scratching my head (and 
> staring at a TTY login).
> 
> When X crashes, my kernel log prints a couple dozen iterations of this. 3d 
> acceleration no longer functions unless I reboot.  If memory serves, the 
> unpatched behavior upon X crash was that the kernel continued to spew 
> these errors until the whole box locked up.  At least that's not happening 
> any more... ;-)
> 
> [  702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
> [  702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool 
> (r:-12)!
> [  704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
> [  704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool 
> (r:-12)!
> [  704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate 
> GEM object (8192, 2, 4096, -12)
> 
> and here's a slightly different variant that happened while I was typing 
> this email (on a different machine, luckily):
> 
> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0
> [ 3114.491717] usb 9-1: USB disconnect, device number 2
> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool 
> (r:-12)!
> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool 
> (r:-12)!
> [64348.297561] [TTM] Buffer eviction failed
> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool 
> (r:-12)!
> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate 
> GEM object (16384, 2, 4096, -12)
> 
> Any ideas?

yes. I believe you have a memory leak. As in, some driver (or X) is
eating up the memory and not giving up enough. That means the TTM
layer is hitting its ceiling of how much memory it can allocate.

Now finding the culprit is going to be a bit hard.

You could use:

[root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool 
         pool      refills   pages freed    inuse available     name
           wc          259           224      808        4 nouveau 0000:05:00.0
       cached      3403058      13561071    51158        3 radeon 0000:01:00.0
       cached           25             0       96        4 nouveau 0000:05:00.0

to figure out if my thinking is really true. You should have a huge
'inuse' count and almost no 'available'.


But that will get us just to confirm that yes - you have a big usage
of memory and it is hitting the ceiling.

Now to actually figure out which application is hanging on these - that
I am not sure about. I think there is some drm info tool to investigate
how many pages each application is using. You can leave it running and
see which app is gulping up the memory. But I am not sure which
tool that is (if there was one). 

Well, lets do one step at a time - see if my theory is correct first.

> 
> 
> ---
> Michael D Labriola
> Electric Boat
> mlabriol@xxxxxxxx
> 401-848-8871 (desk)
> 401-848-8513 (lab)
> 401-316-9844 (cell)
> 
> 
> 
>  
> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.