[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Radeon DRM dom0 issues
On Wed, Feb 19, 2014 at 12:04 PM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: > On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote: >> Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote on 01/24/2014 >> 09:49:38 AM: >> >> > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> >> > To: Michael D Labriola <mlabriol@xxxxxxxx>, >> > Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, >> > michael.d.labriola@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx, xen-devel- >> > bounces@xxxxxxxxxxxxx >> > Date: 01/24/2014 09:50 AM >> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues >> > >> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote: >> > > xen-devel-bounces@xxxxxxxxxxxxx wrote on 01/21/2014 04:59:05 PM: >> > > >> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> >> > > > To: Michael D Labriola <mlabriol@xxxxxxxx>, >> > > > Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, >> > > > michael.d.labriola@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx >> > > > Date: 01/21/2014 04:59 PM >> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues >> > > > Sent by: xen-devel-bounces@xxxxxxxxxxxxx >> > > > >> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote: >> > > > > Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote on 01/20/2014 >> >> > > > > 10:38:27 AM: >> > > > > >> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> >> > > > > > To: Michael D Labriola <mlabriol@xxxxxxxx>, >> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx>, >> > > > > > michael.d.labriola@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx >> > > > > > Date: 01/20/2014 10:38 AM >> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues >> > > > > > >> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola >> wrote: >> > > > > > > Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> wrote on 01/20/2014 >> > > 10:14:36 >> > > > > AM: >> > > > > > > >> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@xxxxxxxxxx> >> > > > > > > > To: Michael D Labriola <mlabriol@xxxxxxxx>, >> > > > > > > > Cc: xen-devel@xxxxxxxxxxxxx, michael.d.labriola@xxxxxxxxx >> > > > > > > > Date: 01/20/2014 10:14 AM >> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues >> > > > > > > > >> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola >> >> > > wrote: >> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having >> > > consistent >> > > > > > > crashes >> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and >> > > unusably >> > > > > >> > > > > > > slow >> > > > > > > > > graphics with a newer HD7000 (can see each line refresh >> > > > > indiviually on >> > > > > > > >> > > > > > > > > radeonfb tty). All 3 systems seem to work fine bare >> metal. >> > > > > > > > >> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you >> mean? >> > > > > > > >> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0. >> The >> > > >> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device >> for >> > > a >> > > > > plain >> > > > > > > text console login. >> > > > > > >> > > > > > So sluggish is probably due to the PAT not being enabled. This >> patch >> > > > > > should be applied: >> > > > > > >> > > > > > lkml.org/lkml/2011/11/8/406 >> > > > > > >> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2) >> > > > > > >> > > > > > and these two reverted: >> > > > > > >> > > > > > "xen/pat: Disable PAT support for now." >> > > > > > "xen/pat: Disable PAT using pat_enabled value." >> > > > > > >> > > > > > Which is to say do: >> > > > > > >> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a >> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1 >> > > > > >> > > > > Thanks! I cherry-picked that patch out of your testing tree, >> reverted >> > > >> > > > > those 2 commits, recompiled and installed. Definitely fixed the >> HD >> > > 7000 >> > > > > sluggishness and appears to have fixed the R600 crashes (although >> it's >> > > >> > > > > only been running a few hours). >> > > > > >> > > > > How come that patch didn't get into mainline? It looks pretty >> > > innocuous >> > > > > to me... >> > > > >> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't >> had >> > > > the chance nor time to implement it. >> > > >> > > I see. Well, I've got a handful of boxes in my lab that need that >> patch >> > > to be usable. If you do come up with a more mainline-able solution, >> I'd >> > > gladly test it for you. ;-) >> > >> > Thank you! >> >> Uh, oh. Looks like those reverts and patches didn't entirely fix my >> problem. My box with the HD5450 (r600 gallium3d) started going bonkers >> again yeserday. After being solid as a rock for 2 weeks as my primary >> workstation, X has crashed a half dozen or so times so far this week. I've >> been in Xen with 2 paravirtual linux guests running almost constantly for >> this whole period. I don't understand what's changed, but my system has >> been entirely unstable now. I did recompile my kernel... but I all did >> was merge the v3.13.1 stable commit into my working tree and turn a few >> things on (netfilter, wifi, a couple drivers turned on here and there). I >> just went and verified that those patches are still applied in my tree >> (i.e., I didn't accidentally undo them). I'm scratching my head (and >> staring at a TTY login). >> >> When X crashes, my kernel log prints a couple dozen iterations of this. 3d >> acceleration no longer functions unless I reboot. If memory serves, the >> unpatched behavior upon X crash was that the kernel continued to spew >> these errors until the whole box locked up. At least that's not happening >> any more... ;-) >> >> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2 >> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool >> (r:-12)! >> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0 >> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool >> (r:-12)! >> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate >> GEM object (8192, 2, 4096, -12) >> >> and here's a slightly different variant that happened while I was typing >> this email (on a different machine, luckily): >> >> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0 >> [ 3114.491717] usb 9-1: USB disconnect, device number 2 >> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3 >> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool >> (r:-12)! >> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0 >> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool >> (r:-12)! >> [64348.297561] [TTM] Buffer eviction failed >> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0 >> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool >> (r:-12)! >> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate >> GEM object (16384, 2, 4096, -12) >> >> Any ideas? > > yes. I believe you have a memory leak. As in, some driver (or X) is > eating up the memory and not giving up enough. That means the TTM > layer is hitting its ceiling of how much memory it can allocate. > > Now finding the culprit is going to be a bit hard. > > You could use: > > [root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool > pool refills pages freed inuse available name > wc 259 224 808 4 nouveau > 0000:05:00.0 > cached 3403058 13561071 51158 3 radeon 0000:01:00.0 > cached 25 0 96 4 nouveau > 0000:05:00.0 > > to figure out if my thinking is really true. You should have a huge > 'inuse' count and almost no 'available'. My /sys/kernel/debug/dri directory has a 0 and a 64 entry, which appear to always have the same contents. Is that normal? My /sys/kernel/debug/dri/0/ttm_dma_page_pool file doesn't exist bare metal... only in Xen. Is that normal? pool refills pages freed inuse available name cached 15190 59551 1205 4 radeon 0000:01:00.0 If I watch that file while creating xterms, moving them around, etc, I can see the number available fluctuate between 3 and 6. This is true, even on my box w/ the newer R7 card in it, which hasn't gotten that GEM error message (yet?). > > But that will get us just to confirm that yes - you have a big usage > of memory and it is hitting the ceiling. > > Now to actually figure out which application is hanging on these - that > I am not sure about. I think there is some drm info tool to investigate > how many pages each application is using. You can leave it running and > see which app is gulping up the memory. But I am not sure which > tool that is (if there was one). > > Well, lets do one step at a time - see if my theory is correct first. -- Michael D Labriola 21 Rip Van Winkle Cir Warwick, RI 02886 401-316-9844 (cell) 401-848-8871 (work) 401-234-1306 (home) _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |