Xen project Mailing List

Re: [Xen-devel] System freeze with IGD passthrough

To: Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

From: "G.R." <firemeteor@xxxxxxxxxxxxxxxxxxxxx>

Date: Thu, 20 Dec 2012 00:04:01 +0800

Delivery-date: Wed, 19 Dec 2012 16:04:35 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Wed, Dec 19, 2012 at 2:20 PM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> wrote: > Adding Jean, the author to the opregion patch. > > Jean, I believe the warning is due to the offset within the page. > To accommodate the offset, you would need to reserve another page for it. > Will the extra page cause any unexpected problem? > > The original thread is about an instability issue that directly freeze the > host. > I believe this warning above should not has such effect. > What do you think? And any suggestion? > Jean appears to be no longer reach able. The warning I found turns out to be not relevant. According to the OpRegion spec, the tail part is reserved and should never be touched by the guest. But anyway, I had a local fix to get rid of the warning, but reserving one more page and map it when the host opregion is not page aligned. I'll send it to a separate thread. Back to the topic. I updated to xen 4.2.1 and tried three times tonight. Two of them lead to total freeze with no error log available, after game playing for a couple of minutes. And the last try ended up with GPU hang after 10+ minutes of game playing. This is a guest only hang. But I still have no way to check GPU error state even it has been collected: [ 1553.588076] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1553.592112] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 1582.004075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1597.220075] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 1613.220074] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung I'm wondering if the two syndromes are due to the same underlying cause. But I guess a GPU hang caused by guest driver issue should not freeze the host. Is it true? I'm going to try more with different config -- different kernel version, with / without PVOPS, native run vs VM etc. But this is kind of blindly since I have no clue at all. If you have anything to suspect, it will be highly appreciated. Thanks, Timothy > Thanks, > Timothy > > On Wed, Dec 19, 2012 at 1:28 AM, G.R. <firemeteor@xxxxxxxxxxxxxxxxxxxxx> > wrote: >> Hi Stefano, >> >> I recently tried to play some 3D games on my linux guest. >> The game starts without problem but it freezes the entire system after >> a some time (a minute or so?). >> Here I mean both the host and domU are not responsive anymore. >> The ssh freezes and i had to shutdown the machine using power button >> directly. >> >> I did not find anything obvious from the host log. But from the guest, >> I can find this: >> >> Dec 18 20:28:38 debvm kernel: [ 0.899860] resource map sanity check >> conflict: 0xfeff5018 0xfeff7017 0xfeff7000 0xffffffff reserved >> Dec 18 20:28:38 debvm kernel: [ 0.899862] ------------[ cut here >> ]------------ >> Dec 18 20:28:38 debvm kernel: [ 0.899869] WARNING: at >> arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2c4/0x33c() >> Dec 18 20:28:38 debvm kernel: [ 0.899870] Hardware name: HVM domU >> Dec 18 20:28:38 debvm kernel: [ 0.899872] Info: mapping multiple >> BARs. Your kernel is fine. >> Dec 18 20:28:38 debvm kernel: [ 0.899873] Modules linked in: >> Dec 18 20:28:38 debvm kernel: [ 0.899878] Pid: 1, comm: swapper/0 >> Not tainted 3.6.9 #4 >> Dec 18 20:28:38 debvm kernel: [ 0.899892] Call Trace: >> Dec 18 20:28:38 debvm kernel: [ 0.899896] [<ffffffff8103d194>] ? >> warn_slowpath_common+0x76/0x8a >> Dec 18 20:28:38 debvm kernel: [ 0.899898] [<ffffffff8103d240>] ? >> warn_slowpath_fmt+0x45/0x4a >> Dec 18 20:28:38 debvm kernel: [ 0.899900] [<ffffffff81032a6c>] ? >> __ioremap_caller+0x2c4/0x33c >> Dec 18 20:28:38 debvm kernel: [ 0.899902] [<ffffffff812c3be3>] ? >> intel_opregion_setup+0x9c/0x201 >> Dec 18 20:28:38 debvm kernel: [ 0.899904] [<ffffffff812bcb75>] ? >> intel_setup_gmbus+0x175/0x19d >> Dec 18 20:28:38 debvm kernel: [ 0.899907] [<ffffffff8128a37a>] ? >> i915_driver_load+0x548/0x90d >> Dec 18 20:28:38 debvm kernel: [ 0.899910] [<ffffffff812ff804>] ? >> setup_hpet_msi_remapped+0x20/0x20 >> Dec 18 20:28:38 debvm kernel: [ 0.899912] [<ffffffff81272706>] ? >> drm_get_pci_dev+0x152/0x259 >> Dec 18 20:28:38 debvm kernel: [ 0.899915] [<ffffffff813d4883>] ? >> _raw_spin_lock_irqsave+0x21/0x45 >> Dec 18 20:28:38 debvm kernel: [ 0.899918] [<ffffffff811d9ecc>] ? >> local_pci_probe+0x5a/0xa0 >> Dec 18 20:28:38 debvm kernel: [ 0.899920] [<ffffffff811d9fcf>] ? >> pci_device_probe+0xbd/0xe7 >> Dec 18 20:28:38 debvm kernel: [ 0.899922] [<ffffffff812cd887>] ? >> driver_probe_device+0x1b0/0x1b0 >> Dec 18 20:28:38 debvm kernel: [ 0.899923] [<ffffffff812cd887>] ? >> driver_probe_device+0x1b0/0x1b0 >> Dec 18 20:28:38 debvm kernel: [ 0.899925] [<ffffffff812cd769>] ? >> driver_probe_device+0x92/0x1b0 >> Dec 18 20:28:38 debvm kernel: [ 0.899926] [<ffffffff812cd8da>] ? >> __driver_attach+0x53/0x73 >> Dec 18 20:28:38 debvm kernel: [ 0.899928] [<ffffffff812cc06f>] ? >> bus_for_each_dev+0x46/0x77 >> Dec 18 20:28:38 debvm kernel: [ 0.899930] [<ffffffff812ccf8f>] ? >> bus_add_driver+0xd5/0x1f4 >> Dec 18 20:28:38 debvm kernel: [ 0.899931] [<ffffffff812cde14>] ? >> driver_register+0x89/0x101 >> Dec 18 20:28:38 debvm kernel: [ 0.899933] [<ffffffff811d9336>] ? >> __pci_register_driver+0x49/0xa3 >> Dec 18 20:28:38 debvm kernel: [ 0.899935] [<ffffffff816d55c7>] ? >> ttm_init+0x63/0x63 >> Dec 18 20:28:38 debvm kernel: [ 0.899937] [<ffffffff81002085>] ? >> do_one_initcall+0x75/0x12c >> Dec 18 20:28:38 debvm kernel: [ 0.899940] [<ffffffff816a6cc2>] ? >> kernel_init+0x13c/0x1c0 >> Dec 18 20:28:38 debvm kernel: [ 0.899941] [<ffffffff816a6565>] ? >> do_early_param+0x83/0x83 >> Dec 18 20:28:38 debvm kernel: [ 0.899943] [<ffffffff813d9f44>] ? >> kernel_thread_helper+0x4/0x10 >> Dec 18 20:28:38 debvm kernel: [ 0.899945] [<ffffffff816a6b86>] ? >> start_kernel+0x3e1/0x3e1 >> Dec 18 20:28:38 debvm kernel: [ 0.899947] [<ffffffff813d9f40>] ? >> gs_change+0x13/0x13 >> Dec 18 20:28:38 debvm kernel: [ 0.899950] ---[ end trace >> db461543ce599b44 ]--- >> >> I'm not sure if this has anything to do with the freeze. This seems to >> show up on every boot after I upgraded to xen version 4.2.1-rc2. Both >> debian kernel 3.2.32 / 3.6.9 suffers from the same log. But whole >> system freeze happens only during gaming, which is much less frequent. >> So I'm not sure if the two are related. But anyway, could you comment >> about what does this log mean? >> >> I can find the one of the mentioned address in the qemu_dm log: >> pt_pci_write_config: [00:02:0] address=00fc val=0xfeff5000 len=4 >> igd_write_opregion: Map OpRegion: cd996018 -> feff5018 >> igd_write_opregion: [00:02:0] addr=fc len=2 val=feff5000 >> >> PS: I also run xbmc on domU and it playbacks video under HW >> acceleration (VAAPI) without any problem. XBMC by itself is also an >> graphics intensive program. But this runs on an pure HVM guest, while >> the failing case is on PVHVM. >> >> PS2: I also suffered another instability yesterday. It happens when I >> was compiling kernel in side the domU. The host reboots suddenly. >> Since I'm not using graphics at that time (Xorg session is idle, I >> connected through SSH), this may be a different issue. >> >> Thanks, >> Timothy _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.