Xen project Mailing List

Re: [Xen-devel] WARNING: at drivers/xen/gntdev.c:426 unmap_if_in_range+0x5d/0x60 [xen_gntdev]()

To: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

From: "Christopher S. Aker" <caker@xxxxxxxxxxxx>

Date: Sun, 14 Dec 2014 12:48:18 -0500

Delivery-date: Sun, 14 Dec 2014 17:48:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

> On Dec 11, 2014, at 10:12 AM, Christopher S. Aker <caker@xxxxxxxxxxxx> wrote: > > Xen: 4.4.2-pre (28573:f6f6236af933) + xsa111, xsa112, xsa114 > Dom0: 3.17.4 > > Things go badly after a day or four. We've hit this on a number of > previously healthy hosts, since moving from 3.10.x dom0 to 3.17.4: > > printk: 5441 messages suppressed. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > grant_table.c:567:d0 Failed to obtain maptrack handle. > (XEN) printk: 4857 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 4846 callbacks suppressed > (XEN) printk: 4699 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1569 callbacks suppressed > (XEN) printk: 1809 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 2327 callbacks suppressed > (XEN) printk: 2779 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 2509 callbacks suppressed > (XEN) printk: 2022 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 2282 callbacks suppressed > (XEN) printk: 2778 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 2385 callbacks suppressed > (XEN) printk: 1560 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1714 callbacks suppressed > (XEN) printk: 1713 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1619 callbacks suppressed > (XEN) printk: 1852 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1895 callbacks suppressed > (XEN) printk: 2058 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1797 callbacks suppressed > (XEN) printk: 1530 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1440 callbacks suppressed > (XEN) printk: 1306 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > > (...this repeats a few hundred times over the course of 30 minutes...) We've also had reports that this adversely affects guests (filesystem errors), during the time the dom0 is cranking out these messages but before it crashes. > net_ratelimit: 1221 callbacks suppressed > (XEN) printk: 1719 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1747 callbacks suppressed > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 1496 callbacks suppressed > br0: port 80(vif242.0) entered disabled state > device vif242.0 left promiscuous mode > br0: port 80(vif242.0) entered disabled state > device vif249.0 entered promiscuous mode > xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi) persistent > grants > xen-blkback:ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) persistent > grants > (XEN) printk: 1107 messages suppressed. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle. > net_ratelimit: 648 callbacks suppressed > m2p_remove_override: pfn 10828f2 mfn 8000000005b4284e, failed to modify > kernel mappings > ------------[ cut here ]------------ > WARNING: CPU: 6 PID: 23911 at drivers/xen/gntdev.c:426 > unmap_if_in_range+0x5d/0x60 [xen_gntdev]() > Modules linked in: xt_u32 xt_physdev ebt_comment ebt_arp ebt_set ebt_limit > ebt_ip6 ebt_ip ip_set_hash_net ip_set ip6table_mangle ip6_tables ebtable_nat > xen_acpi_processor xen_pciback xen_gntalloc xen_gntdev bonding ebtable_filter > 8021q mrp ixgbe mdio ptp pps_core > CPU: 6 PID: 23911 Comm: qemu-dm Not tainted 3.17.4-1 #1 > Hardware name: Supermicro X9DRE-TF+/X9DR7-TF+/X9DRE-TF+/X9DR7-TF+, BIOS 3.0a > 12/04/2013 > 0000000000000009 ffff880043dafcc8 ffffffff81876bcb 0000000000000001 > 0000000000000000 ffff880043dafd08 ffffffff81069777 ffff880043dafd18 > ffff880020154690 00007f8add804000 00007f8add80f000 ffff880020154660 > Call Trace: > [<ffffffff81876bcb>] dump_stack+0x46/0x58 > [<ffffffff81069777>] warn_slowpath_common+0x87/0xb0 > [<ffffffff810697b5>] warn_slowpath_null+0x15/0x20 > [<ffffffffa012d29d>] unmap_if_in_range+0x5d/0x60 [xen_gntdev] > [<ffffffffa012d46e>] mn_invl_range_start+0x4e/0xa0 [xen_gntdev] > [<ffffffff811615cb>] __mmu_notifier_invalidate_range_start+0x5b/0x90 > [<ffffffff811469a9>] unmap_vmas+0x79/0x90 > [<ffffffff8114bb13>] unmap_region+0xa3/0x120 > [<ffffffff8116b339>] ? new_sync_read+0x79/0xb0 > [<ffffffff8114bfb1>] ? vma_rb_erase+0x121/0x210 > [<ffffffff8114dba0>] do_munmap+0x2a0/0x3b0 > [<ffffffff8114dcf9>] vm_munmap+0x49/0x70 > [<ffffffff8114ecd6>] SyS_munmap+0x26/0x40 > [<ffffffff81880169>] system_call_fastpath+0x16/0x1b > ---[ end trace 25ca87f9adc0ad78 ]--- > INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 32, t=60002 > jiffies, g=26177592, c=26177591, q=1229) > Task dump for CPU 0: > swapper/0 R running task 14072 0 0 0x00000008 > 00000000ffffffed 0000000000000000 0000000000000001 ffffffffffffffff > ffffffff810013aa 000000000000e030 0000000000000246 ffffffff81e03e30 > 000000000000e02b 0000000000000000 0000000000000000 ffffffff8100a0c0 > Call Trace: > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0c0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101d73f>] ? default_idle+0x1f/0xb0 > [<ffffffff8101dfea>] ? arch_cpu_idle+0xa/0x10 > [<ffffffff8109ead4>] ? cpu_startup_entry+0x284/0x330 > [<ffffffff8186ec7d>] ? rest_init+0x6d/0x70 > [<ffffffff81eea081>] ? start_kernel+0x41d/0x42a > [<ffffffff81ee9a51>] ? set_init_arg+0x58/0x58 > [<ffffffff81ee95f0>] ? x86_64_start_reservations+0x2a/0x2c > [<ffffffff81eed774>] ? xen_start_kernel+0x540/0x542 > INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 34, t=240007 > jiffies, g=26177592, c=26177591, q=4592) > Task dump for CPU 0: > swapper/0 R running task 14072 0 0 0x00000008 > 00000000ffffffed 0000000000000000 0000000000000001 ffffffffffffffff > ffffffff810013aa 000000000000e030 0000000000000246 ffffffff81e03e30 > 000000000000e02b 0000000000000000 0000000000000000 ffffffff8100a0c0 > Call Trace: > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0c0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101d73f>] ? default_idle+0x1f/0xb0 > [<ffffffff8101dfea>] ? arch_cpu_idle+0xa/0x10 > [<ffffffff8109ead4>] ? cpu_startup_entry+0x284/0x330 > [<ffffffff8186ec7d>] ? rest_init+0x6d/0x70 > [<ffffffff81eea081>] ? start_kernel+0x41d/0x42a > [<ffffffff81ee9a51>] ? set_init_arg+0x58/0x58 > [<ffffffff81ee95f0>] ? x86_64_start_reservations+0x2a/0x2c > [<ffffffff81eed774>] ? xen_start_kernel+0x540/0x542 > INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 34, t=420012 > jiffies, g=26177592, c=26177591, q=8255) > Task dump for CPU 0: > swapper/0 R running task 14072 0 0 0x00000008 > 00000000ffffffed 0000000000000000 0000000000000001 ffffffffffffffff > ffffffff810013aa 000000000000e030 0000000000000246 ffffffff81e03e30 > 000000000000e02b 0000000000000000 0000000000000000 ffffffff8100a0c0 > Call Trace: > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0c0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101d73f>] ? default_idle+0x1f/0xb0 > [<ffffffff8101dfea>] ? arch_cpu_idle+0xa/0x10 > [<ffffffff8109ead4>] ? cpu_startup_entry+0x284/0x330 > [<ffffffff8186ec7d>] ? rest_init+0x6d/0x70 > [<ffffffff81eea081>] ? start_kernel+0x41d/0x42a > [<ffffffff81ee9a51>] ? set_init_arg+0x58/0x58 > [<ffffffff81ee95f0>] ? x86_64_start_reservations+0x2a/0x2c > [<ffffffff81eed774>] ? xen_start_kernel+0x540/0x542 > > Then the dom0 is unresponsive, and requires a reboot. We've hit this on a number of hosts, so it's not an isolated incident. Any suggestions would be appreciated. We're stuck without a stable dom0, at the moment. 3.10.x has the netback issue, and then kernels beyond 3.10 each have their own problems. I suppose we should forge ahead and try 3.18... Thanks! -Chris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.