[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] WARNING: at drivers/xen/gntdev.c:426 unmap_if_in_range+0x5d/0x60 [xen_gntdev]()


  • To: xen devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Christopher S. Aker" <caker@xxxxxxxxxxxx>
  • Date: Sun, 14 Dec 2014 12:48:18 -0500
  • Delivery-date: Sun, 14 Dec 2014 17:48:51 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>

> On Dec 11, 2014, at 10:12 AM, Christopher S. Aker <caker@xxxxxxxxxxxx> wrote:
> 
> Xen: 4.4.2-pre (28573:f6f6236af933) + xsa111, xsa112, xsa114
> Dom0: 3.17.4
> 
> Things go badly after a day or four.  We've hit this on a number of 
> previously healthy hosts, since moving from 3.10.x dom0 to 3.17.4:
> 
> printk: 5441 messages suppressed.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> grant_table.c:567:d0 Failed to obtain maptrack handle.
> (XEN) printk: 4857 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 4846 callbacks suppressed
> (XEN) printk: 4699 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1569 callbacks suppressed
> (XEN) printk: 1809 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 2327 callbacks suppressed
> (XEN) printk: 2779 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 2509 callbacks suppressed
> (XEN) printk: 2022 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 2282 callbacks suppressed
> (XEN) printk: 2778 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 2385 callbacks suppressed
> (XEN) printk: 1560 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1714 callbacks suppressed
> (XEN) printk: 1713 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1619 callbacks suppressed
> (XEN) printk: 1852 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1895 callbacks suppressed
> (XEN) printk: 2058 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1797 callbacks suppressed
> (XEN) printk: 1530 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1440 callbacks suppressed
> (XEN) printk: 1306 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> 
> (...this repeats a few hundred times over the course of 30 minutes...)

We've also had reports that this adversely affects guests (filesystem errors), 
during the time the dom0 is cranking out these messages but before it crashes.

> net_ratelimit: 1221 callbacks suppressed
> (XEN) printk: 1719 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1747 callbacks suppressed
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 1496 callbacks suppressed
> br0: port 80(vif242.0) entered disabled state
> device vif242.0 left promiscuous mode
> br0: port 80(vif242.0) entered disabled state
> device vif249.0 entered promiscuous mode
> xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi) persistent 
> grants
> xen-blkback:ring-ref 9, event-channel 10, protocol 1 (x86_64-abi) persistent 
> grants
> (XEN) printk: 1107 messages suppressed.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> (XEN) grant_table.c:567:d0 Failed to obtain maptrack handle.
> net_ratelimit: 648 callbacks suppressed
> m2p_remove_override: pfn 10828f2 mfn 8000000005b4284e, failed to modify 
> kernel mappings
> ------------[ cut here ]------------
> WARNING: CPU: 6 PID: 23911 at drivers/xen/gntdev.c:426 
> unmap_if_in_range+0x5d/0x60 [xen_gntdev]()
> Modules linked in: xt_u32 xt_physdev ebt_comment ebt_arp ebt_set ebt_limit 
> ebt_ip6 ebt_ip ip_set_hash_net ip_set ip6table_mangle ip6_tables ebtable_nat 
> xen_acpi_processor xen_pciback xen_gntalloc xen_gntdev bonding ebtable_filter 
> 8021q mrp ixgbe mdio ptp pps_core
> CPU: 6 PID: 23911 Comm: qemu-dm Not tainted 3.17.4-1 #1
> Hardware name: Supermicro X9DRE-TF+/X9DR7-TF+/X9DRE-TF+/X9DR7-TF+, BIOS 3.0a 
> 12/04/2013
> 0000000000000009 ffff880043dafcc8 ffffffff81876bcb 0000000000000001
> 0000000000000000 ffff880043dafd08 ffffffff81069777 ffff880043dafd18
> ffff880020154690 00007f8add804000 00007f8add80f000 ffff880020154660
> Call Trace:
> [<ffffffff81876bcb>] dump_stack+0x46/0x58
> [<ffffffff81069777>] warn_slowpath_common+0x87/0xb0
> [<ffffffff810697b5>] warn_slowpath_null+0x15/0x20
> [<ffffffffa012d29d>] unmap_if_in_range+0x5d/0x60 [xen_gntdev]
> [<ffffffffa012d46e>] mn_invl_range_start+0x4e/0xa0 [xen_gntdev]
> [<ffffffff811615cb>] __mmu_notifier_invalidate_range_start+0x5b/0x90
> [<ffffffff811469a9>] unmap_vmas+0x79/0x90
> [<ffffffff8114bb13>] unmap_region+0xa3/0x120
> [<ffffffff8116b339>] ? new_sync_read+0x79/0xb0
> [<ffffffff8114bfb1>] ? vma_rb_erase+0x121/0x210
> [<ffffffff8114dba0>] do_munmap+0x2a0/0x3b0
> [<ffffffff8114dcf9>] vm_munmap+0x49/0x70
> [<ffffffff8114ecd6>] SyS_munmap+0x26/0x40
> [<ffffffff81880169>] system_call_fastpath+0x16/0x1b
> ---[ end trace 25ca87f9adc0ad78 ]---
> INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 32, t=60002 
> jiffies, g=26177592, c=26177591, q=1229)
> Task dump for CPU 0:
> swapper/0       R  running task    14072     0      0 0x00000008
> 00000000ffffffed 0000000000000000 0000000000000001 ffffffffffffffff
> ffffffff810013aa 000000000000e030 0000000000000246 ffffffff81e03e30
> 000000000000e02b 0000000000000000 0000000000000000 ffffffff8100a0c0
> Call Trace:
> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [<ffffffff8100a0c0>] ? xen_safe_halt+0x10/0x20
> [<ffffffff8101d73f>] ? default_idle+0x1f/0xb0
> [<ffffffff8101dfea>] ? arch_cpu_idle+0xa/0x10
> [<ffffffff8109ead4>] ? cpu_startup_entry+0x284/0x330
> [<ffffffff8186ec7d>] ? rest_init+0x6d/0x70
> [<ffffffff81eea081>] ? start_kernel+0x41d/0x42a
> [<ffffffff81ee9a51>] ? set_init_arg+0x58/0x58
> [<ffffffff81ee95f0>] ? x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81eed774>] ? xen_start_kernel+0x540/0x542
> INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 34, t=240007 
> jiffies, g=26177592, c=26177591, q=4592)
> Task dump for CPU 0:
> swapper/0       R  running task    14072     0      0 0x00000008
> 00000000ffffffed 0000000000000000 0000000000000001 ffffffffffffffff
> ffffffff810013aa 000000000000e030 0000000000000246 ffffffff81e03e30
> 000000000000e02b 0000000000000000 0000000000000000 ffffffff8100a0c0
> Call Trace:
> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [<ffffffff8100a0c0>] ? xen_safe_halt+0x10/0x20
> [<ffffffff8101d73f>] ? default_idle+0x1f/0xb0
> [<ffffffff8101dfea>] ? arch_cpu_idle+0xa/0x10
> [<ffffffff8109ead4>] ? cpu_startup_entry+0x284/0x330
> [<ffffffff8186ec7d>] ? rest_init+0x6d/0x70
> [<ffffffff81eea081>] ? start_kernel+0x41d/0x42a
> [<ffffffff81ee9a51>] ? set_init_arg+0x58/0x58
> [<ffffffff81ee95f0>] ? x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81eed774>] ? xen_start_kernel+0x540/0x542
> INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected by 34, t=420012 
> jiffies, g=26177592, c=26177591, q=8255)
> Task dump for CPU 0:
> swapper/0       R  running task    14072     0      0 0x00000008
> 00000000ffffffed 0000000000000000 0000000000000001 ffffffffffffffff
> ffffffff810013aa 000000000000e030 0000000000000246 ffffffff81e03e30
> 000000000000e02b 0000000000000000 0000000000000000 ffffffff8100a0c0
> Call Trace:
> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
> [<ffffffff8100a0c0>] ? xen_safe_halt+0x10/0x20
> [<ffffffff8101d73f>] ? default_idle+0x1f/0xb0
> [<ffffffff8101dfea>] ? arch_cpu_idle+0xa/0x10
> [<ffffffff8109ead4>] ? cpu_startup_entry+0x284/0x330
> [<ffffffff8186ec7d>] ? rest_init+0x6d/0x70
> [<ffffffff81eea081>] ? start_kernel+0x41d/0x42a
> [<ffffffff81ee9a51>] ? set_init_arg+0x58/0x58
> [<ffffffff81ee95f0>] ? x86_64_start_reservations+0x2a/0x2c
> [<ffffffff81eed774>] ? xen_start_kernel+0x540/0x542
> 
> Then the dom0 is unresponsive, and requires a reboot.

We've hit this on a number of hosts, so it's not an isolated incident.  Any 
suggestions would be appreciated.  We're stuck without a stable dom0, at the 
moment.  3.10.x has the netback issue, and then kernels beyond 3.10 each have 
their own problems.  I suppose we should forge ahead and try 3.18...

Thanks!
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.