[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Frequent NIC lock-ups requiring power cycle



We have a server with running CentOS 7 with the 4.9.75-29.el7.x86_64 kernel and a Broadcom PCI network card BCM-95720A2003G.

The server receives a fair amount of traffic (~3-4TB per month) with an even split between uploads/downloads and ~40 HTTP requests per second. Not a trivial amount of traffic, but nothing crazy either.

We've had recurring problems where our NIC locks up and the server must be power cycled in order to restore network connectivity. We've had this both with the on-board Intel NIC, a PCI Broadcom network card (listed above), on CentOS 6, CentOS 7 and on different physical machines (albeit all of them Dell C6100s with XS23-TY3 mobo).

Our system logs show the following at the time of the crash (see below).

The issue appears to be related to the Xen kernel and/or the network driver. Given that we've had this same issue across different brands of network cards -- I'm guessing it's more related to the kernel.

Anyone have any suggestions for how we might resolve or mitigate this issue?

Jun 29 23:03:52 server1 kernel: ------------[ cut here ]------------
Jun 29 23:03:52 server1 kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220
Jun 29 23:03:52 server1 kernel: NETDEV WATCHDOG: p55p1 (tg3): transmit queue 0 timed out
Jun 29 23:03:52 server1 kernel: Modules linked in: br_netfilter xen_blkfront dm_crypt tun drbd lru_cache libcrc32c bridge stp llc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_$
Jun 29 23:03:52 server1 kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.75-29.el7.x86_64 #1
Jun 29 23:03:52 server1 kernel: Hardware name: Dell     XS23-TY3    /      , BIOS 1.62 06/24/2011
Jun 29 23:03:52 server1 kernel: ffff88007ca03dd0 ffffffff813f6f05 ffff88007ca03e20 0000000000000000
Jun 29 23:03:52 server1 kernel: ffff88007ca03e10 ffffffff810a7341 0000013c11b64726 0000000000000000
Jun 29 23:03:52 server1 kernel: ffff880074dbe000 0000000000000005 0000000000000000 ffff880074dbe000
Jun 29 23:03:52 server1 kernel: Call Trace:
Jun 29 23:03:52 server1 kernel: <IRQ>
Jun 29 23:03:52 server1 kernel: [<ffffffff813f6f05>] dump_stack+0x63/0x8e
Jun 29 23:03:52 server1 kernel: [<ffffffff810a7341>] __warn+0xd1/0xf0
Jun 29 23:03:52 server1 kernel: [<ffffffff810a73af>] warn_slowpath_fmt+0x4f/0x60
Jun 29 23:03:52 server1 kernel: [<ffffffff811198ea>] ? hrtimer_interrupt+0xca/0x190
Jun 29 23:03:52 server1 kernel: [<ffffffff81787387>] dev_watchdog+0x217/0x220
Jun 29 23:03:52 server1 kernel: [<ffffffff81787170>] ? dev_deactivate_queue.constprop.27+0x60/0x60
Jun 29 23:03:52 server1 kernel: [<ffffffff81116c05>] call_timer_fn+0x35/0x120
Jun 29 23:03:52 server1 kernel: [<ffffffff8111770c>] run_timer_softirq+0x1dc/0x460
Jun 29 23:03:52 server1 kernel: [<ffffffff810228a5>] ? xen_clocksource_read+0x15/0x20
Jun 29 23:03:52 server1 kernel: [<ffffffff81035639>] ? sched_clock+0x9/0x10
Jun 29 23:03:52 server1 kernel: [<ffffffff810d7672>] ? sched_clock_cpu+0x72/0xa0
Jun 29 23:03:52 server1 kernel: [<ffffffff81883881>] __do_softirq+0xd1/0x283
Jun 29 23:03:52 server1 kernel: [<ffffffff810ad479>] irq_exit+0xe9/0x100
Jun 29 23:03:52 server1 kernel: [<ffffffff814ded65>] xen_evtchn_do_upcall+0x35/0x50
Jun 29 23:03:52 server1 kernel: [<ffffffff81880c5e>] xen_do_hypervisor_callback+0x1e/0x40
Jun 29 23:03:52 server1 kernel: <EOI>
Jun 29 23:03:52 server1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Jun 29 23:03:52 server1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Jun 29 23:03:52 server1 kernel: [<ffffffff81022710>] ? xen_safe_halt+0x10/0x20
Jun 29 23:03:52 server1 kernel: [<ffffffff8187efae>] ? default_idle+0x1e/0xd0
Jun 29 23:03:52 server1 kernel: [<ffffffff810368ff>] ? arch_cpu_idle+0xf/0x20
Jun 29 23:03:52 server1 kernel: [<ffffffff8187f3cc>] ? default_idle_call+0x2c/0x40
Jun 29 23:03:52 server1 kernel: [<ffffffff810ecd2c>] ? cpu_startup_entry+0x1ac/0x240
Jun 29 23:03:52 server1 kernel: [<ffffffff818729b7>] ? rest_init+0x77/0x80
Jun 29 23:03:52 server1 kernel: [<ffffffff81fb0148>] ? start_kernel+0x4ac/0x4b9
Jun 29 23:03:52 server1 kernel: [<ffffffff81fafa8a>] ? set_init_arg+0x55/0x55
Jun 29 23:03:52 server1 kernel: [<ffffffff81faf5d7>] ? x86_64_start_reservations+0x24/0x26
Jun 29 23:03:52 server1 kernel: [<ffffffff81fb6cf7>] ? xen_start_kernel+0x56a/0x576
Jun 29 23:03:52 server1 kernel: ---[ end trace e79c6881e97dc64a ]---
Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: transmit timed out, resetting
Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000000: 0x165f14e4, 0x00100546, 0x02000000, 0x00800040
Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000010: 0xf9fd000c, 0x00000000, 0xf9ff000c, 0x00000000
Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000460: 0x00000008, 0x00002620, 0x01ff0106, 0x00000000
Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000470: 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff
Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000480: 0x42000000, 0x7fffffff, 0x06000004, 0x7fffffff
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.