[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Frequent NIC lock-ups requiring power cycle
We have a server with running CentOS 7 with the 4.9.75-29.el7.x86_64 kernel and a Broadcom PCI network card BCM-95720A2003G.
The server receives a fair amount of traffic (~3-4TB per month) with an even split between uploads/downloads and ~40 HTTP requests per second. Not a trivial amount of traffic, but nothing crazy either. We've had recurring problems where our NIC locks up and the server must be power cycled in order to restore network connectivity. We've had this both with the on-board Intel NIC, a PCI Broadcom network card (listed above), on CentOS 6, CentOS 7 and on different physical machines (albeit all of them Dell C6100s with XS23-TY3 mobo). Our system logs show the following at the time of the crash (see below). The issue appears to be related to the Xen kernel and/or the network driver. Given that we've had this same issue across different brands of network cards -- I'm guessing it's more related to the kernel. Anyone have any suggestions for how we might resolve or mitigate this issue? Jun 29 23:03:52 server1 kernel: ------------[ cut here ]------------ Jun 29 23:03:52 server1 kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220 Jun 29 23:03:52 server1 kernel: NETDEV WATCHDOG: p55p1 (tg3): transmit queue 0 timed out Jun 29 23:03:52 server1 kernel: Modules linked in: br_netfilter xen_blkfront dm_crypt tun drbd lru_cache libcrc32c bridge stp llc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_$ Jun 29 23:03:52 server1 kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.75-29.el7.x86_64 #1 Jun 29 23:03:52 server1 kernel: Hardware name: Dell XS23-TY3 / , BIOS 1.62 06/24/2011 Jun 29 23:03:52 server1 kernel: ffff88007ca03dd0 ffffffff813f6f05 ffff88007ca03e20 0000000000000000 Jun 29 23:03:52 server1 kernel: ffff88007ca03e10 ffffffff810a7341 0000013c11b64726 0000000000000000 Jun 29 23:03:52 server1 kernel: ffff880074dbe000 0000000000000005 0000000000000000 ffff880074dbe000 Jun 29 23:03:52 server1 kernel: Call Trace: Jun 29 23:03:52 server1 kernel: <IRQ> Jun 29 23:03:52 server1 kernel: [<ffffffff813f6f05>] dump_stack+0x63/0x8e Jun 29 23:03:52 server1 kernel: [<ffffffff810a7341>] __warn+0xd1/0xf0 Jun 29 23:03:52 server1 kernel: [<ffffffff810a73af>] warn_slowpath_fmt+0x4f/0x60 Jun 29 23:03:52 server1 kernel: [<ffffffff811198ea>] ? hrtimer_interrupt+0xca/0x190 Jun 29 23:03:52 server1 kernel: [<ffffffff81787387>] dev_watchdog+0x217/0x220 Jun 29 23:03:52 server1 kernel: [<ffffffff81787170>] ? dev_deactivate_queue.constprop.27+0x60/0x60 Jun 29 23:03:52 server1 kernel: [<ffffffff81116c05>] call_timer_fn+0x35/0x120 Jun 29 23:03:52 server1 kernel: [<ffffffff8111770c>] run_timer_softirq+0x1dc/0x460 Jun 29 23:03:52 server1 kernel: [<ffffffff810228a5>] ? xen_clocksource_read+0x15/0x20 Jun 29 23:03:52 server1 kernel: [<ffffffff81035639>] ? sched_clock+0x9/0x10 Jun 29 23:03:52 server1 kernel: [<ffffffff810d7672>] ? sched_clock_cpu+0x72/0xa0 Jun 29 23:03:52 server1 kernel: [<ffffffff81883881>] __do_softirq+0xd1/0x283 Jun 29 23:03:52 server1 kernel: [<ffffffff810ad479>] irq_exit+0xe9/0x100 Jun 29 23:03:52 server1 kernel: [<ffffffff814ded65>] xen_evtchn_do_upcall+0x35/0x50 Jun 29 23:03:52 server1 kernel: [<ffffffff81880c5e>] xen_do_hypervisor_callback+0x1e/0x40 Jun 29 23:03:52 server1 kernel: <EOI> Jun 29 23:03:52 server1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 29 23:03:52 server1 kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Jun 29 23:03:52 server1 kernel: [<ffffffff81022710>] ? xen_safe_halt+0x10/0x20 Jun 29 23:03:52 server1 kernel: [<ffffffff8187efae>] ? default_idle+0x1e/0xd0 Jun 29 23:03:52 server1 kernel: [<ffffffff810368ff>] ? arch_cpu_idle+0xf/0x20 Jun 29 23:03:52 server1 kernel: [<ffffffff8187f3cc>] ? default_idle_call+0x2c/0x40 Jun 29 23:03:52 server1 kernel: [<ffffffff810ecd2c>] ? cpu_startup_entry+0x1ac/0x240 Jun 29 23:03:52 server1 kernel: [<ffffffff818729b7>] ? rest_init+0x77/0x80 Jun 29 23:03:52 server1 kernel: [<ffffffff81fb0148>] ? start_kernel+0x4ac/0x4b9 Jun 29 23:03:52 server1 kernel: [<ffffffff81fafa8a>] ? set_init_arg+0x55/0x55 Jun 29 23:03:52 server1 kernel: [<ffffffff81faf5d7>] ? x86_64_start_reservations+0x24/0x26 Jun 29 23:03:52 server1 kernel: [<ffffffff81fb6cf7>] ? xen_start_kernel+0x56a/0x576 Jun 29 23:03:52 server1 kernel: ---[ end trace e79c6881e97dc64a ]--- Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: transmit timed out, resetting Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000000: 0x165f14e4, 0x00100546, 0x02000000, 0x00800040 Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000010: 0xf9fd000c, 0x00000000, 0xf9ff000c, 0x00000000 Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000460: 0x00000008, 0x00002620, 0x01ff0106, 0x00000000 Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000470: 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff Jun 29 23:03:52 server1 kernel: tg3 0000:03:00.0 p55p1: 0x00000480: 0x42000000, 0x7fffffff, 0x06000004, 0x7fffffff _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |