Xen project Mailing List

[Xen-bugs] [Bug 1486] New: dom0 crashes under heavy network load

From: bugzilla-daemon@xxxxxxxxxxxxxxxxxxx

Date: Tue, 14 Jul 2009 01:08:16 -0700

Delivery-date: Tue, 14 Jul 2009 01:08:25 -0700

List-id: Xen Bugzilla <xen-bugs.lists.xensource.com>

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1486 Summary: dom0 crashes under heavy network load Product: Xen Version: unstable Platform: x86-64 OS/Version: Linux Status: NEW Severity: major Priority: P2 Component: Hypervisor AssignedTo: xen-bugs@xxxxxxxxxxxxxxxxxxx ReportedBy: uk@xxxxxxxxxxxxx CC: uk@xxxxxxxxxxxxx On a Dell PE-R710, with bnx2 network drivers (also tested with e1000 card, wich also crashes if onboard-bnx2 is disabled, so I think this is not a nic driver issue), dom0 crashes totally under heavy constant network and disk load (produced in dom0 and one domU). faster reproduceable with an additional rsync which also causes disk i/o. In my testing scenario, 60 domU have been started, each of them had 6 disk- and 2 network blockdevices, so 8 backend-devices in use. Testing scenario, using netcat to produce constant load (only zero bytes in this case): my.dom0 #: nc -l -p 1234 | pv > /dev/null external.host #: cat /dev/zero | pv | nc ip.of.my.dom0 1234 then i ran additional rsync in order to produce net and disk i/o: my.dom0 #: for i in $(seq 1 1000); do echo "============== run $i ============" >> rsync-runs.txt ; rm -rfv /var/spool/test/* ; rsync -avP --numeric-ids --password-file=/etc/rsyncd.secrets user@xxxxxxxxxxxxx::source/* /var/spool/test/; done ...which copies round about 1G of data in one run. The Crash occurs in a few minutes or even several ours; testing the e1000 it took 84 rsync runs (I do not know how long it took as it crashed last night). I think I can crash the machine faster if I use the bnx2 card. Here, the unstable kernel 2.6.27.5 from xenbits was used, but this issue also affects older versions. Stacktrace: 9 19:34:20 xh132 kernel: ------------[ cut here ]------------ Jul 9 19:34:20 xh132 kernel: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x13c/0x1e9() Jul 9 19:34:20 xh132 kernel: NETDEV WATCHDOG: eth0 (bnx2): transmit timed out Jul 9 19:34:20 xh132 kernel: Modules linked in: iptable_filter(N) ip_tables(N) x_tables(N) bridge(N) stp(N) llc(N) loop(N) dm_mod(N) 8021q(N) bonding(N) dcdbas(N) Jul 9 19:34:20 xh132 kernel: Supported: No Jul 9 19:34:20 xh132 kernel: Pid: 0, comm: swapper Tainted: G 2.6.27.5-xen0-he+4 #7 Jul 9 19:34:20 xh132 kernel: Jul 9 19:34:20 xh132 kernel: Call Trace: Jul 9 19:34:20 xh132 kernel: <IRQ> [<ffffffff8022b3d7>] warn_slowpath+0xb4/0xde Jul 9 19:34:20 xh132 kernel: [<ffffffff80552b00>] __down_read+0xb6/0x110 Jul 9 19:34:20 xh132 kernel: [<ffffffff804d6999>] neigh_lookup+0xb0/0xc0 Jul 9 19:34:20 xh132 kernel: [<ffffffff804cafd2>] skb_queue_tail+0x17/0x3e Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d6de>] get_nsec_offset+0x9/0x2c Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d7ff>] local_clock+0x48/0x99 Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d6de>] get_nsec_offset+0x9/0x2c Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d7ff>] local_clock+0x48/0x99 Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d96f>] sched_clock+0x15/0x36 Jul 9 19:34:20 xh132 kernel: [<ffffffff80241ef5>] sched_clock_cpu+0x290/0x2b9 Jul 9 19:34:20 xh132 kernel: [<ffffffff8020dfea>] timer_interrupt+0x409/0x41d Jul 9 19:34:20 xh132 kernel: [<ffffffff804ded1f>] dev_watchdog+0x13c/0x1e9 Jul 9 19:34:20 xh132 kernel: [<ffffffffa0038b31>] br_fdb_cleanup+0x0/0xd5 [bridge] Jul 9 19:34:20 xh132 kernel: [<ffffffff802347c8>] __mod_timer+0xc7/0xd5 Jul 9 19:34:20 xh132 kernel: [<ffffffff804debe3>] dev_watchdog+0x0/0x1e9 Jul 9 19:34:20 xh132 kernel: [<ffffffff80234131>] run_timer_softirq+0x16c/0x211 Jul 9 19:34:20 xh132 kernel: [<ffffffff8024f132>] handle_percpu_irq+0x53/0x6f Jul 9 19:34:20 xh132 kernel: [<ffffffff8022fee0>] __do_softirq+0x92/0x13b Jul 9 19:34:20 xh132 kernel: [<ffffffff8020b37c>] call_softirq+0x1c/0x28 Jul 9 19:34:20 xh132 kernel: [<ffffffff8020d1c3>] do_softirq+0x55/0xbb Jul 9 19:34:20 xh132 kernel: [<ffffffff8020ae3e>] do_hypervisor_callback+0x1e/0x30 Jul 9 19:34:20 xh132 kernel: <EOI> [<ffffffff8020d6af>] xen_safe_halt+0xb3/0xd9 Jul 9 19:34:20 xh132 kernel: [<ffffffff802105b3>] xen_idle+0x2e/0x67 Jul 9 19:34:20 xh132 kernel: [<ffffffff80208dfe>] cpu_idle+0x57/0x75 Jul 9 19:34:20 xh132 kernel: Jul 9 19:34:20 xh132 kernel: ---[ end trace a04b8dccc5213f7d ]--- Jul 9 19:34:20 xh132 kernel: bnx2: eth0 NIC Copper Link is Down Jul 9 19:34:20 xh132 kernel: bonding: bond0: link status down for active interface eth0, disabling it in 200 ms. Jul 9 19:34:20 xh132 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Jul 9 19:34:20 xh132 kernel: device eth0 left promiscuous mode Jul 9 19:34:20 xh132 kernel: bonding: bond0: now running without any active interface ! Please let me know if you need further information. So perhaps you can help. Many thanks in advance, best regards, Ulf Kreutzberg -- Configure bugmail: http://bugzilla.xensource.com/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. _______________________________________________ Xen-bugs mailing list Xen-bugs@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-bugs

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.