[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Lockup in netback - Xen 4.1.2 (XS 6.0.2 hotfix 7)
Hi all- I am seeing an intermittent lockup on my machine's networking as soon as I apply a network load. On a pool of 80 the first one will lock up generally within 15-20 minutes of beginning the workload. The symptom is I see a long list of the following in /var/log/messages: Aug 16 18:32:49 localhost kernel: netback[1]: TXP193 is DMA mapped Aug 16 18:32:49 localhost kernel: netback[1]: TXP211 is DMA mapped Aug 16 18:32:49 localhost kernel: netback[1]: TXP232 is DMA mapped Aug 16 18:32:49 localhost kernel: netback[1]: TXP157 is DMA mapped Aug 16 18:32:49 localhost kernel: netback[0]: TXP44 is DMA mapped this seems to clog up the networking pipeline which leads to stall in my NIC driver: Aug 16 18:32:58 localhost kernel: ------------[ cut here ]------------ Aug 16 18:32:58 localhost kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x241/0x250() Aug 16 18:32:58 localhost kernel: Hardware name: C51G,MCP51 Aug 16 18:32:58 localhost kernel: NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out Aug 16 18:32:58 localhost kernel: Modules linked in: nfs nfs_acl auth_rpcgss sch_htb lockd sunrpc 8021q openvswitch ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables binfmt_misc nls_utf8 isofs video output sbs sbshc fan container battery ac parport_pc lp parport nvram thermal rtc_cmos processor evdev sg tg3 button thermal_sys rtc_core sata_sil24 rtc_lib serio_raw tpm_tis tpm tpm_bios i2c_nforce2 pcspkr i2c_core ide_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod sata_nv pata_acpi ata_generic libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd usbcore fbcon font tileblit bitblit softcursor Aug 16 18:32:58 localhost kernel: Pid: 0, comm: swapper Not tainted 2.6.32.12-0.7.1.xs6.0.2.553.170674xen #1 Aug 16 18:32:58 localhost kernel: Call Trace: Aug 16 18:32:58 localhost kernel: [<c031a1a1>] ? dev_watchdog+0x241/0x250 Aug 16 18:32:58 localhost kernel: [<c031a1a1>] ? dev_watchdog+0x241/0x250 Aug 16 18:32:58 localhost kernel: [<c012e0bc>] warn_slowpath_common+0x7c/0xa0 Aug 16 18:32:58 localhost kernel: [<c031a1a1>] ? dev_watchdog+0x241/0x250 Aug 16 18:32:58 localhost kernel: [<c012e126>] warn_slowpath_fmt+0x26/0x30 Aug 16 18:32:58 localhost kernel: [<c031a1a1>] dev_watchdog+0x241/0x250 Aug 16 18:32:58 localhost kernel: [<c02188f6>] ? blk_rq_timed_out_timer+0xe6/0x110 Aug 16 18:32:58 localhost kernel: [<c0137fe1>] run_timer_softirq+0x151/0x200 Aug 16 18:32:58 localhost kernel: [<c0319f60>] ? dev_watchdog+0x0/0x250 Aug 16 18:32:58 localhost kernel: [<c013359a>] __do_softirq+0xba/0x180 Aug 16 18:32:58 localhost kernel: [<c015b657>] ? handle_IRQ_event+0x37/0x100 Aug 16 18:32:58 localhost kernel: [<c015e774>] ? move_native_irq+0x14/0x50 Aug 16 18:32:58 localhost kernel: [<c01336d5>] do_softirq+0x75/0x80 Aug 16 18:32:58 localhost kernel: [<c01339bb>] irq_exit+0x2b/0x40 Aug 16 18:32:58 localhost kernel: [<c029c7b7>] evtchn_do_upcall+0x1e7/0x330 Aug 16 18:32:58 localhost kernel: [<c010470f>] hypervisor_callback+0x43/0x4b Aug 16 18:32:58 localhost kernel: [<c0107095>] ? xen_safe_halt+0xb5/0x150 Aug 16 18:32:58 localhost kernel: [<c010adae>] xen_idle+0x1e/0x50 Aug 16 18:32:58 localhost kernel: [<c0102a7b>] cpu_idle+0x3b/0x60 Aug 16 18:32:58 localhost kernel: [<c0373c43>] rest_init+0x53/0x60 Aug 16 18:32:58 localhost kernel: [<c04f5cea>] start_kernel+0x29a/0x340 Aug 16 18:32:58 localhost kernel: [<c04f55f0>] ? unknown_bootoption+0x0/0x1f0 Aug 16 18:32:58 localhost kernel: [<c04f507c>] i386_start_kernel+0x7c/0x90 Aug 16 18:32:58 localhost kernel: ---[ end trace 76ea5a31a8fc2f33 ]--- and after the NIC driver fails netback un-stalls itself: Aug 16 18:33:00 localhost kernel: tg3 0000:01:00.0: tg3_stop_block timed out, ofs=1400 enable_bit=2 Aug 16 18:33:00 localhost kernel: pci 0000:00:02.0: eth0: Link is down Aug 16 18:33:00 localhost kernel: netback[1]: DMA mapped TXP 203 released Aug 16 18:33:00 localhost kernel: netback[1]: DMA mapped TXP 212 released Aug 16 18:33:00 localhost kernel: netback[2]: DMA mapped TXP 94 released Aug 16 18:33:00 localhost kernel: netback[1]: DMA mapped TXP 159 released To get packets moving again I have to have a serial console to the host, rmmod the tg3 driver, modprobe it, ifconfig up the interface and restart OVS. I've tried a variety of things to debug the problem: -Turning off all hardware acceleration on the NIC from ethtool -Different OVS versions -Using a single dom0 vcpu -Turning off irqbalance and MSI -Trying the latest stable kernel in my VMs (3.5.3) -Tried a newer TG3 driver from the Citrix crew (http://forums.citrix.com/thread.jspa?threadID=311744) But to no avail. I don't ever see the "is DMA mapped" messages under normal operation, so it seems like whatever is causing dom0 to believe that the memory in the netback/front rings is DMA mapped is the problem. If anyone has any suggestions on how to approach/solve this problem I am open to ideas, I've spent a couple weeks on and off on it with no resolution. I'm attaching a tar with all the log messages from the system if they can help. Thanks in advance, David Attachment:
crash_newdriver-1_logs.tgz _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |