[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Enabling NLB is crashing VM's/DRBD
HI, We are running Debian's provided xen-hypervisor-4.0-amd64(4.0. 1-4). The kernel is 2.6.32-5-xen-amd64(2.6.32-46) from Debian.
The previously posted log lines were from the dom0's /var/log/messages. The only thing I'm seeing form xm dmesg is the following: (XEN) grant_table.c:1717:d0 Bad grant reference I've also picked up on some more entries from syslog that were not present in messages. Here is what's present in syslog. Time seems to be sync'd to the second on both machines: Nov 28 10:55:03 nodeA kernel: [1239467.400293] eth0: port 11(nlb2.e0) entering disabled state Nov 28 10:55:03 nodeA kernel: [1239467.400516] eth0: port 11(nlb2.e0) entering disabled state Nov 28 10:55:04 nodeA kernel: [1239467.731442] frontend_changed: backend/vif/73/0: prepare for reconnect Nov 28 10:55:04 nodeA logger: /etc/xen/scripts/vif-bridge: offline XENBUS_PATH=backend/vif/73/0 Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: brctl delif eth0 nlb2.e0 failed Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: ifconfig nlb2.e0 down failed Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for nlb2.e0, bridge eth0. Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/73/0 Nov 28 10:55:08 nodeA kernel: [1239471.758583] device nlb2.e0 entered promiscuous mode Nov 28 10:55:10 nodeA kernel: [1239473.795967] block drbd23: sock was shut down by peer Nov 28 10:55:27 nodeA kernel: [1239473.795973] block drbd23: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) Nov 28 10:55:27 nodeA kernel: [1239473.795980] block drbd23: short read expecting header on sock: r=0 Nov 28 10:55:27 nodeA kernel: [1239474.009951] block drbd31: sock was shut down by peer Nov 28 10:55:09 nodeB kernel: [1239622.217505] block drbd23: PingAck did not arrive in time. Nov 28 10:55:09 nodeB kernel: [1239622.217542] block drbd23: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Nov 28 10:55:09 nodeB kernel: [1239622.217551] block drbd23: asender terminated Nov 28 10:55:09 nodeB kernel: [1239622.217554] block drbd23: Terminating drbd23_asender Nov 28 10:55:09 nodeB kernel: [1239622.217795] block drbd23: short read expecting header on sock: r=-512 Nov 28 10:55:09 nodeB kernel: [1239622.217887] block drbd23: Creating new current UUID Nov 28 10:55:09 nodeB kernel: [1239622.218118] block drbd23: Connection closed Nov 28 10:55:09 nodeB kernel: [1239622.218125] block drbd23: conn( NetworkFailure -> Unconnected ) Nov 28 10:55:09 nodeB kernel: [1239622.218135] block drbd23: receiver terminated Nov 28 10:55:09 nodeB kernel: [1239622.218137] block drbd23: Restarting drbd23_receiver Nov 28 10:55:09 nodeB kernel: [1239622.218140] block drbd23: receiver (re)started Nov 28 10:55:09 nodeB kernel: [1239622.218143] block drbd23: conn( Unconnected -> WFConnection ) Nov 28 10:55:09 nodeB kernel: [1239622.353589] block drbd30: PingAck did not arrive in time. Nov 28 10:55:09 nodeB kernel: [1239622.353627] block drbd30: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Nov 28 10:55:09 nodeB kernel: [1239622.353637] block drbd30: asender terminated Nov 28 10:55:09 nodeB kernel: [1239622.353639] block drbd30: Terminating drbd30_asender Nov 28 10:55:09 nodeB kernel: [1239622.353668] block drbd30: short read expecting header on sock: r=-512 Nov 28 10:55:09 nodeB kernel: [1239622.353754] block drbd30: Creating new current UUID Nov 28 10:55:09 nodeB kernel: [1239622.388101] block drbd30: Connection closed Nov 28 10:55:09 nodeB kernel: [1239622.388107] block drbd30: conn( NetworkFailure -> Unconnected ) Nov 28 10:55:09 nodeB kernel: [1239622.388111] block drbd30: receiver terminated Nov 28 10:55:09 nodeB kernel: [1239622.388113] block drbd30: Restarting drbd30_receiver Nov 28 10:55:09 nodeB kernel: [1239622.388116] block drbd30: receiver (re)started Nov 28 10:55:09 nodeB kernel: [1239622.388119] block drbd30: conn( Unconnected -> WFConnection ) I've also looked at the qemu, xend-hotplug, and xend logs and do not see any telling errors. In xend.log I just see lines pertaining to VM's being rebooted. As for GPLPV.. I haven't been able to reproduce the "network crashing" and rebooting in the lab and probably won't be able to until I can get a more robust production-like environment setup. Unfortunately I can't risk more customer down time by attempting to setup NLB without the GPLPV drivers in production. If I can manage to reproduce this in staging I will of course attempt without the GPLPV drivers. -Greg _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |