[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Enabling NLB is crashing VM's/DRBD
On Fri, Nov 30, 2012 at 11:11:11AM +1300, Greg Zapp wrote: > HI, > > We are running Debian's provided xen-hypervisor-4.0-amd64(4.0. > 1-4). The kernel is 2.6.32-5-xen-amd64(2.6.32-46) from Debian. > > The previously posted log lines were from the dom0's /var/log/messages. > The only thing I'm seeing form xm dmesg is the following: > (XEN) grant_table.c:1717:d0 Bad grant reference > > I've also picked up on some more entries from syslog that were not present > in messages. Here is what's present in syslog. Time seems to be sync'd > to the second on both machines: > Nov 28 10:55:03 nodeA kernel: [1239467.400293] eth0: port 11(nlb2.e0) > entering disabled state > Nov 28 10:55:03 nodeA kernel: [1239467.400516] eth0: port 11(nlb2.e0) > entering disabled state How's your networking set up? I hope the the Windows NLB VMs aren't using the same bridge/VLAN as DRBD is using? -- Pasi > Nov 28 10:55:04 nodeA kernel: [1239467.731442] frontend_changed: > backend/vif/73/0: prepare for reconnect > Nov 28 10:55:04 nodeA logger: /etc/xen/scripts/vif-bridge: offline > XENBUS_PATH=backend/vif/73/0 > Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: brctl delif > eth0 nlb2.e0 failed > Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: ifconfig > nlb2.e0 down failed > Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: Successful > vif-bridge offline for nlb2.e0, bridge eth0. > Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: online > XENBUS_PATH=backend/vif/73/0 > Nov 28 10:55:08 nodeA kernel: [1239471.758583] device nlb2.e0 entered > promiscuous mode > Nov 28 10:55:10 nodeA kernel: [1239473.795967] block drbd23: sock was shut > down by peer > Nov 28 10:55:27 nodeA kernel: [1239473.795973] block drbd23: peer( Primary > -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) > Nov 28 10:55:27 nodeA kernel: [1239473.795980] block drbd23: short read > expecting header on sock: r=0 > Nov 28 10:55:27 nodeA kernel: [1239474.009951] block drbd31: sock was shut > down by peer > > Nov 28 10:55:09 nodeB kernel: [1239622.217505] block drbd23: PingAck did > not arrive in time. > Nov 28 10:55:09 nodeB kernel: [1239622.217542] block drbd23: peer( > Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate > -> DUnknown ) > Nov 28 10:55:09 nodeB kernel: [1239622.217551] block drbd23: asender > terminated > Nov 28 10:55:09 nodeB kernel: [1239622.217554] block drbd23: Terminating > drbd23_asender > Nov 28 10:55:09 nodeB kernel: [1239622.217795] block drbd23: short read > expecting header on sock: r=-512 > Nov 28 10:55:09 nodeB kernel: [1239622.217887] block drbd23: Creating new > current UUID > Nov 28 10:55:09 nodeB kernel: [1239622.218118] block drbd23: Connection > closed > Nov 28 10:55:09 nodeB kernel: [1239622.218125] block drbd23: conn( > NetworkFailure -> Unconnected ) > Nov 28 10:55:09 nodeB kernel: [1239622.218135] block drbd23: receiver > terminated > Nov 28 10:55:09 nodeB kernel: [1239622.218137] block drbd23: Restarting > drbd23_receiver > Nov 28 10:55:09 nodeB kernel: [1239622.218140] block drbd23: receiver > (re)started > Nov 28 10:55:09 nodeB kernel: [1239622.218143] block drbd23: conn( > Unconnected -> WFConnection ) > Nov 28 10:55:09 nodeB kernel: [1239622.353589] block drbd30: PingAck did > not arrive in time. > Nov 28 10:55:09 nodeB kernel: [1239622.353627] block drbd30: peer( > Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate > -> DUnknown ) > Nov 28 10:55:09 nodeB kernel: [1239622.353637] block drbd30: asender > terminated > Nov 28 10:55:09 nodeB kernel: [1239622.353639] block drbd30: Terminating > drbd30_asender > Nov 28 10:55:09 nodeB kernel: [1239622.353668] block drbd30: short read > expecting header on sock: r=-512 > Nov 28 10:55:09 nodeB kernel: [1239622.353754] block drbd30: Creating new > current UUID > Nov 28 10:55:09 nodeB kernel: [1239622.388101] block drbd30: Connection > closed > Nov 28 10:55:09 nodeB kernel: [1239622.388107] block drbd30: conn( > NetworkFailure -> Unconnected ) > Nov 28 10:55:09 nodeB kernel: [1239622.388111] block drbd30: receiver > terminated > Nov 28 10:55:09 nodeB kernel: [1239622.388113] block drbd30: Restarting > drbd30_receiver > Nov 28 10:55:09 nodeB kernel: [1239622.388116] block drbd30: receiver > (re)started > Nov 28 10:55:09 nodeB kernel: [1239622.388119] block drbd30: conn( > Unconnected -> WFConnection ) > > I've also looked at the qemu, xend-hotplug, and xend logs and do not see > any telling errors. In xend.log I just see lines pertaining to VM's being > rebooted. > > As for GPLPV.. I haven't been able to reproduce the "network crashing" and > rebooting in the lab and probably won't be able to until I can get a more > robust production-like environment setup. Unfortunately I can't risk more > customer down time by attempting to setup NLB without the GPLPV drivers in > production. If I can manage to reproduce this in staging I will of course > attempt without the GPLPV drivers. > > -Greg _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |