[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Enabling NLB is crashing VM's/DRBD



HI,

We are running Debian's provided xen-hypervisor-4.0-amd64(4.0.
1-4).  The kernel is 2.6.32-5-xen-amd64(2.6.32-46) from Debian.

The previously posted log lines were from the dom0's /var/log/messages.  The only thing I'm seeing form xm dmesg is the following:
(XEN) grant_table.c:1717:d0 Bad grant reference

I've also picked up on some more entries from syslog that were not present in messages.  Here is what's present in syslog.  Time seems to be sync'd to the second on both machines:
Nov 28 10:55:03 nodeA kernel: [1239467.400293] eth0: port 11(nlb2.e0) entering disabled state
Nov 28 10:55:03 nodeA kernel: [1239467.400516] eth0: port 11(nlb2.e0) entering disabled state
Nov 28 10:55:04 nodeA kernel: [1239467.731442] frontend_changed: backend/vif/73/0: prepare for reconnect
Nov 28 10:55:04 nodeA logger: /etc/xen/scripts/vif-bridge: offline XENBUS_PATH=backend/vif/73/0
Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: brctl delif eth0 nlb2.e0 failed
Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: ifconfig nlb2.e0 down failed
Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for nlb2.e0, bridge eth0.
Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: online XENBUS_PATH=backend/vif/73/0
Nov 28 10:55:08 nodeA kernel: [1239471.758583] device nlb2.e0 entered promiscuous mode
Nov 28 10:55:10 nodeA kernel: [1239473.795967] block drbd23: sock was shut down by peer
Nov 28 10:55:27 nodeA kernel: [1239473.795973] block drbd23: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
Nov 28 10:55:27 nodeA kernel: [1239473.795980] block drbd23: short read expecting header on sock: r=0
Nov 28 10:55:27 nodeA kernel: [1239474.009951] block drbd31: sock was shut down by peer


Nov 28 10:55:09 nodeB kernel: [1239622.217505] block drbd23: PingAck did not arrive in time.
Nov 28 10:55:09 nodeB kernel: [1239622.217542] block drbd23: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Nov 28 10:55:09 nodeB kernel: [1239622.217551] block drbd23: asender terminated
Nov 28 10:55:09 nodeB kernel: [1239622.217554] block drbd23: Terminating drbd23_asender
Nov 28 10:55:09 nodeB kernel: [1239622.217795] block drbd23: short read expecting header on sock: r=-512
Nov 28 10:55:09 nodeB kernel: [1239622.217887] block drbd23: Creating new current UUID
Nov 28 10:55:09 nodeB kernel: [1239622.218118] block drbd23: Connection closed
Nov 28 10:55:09 nodeB kernel: [1239622.218125] block drbd23: conn( NetworkFailure -> Unconnected )
Nov 28 10:55:09 nodeB kernel: [1239622.218135] block drbd23: receiver terminated
Nov 28 10:55:09 nodeB kernel: [1239622.218137] block drbd23: Restarting drbd23_receiver
Nov 28 10:55:09 nodeB kernel: [1239622.218140] block drbd23: receiver (re)started
Nov 28 10:55:09 nodeB kernel: [1239622.218143] block drbd23: conn( Unconnected -> WFConnection )
Nov 28 10:55:09 nodeB kernel: [1239622.353589] block drbd30: PingAck did not arrive in time.
Nov 28 10:55:09 nodeB kernel: [1239622.353627] block drbd30: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Nov 28 10:55:09 nodeB kernel: [1239622.353637] block drbd30: asender terminated
Nov 28 10:55:09 nodeB kernel: [1239622.353639] block drbd30: Terminating drbd30_asender
Nov 28 10:55:09 nodeB kernel: [1239622.353668] block drbd30: short read expecting header on sock: r=-512
Nov 28 10:55:09 nodeB kernel: [1239622.353754] block drbd30: Creating new current UUID
Nov 28 10:55:09 nodeB kernel: [1239622.388101] block drbd30: Connection closed
Nov 28 10:55:09 nodeB kernel: [1239622.388107] block drbd30: conn( NetworkFailure -> Unconnected )
Nov 28 10:55:09 nodeB kernel: [1239622.388111] block drbd30: receiver terminated
Nov 28 10:55:09 nodeB kernel: [1239622.388113] block drbd30: Restarting drbd30_receiver
Nov 28 10:55:09 nodeB kernel: [1239622.388116] block drbd30: receiver (re)started
Nov 28 10:55:09 nodeB kernel: [1239622.388119] block drbd30: conn( Unconnected -> WFConnection )

I've also looked at the qemu, xend-hotplug, and xend logs and do not see any telling errors.  In xend.log I just see lines pertaining to VM's being rebooted.

As for GPLPV.. I haven't been able to reproduce the "network crashing" and rebooting in the lab and probably won't be able to until I can get a more robust production-like environment setup.  Unfortunately I can't risk more customer down time by attempting to setup NLB without the GPLPV drivers in production.  If I can manage to reproduce this in staging I will of course attempt without the GPLPV drivers.

-Greg
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.