[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Enabling NLB is crashing VM's/DRBD



On Fri, Nov 30, 2012 at 11:11:11AM +1300, Greg Zapp wrote:
>    HI,
> 
>    We are running Debian's provided xen-hypervisor-4.0-amd64(4.0.
>    1-4).  The kernel is 2.6.32-5-xen-amd64(2.6.32-46) from Debian.
> 
>    The previously posted log lines were from the dom0's /var/log/messages.
>    The only thing I'm seeing form xm dmesg is the following:
>    (XEN) grant_table.c:1717:d0 Bad grant reference
> 
>    I've also picked up on some more entries from syslog that were not present
>    in messages.  Here is what's present in syslog.  Time seems to be sync'd
>    to the second on both machines:
>    Nov 28 10:55:03 nodeA kernel: [1239467.400293] eth0: port 11(nlb2.e0)
>    entering disabled state
>    Nov 28 10:55:03 nodeA kernel: [1239467.400516] eth0: port 11(nlb2.e0)
>    entering disabled state

How's your networking set up? 

I hope the the Windows NLB VMs aren't using the same bridge/VLAN as DRBD is 
using? 


-- Pasi


>    Nov 28 10:55:04 nodeA kernel: [1239467.731442] frontend_changed:
>    backend/vif/73/0: prepare for reconnect
>    Nov 28 10:55:04 nodeA logger: /etc/xen/scripts/vif-bridge: offline
>    XENBUS_PATH=backend/vif/73/0
>    Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: brctl delif
>    eth0 nlb2.e0 failed
>    Nov 28 10:55:05 nodeA logger: /etc/xen/scripts/vif-bridge: ifconfig
>    nlb2.e0 down failed
>    Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: Successful
>    vif-bridge offline for nlb2.e0, bridge eth0.
>    Nov 28 10:55:06 nodeA logger: /etc/xen/scripts/vif-bridge: online
>    XENBUS_PATH=backend/vif/73/0
>    Nov 28 10:55:08 nodeA kernel: [1239471.758583] device nlb2.e0 entered
>    promiscuous mode
>    Nov 28 10:55:10 nodeA kernel: [1239473.795967] block drbd23: sock was shut
>    down by peer
>    Nov 28 10:55:27 nodeA kernel: [1239473.795973] block drbd23: peer( Primary
>    -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>    Nov 28 10:55:27 nodeA kernel: [1239473.795980] block drbd23: short read
>    expecting header on sock: r=0
>    Nov 28 10:55:27 nodeA kernel: [1239474.009951] block drbd31: sock was shut
>    down by peer
> 
>    Nov 28 10:55:09 nodeB kernel: [1239622.217505] block drbd23: PingAck did
>    not arrive in time.
>    Nov 28 10:55:09 nodeB kernel: [1239622.217542] block drbd23: peer(
>    Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate
>    -> DUnknown )
>    Nov 28 10:55:09 nodeB kernel: [1239622.217551] block drbd23: asender
>    terminated
>    Nov 28 10:55:09 nodeB kernel: [1239622.217554] block drbd23: Terminating
>    drbd23_asender
>    Nov 28 10:55:09 nodeB kernel: [1239622.217795] block drbd23: short read
>    expecting header on sock: r=-512
>    Nov 28 10:55:09 nodeB kernel: [1239622.217887] block drbd23: Creating new
>    current UUID
>    Nov 28 10:55:09 nodeB kernel: [1239622.218118] block drbd23: Connection
>    closed
>    Nov 28 10:55:09 nodeB kernel: [1239622.218125] block drbd23: conn(
>    NetworkFailure -> Unconnected )
>    Nov 28 10:55:09 nodeB kernel: [1239622.218135] block drbd23: receiver
>    terminated
>    Nov 28 10:55:09 nodeB kernel: [1239622.218137] block drbd23: Restarting
>    drbd23_receiver
>    Nov 28 10:55:09 nodeB kernel: [1239622.218140] block drbd23: receiver
>    (re)started
>    Nov 28 10:55:09 nodeB kernel: [1239622.218143] block drbd23: conn(
>    Unconnected -> WFConnection )
>    Nov 28 10:55:09 nodeB kernel: [1239622.353589] block drbd30: PingAck did
>    not arrive in time.
>    Nov 28 10:55:09 nodeB kernel: [1239622.353627] block drbd30: peer(
>    Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate
>    -> DUnknown )
>    Nov 28 10:55:09 nodeB kernel: [1239622.353637] block drbd30: asender
>    terminated
>    Nov 28 10:55:09 nodeB kernel: [1239622.353639] block drbd30: Terminating
>    drbd30_asender
>    Nov 28 10:55:09 nodeB kernel: [1239622.353668] block drbd30: short read
>    expecting header on sock: r=-512
>    Nov 28 10:55:09 nodeB kernel: [1239622.353754] block drbd30: Creating new
>    current UUID
>    Nov 28 10:55:09 nodeB kernel: [1239622.388101] block drbd30: Connection
>    closed
>    Nov 28 10:55:09 nodeB kernel: [1239622.388107] block drbd30: conn(
>    NetworkFailure -> Unconnected )
>    Nov 28 10:55:09 nodeB kernel: [1239622.388111] block drbd30: receiver
>    terminated
>    Nov 28 10:55:09 nodeB kernel: [1239622.388113] block drbd30: Restarting
>    drbd30_receiver
>    Nov 28 10:55:09 nodeB kernel: [1239622.388116] block drbd30: receiver
>    (re)started
>    Nov 28 10:55:09 nodeB kernel: [1239622.388119] block drbd30: conn(
>    Unconnected -> WFConnection )
> 
>    I've also looked at the qemu, xend-hotplug, and xend logs and do not see
>    any telling errors.  In xend.log I just see lines pertaining to VM's being
>    rebooted.
> 
>    As for GPLPV.. I haven't been able to reproduce the "network crashing" and
>    rebooting in the lab and probably won't be able to until I can get a more
>    robust production-like environment setup.  Unfortunately I can't risk more
>    customer down time by attempting to setup NLB without the GPLPV drivers in
>    production.  If I can manage to reproduce this in staging I will of course
>    attempt without the GPLPV drivers.
> 
>    -Greg

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.