[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Enabling NLB is crashing VM's/DRBD
Hi All, We have a somewhat serious issue around NLB on Windows 2012 and Xen. First, let me describe our environment and then I'll let you know what's wrong. 2 X Debian-squeeze boxes running the latest provided AMD64 Xen kernel and about 100GB of RAM. These boxes are connected via infiniband and DRBD is running over this(IPoIB). Each VPS runs on a mirrored DRBD devices. Each DRBD device sits on 2 logical volumes. One for data and one for metadata. The hypervisors exclusively run Windows VM's(Server 2008 R2 and 2012). The VM's are utilizing the GPLPV drivers(PCI,VBD,Net,etc). We are using network-bridge. So here is the trouble. We had somebody trying to setup Windows NLB. When adding a host it would cause the VM to freeze but also disconnect the DRBD devices. Everything recovers but the DRBD devices resync and a bunch of VM's on the one side(the side with the VM that hangs up) get rebooted by Xen. Here is what we are seeing in messages: eth0: port 3(nlb2.e0) entering disabled state eth0: port 3(nlb2.e0) entering disabled state frontend_changed: backend/vif/65/0: prepare for reconnect device nlb.e0 entered promiscuous mode block drbd29: sock was shut down by peer block drbd29: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) block drbd24: sock was shut down by peer block drbd24: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) block drbd29: Creating new current UUID block drbd30: sock was shut down by peer block drbd30: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) pdsk( UpToDate -> DUnknown ) .... and on and on and on with the DRBD disconnecting block drbd29: md_sync_timer expired! Worker calls drbd_md_sync(). block drbd21: md_sync_timer expired! Worker calls drbd_md_sync(). .... lots of that block drbd24: Terminating drbd24_asender block drbd21: asender terminated block drbd21: Terminating drbd21_asender .... eth0: port 3(nlb2.e0) entering forwarding state .... block drbd1: Handshake successful: Agreed network protocol version 91 block drbd1: conn( WFConnection -> WFReportParams ) block drbd38: Handshake successful: Agreed network protocol version 91 block drbd38: conn( WFConnection -> WFReportParams ) block drbd38: Starting asender thread (from drbd38_receiver [16250]) block drbd1: Starting asender thread (from drbd1_receiver [18278]) ... Then lots of stuff for the DRBD devices reconnecting and syncing. This happened three times, each time the user was attempting to add the second node into NLB. I can reproduce the network adapter dying(Becomes disabled and is unusable until reboot) in the lab on Server 2012 unless I follow specific steps, but not the DRBD dying. I can get NLB working but I'm mostly concerned about one persons ability to effectively crash 8 other VM's. It looks like whatever is going on is somehow effecting my DRBD connection. Has anyone seen anything like this before? Thanks, Greg _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |