[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Enabling NLB is crashing VM's/DRBD
On Thu, Nov 29, 2012 at 01:12:50PM +1300, Greg Zapp wrote: > Hi All, > Hello, > We have a somewhat serious issue around NLB on Windows 2012 and Xen. > First, let me describe our environment and then I'll let you know what's > wrong. > > 2 X Debian-squeeze boxes running the latest provided AMD64 Xen kernel and > about 100GB of RAM. You haven't provided enough information.. - What Xen version are you running? - What dom0 kernel version are you running? > These boxes are connected via infiniband and DRBD is running over > this(IPoIB). > Each VPS runs on a mirrored DRBD devices. > Each DRBD device sits on 2 logical volumes. One for data and one for > metadata. > The hypervisors exclusively run Windows VM's(Server 2008 R2 and 2012). > The VM's are utilizing the GPLPV drivers(PCI,VBD,Net,etc). > We are using network-bridge. > > So here is the trouble. We had somebody trying to setup Windows NLB. > When adding a host it would cause the VM to freeze but also disconnect the > DRBD devices. Everything recovers but the DRBD devices resync and a bunch > of VM's on the one side(the side with the VM that hangs up) get rebooted > by Xen. Here is what we are seeing in messages: > > eth0: port 3(nlb2.e0) entering disabled state > eth0: port 3(nlb2.e0) entering disabled state > frontend_changed: backend/vif/65/0: prepare for reconnect > device nlb.e0 entered promiscuous mode > block drbd29: sock was shut down by peer > block drbd29: peer( Secondary -> Unknown ) conn( Connected -> BrokenPipe ) > pdsk( UpToDate -> DUnknown ) > block drbd24: sock was shut down by peer > block drbd24: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) > pdsk( UpToDate -> DUnknown ) > block drbd29: Creating new current UUID > block drbd30: sock was shut down by peer > block drbd30: peer( Primary -> Unknown ) conn( Connected -> BrokenPipe ) > pdsk( UpToDate -> DUnknown ) > .... and on and on and on with the DRBD disconnecting > block drbd29: md_sync_timer expired! Worker calls drbd_md_sync(). > block drbd21: md_sync_timer expired! Worker calls drbd_md_sync(). > .... lots of that > block drbd24: Terminating drbd24_asender > block drbd21: asender terminated > block drbd21: Terminating drbd21_asender > .... > eth0: port 3(nlb2.e0) entering forwarding state > .... > block drbd1: Handshake successful: Agreed network protocol version 91 > block drbd1: conn( WFConnection -> WFReportParams ) > block drbd38: Handshake successful: Agreed network protocol version 91 > block drbd38: conn( WFConnection -> WFReportParams ) > block drbd38: Starting asender thread (from drbd38_receiver [16250]) > block drbd1: Starting asender thread (from drbd1_receiver [18278]) > ... Then lots of stuff for the DRBD devices reconnecting and syncing. > > This happened three times, each time the user was attempting to add the > second node into NLB. I can reproduce the network adapter dying(Becomes > disabled and is unusable until reboot) in the lab on Server 2012 unless I > follow specific steps, but not the DRBD dying. I can get NLB working but > I'm mostly concerned about one persons ability to effectively crash 8 > other VM's. It looks like whatever is going on is somehow effecting my > DRBD connection. Has anyone seen anything like this before? > Does it happen without GPLPV drivers? Try using plain Intel e1000 emulated NICs in the Windows VMs. Any errors in dom0 kernel dmesg? How about in Xen dmesg? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |