 
	
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Crashing / unable to start domUs due to high number of luns?
 On 1/31/2012 5:30 PM, Konrad Rzeszutek Wilk wrote: On Tue, Jan 31, 2012 at 01:42:23PM -0800, Nathan March wrote: Was running approximately 15 guests, although this persisted after migrating them off. Nothing in dmesg (dom0 dmesg or xm dmesg) that looked abnormal at all, no references to vifs. Asides from the inability to start a VM, I couldn't seem to find any sort of error anywhere. All the hosts show the same irq counts:[ 34.903763] NR_IRQS:4352 nr_irqs:4352 16 Unfortunately I'm not able to reproduce this now, but I've posted several different copies of /proc/interrupts here: http://pastebin.com/n7PWNeaZ Full xm / kernel dmesg is uploaded here: http://pastebin.com/AtCvFBDS [2012-01-31 13:07:56 12353] DEBUG (XendDomainInfo:3071) XendDomainInfo.destroy: domid=35 [2012-01-31 13:07:58 12353] DEBUG (XendDomainInfo:2401) Destroying device model I tried turning up udev's log level but that didn't reveal anything. Reading the xenstore for the vif doesn't show anything unusual either: ukxen1 ~ # xenstore-ls /local/domain/0/backend/vif/35 0 = "" bridge = "vlan91" domain = "nathanxenuk1" handle = "0" uuid = "2128d0b7-d50f-c2ad-4243-8a42bb598b81" script = "/etc/xen/scripts/vif-bridge" state = "1" frontend = "/local/domain/35/device/vif/0" mac = "00:16:3d:03:00:44" online = "1" frontend-id = "35" The bridge device (vlan91) exists, trying a different bridge doesn't matter. Removing the VIF completely results in the same error for the VBD. Adding debugging to the hotplug/network scripts didn't reveal anything, it looks like they aren't even being executed yet. Nothing is logged to xen-hotplug.log.OK, so that would imply the kernel hasn't been able to do the right thing. Hmm. What do you see when this happens with udev --monitor --kernel --udev --property ? The remaining server I thought was doing this is apparently not (I was probably mistaken), so the 2 that were definitely doing it have been rebooted and I can't reproduce this at the moment. I've been abusing a free server all morning with a loop to spawn/shutdown a VM repeatedly and flush / rescan multipath to see if I can reproduce this again. No luck so far unfortunately, but I'll keep trying. The only thing I can think of that this may be related to, is gentoo defaulted to a 10mb /dev which we filled up a few months back. We upped the size to 50mb in the mount options and everything's been completely stable since (~33 days). None of the /dev on the dom0's is higher than 25% usage. Asides from adding the new luns, no changes have been made in the past month. To try and test if removing some devices would solve anything, I tried doing an "iscsiadm -m node --logout" and it promptly hard locked the entire box. After a reboot, I was unable to reproduce the problem on that particular dom0. I've still got 1 dom0 that's exhibiting the problem, if anyone is able to suggest any further debugging steps? - Nathan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel -- Nathan March<nathan@xxxxxx> Gossamer Threads Inc. http://www.gossamer-threads.com/ Tel: (604) 687-5804 Fax: (604) 687-5806 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel 
 
 | 
|  | Lists.xenproject.org is hosted with RackSpace, monitoring our |