[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Crashing / unable to start domUs due to high number of luns?
On Tue, Jan 31, 2012 at 01:42:23PM -0800, Nathan March wrote: > Hi All, > > We've got a xen setup based around a dell iscsi device with each xen > host having 2 lun's, we then run multipath on top of that. After adding > a couple new virtual disks the other day, a couple of our online stable > VM's suddenly hard locked up. Attaching to the console gave me nothing, > looked like they lost their disk devices. > > Attempting to restart them on the same dom0 failed with hot plug errors, > as did attempting to start them on a few different dom0's. After doing a > "multipath -F" to remove unused devices and manually bringing in just > the selected LUN's via "multipath diskname", I was able to successfully > start them. This initially made me think perhaps I was hitting some sort > of udev / multipath / iscsi device lun limit (136 luns, 8 paths per lun > = 1088 iscsi connections). Just to be clear, the problem occurred on > multiple dom0's at the same time so it definitely seems iscsi related. > > Now, a day later, I'm debugging this further and I'm again unable to > start VM's, even with all extra multipath devices removed. I rebooted > one of the dom0's and was able to successfully migrate our production > VM's off a broken server, so I've now got an empty dom0 that's unable to > start any vm's. > > Starting a VM results in the following in xend.log: > > [2012-01-31 13:06:16 12353] DEBUG (DevController:144) Waiting for 0. > [2012-01-31 13:06:16 12353] DEBUG (DevController:628) > hotplugStatusCallback /local/domain/0/backend/vif/35/0/hotplug-status. > [2012-01-31 13:07:56 12353] ERROR (SrvBase:88) Request wait_for_devices > failed. > Traceback (most recent call last): > File "/usr/lib64/python2.6/site-packages/xen/web/SrvBase.py", line > 85, in perform > return op_method(op, req) > File > "/usr/lib64/python2.6/site-packages/xen/xend/server/SrvDomain.py", line > 85, in op_wait_for_devices > return self.dom.waitForDevices() > File "/usr/lib64/python2.6/site-packages/xen/xend/XendDomainInfo.py", > line 1237, in waitForDevices > self.getDeviceController(devclass).waitForDevices() > File > "/usr/lib64/python2.6/site-packages/xen/xend/server/DevController.py", > line 140, in waitForDevices > return map(self.waitForDevice, self.deviceIDs()) > File > "/usr/lib64/python2.6/site-packages/xen/xend/server/DevController.py", > line 155, in waitForDevice > (devid, self.deviceClass)) > VmError: Device 0 (vif) could not be connected. Hotplug scripts not working. Was there anything in the kernel (dmesg) about vifs? What does your /proc/interrupts look like? Can you provide the dmesg that you get during startup. I am mainly looking for: NR_IRQS:16640 nr_irqs:1536 16 How many guests are your running when this happens? One theory is that your are running out dom0 interrupts. Thought I *think* that was made dynamic in 3.0.. Thought that does explain your iSCSI network wonky in the guest - was there anything in the dmesg when the guest started going bad? > [2012-01-31 13:07:56 12353] DEBUG (XendDomainInfo:3071) > XendDomainInfo.destroy: domid=35 > [2012-01-31 13:07:58 12353] DEBUG (XendDomainInfo:2401) Destroying > device model > > I tried turning up udev's log level but that didn't reveal anything. > Reading the xenstore for the vif doesn't show anything unusual either: > > ukxen1 ~ # xenstore-ls /local/domain/0/backend/vif/35 > 0 = "" > bridge = "vlan91" > domain = "nathanxenuk1" > handle = "0" > uuid = "2128d0b7-d50f-c2ad-4243-8a42bb598b81" > script = "/etc/xen/scripts/vif-bridge" > state = "1" > frontend = "/local/domain/35/device/vif/0" > mac = "00:16:3d:03:00:44" > online = "1" > frontend-id = "35" > > The bridge device (vlan91) exists, trying a different bridge doesn't > matter. Removing the VIF completely results in the same error for the > VBD. Adding debugging to the hotplug/network scripts didn't reveal > anything, it looks like they aren't even being executed yet. Nothing is > logged to xen-hotplug.log. OK, so that would imply the kernel hasn't been able to do the right thing. Hmm. What do you see when this happens with udev --monitor --kernel --udev --property ? > > The only thing I can think of that this may be related to, is gentoo > defaulted to a 10mb /dev which we filled up a few months back. We upped > the size to 50mb in the mount options and everything's been completely > stable since (~33 days). None of the /dev on the dom0's is higher than > 25% usage. Asides from adding the new luns, no changes have been made in > the past month. > > To try and test if removing some devices would solve anything, I tried > doing an "iscsiadm -m node --logout" and it promptly hard locked the > entire box. After a reboot, I was unable to reproduce the problem on > that particular dom0. > > I've still got 1 dom0 that's exhibiting the problem, if anyone is able > to suggest any further debugging steps? > > - Nathan > > > (XEN) Xen version 4.1.1 (root@) (gcc version 4.3.4 (Gentoo 4.3.4 p1.1, > pie-10.1.5) ) Mon Aug 29 16:24:12 PDT 2011 > > ukxen1 xen # xm info > host : ukxen1 > release : 3.0.3 > version : #4 SMP Thu Dec 22 12:44:22 PST 2011 > machine : x86_64 > nr_cpus : 24 > nr_nodes : 2 > cores_per_socket : 6 > threads_per_core : 2 > cpu_mhz : 2261 > hw_caps : > bfebfbff:2c100800:00000000:00003f40:029ee3ff:00000000:00000001:00000000 > virt_caps : hvm hvm_directio > total_memory : 98291 > free_memory : 91908 > free_cpus : 0 > xen_major : 4 > xen_minor : 1 > xen_extra : .1 > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > hvm-3.0-x86_32p hvm-3.0-x86_64 > xen_scheduler : credit > xen_pagesize : 4096 > platform_params : virt_start=0xffff800000000000 > xen_changeset : unavailable > xen_commandline : console=vga dom0_mem=1024M dom0_max_vcpus=1 > dom0_vcpus_pin=true > cc_compiler : gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5) > cc_compile_by : root > cc_compile_domain : > cc_compile_date : Mon Aug 29 16:24:12 PDT 2011 > xend_config_format : 4 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |