[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Xen intermittently fails to release HVM domU VBDs, preventing Heartbeat node fail-over
Hi, Intermittently, upon domU shutdown, Xen appears to fail to release domU VBD handles. Consequentially, LVs in the VG remain open, and the VG cannot be disabled. This effectively prevents manual failover. We suspect a bug in Xen or dm-qemu. Regards, Erich System environment information: - SLES 10 SP2 (x86_64) - Kernel 2.6.16.60-0.21-xen - Xen 3.2.0_16718_14-0.4 - LVM 2.02.17-7.19 - Heartbeat 2.1.3-0.9 Configuration details: - Xen HVM domU - Xen VBD backed by LVM LV in dom0 - Xen resources and LVM VG managed by Heartbeat - On node failover, Heartbeat stops domU, deactivates LVM VG, activates VG on peer, starts domU on peer Relevant error messages concurrent with the issue: Xend log (note error occurring during domain_destroy()): [2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1965) XendDomainInfo.destroyDomain(21) [2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1965) XendDomainInfo.destroyDomain(24) [2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1588) Removing vif/0 [2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:590) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:1588) Removing vbd/51712 [2008-06-20 13:55:32 16615] DEBUG (XendDomainInfo:590) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712 [2008-06-20 13:55:33 16615] ERROR (XendDomainInfo:1977) XendDomainInfo.destroy: xc.domain_destroy failed. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1972, in destroyDomain xc.domain_destroy(self.domid) Error: (3, 'No such process') [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing vbd/51728 [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:590) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51728 [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1575) Destroying device model [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing vkbd/0 [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:590) XendDomainInfo.destroyDevice: deviceClass = vkbd, device = vkbd/0 [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing vfb/0 [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:590) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 [2008-06-20 13:55:33 16615] INFO (XendDomainInfo:1295) Domain has shutdown: name=hostemplate id=23 reason=poweroff. [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1582) Releasing devices [2008-06-20 13:55:33 16615] DEBUG (XendDomainInfo:1588) Removing console/0 Xend debug log: Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/server/Hald.py", line 55, in shutdown os.kill(self.pid, signal.SIGINT) OSError: [Errno 3] No such process Heartbeat debug log (note domain having terminated successfully from Heartbeat's view, but subsequent VG deactivation failure): Jun 20 13:55:34 xenha01 lrmd: [13229]: info: RA output: (res_xen_xen-ad:stop:stdout) Domain xen-ad terminated All domains terminated - and subsequently Jun 20 13:55:38 xenha01 tengine: [10035]: info: send_rsc_command: Initiating action 17: res_lvm_xendomains01_stop_0 on xenha01 Jun 20 13:55:38 xenha01 crmd: [13232]: info: do_lrm_rsc_op: Performing op=res_lvm_xendomains01_stop_0 key=17:13:78f873ed-af08-4add-bb36-3798cd1a4a22) Jun 20 13:55:38 xenha01 lrmd: [13229]: info: rsc:res_lvm_xendomains01: stop Jun 20 13:55:38 xenha01 crmd: [13232]: info: process_lrm_event: LRM operation res_lvm_xendomains01_stop_0 (call=190, rc=1) complete Jun 20 13:55:38 xenha01 tengine: [10035]: WARN: update_failcount: Updating failcount for res_lvm_xendomains01 on 519aa0b0-a947-47e9-ace9-d52030ef98a9 after failed stop: rc=1 Jun 20 13:55:38 xenha01 tengine: [10035]: info: match_graph_event: Action res_lvm_xendomains01_stop_0 (17) confirmed on xenha01 (rc=4) Jun 20 13:55:38 xenha01 pengine: [10036]: ERROR: unpack_rsc_op: Remapping res_lvm_xendomains01_stop_0 (rc=1) on xenha01 to an ERROR Jun 20 13:55:38 xenha01 pengine: [10036]: WARN: unpack_rsc_op: Processing failed op res_lvm_xendomains01_stop_0 on xenha01: Error _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |