[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] block-attach crashes domU
I have run into a strange situation where a domain will not boot with a certain disk specified in the config file and trying to block-attach it after it starts results in the domain disappearing from the list and presumably simply crashing. I am running CentOS 5.4 with kernel 2.6.18-160.el5xen x86_64 For months everything worked perfectly with with these domains using an AoE SAN for the back-end. I have used this sort of setup for several years and it is great. But these domains in particular have been running for several months. Then 3 of the 4 domU's I run were really heavily slammed and became unresponsive and I ended up having to do an xm destroy on them. After that they refuse to come back up. One of my domU's has not been rebooted and it continues to work great with all 4 disk devices attached. Here is my domU config file: name = "db2" uuid = "f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff" maxmem = 16384 memory = 2048 vcpus = 4 bootloader = "/usr/bin/pygrub" on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ ] disk = [ "phy:/dev/etherd/e1.12,xvda,w", "phy:/dev/etherd/e2.12,xvdb,w", "phy:/dev/etherd/e3.1,xvdc,w", "phy:/dev/etherd/e4.1,xvdd,w" ] vif = [ "mac=00:16:3e:5b:5c:dd,bridge=dmz" ] If I boot the domU with this config file I get the following on boot: Red Hat nash version 5.1.19.6 starting Mounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Setting up hotplug. Creating block device nodes. Loading ehci-hcd.ko module Loading ohci-hcd.ko module Loading uhci-hcd.ko module USB Universal Host Controller Interface driver v3.0 Loading jbd.ko module Loading ext3.ko module Loading raid1.ko module md: raid1 personality registered for level 1 Loading xenblk.ko module Registering block device major 202 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 > xvdb: xvdb1 xvdb2 xvdb3 xvdb4 < xvdb5 > xvdc: xvdc1 kobject_add failed for xvda with -EEXIST, don't try to register things with the same name in the same directory. Call Trace: [<ffffffff803404ea>] kobject_add+0x170/0x19b [<ffffffff8025cfd5>] exact_lock+0x0/0x14 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff802fb4e2>] register_disk+0x43/0x190 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80336c3a>] add_disk+0x34/0x3d [<ffffffff88084ec9>] :xenblk:backend_changed+0x110/0x193 [<ffffffff803b32fa>] xenbus_read_driver_state+0x26/0x3b [<ffffffff803b4bdb>] xenwatch_thread+0x0/0x135 [<ffffffff803b402d>] xenwatch_handle_callback+0x15/0x48 [<ffffffff803b4cf7>] xenwatch_thread+0x11c/0x135 [<ffffffff8029bb44>] autoremove_wake_function+0x0/0x2e [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233bcd>] kthread+0xfe/0x132 [<ffffffff80260b2c>] child_rip+0xa/0x12 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4 [<ffffffff80233acf>] kthread+0x0/0x132 [<ffffffff80260b22>] child_rip+0x0/0x12 Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: [<ffffffff802fe512>] create_dir+0x11/0x1cf PGD 7f1c9067 PUD 7f1ca067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /block/ram0/dev CPU 1 Modules linked in: xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 9, comm: xenwatch Not tainted 2.6.18-164.el5xen #1 RIP: e030:[<ffffffff802fe512>] [<ffffffff802fe512>] create_dir+0x11/0x1cf RSP: e02b:ffff880000fbfda0 EFLAGS: 00010282 RAX: ffff88007f31b870 RBX: ffff88007f3cd4f0 RCX: ffff880000fbfdd8 RDX: ffff88007f3cd4f8 RSI: 0000000000000000 RDI: ffff88007f3cd4f0 RBP: ffff88007f3cd4f0 R08: 0000000000000001 R09: ffff88000114c000 R10: ffffffff8029b92c R11: ffff880000fbfbb0 R12: ffff88007f3cd4f0 R13: ffff880000fbfdd8 R14: 0000000000000000 R15: ffff88007f31b870 FS: 0000000000000000(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 The /dev/etherd/e4.1 backend to the xvdd device is present in the dom0 and works perfectly. I can access it from within the dom0 with no problem. Something is confused. I would really like to avoid rebooting the dom0's if at all possible. I have found that if I remove the "phy:/dev/etherd/e4.1,xvdd,w" from the disk = line the domU boots fine. But if I try to block-attach the missing device the domU dies instantly. I have been looking for logs that might explain something about why it died but I cannot find anything relevant. I have googled the "don't try to register thigns with the same name in the same directory" error and found a few references to it but none in the context of xen. Any advice would be greatly appreciated. -- Tracy Reed http://tracyreed.org Attachment:
pgpw5wJzZZOyv.pgp _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |