[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] block-attach crashes domU



On Thu, Jan 21, 2010 at 10:21:26PM -0800, Tracy Reed wrote:
> I have run into a strange situation where a domain will not boot with
> a certain disk specified in the config file and trying to block-attach
> it after it starts results in the domain disappearing from the list
> and presumably simply crashing. 
> 
> I am running CentOS 5.4 with kernel 2.6.18-160.el5xen x86_64
> 
> For months everything worked perfectly with with these domains using
> an AoE SAN for the back-end. I have used this sort of setup for
> several years and it is great. But these domains in particular have
> been running for several months. Then 3 of the 4 domU's I run were
> really heavily slammed and became unresponsive and I ended up having
> to do an xm destroy on them. After that they refuse to come back
> up. One of my domU's has not been rebooted and it continues to work
> great with all 4 disk devices attached.
> 
> Here is my domU config file:
> 
> name = "db2"
> uuid = "f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff"
> maxmem = 16384
> memory = 2048
> vcpus = 4
> bootloader = "/usr/bin/pygrub"
> on_poweroff = "destroy"
> on_reboot = "restart"
> on_crash = "restart"
> vfb = [  ]
> disk = [ "phy:/dev/etherd/e1.12,xvda,w", "phy:/dev/etherd/e2.12,xvdb,w", 
> "phy:/dev/etherd/e3.1,xvdc,w", "phy:/dev/etherd/e4.1,xvdd,w" ]
> vif = [ "mac=00:16:3e:5b:5c:dd,bridge=dmz" ]
> 
> If I boot the domU with this config file I get the following on boot:
> 
> Red Hat nash version 5.1.19.6 starting
> Mounting proc filesystem
> Mounting sysfs filesystem
> Creating /dev
> Creating initial device nodes
> Setting up hotplug.
> Creating block device nodes.
> Loading ehci-hcd.ko module
> Loading ohci-hcd.ko module
> Loading uhci-hcd.ko module
> USB Universal Host Controller Interface driver v3.0
> Loading jbd.ko module
> Loading ext3.ko module
> Loading raid1.ko module
> md: raid1 personality registered for level 1
> Loading xenblk.ko module
> Registering block device major 202
>  xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 >
>  xvdb: xvdb1 xvdb2 xvdb3 xvdb4 < xvdb5 >
>  xvdc: xvdc1
> kobject_add failed for xvda with -EEXIST, don't try to register things with 
> the same name in the same directory.
> 
> Call Trace:
>  [<ffffffff803404ea>] kobject_add+0x170/0x19b
>  [<ffffffff8025cfd5>] exact_lock+0x0/0x14
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff802fb4e2>] register_disk+0x43/0x190
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80336c3a>] add_disk+0x34/0x3d
>  [<ffffffff88084ec9>] :xenblk:backend_changed+0x110/0x193
>  [<ffffffff803b32fa>] xenbus_read_driver_state+0x26/0x3b
>  [<ffffffff803b4bdb>] xenwatch_thread+0x0/0x135
>  [<ffffffff803b402d>] xenwatch_handle_callback+0x15/0x48
>  [<ffffffff803b4cf7>] xenwatch_thread+0x11c/0x135
>  [<ffffffff8029bb44>] autoremove_wake_function+0x0/0x2e
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80233bcd>] kthread+0xfe/0x132
>  [<ffffffff80260b2c>] child_rip+0xa/0x12
>  [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
>  [<ffffffff80233acf>] kthread+0x0/0x132
>  [<ffffffff80260b22>] child_rip+0x0/0x12
> 
> Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: 
>  [<ffffffff802fe512>] create_dir+0x11/0x1cf
> PGD 7f1c9067 PUD 7f1ca067 PMD 0 
> Oops: 0000 [1] SMP 
> last sysfs file: /block/ram0/dev
> CPU 1 
> Modules linked in: xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
> Pid: 9, comm: xenwatch Not tainted 2.6.18-164.el5xen #1
> RIP: e030:[<ffffffff802fe512>]  [<ffffffff802fe512>] create_dir+0x11/0x1cf
> RSP: e02b:ffff880000fbfda0  EFLAGS: 00010282
> RAX: ffff88007f31b870 RBX: ffff88007f3cd4f0 RCX: ffff880000fbfdd8
> RDX: ffff88007f3cd4f8 RSI: 0000000000000000 RDI: ffff88007f3cd4f0
> RBP: ffff88007f3cd4f0 R08: 0000000000000001 R09: ffff88000114c000
> R10: ffffffff8029b92c R11: ffff880000fbfbb0 R12: ffff88007f3cd4f0
> R13: ffff880000fbfdd8 R14: 0000000000000000 R15: ffff88007f31b870
> FS:  0000000000000000(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> 
> The /dev/etherd/e4.1 backend to the xvdd device is present in the dom0
> and works perfectly. I can access it from within the dom0 with no
> problem.
> 
> Something is confused. I would really like to avoid rebooting the
> dom0's if at all possible.
> 
> I have found that if I remove the "phy:/dev/etherd/e4.1,xvdd,w" from
> the disk = line the domU boots fine. But if I try to block-attach the
> missing device the domU dies instantly. 
> 
> I have been looking for logs that might explain something about why it
> died but I cannot find anything relevant. I have googled the "don't
> try to register thigns with the same name in the same directory" error
> and found a few references to it but none in the context of xen.
> 
> Any advice would be greatly appreciated.
> 

Does it work if you attach some local LVM volume or file image (non-AOE) as 
xvdd? 

Do you get errors in dom0 "dmesg"? How about dom0 /var/log/messages? 
Do you get errors in dom0 "xm log" ? How about "xm dmesg"?

-- Pasi



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.