[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] block-attach crashes domU



I have run into a strange situation where a domain will not boot with
a certain disk specified in the config file and trying to block-attach
it after it starts results in the domain disappearing from the list
and presumably simply crashing. 

I am running CentOS 5.4 with kernel 2.6.18-160.el5xen x86_64

For months everything worked perfectly with with these domains using
an AoE SAN for the back-end. I have used this sort of setup for
several years and it is great. But these domains in particular have
been running for several months. Then 3 of the 4 domU's I run were
really heavily slammed and became unresponsive and I ended up having
to do an xm destroy on them. After that they refuse to come back
up. One of my domU's has not been rebooted and it continues to work
great with all 4 disk devices attached.

Here is my domU config file:

name = "db2"
uuid = "f253cab5-c3de-c1f7-e735-5d4f0bfcd3ff"
maxmem = 16384
memory = 2048
vcpus = 4
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [  ]
disk = [ "phy:/dev/etherd/e1.12,xvda,w", "phy:/dev/etherd/e2.12,xvdb,w", 
"phy:/dev/etherd/e3.1,xvdc,w", "phy:/dev/etherd/e4.1,xvdd,w" ]
vif = [ "mac=00:16:3e:5b:5c:dd,bridge=dmz" ]

If I boot the domU with this config file I get the following on boot:

Red Hat nash version 5.1.19.6 starting
Mounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Setting up hotplug.
Creating block device nodes.
Loading ehci-hcd.ko module
Loading ohci-hcd.ko module
Loading uhci-hcd.ko module
USB Universal Host Controller Interface driver v3.0
Loading jbd.ko module
Loading ext3.ko module
Loading raid1.ko module
md: raid1 personality registered for level 1
Loading xenblk.ko module
Registering block device major 202
 xvda: xvda1 xvda2 xvda3 xvda4 < xvda5 >
 xvdb: xvdb1 xvdb2 xvdb3 xvdb4 < xvdb5 >
 xvdc: xvdc1
kobject_add failed for xvda with -EEXIST, don't try to register things with the 
same name in the same directory.

Call Trace:
 [<ffffffff803404ea>] kobject_add+0x170/0x19b
 [<ffffffff8025cfd5>] exact_lock+0x0/0x14
 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff802fb4e2>] register_disk+0x43/0x190
 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80336c3a>] add_disk+0x34/0x3d
 [<ffffffff88084ec9>] :xenblk:backend_changed+0x110/0x193
 [<ffffffff803b32fa>] xenbus_read_driver_state+0x26/0x3b
 [<ffffffff803b4bdb>] xenwatch_thread+0x0/0x135
 [<ffffffff803b402d>] xenwatch_handle_callback+0x15/0x48
 [<ffffffff803b4cf7>] xenwatch_thread+0x11c/0x135
 [<ffffffff8029bb44>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233bcd>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029b92c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233acf>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12

Unable to handle kernel NULL pointer dereference at 0000000000000010 RIP: 
 [<ffffffff802fe512>] create_dir+0x11/0x1cf
PGD 7f1c9067 PUD 7f1ca067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /block/ram0/dev
CPU 1 
Modules linked in: xenblk raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 9, comm: xenwatch Not tainted 2.6.18-164.el5xen #1
RIP: e030:[<ffffffff802fe512>]  [<ffffffff802fe512>] create_dir+0x11/0x1cf
RSP: e02b:ffff880000fbfda0  EFLAGS: 00010282
RAX: ffff88007f31b870 RBX: ffff88007f3cd4f0 RCX: ffff880000fbfdd8
RDX: ffff88007f3cd4f8 RSI: 0000000000000000 RDI: ffff88007f3cd4f0
RBP: ffff88007f3cd4f0 R08: 0000000000000001 R09: ffff88000114c000
R10: ffffffff8029b92c R11: ffff880000fbfbb0 R12: ffff88007f3cd4f0
R13: ffff880000fbfdd8 R14: 0000000000000000 R15: ffff88007f31b870
FS:  0000000000000000(0000) GS:ffffffff805ca080(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000

The /dev/etherd/e4.1 backend to the xvdd device is present in the dom0
and works perfectly. I can access it from within the dom0 with no
problem.

Something is confused. I would really like to avoid rebooting the
dom0's if at all possible.

I have found that if I remove the "phy:/dev/etherd/e4.1,xvdd,w" from
the disk = line the domU boots fine. But if I try to block-attach the
missing device the domU dies instantly. 

I have been looking for logs that might explain something about why it
died but I cannot find anything relevant. I have googled the "don't
try to register thigns with the same name in the same directory" error
and found a few references to it but none in the context of xen.

Any advice would be greatly appreciated.

-- 
Tracy Reed
http://tracyreed.org

Attachment: pgpw5wJzZZOyv.pgp
Description: PGP signature

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.