[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Error: Device 0 (vif) could notbeconnected. Hotplugscripts not working



Ok, now this is starting to get interesting. I previously had xen-netback statically compiled into the kernel.
It's hard to debug static drivers, so I changed it to compile as a module. And lo and behold, the kernel oops disappeared.

It is not a stable solution though. Sometimes I still get the same oops, it seems to be a race condition.
I'm running 2.6.32.14 from Jeremy's xen/stable-2.6.32.x


On 01.06.2010 11:53, Helmut Wieser wrote:
Doesn't seem to make a difference.
I even downgraded to udevd 141, no change.

I found my problem here, and applied the patch from http://lists.xensource.com/archives/html/xen-devel/2010-05/msg01462.html
But as it's incomplete it didn't help me with my configuration.
I even tried to compile 2.6.32.14 and still have the same issue.

This is the relevant part of my drivers/xen/netback/netbus.c:
static int netback_uevent(struct xenbus_device *xdev, struct kobj_uevent_env *env)
{
        struct backend_info *be;
        struct xen_netif *netif;
        char *val;

        DPRINTK("netback_uevent");

        be = dev_get_drvdata(&xdev->dev);
        if (!be)
                return 0;
        netif = be->netif;

        val = xenbus_read(XBT_NIL, xdev->nodename, "script", NULL);
        if (IS_ERR(val)) {
                int err = PTR_ERR(val);
                xenbus_dev_fatal(xdev, err, "reading script");
                return err;
        }
        else {
                if (add_uevent_var(env, "script=%s", val)) {
                        kfree(val);
                        return -ENOMEM;
                }
                kfree(val);
        }

        if (add_uevent_var(env, "vif=%s", netif->dev->name))
                return -ENOMEM;

        return 0;
}

This is the dmesg when I start a hvm domU for the first time:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
IP: [<ffffffff8123610a>] netback_uevent+0x8e/0xbf
PGD 1e2bc067 PUD 1dd03067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/vif-1-0/uevent
CPU 7
Modules linked in: bridge stp llc ipv6 xen_netfront firewire_sbp2 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd tpm_tis soundcore tpm serio_raw snd_page_alloc pcspkr tpm_bios wmi firewire_ohci usb_storage firewire_core crc_itu_t tg3 floppy [last unloaded: scsi_wait_scan]
Pid: 2141, comm: udevd Not tainted 2.6.32.14 #6 HP Z600 Workstation
RIP: e030:[<ffffffff8123610a>]  [<ffffffff8123610a>] netback_uevent+0x8e/0xbf
RSP: e02b:ffff88001d21fda8  EFLAGS: 00010246
RAX: 00200000000000c1 RBX: ffff88001cccde00 RCX: 0000000000800046
RDX: ffff88001d7e3b00 RSI: ffffea00006739a8 RDI: 00200000000002c0
RBP: ffff88001d21fdc8 R08: 0000000000000000 R09: ffffffff815c7cf0
R10: ffff88001e292904 R11: ffff88001e292154 R12: ffff88001e292000
R13: 0000000000000000 R14: ffff88001d7e3b80 R15: ffff88001e7be000
FS:  00007f7591154790(0000) GS:ffff880002ca2000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000110 CR3: 000000001d240000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process udevd (pid: 2141, threadinfo ffff88001d21e000, task ffff88001e6a16e0)
Stack:
 ffff88001cccde40 ffff88001e292000 ffff88001cccde00 ffffffff815fe0e8
<0> ffff88001d21fdf8 ffffffff8122badc ffff88001cccde40 ffff88001e292000
<0> ffff88001fc49f60 ffff88001cccde50 ffff88001d21fe28 ffffffff81266772
Call Trace:
 [<ffffffff8122badc>] xenbus_uevent_backend+0x90/0xab
 [<ffffffff81266772>] dev_uevent+0x102/0x146
 [<ffffffff81267459>] show_uevent+0x81/0xd8
 [<ffffffff81266434>] dev_attr_show+0x22/0x49
 [<ffffffff810a0e41>] ? __get_free_pages+0x9/0x46
 [<ffffffff8112561c>] sysfs_read_file+0xac/0x12e
 [<ffffffff810d459f>] vfs_read+0xa6/0x103
 [<ffffffff810d46b2>] sys_read+0x45/0x69
 [<ffffffff81012a82>] system_call_fastpath+0x16/0x1b
Code: c6 79 48 53 81 31 c0 4c 89 e7 e8 ea 03 f8 ff 85 c0 74 10 4c 89 f7 41 bc f4 ff ff ff e8 99 3e e9 ff eb 2d 4c 89 f7 e8 8f 3e e9 ff <49> 8b 95 10 01 00 00 4c 89 e7 31 c0 48 c7 c6 83 48 53 81 41 bc
RIP  [<ffffffff8123610a>] netback_uevent+0x8e/0xbf
 RSP <ffff88001d21fda8>
CR2: 0000000000000110
---[ end trace 4f88c9bf70342ee1 ]---

I don't get it, because the patch is supposed to prevent null pointers. Either xdev itself is corrupt, or returning corrupt data.
I'm stumped.


On 01.06.2010 09:03, Helmut Wieser wrote:
No joy. I couldn't find out what CONFIG_XEN_SYSFS does, but it doesn't seem to be part of 2.6.31.13. I set all the other options apart from wireless that you suggested.

I'll try to use Zhang Enming's kernel config next.

Oh, and of course I get the infamous oops from bug 1612, I just never noticed it because my console doesn't work with gfx passthru.
Here's the output:

[  167.571125]   alloc irq_desc for 826 on node 0
[  167.571131]   alloc kstat_irqs on node 0
[  167.724755] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
[  167.724943] IP: [<ffffffff8124ad7c>] netback_uevent+0x90/0xd5
[  167.725066] PGD 1d5e6067 PUD 1ddd2067 PMD 0
[  167.725296] Oops: 0000 [#1] SMP
[  167.725472] last sysfs file: /sys/devices/vif-1-0/uevent
[  167.725544] CPU 2
[  167.725653] Modules linked in: bridge stp xenfs blktap pci_hotplug xen_blkfront xen_netfront xen_evtchn loop firewire_sbp2 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep usbhid snd_pcm snd_timer snd hid wmi processor soundcore acpi_processor pcspkr psmouse snd_page_alloc serio_raw button evdev ext3 jbd mbcache usb_storage sr_mod sd_mod crc_t10dif cdrom tg3 firewire_ohci floppy thermal ahci firewire_core libphy libata thermal_sys crc_itu_t scsi_mod uhci_hcd ehci_hcd usbcore nls_base [last unloaded: scsi_wait_scan]
[  167.728264] Pid: 1955, comm: udevd Not tainted 2.6.31.13 #3 HP Z600 Workstation
[  167.728356] RIP: e030:[<ffffffff8124ad7c>]  [<ffffffff8124ad7c>] netback_uevent+0x90/0xd5
[  167.728497] RSP: e02b:ffff880002095d98  EFLAGS: 00010246
[  167.728569] RAX: 0000000000000000 RBX: ffff88000231ea00 RCX: 000000000080007c
[  167.728644] RDX: ffff88001d3ab440 RSI: 00000000a3c9a148 RDI: 01000000000002c0
[  167.728722] RBP: ffff88000244e000 R08: 0000000000000000 R09: 0000000000000000
[  167.728797] R10: ffffffff8100eddf R11: 00000000a3c9a148 R12: 0000000000000000
[  167.728873] R13: ffff88001d3ab680 R14: ffff88001d5b9000 R15: ffffffff81502b30
[  167.728955] FS:  00007fdb8f1c5790(0000) GS:ffffc9000002e000(0000) knlGS:0000000000000000
[  167.729051] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  167.729128] CR2: 0000000000000110 CR3: 000000001dcfc000 CR4: 0000000000002660
[  167.729212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  167.729303] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  167.729405] Process udevd (pid: 1955, threadinfo ffff880002094000, task ffff88001d5206c0)
[  167.729525] Stack:
[  167.729609]  ffffffff814608a6 00000000a3c9a148 0000000000000002 ffff88000231ea40
[  167.729835] <0> ffff88000244e000 ffff88000231ea50 ffff88000231ea50 ffffffff81283264
[  167.730179] <0> 00007fdb00000010 00000000a3c9a148 ffff880002095f50 0000000000000000
[  167.730566] Call Trace:
[  167.730639]  [<ffffffff81283264>] ? dev_uevent+0x1a2/0x207
[  167.730714]  [<ffffffff81284725>] ? show_uevent+0x92/0xfd
[  167.730790]  [<ffffffff81282e8b>] ? dev_attr_show+0x2e/0x6b
[  167.730873]  [<ffffffff810ce1b0>] ? get_zeroed_page+0x21/0x76
[  167.730957]  [<ffffffff811647fb>] ? sysfs_read_file+0xbb/0x156
[  167.731051]  [<ffffffff8100e301>] ? xen_force_evtchn_callback+0x1d/0x37
[  167.731133]  [<ffffffff8100edf2>] ? check_events+0x12/0x20
[  167.731209]  [<ffffffff81105f91>] ? vfs_read+0xb1/0x123
[  167.731285]  [<ffffffff8100eddf>] ? xen_restore_fl_direct_end+0x0/0x1
[  167.731366]  [<ffffffff81384013>] ? _spin_unlock_irqrestore+0x24/0x3e
[  167.731445]  [<ffffffff811060eb>] ? sys_read+0x55/0x90
[  167.731527]  [<ffffffff81013e42>] ? system_call_fastpath+0x16/0x1b
[  167.731619] Code: c7 c6 33 1f 45 81 31 c0 48 89 ef e8 17 f0 f7 ff 85 c0 74 0f 4c 89 ef bd f4 ff ff ff e8 66 3e eb ff eb 2b 4c 89 ef e8 5c 3e eb ff <49> 8b 94 24 10 01 00 00 48 89 ef 31 c0 48 c7 c6 3d 1f 45 81 e8
[  167.734689] RIP  [<ffffffff8124ad7c>] netback_uevent+0x90/0xd5
[  167.734818]  RSP <ffff880002095d98>
[  167.734890] CR2: 0000000000000110
[  167.734966] ---[ end trace d844f79248755c84 ]---
[  167.927689] device vif1.0 entered promiscuous mode
[  167.938021] eth0: port 2(vif1.0) entering forwarding state
[  168.100416] ip_tables: (C) 2000-2006 Netfilter Core Team
[  168.316187] nf_conntrack version 0.5.0 (4044 buckets, 16176 max)
[  168.316778] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[  168.316873] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[  168.316966] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[  168.427110] physdev match: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING chains for non-bridged traffic is not supported anymore.
[  168.620471] tun: Universal TUN/TAP device driver, 1.6
[  168.620550] tun: (C) 1999-2004 Max Krasnyansky <maxk@xxxxxxxxxxxx>
[  168.668587] device tap1.0 entered promiscuous mode
[  168.668694] eth0: port 3(tap1.0) entering forwarding state
[  168.688096]   alloc irq_desc for 825 on node 0
[  168.688170]   alloc kstat_irqs on node 0

But the machine comes up for the first time and everything seems to be working fine.

I use udevd 151.


On 31.05.2010 16:27, Niels Dettenbach wrote:
Am Montag 31 Mai 2010, 16:13:14 schrieb Helmut Wieser:
  
No, this doesn't help.
I'm currently trying to ditch the debian kernel and compiling one of 
jeremy's kernels with a config close to the one from debian.
    
...you may try this:

 1.) make shure udev is <=151 (i use 141 currently)


 2.) set in your xen kernel (if not): 

CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_WIRELESS_EXT_SYSFS=y    *
CONFIG_GPIO_SYSFS=y  *
CONFIG_VIDEO_PVRUSB2_SYSFS=y
CONFIG_RTC_INTF_SYSFS=y
CONFIG_XEN_SYSFS=y
CONFIG_SYSFS=y

(* only if applies to your hardware)

(not shure if it's optimal but seems to work for me with 3.4x and 4.x)

=> reboot


 3.) make a
	mount -t sysfs sys /sys

=> if you still have any sysfs mounted you might try to unmount it before this 
step

I have a line
sys                     /sys            sysfs           auto 0 0

in my fstab which seems to help...

May be this is widely waste but it seems to help me - so pls don't hit me... 
;)

Another thing is that you might have fractions of your (to new) udev config 
from before downgrading.

I'm working with gentoo which compiles things as i want so i'm not fully in 
the view what your distributor and package management might does well and what 
not with your (udev) configs...


may be this helps,


Niels.

- 

  
_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.