[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Dom0 crashed when rebooting whilst DomU are running



On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote:

> On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote:
>> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
>> 
>>> Could you not top post please, it makes it rather hard to follow the
>>> flow of the conversation.
>>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
>>>> As stated, you can alias shutdown to do exactly what you need, it can
>>>> be as simple as a series of hard-coded operations to a complex custom
>>>> shell script that parses your domains and closes each with feedback.
>>> 
>>> Xen ships the "xendomains" initscript which can halt guest on shutdown
>>> as well as automatically start specific guests on boot. It can also be
>>> configured to suspend/resume them or (I think) migrate them away.
>>> 
>>> For diagnosing the crash itself more details will be required than were
>>> provided in the original post. Please see
>>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance.
>>> At a minimum we would need a capture (serial console or photo) of the
>>> crash backtrace.
>>> 
>>> Ian.
>>> 
>>> 
>>  I found out that it hangs during re-boot of dom0 when having more
>> Network interfaces involved, like:
>>      vif = [ 'mac=06:46:AB:CC:11:01, ip=<myIPadress>', '', '',
>> 'mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge', '',
>> 'mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge' ]
> 
> 6 interfaces total, 3 of which have a random mac on each reboot and all
> get put on the default bridge?

No, not really. The bridge is different for each interface. We have VLAN20, 
VLAN40, etc as bridges.
These one will be created as well at the beginning when the system starts up 
(create_bridges.sh):
/usr/sbin/brctl addbr VLAN11
/usr/sbin/brctl addbr VLAN12
/usr/sbin/brctl addbr VLAN20
/usr/sbin/brctl addbr VLAN30
/usr/sbin/brctl addbr VLAN40

/sbin/ifconfig VLAN11 down -arp up
/sbin/ifconfig VLAN12 down -arp up
/sbin/ifconfig VLAN20 down -arp up
/sbin/ifconfig VLAN30 down -arp up
/sbin/ifconfig VLAN40 down -arp up
> 
> Is your default script vif-bridge or something else? Have you modified
> any of these scripts?

No I didn't modify anything. Still the original script.
> 
>>  in case you use just one or having the basic line in place, it is
>> working:  
>>      vif = [ '' ]
>> 
>>  The system stops after initiating the reboot at the following line in the 
>> console: System will restart...........
> 
> So this is a hang, not a crash as suggested originally?

Yes, you are right. It is just a hang.
> 
> If it is a hang then you might have some luck using hte magic sysrq keys
> to print lists of blocked tasks. I'm not sure in Squeeze but you might
> need to enable this as described in Documentation/sysrq.txt in the Linux
> source.
> 
> Blocked tasks are listed with SysRQ-'w'. If you have serial console then
> 't' will list all task, but that list can be quite long so it is useless
> without a serial console.

List is empty. SysRQ -w and SysRQ-t shows nothing at all. There is nothing 
running anymore.
It shows periodically:  INFO: task xenwatch:12 blocked for more than 120 seconds
Seems that the xenwatch is blocking the reboot here, is that assumption 
correct? But strange enough that I can't
see any process anymore with the SysRQ -t or SysRQ -w

>>  In the Logfile of /var/log/message you can find this as the last line: 
>>        Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system 
>> reboot
>>      Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port 1(vif2.3) 
>> entering forwarding state
>>      Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port 1(vif2.5) 
>> entering forwarding state
>>      Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port 1(vif2.3) 
>> entering disabled state
>>      Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port 1(vif2.3) 
>> entering disabled state
>>      Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port 1(vif2.5) 
>> entering disabled state
>>      Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port 1(vif2.5) 
>> entering disabled state
>>      Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped.
>>      Sep  8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd" 
>> swVersion="4.6.4" x-pid="890" x-info="http://www.rsyslog.com";] exiting on 
>> signal 15.
>> 
>> In the /var/log/daemong.log you can find this message:
>>         Sep  8 15:44:37 rootsrv01 acpid: exiting
>>         Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15, 
>> un-registering and exiting
> 
> All the above (both message and daemon.log) look like normal parts of
> shutting down to me.
> 
>>         Sep  8 15:44:37 rootsrv01 udevd-work[2276]: 
>> '/etc/xen/scripts/vif-setup offline type_if=vif' unexpected exit with status 
>> 0x000f
> 
> This might be worth following up on.

When putting a "sleep 5" in stop section of the /etc/init.d/xendomains:
case "$1" in
    start)
        start
        rc_status
        if test -f $LOCKFILE; then rc_status -v; fi
        ;;

    stop)
        stop
        rc_status -v
        sleep 5
        ;;

then the system shuts down as expected and is rebooting properly.
In the daemon.log file I couldn't find the error: Sep  8 15:44:37 rootsrv01 
udevd-work[2276]: '/etc/xen/scripts/vif-setup offline type_if=vif' unexpected 
exit with status 0x000f
anymore. It seems that it disappeared after putting a delay inside. Could it be 
a race condition here during shutdown, with the udev-daemon??

> 
> I would do this by adding near the top of vif-setup and/or vif-bridge
> (or whichever script you use):
>       exec 1>>/var/log/vif-setup.log
>       exec 2>&1
> 
> I would then also annotate all through vif-bridge in the offline path
> with echo statements showing how far it got and what command was to be
> run next.
> 
> Ian.
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxx
> http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.