[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] domU network has sleeping sickness

To: Marc Teichgraeber <radar@xxxxxxxxxxx>
From: Steven Timm <timm@xxxxxxxx>
Date: Mon, 03 Mar 2008 10:05:26 -0600 (CST)
Cc: xen-users@xxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 03 Mar 2008 08:16:02 -0800
List-id: Xen user discussion <xen-users.lists.xensource.com>

I've seen the same problem with my xen 3.1.0 setup.  What
the Xen gurus are telling us is that this is a symptom of Xen dom0

being busy and not servicing the network interrupts of the domu'spromptly. Their advice to us was to shift an application that

had been running on dom0 to another Xen instance to see if that
would help.  We are in the process of implementing that solution now.

By the way my system (Dell poweredge2950) has got broadcomm
inbuilt network cards, not Intel E1000 so it is unlikely that
it is a network driver specific issue.

During these episodes of non-network connectivity, by the way,
it was not unusual to see the following kernel dump in dom0

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: Call Trace:

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <IRQ>[<ffffffff8025

8269>] softlockup_tick+0xcc/0xde

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:[<ffffffff8020e84d>]

 timer_interrupt+0x3a3/0x401

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:[<ffffffff80258898>]

 handle_IRQ_event+0x4b/0x93

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:[<ffffffff8025897e>]

 __do_IRQ+0x9e/0x100

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:[<ffffffff8020cc97>]

 do_IRQ+0x63/0x71

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:[<ffffffff8034b347>]

 evtchn_do_upcall+0xee/0x165

2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel:[<ffffffff8020abca>]

 do_hypervisor_callback+0x1e/0x2c
2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <EOI>

or

Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0!
Feb 25 10:32:39 fermigrid6 kernel:
Feb 25 10:32:39 fermigrid6 kernel: Call Trace:

Feb 25 10:32:39 fermigrid6 kernel: <IRQ> [<ffffffff80258269>]softlockup_tick+0xcc/0xdeFeb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020e84d>]timer_interrupt+0x3a3/0x401Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80258898>]handle_IRQ_event+0x4b/0x93Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8025897e>]__do_IRQ+0x9e/0x100

Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8020cc97>] do_IRQ+0x63/0x71

Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b347>]evtchn_do_upcall+0xee/0x165Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020abca>]do_hypervisor_callback+0x1e/0x2cFeb 25 10:32:39 fermigrid6 kernel: <EOI> [<ffffffff8020622a>]hypercall_page+0x22a/0x1000Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>]hypercall_page+0x22a/0x1000Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b258>]force_evtchn_callback+0xa/0xbFeb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f2272>]thread_return+0xdf/0x119Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>]hypercall_page+0x22a/0x1000Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80228a25>]__cond_resched+0x1c/0x44Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f25df>]cond_resched+0x37/0x42Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802343c4>]ksoftirqd+0x0/0xbfFeb 25 10:32:39 fermigrid6 kernel: [<ffffffff80234432>]ksoftirqd+0x6e/0xbf

Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff802422d7>] kthread+0xc8/0xf1

Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae1c>]child_rip+0xa/0x12

Feb 25 10:32:39 fermigrid6 kernel:  [<ffffffff8024220f>] kthread+0x0/0xf1

Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae12>]child_rip+0x0/0x12


----------------

One of our dom0's was running an LVS server, the other one on identicalhardware was not. We moved the LVS server from one to the other and

the network problems and kernel panics followed it.

Steve Timm

On Mon, 3 Mar 2008, Marc Teichgraeber wrote:

Hi all,

I have a strange network problem with some domU's on three XEN-Hosts.
They are loosing their network connectivity. I do bridged networking.
  * It happens randomly and could happen right after bootup of the domU
or anytime later.
  * The domU is not reachable from another host on the LAN.
  * The domU is always reachable from the dom0 (ssh, ping).
  * I can 'repair' the connection when attaching to the console and
ping out from the domU. First nothings happens, then the machine gets
back their network. (And thats also my momentary workaround, pinging all
the time from the console)
  * Pinging from another host at the same time helps too.
  * It could be that I can ping continously from one host and another
hosts gets only every 10th packet or so back.
  * The interfaces could come back from their sleep by itself.
  * When the networks has fallen asleep, ssh on the domU from another
host hangs, it does not come back with "no route to host" or something.

I'm suspicious about the network controllers, they are the same on all
hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller
(Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection
with I/O Acceleration"(Intel website). I've tried the latest e1000
driver from Intel but it does'nt helped.
I've checked all MAC Adresses, they are unique, also the IP Adresses.

Any ideas are welcome :)

-------------------------------------------------------------------------
"xm info" from host1,  openSUSE 10.2 (X86-64):

release                : 2.6.18.8-0.9-xen
version                : #1 SMP Sun Feb 10 22:48:05 UTC 2008
machine                : x86_64
nr_cpus                : 4
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 2
threads_per_core       : 1
cpu_mhz                : 2327
hw_caps                :
bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
total_memory           : 32766
free_memory            : 21607
max_free_memory        : 21607
max_para_memory        : 21603
max_hvm_memory         : 21544
xen_major              : 3
xen_minor              : 0
xen_extra              : .3_11774-23
xen_caps               : xen-3.0-x86_64
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 11774
cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (SUSE
Linux)
cc_compile_by          : abuild
cc_compile_domain      : suse.de
cc_compile_date        : Thu Jan 10 21:22:54 UTC 2008
xend_config_format     : 2
-------------------------------------------------------------------------
"xm info" output on host2, openSUSE 10.3 (X86-64)

release                : 2.6.22.13-0.3-xen
version                : #1 SMP 2007/11/19 15:02:58 UTC
machine                : x86_64
nr_cpus                : 8
nr_nodes               : 1
sockets_per_node       : 2
cores_per_socket       : 4
threads_per_core       : 1
cpu_mhz                : 3000
hw_caps                :
bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001
total_memory           : 16382
free_memory            : 591
max_free_memory        : 591
max_para_memory        : 587
max_hvm_memory         : 577
xen_major              : 3
xen_minor              : 1
xen_extra              : .0_15042-51
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : 15042
cc_compiler            : gcc version 4.2.1 (SUSE Linux)
cc_compile_by          : abuild
cc_compile_domain      : suse.de
cc_compile_date        : Tue Sep 25 21:16:06 UTC 2007
xend_config_format     : 4


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

Follow-Ups:
- [Xen-users] Re: domU network has sleeping sickness
  - From: GP lisper
- Re: [Xen-users] domU network has sleeping sickness
  - From: Marc Teichgraeber

References:
- [Xen-users] domU network has sleeping sickness
  - From: Marc Teichgraeber

Prev by Date: RE [Xen-users] domU network has sleeping sickness
Next by Date: Re: [Xen-users] domU network has sleeping sickness
Previous by thread: RE [Xen-users] domU network has sleeping sickness
Next by thread: Re: [Xen-users] domU network has sleeping sickness
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.