[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] domU network has sleeping sickness
I've seen the same problem with my xen 3.1.0 setup. What the Xen gurus are telling us is that this is a symptom of Xen dom0being busy and not servicing the network interrupts of the domu's promptly. Their advice to us was to shift an application that had been running on dom0 to another Xen instance to see if that would help. We are in the process of implementing that solution now. By the way my system (Dell poweredge2950) has got broadcomm inbuilt network cards, not Intel E1000 so it is unlikely that it is a network driver specific issue. During these episodes of non-network connectivity, by the way, it was not unusual to see the following kernel dump in dom0 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: Call Trace:2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <IRQ> [<ffffffff8025 8269>] softlockup_tick+0xcc/0xde2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8020e84d>] timer_interrupt+0x3a3/0x4012008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: [<ffffffff80258898>] handle_IRQ_event+0x4b/0x932008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8025897e>] __do_IRQ+0x9e/0x1002008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x712008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x1652008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c 2008-02-05T18:35:16-06:00 s_sys@xxxxxxxxxxxxxxxxxxx kernel: <EOI> or Feb 25 10:32:39 fermigrid6 kernel: BUG: soft lockup detected on CPU#0! Feb 25 10:32:39 fermigrid6 kernel: Feb 25 10:32:39 fermigrid6 kernel: Call Trace:Feb 25 10:32:39 fermigrid6 kernel: <IRQ> [<ffffffff80258269>] softlockup_tick+0xcc/0xde Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020e84d>] timer_interrupt+0x3a3/0x401 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80258898>] handle_IRQ_event+0x4b/0x93 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8025897e>] __do_IRQ+0x9e/0x100 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020cc97>] do_IRQ+0x63/0x71Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c Feb 25 10:32:39 fermigrid6 kernel: <EOI> [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8034b258>] force_evtchn_callback+0xa/0xb Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f2272>] thread_return+0xdf/0x119 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020622a>] hypercall_page+0x22a/0x1000 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80228a25>] __cond_resched+0x1c/0x44 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff803f25df>] cond_resched+0x37/0x42 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802343c4>] ksoftirqd+0x0/0xbf Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff80234432>] ksoftirqd+0x6e/0xbf Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff802422d7>] kthread+0xc8/0xf1Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae1c>] child_rip+0xa/0x12 Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8024220f>] kthread+0x0/0xf1Feb 25 10:32:39 fermigrid6 kernel: [<ffffffff8020ae12>] child_rip+0x0/0x12 ----------------One of our dom0's was running an LVS server, the other one on identical hardware was not. We moved the LVS server from one to the other and the network problems and kernel panics followed it. Steve Timm On Mon, 3 Mar 2008, Marc Teichgraeber wrote: Hi all, I have a strange network problem with some domU's on three XEN-Hosts. They are loosing their network connectivity. I do bridged networking. * It happens randomly and could happen right after bootup of the domU or anytime later. * The domU is not reachable from another host on the LAN. * The domU is always reachable from the dom0 (ssh, ping). * I can 'repair' the connection when attaching to the console and ping out from the domU. First nothings happens, then the machine gets back their network. (And thats also my momentary workaround, pinging all the time from the console) * Pinging from another host at the same time helps too. * It could be that I can ping continously from one host and another hosts gets only every 10th packet or so back. * The interfaces could come back from their sleep by itself. * When the networks has fallen asleep, ssh on the domU from another host hangs, it does not come back with "no route to host" or something. I'm suspicious about the network controllers, they are the same on all hosts: "Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper)"(lspci) some kind of "Intel® PRO/1000 EB Network Connection with I/O Acceleration"(Intel website). I've tried the latest e1000 driver from Intel but it does'nt helped. I've checked all MAC Adresses, they are unique, also the IP Adresses. Any ideas are welcome :) ------------------------------------------------------------------------- "xm info" from host1, openSUSE 10.2 (X86-64): release : 2.6.18.8-0.9-xen version : #1 SMP Sun Feb 10 22:48:05 UTC 2008 machine : x86_64 nr_cpus : 4 nr_nodes : 1 sockets_per_node : 2 cores_per_socket : 2 threads_per_core : 1 cpu_mhz : 2327 hw_caps : bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 total_memory : 32766 free_memory : 21607 max_free_memory : 21607 max_para_memory : 21603 max_hvm_memory : 21544 xen_major : 3 xen_minor : 0 xen_extra : .3_11774-23 xen_caps : xen-3.0-x86_64 xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : 11774 cc_compiler : gcc version 4.1.2 20061115 (prerelease) (SUSE Linux) cc_compile_by : abuild cc_compile_domain : suse.de cc_compile_date : Thu Jan 10 21:22:54 UTC 2008 xend_config_format : 2 ------------------------------------------------------------------------- "xm info" output on host2, openSUSE 10.3 (X86-64) release : 2.6.22.13-0.3-xen version : #1 SMP 2007/11/19 15:02:58 UTC machine : x86_64 nr_cpus : 8 nr_nodes : 1 sockets_per_node : 2 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 3000 hw_caps : bfebfbff:20100800:00000000:00000140:0004e3bd:00000000:00000001 total_memory : 16382 free_memory : 591 max_free_memory : 591 max_para_memory : 587 max_hvm_memory : 577 xen_major : 3 xen_minor : 1 xen_extra : .0_15042-51 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : 15042 cc_compiler : gcc version 4.2.1 (SUSE Linux) cc_compile_by : abuild cc_compile_domain : suse.de cc_compile_date : Tue Sep 25 21:16:06 UTC 2007 xend_config_format : 4 -- ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 timm@xxxxxxxx http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |