[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] tg3 network stall in xen-3.4.x but not in xen-3.3.x
Hi Ian, On Mon, Jul 6, 2009 at 5:36 AM, Ian Pratt<Ian.Pratt@xxxxxxxxxxxxx> wrote: >> >> Power management is another difference between 3.3 and 3.4. You can >> disable >> >> 3.4 power management by adding Xen boot parameters: cpuidle=0 >> cpufreq=none >> >> > I will disable and run the test tomorrow to see whether network stall >> > issue is there or not. >> >> Using cpuidle=0 cpufreq=none seems to solve the network stall problem. > > Hmm, that's rather disturbing. Its presumably the cpuidle parameter which is > having the effect. Quite why deeper sleep states can result in one particular > device interrupt getting stuck (as opposed to all of them) is a mystery. It > might be interesting to see the boot messages, and also to find out which of > the C states is causing the problem (presumably C2 or C3). If I do not add cpuidle and cpufreq in xen boot para. I got the below: # xenpm get-cpuidle-states Max C-state: C1 cpu id : 0 total C-states : 2 idle time(ms) : 131588676 C0 : transition [00000000000019346170] residency [00000000000003897999 ms] C1 : transition [00000000000019346170] residency [00000000000131507268 ms] cpu id : 1 total C-states : 2 idle time(ms) : 131696919 C0 : transition [00000000000012247741] residency [00000000000003766854 ms] C1 : transition [00000000000012247741] residency [00000000000131638414 ms] cpu id : 2 total C-states : 2 idle time(ms) : 131540647 C0 : transition [00000000000013405442] residency [00000000000003922680 ms] C1 : transition [00000000000013405442] residency [00000000000131482588 ms] cpu id : 3 total C-states : 2 idle time(ms) : 131527968 C0 : transition [00000000000031194790] residency [00000000000004030618 ms] C1 : transition [00000000000031194790] residency [00000000000131374650 ms] Sorry, I am unable to give you more details as currently all are booted with cpuidle and cpufreq in xen boot para. I will try to migrate one of the server VMs to another then use that to test without cpuidle and cpufreq in xen boot para. then will report back my findings. In fact now all are with: kernel /xen.gz dom0_mem=256M loglvl=all guest_loglvl=all cpuidle=0 cpufreq=none If you have any suggestion to add in xen boot para. or any other, feel free to let me know ;) > In your tests, rather than rebooting the machine you may possibly be able to > recover the machine by unloading and reloading the NIC module. (you may need > to remove it from the bridge and ifconfig it down first). Yes, shutdown all xendomains, shutdown network-bridge and xend then restart them without the need to restart network can bring back the network most of the time but it is disturbing as all VMs will need to shutdown clearly to prevent ext3 file system dirty. I noticed for other servers that without the cpuidle=0 cpufreq=none in xen-3.4.x, xenpm get-cpuidle-states showing: # xenpm get-cpuidle-states Max C-state: C7 Is this due to the processor type since they are not dual core and/or quad core or multi-processors and whether is it a VT-d enabled system type? # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 3 cpu MHz : 3000.112 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc pni cid bogomips : 6004.86 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 3 cpu MHz : 3000.112 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc pni cid bogomips : 6004.86 The above server is not DELL but is a Tyan server: # lspci -vvv|grep -i ethernet 01:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 11) Subsystem: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 11) Subsystem: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express Doing test on this server is ok with no network stall however this server will crash within a month time and when I plug in monitor/keyboard can't see any output nor cltr+alt+delete got any response. The only thing I can do is to reboot the server then this cycle will repeat... sudden crash within a month and sometimes can happen 2 or more times within a month. So this server is running a backup domU and a mirror domU which are not so critical. Due to sudden crash issue on this type of server(s) (I got two such server having same issue), thus can't really run this in real production :( Thanks. Kindest regards, Giam Teck Choon _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |