[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] outgoing domU network dies after 135-194 minutes
For the list/archives, think I found the issue but can't understand why it suddenly became an issue. I have 5 network ports on the host. When I start up I get 4 properly configured xenbrX but the 5th xenbr4 is missing the peth0 and gets the real MAC assigned. xenbr4 8000.009027bea5ee no eth4 vif1.4 xenbr4 is only used by the firewall and since it worked I didn't change anything. When this problem started I did look for different things and fixed up minor issues like this one by one. after doing echo 'options netloop nloopbacks=16' >/etc/modprobe.d/netloop.local to get enough loopback to properly configure peth4 it stopped dying. I removed it, rebooted and it died again so now I have it in. What I wonder most is why it suddenly decided to start breaking the system "out of the blue" and why the second server (which took over firewall duty when this one died) never got the same issue. /ps On Thu, 2008-05-22 at 21:29 -0400, Peter Sjoberg wrote: > Done some more testing and found that > * it seems like it always dies after 2h15m-2h45min > * when it dies it dies for all domU and on all ports at the same time > (or at least within 1 minute) > * external traffic to & from dom0 works fine all the time > * traffic peth0->vif1.0->domU eth0 works fine (tcpdump in domU shows > packages) > * traffic domU ->eth0 ->vif1.0 dies directly, eth0 TX counter doesn't > change and tcpdump on vif1.0 shows outgoing no traffic (only incoming) > * I restarted one domU after an hour but it died at the same time as the > others so it seems tied to uptime of dom0 > > * besides that firewall rules doesn't change after a while I have > everything open > * all domU are paravirtualized > > I'm at loss as to where to look. I have started to move over some things > to a second system but it can't handle a full failover (not enough disk > and no backup tape) so I need to figure out what's going on here. > > what is the common sw that can effect all 4 domU on all 5 network ports > (vif[1-4].*) but not dom0 > > /ps > > On Wed, 2008-05-21 at 07:56 -0400, Peter Sjoberg wrote: > > I have a OpenSuse 10.3 with xen 3.1.0 running and it's been running fine > > for a few months. > > This past weekend it suddenly started to act up and after some > > troubleshooting I can now say that it seems like the guests(domU) loose > > the outgoing network pipe, from the console I can see that the TX > > counter is stuck at the same value but it's no errors. It behaves as if > > whatever I try to connect to isn't there. > > I can reboot the guest but the problem stays, TX stays at 0 while RX > > counts up. > > Rebooting the host(dom0) solves the problem for a few hours (seems to be > > 2-6h). > > > > I tried to look for what the problem can be but don't know where to > > look. The closest I got was when I narrowed it down to that it doesn't > > send any network traffic out from any domU and once it happens the domU > > mac is no where to be found outside the domU (checked brctl showmacs & > > on the switch) > > What bothers me most is that it worked fine up until Sunday. I was even > > out of town for a few days before so I didn't change anything. > > Also, why does it work for a while after reboot? > > > > My setup is not that strange. I have one domu as firewall and another in > > two DMZs so I have my own network-bridge script that calls the stock > > opensuse script > > > > for i in $(seq 0 4); do > > $dir/network-bridge "$@" vifnum=$i netdev=eth$i bridge=xenbr$i > > /usr/sbin/ethtool -K eth$i tx off > > done > > > > and this gives > > # brctl show > > bridge name bridge id STP enabled interfaces > > xenbr0 8000.fefffffff000 no vif0.0 > > peth0 > > vif2.0 > > vif4.0 > > xenbr1 8000.fefffffff001 no vif0.1 > > peth1 > > vif2.1 > > vif3.0 > > xenbr2 8000.fefffffff002 no vif0.2 > > peth2 > > vif1.0 > > vif2.2 > > xenbr3 8000.fefffffff003 no vif0.3 > > peth3 > > vif2.3 > > xenbr4 8000.00508bcfd44d no eth4 > > vif2.4 > > The kernel and xen running is stock opensuse > > > > # xm info > > host : enterprise > > release : 2.6.22.17-0.1-xen > > version : #1 SMP 2008/02/10 20:01:04 UTC > > machine : x86_64 > > nr_cpus : 2 > > nr_nodes : 1 > > sockets_per_node : 1 > > cores_per_socket : 2 > > threads_per_core : 1 > > cpu_mhz : 2611 > > hw_caps : > > 178bfbff:ebd3fbff:00000000:00000010:00002001:00000000:0000001f > > total_memory : 4031 > > free_memory : 0 > > max_free_memory : 1106 > > max_para_memory : 1102 > > max_hvm_memory : 1091 > > xen_major : 3 > > xen_minor : 1 > > xen_extra : .0_15042-51.3 > > xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 > > hvm-3.0-x86_32p hvm-3.0-x86_64 > > xen_scheduler : credit > > xen_pagesize : 4096 > > platform_params : virt_start=0xffff800000000000 > > xen_changeset : 15042 > > cc_compiler : gcc version 4.2.1 (SUSE Linux) > > cc_compile_by : abuild > > cc_compile_domain : suse.de > > cc_compile_date : Thu Dec 20 19:57:34 UTC 2007 > > xend_config_format : 4 > > > > > > So, where should I look for problems? > > > > /ps > > > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-users > > > _______________________________________________ > Xen-users mailing list > Xen-users@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |