[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Massive numbers of dropped packets / ssh problems



Tom,

ethtool -S gives me this, so now I'm thinking the high number of dropped packets is simply a quirk with the network card or driver, so I'm going to let it slide for the moment.

# ethtool -S peth1
NIC statistics:
     tx_packets: 174781
     rx_packets: 296975
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 11974
     tx_multi_collisions: 10024
     unicast: 181041
     broadcast: 30937
     multicast: 115934
     tx_aborted: 0
     tx_underrun: 0

The larger concern is the fact that when I SSH into a DomU or go to a website provided by that DomU I will often time out. But a subsequent attempt will work. I have disabled DNS lookups for sshd ("UseDNS no" in /etc/ssh/sshd_config) and Apache (of course) is not configured to do that either.

It also seems that the more people who are using that server during the day, the more likely it is that we will NOT get this happen. The more idle the computer has been (such as first thing in the morning) the more likely it is that we WILL get this to happen.

Right now, I'm leaning towards an upstream router, which I know does ARP caching, but I can't for the life of me figure out how that would be problem. It's not like the MAC addresses are changing during the day.

Seems to me that ARP might be the trouble, but I haven't yet worked out the best way to test this reliably.

Would it be better to switch from Bridged networking to Routed networking? Sounds to me that it may make sense to at least try it.

Thanks,
--Joel



On Mar 8, 2009, at 2:27 PM, Tom Brown wrote:

On Sun, 8 Mar 2009, Joel Richard wrote:

That was my gut instinct, as well, Tom, but if I let the connection sit, it will eventually time out after a few minutes.

So what? That's supposed to be a diagnosis that rules out DNS issues?

ssh closes connections that don't authenticate within a reasonable time period. It's actually a significant problem when logging from a slow box like my cell phone which takes forever to exchange encryption keys, (and then a while to type in a password with some non-standard characters).

And it still doesn't explain the barrage of dropped packets on peth1.

No one suggested it did. But that's a more complex problem to diagnose. it could be the PROMISC flag on eth1, it could be that dom0 doesn't have enough RAM to buffer the network traffic, it could be that you're overloaded the CPU and that's causing network buffers to overflow, it could be some strange hardware interrupt taking too long, it could just about anything.

It could be perfectly normal, a result of a network card being in promiscuous mode to handle bridging traffic, and getting lots of packets that "aren't for it" because you're NOT on a switched network. You're not complaining about slow network traffic, and stalled connections, which is what one would expect if you were losing more than a few percent of legit traffic.

Try running ethtool -S and seeing if you can get any more detail on why the packets are dropped.

-Tom


I have to admit that this situation is beyond my understanding and skills which is why I came to the lit for help.

Could it simply be that whatever it is that is dealing with the now 89 billion dropped packets is simply overloaded and can't provide a timely response to a normal SSH request? I mean 89 billion in a week is 140,000 in a second. What the heck could be going on to provide that many dropped packets? I just don't get it.

I'm now leaning towards some sort of driver problem with the NIC card since, upon reboot it shows 2 billion packets dropped. How can this be? It must be a problem with the network card.

As for SSH problems could it be an ARP problem? I know that my upstream router does some sort of ARP caching. I have not yet established any patterns here, but there are a few tests I want to run.

Thanks,
--Joel

On Mar 7, 2009, at 9:39 AM, tom.ashley@xxxxxxxxx wrote:

That actaully sounds like a reverse dns lookup issue from your server.
Make sure the ssh server can resolve the ip you are coming from.
Tom
Sent using BlackBerry® from Orange
-----Original Message-----
From: Joel Richard <xen@xxxxxxxxxxxxxxx>
Date: Sat, 7 Mar 2009 08:08:49
To: <xen-users@xxxxxxxxxxxxxxxxxxx>
Subject: [Xen-users] Massive numbers of dropped packets / ssh problems
Good morning,
I'm hoping that someone can help me out with this. I recently moved a
running server from an internal network to a public-facing network
(moved from our office to a data center)
Prior to this, we had no trouble with the server, but since we've put
it on the net and gave it some public IPs (versus the 192.168.x.x
addresses it had before) we've had trouble with SSH and Web
connections onto one of the DomU servers (possibly all of them). That
is, sometimes when we connect from SSH or HTTP, it will sometimes
"time out" somehow (not your usual connection timed out) and we need
to try again a couple of times to get the server to respond. It seems to happen most when the server has been idle for some time, like first
thing in the morning.
On the Dom0 if I do an ifconfig, I get the info that's at the bottom
of this message (IPs have been changed to protect the innocent) The
server has 4 DomU's running on it.
Note that the peth1 interface has 79 billion dropped packets. The
server has been running for a week. My first thought is "what the
hell?" :) I can't find those dropped packets with tcpdump and yet I am
convinced that this is a problem.
When we SSH into the server (the DomU), sometimes it will hang with
this information:
(my laptop) $ ssh -v XXX.XXX.XXX.XXX
OpenSSH_5.0p1, OpenSSL 0.9.7l 28 Sep 2006
debug1:  Reading configuration data /Users/joel/.ssh/config
debug1:  Applying options for dev
debug1:  Reading configuration data /etc/ssh_config
debug1: Connecting to dev.richard-group.com [XXX.XXX.XXX.XXX] port 22.
debug1:  Connection established.
debug1:  identity file /Users/joel/.ssh/identity type -1
debug1:  identity file /Users/joel/.ssh/id_rsa type 1
debug1:  identity file /Users/joel/.ssh/id_dsa type 2
Yes, that's where it stops. It's already established the connection,
but it doesn't continue. It initially sounds like an SSH problem, but it seems more to me something possibly between the Dom0 and the DomU.
I do not have this problem with ssh on a real server on the same
network that is not using Xen.
To clarify, I am using Debian Etch, Xen 3.0.3, AMD Phenom CPUs,
2.6.18-6-xen-amd64 #1 SMP kernel. We're using the onboard NIC.
Can anyone help?
Thanks a lot.
--Joel
(dom0) # ifconfig
eth1      Link encap:Ethernet  HWaddr 00:1F:D0:99:C8:1D
        inet addr:XXX.XXX.XXX.XXX  Bcast:XXX.XXX.XXX.XXX  Mask:
255.255.255.192
        inet6 addr: XXX.XXX.XXX.XXX/64 Scope:Link
        UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
        RX packets:790038 errors:0 dropped:0 overruns:0 frame:0
        TX packets:30111 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:259565273 (247.5 MiB)  TX bytes:8294720 (7.9 MiB)
lo        Link encap:Local Loopback
        inet addr:127.0.0.1  Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING  MTU:16436  Metric:1
        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
peth1     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:3231189 errors:0 dropped:79653025568 overruns:0
frame:0
        TX packets:1839869 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
RX bytes:1834207650 (1.7 GiB) TX bytes:617376732 (588.7 MiB)
        Interrupt:16 Base address:0xa000
vif0.1    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:30111 errors:0 dropped:0 overruns:0 frame:0
        TX packets:790038 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:8294720 (7.9 MiB)  TX bytes:259565273 (247.5 MiB)
vif1.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:1621897 errors:0 dropped:0 overruns:0 frame:0
TX packets:2841674 errors:0 dropped:1371 overruns:0 carrier:0
        collisions:0 txqueuelen:0
RX bytes:437418179 (417.1 MiB) TX bytes:2168128688 (2.0 GiB)
vif2.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:10114591 errors:0 dropped:0 overruns:0 frame:0
        TX packets:11032176 errors:0 dropped:244457 overruns:0
carrier:0
        collisions:0 txqueuelen:0
RX bytes:8708636564 (8.1 GiB) TX bytes:10165626759 (9.4 GiB)
vif4.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:10714696 errors:0 dropped:0 overruns:0 frame:0
        TX packets:10554522 errors:0 dropped:220230 overruns:0
carrier:0
        collisions:0 txqueuelen:0
RX bytes:10211262961 (9.5 GiB) TX bytes:8503978649 (7.9 GiB)
vif8.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:6432 errors:0 dropped:0 overruns:0 frame:0
        TX packets:39471 errors:0 dropped:50 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:1485151 (1.4 MiB)  TX bytes:12072170 (11.5 MiB)
xenbr1    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
        inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
        UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
        RX packets:579865 errors:0 dropped:0 overruns:0 frame:0
        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:113821662 (108.5 MiB)  TX bytes:0 (0.0 b)
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


----------------------------------------------------------------------
tbrown@xxxxxxxxxxxxx   | Courage is doing what you're afraid to do.
http://BareMetal.com/  | There can be no courage unless you're scared.
                      | - Eddie Rickenbacker


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.