[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] (repeatable) cross-domain networking failure




Summary:

I'm running into a situation where, after sending some UDP traffic between two xen domains (Domain 0 and Domain 1) the networking between the domains fails. This failure is 100% repeatable.

In more detail:

I have two xen domains. They run the kernels from the 2.0.3 release. (I've run into the same problem with 2.0.1 as well.) Domain 0 has 5 physical ethernet interfaces, and a virtual interface to Domain 1. Domain 1 has just the virtual interface to Domain 0.

D0 is configured with IP address 192.168.0.1, and D1 with 192.168.1.1. The netmask is set to 255.255.0.0.

When I bring up D1, I can ping D1 from D0, ssh into D1, etc.

I then start a UDP server in D0, and a traffic generator in D1. After the traffic generator sends its 128-th packet, networking between the domains fails. The 128th packet is received successfully by the UDP server, but no later traffic arrives in D0. This includes UDP, TCP, ICMP, and ARP.

Looking at the interrupt counts in /proc/interrupts, I see that D0 no longer receives packets sent by D1. D1, however, does receive packets sent by D0. (To be clear, D0->D1 traffic is ICMP ping requests, unrelated to the UDP traffic. There is not UDP traffic sent from D0 to D1.)

(I suspect the stuff in this paragraph doesn't matter, but include it for completeness.) Eventually, D0's ARP cache entry for D1 expires. D0 ARPs for D1, and D1 replies. But D0 never receives these replies. And eventually, D1 stops replying to the ARPs entirely. (D1's sending behavior is observed via tcpdump running in the console connection to D1.)

Note that the networking failure only occurs if the UDP packets are delivered to a user-level process in D0. In particular, UDP traffic to D0's kernel NFS server does not induce the failure. Nor does traffic sent to D0 for which there is no user process to accept the packets. And neither does traffic which is forwarded on to other hosts via NAT. (I haven't tested the regular forwarding case.)

Also, for what it's worth, Domain 0's network connectivity on its other interfaces (which are connected to the world at large) are unaffected.

Looking through the mailing list archive, I saw a prior bug that seemed similar, but involved IP fragmentation. That is not the case here, as the UDP packets sent by D1 are small (<100 bytes).

Any suggestions for debugging this?

Thanks,
mukesh


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.