Xen project Mailing List

I've read and experimented extensively and being in desperate need of "finishing" this setup and getting it deployed live, would like to see if anyone has any suggestions on the last hangup we seem to have.

Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each. CentOS 5.2 x86_64 and it's xen implementation. The only thing non "stock" CentOS at this point are the Intel IGB drivers. The RHEL/CentOS drivers for Intel IGB appear to have a bug with DHCP over a bridged interface which the latest drivers downloaded straight from Intel cured for us.

Anyway, both are attached to shared FC storage and are doing RHCS with both IP and disk-based quorum. CLVMD with a shared VG for creating LV's in as containers for VMs. That part is all working very good.

Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a virbr0 as a bridged per-DOM0 local network as well.

When any VM boots up it can ping and traceroute on any of it's respective networks perfectly. Inbound/outbound data flow of any kind appears perfect as well. Once a VM is migrated or live-migrated to the other DOM0 though the ability to ping or traceroute ceases. Sessions via ssh or httpd either inbound or outbound continue to work fine though.

When a VM boots I see this in dmesg:
netfront: Initialising virtual ethernet driver.
netfront: device eth0 has flipping receive path.

I read something about a CRC problem and had each of them do "ethtool -K eth{n} tx off" but don't think that was necessary in this instance, I've never seen any error messages about CRC errors. The described problem and solution I followed was not heavily detailed and it was just an attempt to see if that helped with the problem.

The following was added to the end of /etc/sysctl.conf on both DOM0's only (per the excellent wiki article):
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0

The other oddity about this is that a VM started on server1 and live migrated to server2, a running ping only pauses a short while then picks right back up and continues to be successful. Migrating it back to server1 or initially starting a VM on server2 and migrating it to server1 is where the ping "stuck" issue comes into play. We were very careful and documented well as we installed both boxes, in an attempt to keep them as identical as possible. I fear this behavior proves that's not the case though, ugh...

After migrating from 2 to 1 and then trying a ping (and waiting a good logn while before ctrl-c'ing this):
PING 192.168.77.1 (192.168.77.1) 56(84) bytes of data.
64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms

--- 192.168.77.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms

Very strange... Additionally a "service network restart" at this point results in all interfaces going down, loopback being reinitialized and then it hangs on trying to bring up eth0. I can ctrl-c it three times as it pauses on each interface, then "ifconfig" and see all the IPs are still there. Still can't ping but can "telnet google.com 80" for instance. Odd...

So anyway, any pointers or suggestions you might have, would be greatly appreciated...

Thanks.

[Xen-users] Network Issues on Migration