[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-users] Live migration: 2500ms downtime
- To: "mail4dla@xxxxxxxxxxxxxx" <mail4dla@xxxxxxxxxxxxxx>
- From: "Marconi Rivello" <marconirivello@xxxxxxxxx>
- Date: Fri, 10 Aug 2007 11:21:06 -0300
- Cc: xen-users@xxxxxxxxxxxxxxxxxxx
- Delivery-date: Fri, 10 Aug 2007 07:21:36 -0700
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=khXpWZLjfxHfmWcbps0o+XUr0CcFi4XtE2QVgNOsRikrDvqhV92GQ4WxgafdcIG6cJyjCY2zk48nkfXAQg/ussuQUFXscpmzBMSMlrB/U93uoIpFjYCuZTQ/vq2zGaOFdrCT3vAQu3VCqbG9CgxRzLPB76gM1QY5iGB+KXVPNaQ=
- List-id: Xen user discussion <xen-users.lists.xensource.com>
Thanks for your reply.
I did ping from a third physical machine. The result doesn't vary much.
I followed your advice on analyzing the traffic. But I don't see why to look for ICMPs, since the DomU does answer the ping, it just has a
2.5s gap after stopping on one machine and starting on the other.
BUT, I went looking for the "unsolicited ARP response", and I didn't get any, which leads to not being able to communicate with the DomU after migration from a third machine. Unless I have an active ssh RECEIVING data from the DomU.
Here follows 2 scenarios:
1. I have an ssh connection to the DomU running top. So the DomU is constantly sending packets out the network. After I migrate domU, there's the 2.5 seconds gap, but after that the top keeps going, and I can use the ssh connection as usual.
2. I have an ssh connection to the DomU, but I'm not running any foreground process. (or have no connection at all) After I migrate domU, the ssh connection doesn't respond. The domU doesn't respond to ping or anything.
That happens when the physical machines are connected to the switch. I started tcpdump on both Dom0's to see if the DomU would send the unsolicited arp reply to update the switch's tables. And there is none. So, unless there is already traffic going out from the domU, there isn't anything to tell the switch the machine changed from one port to another.
Just to emphasize: I'm running CentOS 5, with Xen 3.0.3 (which comes with it), and applied the Xen related official CentOS (same as redhat's) updates.
Again, any thoughts or suggestions would be really appreciated.
Thanks, Marconi.
On 8/10/07, mail4dla@xxxxxxxxxxxxxx <
mail4dla@xxxxxxxxxxxxxx> wrote:Hi,
from my own experience, I can confirm that the actual downtime is very low and the limiting factor is propagation of the new location in the network.
As also the Dom0 itself caches MAC adresses, you should try to do the ping from a 3rd machine to rule out that the Dom0 does not send the packets out to the network.
If this is not an option, you can use something like 'tcpdump -i eth0 "proto ICMP"' to see what's actually going on on your network and correlate this to the output of your ping command.
Cheers
dla
Hi there,
I've read the paper on Xen live migration, and it shows some very impressive figures, like 165ms downtime on a running web server, and 50ms for a quake3 server.
I installed CentOS 5 on 2 servers, each with 2x Xeon E5335 (quad-core), 2x Intel 80003ES2LAN Gb NICs. Then I installed 2 DomUs, also with CentOS 5.
One NIC is connected to the LAN (on the same switch and VLAN), the other interconnects the 2 servers with a cross cable.
Then I start pinging the DomU that is going to be migrated with 100ms interval, from within the Dom0 that is currently hosting it. And migrate the VM. The pinging is done on the LAN interface, while the migration occurs on the cross cabled one.
64 bytes from 10.10.241.44: icmp_seq=97 ttl=64 time=0.044 ms 64 bytes from
10.10.241.44: icmp_seq=98 ttl=64 time=0.039 ms 64 bytes from
10.10.241.44: icmp_seq=99 ttl=64 time=0.039 ms 64 bytes from 10.10.241.44: icmp_seq=125 ttl=64 time=0.195 ms
64 bytes from 10.10.241.44: icmp_seq=126 ttl=64 time=
0.263 ms 64 bytes from 10.10.241.44: icmp_seq=127 ttl=64 time=0.210 ms
As you can see, the response time before the migration is around 40us, and after, it's 200us, which is understandable since the VM is now in another physical host.
The problem is the 25 lost packets between the last phase of the migration. Don't get me wrong: 2.5s is a very good time, but 50 times higher than what it is told to be, isn't.
I tried the same test connecting both machines on a hub, and got the same results.
Did anybody try to measure the downtime during a live migration? How are the results?
Any thoughts and suggestions are very appreciated.
Thanks, Marconi.
_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
|