Xen project Mailing List

[Xen-users] Are Debian squeeze dom0 kernels subject to this same IPv6 GSO problem?

To: xen-users <xen-users@xxxxxxxxxxxxxxxxxxx>

From: Andy Smith <andy@xxxxxxxxxxxxxx>

Date: Tue, 13 Dec 2011 05:10:25 +0000

Delivery-date: Tue, 13 Dec 2011 05:12:43 +0000

List-id: Xen user discussion <xen-users.lists.xensource.com>

Openpgp: id=BF15490B; url=http://strugglers.net/~andy/pubkey.asc

Hi, I have three Debian squeeze servers running: ii linux-image-2.6.32-5-xen-amd64 2.6.32-38 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii xen-hypervisor-4.0-amd64 4.0.1-4 The Xen Hypervisor on AMD64 All three servers have Intel gigabit NICs, but one server uses the e1000e driver and the other two use the igb driver. They've been in production for around 6 months now and it seems like somewhat embarrassingly we've only just now discovered a problem with IPv6 performance on the two servers with the igb driver. The problem manifests itself as awful TCP performance to a Xen domU, on the order of 15-30KB/sec data transfer. Doing the same data transfer from the server dom0 itself does not show the same issue, and the expected tens of MB/sec data transfer is achieved. Here's an example tcpdump of when the problem is occurring: # tcpdump -vpni bond0 'host 2a00:801:0:11::2' [...] 23:59:00.672905 IP6 (hlim 55, next-header TCP (6) payload length: 4316) 2a00:801:0:11::2.80 > 2001:db8:1f1:f240::2.35241: Flags [P.], cksum 0x62d3 (incorrect -> 0x1c84), seq 15709:19993, ack 127, win 9, options [nop,nop,TS val 1771553020 ecr 1086205224], length 4284 23:59:00.672987 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 1240) 2001:db8:0:1f1::8 > 2a00:801:0:11::2: [icmp6 sum ok] ICMP6, packet too big, length 1240, mtu 1500 23:59:00.673161 IP6 (hlim 63, next-header TCP (6) payload length: 32) 2001:db8:1f1:f240::2.35241 > 2a00:801:0:11::2.80: Flags [.], cksum 0x24e4 (correct), ack 17137, win 716, options [nop,nop,TS val 1086205237 ecr 1771553020], length 0 23:59:00.725659 IP6 (hlim 55, next-header TCP (6) payload length: 1460) 2a00:801:0:11::2.80 > 2001:db8:1f1:f240::2.35241: Flags [.], cksum 0x16de (correct), seq 19993:21421, ack 127, win 9, options [nop,nop,TS val 1771553033 ecr 1086205237], length 1428 23:59:00.725940 IP6 (hlim 63, next-header TCP (6) payload length: 44) 2001:db8:1f1:f240::2.35241 > 2a00:801:0:11::2.80: Flags [.], cksum 0x25f5 (correct), ack 17137, win 716, options [nop,nop,TS val 1086205250 ecr 1771553020,nop,nop,sack 1 {19993:21421}], length 0 [...] 23:59:01.188463 IP6 (hlim 63, next-header TCP (6) payload length: 32) 2001:db8:1f1:f240::2.35241 > 2a00:801:0:11::2.80: Flags [.], cksum 0x0105 (correct), ack 25705, win 1073, options [nop,nop,TS val 1086205366 ecr 1771553149], length 0 23:59:01.240946 IP6 (hlim 55, next-header TCP (6) payload length: 2888) 2a00:801:0:11::2.80 > 2001:db8:1f1:f240::2.35241: Flags [P.], cksum 0x5d3f (incorrect -> 0xf9ef), seq 25705:28561, ack 127, win 9, options [nop,nop,TS val 1771553162 ecr 1086205366], length 2856 23:59:01.241040 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 1240) 2001:db8:0:1f1::8 > 2a00:801:0:11::2: [icmp6 sum ok] ICMP6, packet too big, length 1240, mtu 1500 2a00:801:0:11::2 is speedtest.tele2.net which helpfully hosts files like http://speedtest.tele2.net/100MB.zip for testing purposes. The above is the result of me using wget to download that file from a domU on this server. The domU is at 2001:db8:1f1:f240::2 and the dom0 is at 2001:db8:0:1f1::8. What I'm noticing is the occasional incorrect checksum and "ICMPv6 packet too big" messages as seen above around 23:59:00.672905 and 23:59:01.240946 after a packet of length 2856. These do not occur on the server with the e1000e driver, where all the packets top out at 1428. They are always sporadically present on the two servers with the igb driver where the poor throughput is observed. I'm wondering if I am hitting something like this: http://amailbox.org/mailarchive/linux-kvm/2010/2/2/6257539/thread I have played with disabling and enabling GSO and checksums on every interface I can, both in dom0 and domUs, and that makes no difference. Can anyone confirm that that is the issue here? I don't at present have another machine with igb NICs around to test this. Looking at linux-source-2.6.32 on squeeze, it does not have this patch: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8e1e8a4779cb23c1d9f51e9223795e07ec54d77a although I notice that this commit also touches e1000e where I am not currently having any problems. Any ideas? Cheers, Andy _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.