[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Network Checksum Removal
Tests for domU->dom0, domU->host, and domU->domU are completed: 3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory Benchmark: netperf2 -T TCP_STREAM dom0, dom1, and dom2 on cpu0 (first SMT thread on first core) domU to host hw tx csum msg-size: 00064 Mbps: 0186 d0-cpu: 49.38 d1-cpu: 44.35 msg-size: 01500 Mbps: 0917 d0-cpu: 62.13 d1-cpu: 37.87 msg-size: 16384 Mbps: 0933 d0-cpu: 66.63 d1-cpu: 33.37 msg-size: 32768 Mbps: 0928 d0-cpu: 66.96 d1-cpu: 32.66 sw tx csum msg-size: 00064 Mbps: 0187 d0-cpu: 49.50 d1-cpu: 44.52 msg-size: 01500 Mbps: 0904 d0-cpu: 60.63 d1-cpu: 39.36 msg-size: 16384 Mbps: 0924 d0-cpu: 63.98 d1-cpu: 35.98 msg-size: 32768 Mbps: 0926 d0-cpu: 64.18 d1-cpu: 35.68 ^^about 2% reduction in cpu util on dom1^^ domU to dom0 hw tx csum msg-size: 00064 Mbps: 0014 d0-cpu: 64.02 d1-cpu: 31.71 msg-size: 01500 Mbps: 1087 d0-cpu: 63.34 d1-cpu: 36.67 msg-size: 16384 Mbps: 1204 d0-cpu: 67.30 d1-cpu: 32.71 msg-size: 32768 Mbps: 1148 d0-cpu: 68.08 d1-cpu: 31.93 sw tx csum msg-size: 00064 Mbps: 0014 d0-cpu: 64.88 d1-cpu: 32.39 msg-size: 01500 Mbps: 0948 d0-cpu: 62.20 d1-cpu: 37.80 msg-size: 16384 Mbps: 1063 d0-cpu: 64.73 d1-cpu: 35.27 msg-size: 32768 Mbps: 1012 d0-cpu: 65.71 d1-cpu: 34.30 ^^upto 13% throughput increase with cpu util down ~2% on dom1^^ Note the dismal performance for very small msg sizes donU to domU hw tx csum msg-size:00064 Mbps: 0359 d0-cpu: 27.85 d1-cpu: 53.68 d2-cpu: 18.48 msg-size:01500 Mbps: 0594 d0-cpu: 47.42 d1-cpu: 21.77 d2-cpu: 30.78 msg-size:16384 Mbps: 0619 d0-cpu: 49.66 d1-cpu: 18.81 d2-cpu: 31.53 msg-size:32768 Mbps: 0616 d0-cpu: 49.58 d1-cpu: 18.68 d2-cpu: 31.74 sw tx csum msg-size:00064 Mbps: 0361 d0-cpu: 27.81 d1-cpu: 53.58 d2-cpu: 18.62 msg-size:01500 Mbps: 0584 d0-cpu: 46.22 d1-cpu: 23.18 d2-cpu: 30.60 msg-size:16384 Mbps: 0602 d0-cpu: 47.99 d1-cpu: 20.33 d2-cpu: 31.69 msg-size:32768 Mbps: 0603 d0-cpu: 47.67 d1-cpu: 20.59 d2-cpu: 31.74 ^^About a 2% throughput increase, and cpu down on d1 The cpu wasted on dom1 should be enough justification for domU<->domU communication with point to point front end driver communication. dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same core) domU to host hw tx csum msg-size: 00064 Mbps: 0540 d0-cpu: 92.98 d1-cpu: 100.00 msg-size: 01500 Mbps: 0941 d0-cpu: 99.74 d1-cpu: 48.62 msg-size: 16384 Mbps: 0941 d0-cpu: 99.71 d1-cpu: 43.32 msg-size: 32768 Mbps: 0941 d0-cpu: 99.72 d1-cpu: 43.21 sw tx csum msg-size: 00064 Mbps: 0545 d0-cpu: 93.47 d1-cpu: 100.00 msg-size: 01500 Mbps: 0941 d0-cpu: 99.76 d1-cpu: 51.43 msg-size: 16384 Mbps: 0941 d0-cpu: 99.69 d1-cpu: 46.58 msg-size: 32768 Mbps: 0941 d0-cpu: 99.72 d1-cpu: 45.39 ^^Finally at wire speed, but at a cost of 100% cpu on dom0 This cpu util seems excessive, maybe oprofile will show some problems. Notice dom1 has ~2% lower cpu. domU to dom0 tx csum msg-size: 00064 Mbps: 0390 d0-cpu: 97.92 d1-cpu: 100.00 msg-size: 01500 Mbps: 1571 d0-cpu: 97.36 d1-cpu: 54.83 msg-size: 16384 Mbps: 1582 d0-cpu: 96.20 d1-cpu: 49.93 msg-size: 32768 Mbps: 1596 d0-cpu: 96.32 d1-cpu: 49.63 sw tx csum msg-size: 00064 Mbps: 0375 d0-cpu: 97.65 d1-cpu: 100.00 msg-size: 01500 Mbps: 1546 d0-cpu: 96.36 d1-cpu: 52.99 msg-size: 16384 Mbps: 1598 d0-cpu: 95.88 d1-cpu: 47.48 msg-size: 32768 Mbps: 1641 d0-cpu: 95.89 d1-cpu: 46.37 ^^very slightly better avg throughput, and lower cpu on dom1 donU to domU tx csum msg-size:00064 Mbps: 0287 d0-cpu: 84.97 d1-cpu: 100.0 d2-cpu: 75.46 msg-size:01500 Mbps: 1004 d0-cpu: 90.98 d1-cpu: 68.29 d2-cpu: 76.94 msg-size:16384 Mbps: 1018 d0-cpu: 89.78 d1-cpu: 60.82 d2-cpu: 78.12 msg-size:32768 Mbps: 1010 d0-cpu: 89.30 d1-cpu: 59.83 d2-cpu: 77.99 sw tx csum msg-size:00064 Mbps: 0286 d0-cpu: 84.81 d1-cpu: 99.93 d2-cpu: 76.28 msg-size:01500 Mbps: 1018 d0-cpu: 91.30 d1-cpu: 67.27 d2-cpu: 75.08 msg-size:16384 Mbps: 1012 d0-cpu: 88.46 d1-cpu: 55.56 d2-cpu: 71.37 msg-size:32768 Mbps: 1017 d0-cpu: 88.33 d1-cpu: 54.96 d2-cpu: 70.96 ^^about same throughput, but ~4% lower cpu on d1 Again, point to point front end comms woudl be great here. IMO, I think the patch is a good thing. There are other very major issues with networking, like the massive cpu overhead for dom0. I wonder if we could have a layer 2 networking model like: -Xen has have front end ethernet drivers only -dom0 has a Xen bridge front end driver, just to put eth0 (or whatever phys dev) on it. -no domain hosted bridge device or backend ethernet drivers With this, Xen acts as a ethernet "switch", switching ethernet traffic in xen itself, without the help of a domain hosted bridge. Packets are forwarded to either a domain's front end driver, or the front end bridge interface in dom0 (or any other driver domain). With this we may have better control of emulating offload functions, and we should avoid some hops (and in may cases involving dom0) for the netwrok traffic. Comments? -Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |