[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Network Checksum Removal



On Wednesday 25 May 2005 11:48 am, Ian Pratt wrote:
> What does the tx hw csum control actually turn on and off?

The tx hw csum control lets the TCP/IP stack know whether or not to software 
checksum the outgoing packet or not.  So if tx checksum offload is enabled, 
then the stack will not software checksum it.

> I'm surprised there's much benefit to csum offload on the tx side at all
> as its almost always done as part of a copy.

Why?  The tx checksumming is just as expensive as the rx checksumming.

> I'd have thought the main benefit of csum offload was on the rx side, so
> that packets received by the NIC are hardware csum'ed, passed through
> the bridge, and then into the domU where the csum re-calculation is
> avoided [it would normally need to be done before the TCP ack is sent,
> and can't be done as part of a copy as the data won't be moved out of
> the skb until the user app does a read].  The same rx csum check will be
> avoided and hence provide benefit to domU <-> domU transfers.

I can add an ethtool feature to disable rx checksum offload (so that domU will 
verify the checksum in hardware).

> In the figures below, which direction is the data stream heading? (I
> presume it's a one way test, like ttcp?)
>
> It's somewhat surprising that the dom0 bridge code is burning so much
> CPU. xenoprofile results will be quite interesting to see what functions
> are eating the CPU.

There is a patch on netdev which can decrease the CPU load of bridging.  
specifically, it allows the bridge device to take advantage of the network 
device features (like hardware checksum offload).  Stephen Hemminger says it 
should go in the 2.6.13 kernel.  

> Ultimately, the best way of doing domU <-> domU networking will be to
> allow point-to-point connections where netfronts are connected direct to
> other netfronts if the hosts are on the same machine. However, the
> priority for 3.0 is to optimise the normal front-back-bridge-back-front
> path.
>
> Thanks,
> Ian
>
> > -----Original Message-----
> > From: Andrew Theurer [mailto:habanero@xxxxxxxxxx]
> > Sent: 25 May 2005 15:39
> > To: Jon Mason; xen-devel@xxxxxxxxxxxxxxxxxxx
> > Cc: Ian Pratt; bin.ren@xxxxxxxxxxxx
> > Subject: Re: [Xen-devel] [PATCH] Network Checksum Removal
> >
> > Tests for domU->dom0, domU->host, and domU->domU are completed:
> >
> > 3.2 GHz Xeon with Hyperhtreading, 2GB (correction) memory
> >
> >
> > Benchmark: netperf2 -T TCP_STREAM
> >
> >
> > dom0, dom1, and dom2 on cpu0 (first SMT thread on first core)
> >  domU to host
> >   hw tx csum
> >    msg-size: 00064  Mbps: 0186  d0-cpu: 49.38  d1-cpu: 44.35
> >    msg-size: 01500  Mbps: 0917  d0-cpu: 62.13  d1-cpu: 37.87
> >    msg-size: 16384  Mbps: 0933  d0-cpu: 66.63  d1-cpu: 33.37
> >    msg-size: 32768  Mbps: 0928  d0-cpu: 66.96  d1-cpu: 32.66
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0187  d0-cpu: 49.50  d1-cpu: 44.52
> >    msg-size: 01500  Mbps: 0904  d0-cpu: 60.63  d1-cpu: 39.36
> >    msg-size: 16384  Mbps: 0924  d0-cpu: 63.98  d1-cpu: 35.98
> >    msg-size: 32768  Mbps: 0926  d0-cpu: 64.18  d1-cpu: 35.68
> >     ^^about 2% reduction in cpu util on dom1^^
> >  domU to dom0
> >   hw tx csum
> >    msg-size: 00064  Mbps: 0014  d0-cpu: 64.02  d1-cpu: 31.71
> >    msg-size: 01500  Mbps: 1087  d0-cpu: 63.34  d1-cpu: 36.67
> >    msg-size: 16384  Mbps: 1204  d0-cpu: 67.30  d1-cpu: 32.71
> >    msg-size: 32768  Mbps: 1148  d0-cpu: 68.08  d1-cpu: 31.93
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0014  d0-cpu: 64.88  d1-cpu: 32.39
> >    msg-size: 01500  Mbps: 0948  d0-cpu: 62.20  d1-cpu: 37.80
> >    msg-size: 16384  Mbps: 1063  d0-cpu: 64.73  d1-cpu: 35.27
> >    msg-size: 32768  Mbps: 1012  d0-cpu: 65.71  d1-cpu: 34.30
> >     ^^upto 13% throughput increase with cpu util down ~2% on dom1^^
> >       Note the dismal performance for very small msg sizes
> >  donU to domU
> >   hw tx csum
> >    msg-size:00064 Mbps: 0359  d0-cpu: 27.85  d1-cpu: 53.68
> > d2-cpu: 18.48
> >    msg-size:01500 Mbps: 0594  d0-cpu: 47.42  d1-cpu: 21.77
> > d2-cpu: 30.78
> >    msg-size:16384 Mbps: 0619  d0-cpu: 49.66  d1-cpu: 18.81
> > d2-cpu: 31.53
> >    msg-size:32768 Mbps: 0616  d0-cpu: 49.58  d1-cpu: 18.68
> > d2-cpu: 31.74
> >   sw tx csum
> >    msg-size:00064 Mbps: 0361  d0-cpu: 27.81  d1-cpu: 53.58
> > d2-cpu: 18.62
> >    msg-size:01500 Mbps: 0584  d0-cpu: 46.22  d1-cpu: 23.18
> > d2-cpu: 30.60
> >    msg-size:16384 Mbps: 0602  d0-cpu: 47.99  d1-cpu: 20.33
> > d2-cpu: 31.69
> >    msg-size:32768 Mbps: 0603  d0-cpu: 47.67  d1-cpu: 20.59
> > d2-cpu: 31.74
> >     ^^About a 2% throughput increase, and cpu down on d1
> >       The cpu wasted on dom1 should be enough justification for
> >       domU<->domU communication with point to point front end driver
> >       communication.
> > dom0 on cpu0, dom1 on cpu2, and dom2 on cpu3 (dom1 and dom2 on same
> > core)
> >  domU to host
> >   hw tx csum
> >    msg-size: 00064  Mbps: 0540  d0-cpu: 92.98  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 0941  d0-cpu: 99.74  d1-cpu: 48.62
> >    msg-size: 16384  Mbps: 0941  d0-cpu: 99.71  d1-cpu: 43.32
> >    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 43.21
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0545  d0-cpu: 93.47  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 0941  d0-cpu: 99.76  d1-cpu: 51.43
> >    msg-size: 16384  Mbps: 0941  d0-cpu: 99.69  d1-cpu: 46.58
> >    msg-size: 32768  Mbps: 0941  d0-cpu: 99.72  d1-cpu: 45.39
> >     ^^Finally at wire speed, but at a cost of 100% cpu on dom0
> >       This cpu util seems excessive, maybe oprofile will show
> >       some problems.  Notice dom1 has ~2% lower cpu.
> >  domU to dom0
> >   tx csum
> >    msg-size: 00064  Mbps: 0390  d0-cpu: 97.92  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 1571  d0-cpu: 97.36  d1-cpu: 54.83
> >    msg-size: 16384  Mbps: 1582  d0-cpu: 96.20  d1-cpu: 49.93
> >    msg-size: 32768  Mbps: 1596  d0-cpu: 96.32  d1-cpu: 49.63
> >   sw tx csum
> >    msg-size: 00064  Mbps: 0375  d0-cpu: 97.65  d1-cpu: 100.00
> >    msg-size: 01500  Mbps: 1546  d0-cpu: 96.36  d1-cpu: 52.99
> >    msg-size: 16384  Mbps: 1598  d0-cpu: 95.88  d1-cpu: 47.48
> >    msg-size: 32768  Mbps: 1641  d0-cpu: 95.89  d1-cpu: 46.37
> >     ^^very slightly better avg throughput, and lower cpu on dom1
> >  donU to domU
> >   tx csum
> >    msg-size:00064 Mbps: 0287  d0-cpu: 84.97  d1-cpu: 100.0
> > d2-cpu: 75.46
> >    msg-size:01500 Mbps: 1004  d0-cpu: 90.98  d1-cpu: 68.29
> > d2-cpu: 76.94
> >    msg-size:16384 Mbps: 1018  d0-cpu: 89.78  d1-cpu: 60.82
> > d2-cpu: 78.12
> >    msg-size:32768 Mbps: 1010  d0-cpu: 89.30  d1-cpu: 59.83
> > d2-cpu: 77.99
> >   sw tx csum
> >    msg-size:00064 Mbps: 0286  d0-cpu: 84.81  d1-cpu: 99.93
> > d2-cpu: 76.28
> >    msg-size:01500 Mbps: 1018  d0-cpu: 91.30  d1-cpu: 67.27
> > d2-cpu: 75.08
> >    msg-size:16384 Mbps: 1012  d0-cpu: 88.46  d1-cpu: 55.56
> > d2-cpu: 71.37
> >    msg-size:32768 Mbps: 1017  d0-cpu: 88.33  d1-cpu: 54.96
> > d2-cpu: 70.96
> >     ^^about same throughput, but ~4% lower cpu on d1
> >       Again, point to point front end comms woudl be great here.
> >
> >
> > IMO, I think the patch is a good thing.  There are other very major
> > issues with networking, like the massive cpu overhead for dom0.  I
> > wonder if we could have a layer 2 networking model like:
> >
> > -Xen has have front end ethernet drivers only
> > -dom0 has a Xen bridge front end driver, just to put eth0 (or
> > whatever
> > phys dev) on it.
> > -no domain hosted bridge device or backend ethernet drivers
> >
> > With this, Xen acts as a ethernet "switch", switching
> > ethernet traffic
> > in xen itself, without the help of a domain hosted bridge.
> > Packets are
> > forwarded to either a domain's front end driver, or the front end
> > bridge interface in dom0 (or any other driver domain).  With this we
> > may have better control of emulating offload functions, and we should
> > avoid some hops (and in may cases involving dom0) for the netwrok
> > traffic.  Comments?
> >
> > -Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.