[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] checksum `offload'
Ian Jackson wrote:
Hardcoded in the Xen 3.0.1 network backend driver (in the supplied
patch to Linux 2.6.12) is the notion that packets `outbound' through
the network backend (destined for a frontend in another guest) do not
ever need to be checksummed.
Excellent and timely summary. I just started looking into
the offload problem for VLANs. Jon Mason and Jim Dykman
generated a patch for the IPSec environment issue, but
due to concerns about whether it would be acceptable
upstream, this hasn't yet been blessed. I'd really like
to look at that bug in a wider context with many of the
issues you just specified addressed, but this was going
to be post 3.0.2 and distro release happening.
I can't find any design documentation which explains this decision, but
I presume that this is the result of the following chain of reasoning
about virtual network interfaces:
1. The backend is in dom0 and the frontend is in some domU.
2. domU does not have and use any physical network hardware.
3. The domU does not act as a router-encapsulator. (eg,
run a VPN client, tunnel endpoint, etc. etc.)
4. The domU will always know correctly whether the packet
originated from dom0 (checksum not needed, not calculated) or from
some other machine and just came via domU (checksum calculated and
5. Therefore all packets leaving dom0 for domU will terminate
on that domU and do not need to be checksummed.
(It is possible that there's something fancy happening in the
frontend; I briefly looked at that code but didn't take the time to
understand it fully.)
At the point this was done, there was not support for
a different model (backend in dom0, frontend in domU).
It was assumed to be the traffic model.
All of the assumptions 1-4 can be false. 1-3 can be false in many
network topologies and the system should not assume that the network
topology is as set up by the provided default configuration scripts.
4 is apparently false in my case and caused the symptoms I saw.
While Xen allows the frontend interface's `transmit checksum offload'
(ie, for packets leaving that guest) to be enabled and disabled from
userland, so that checksum calculation can be suprresed, it does not
allow the `receive checksum offload' (for packets entering the guest)
to be controlled, and it does not allow the backend's checksum
processing to be enabled and disabled (in 3.0.1, at least).
Since I believe we only initiate for outgoing, suppressing
the offload on the transmit on DomU should be enough to
bypass this behaviour(?).
Therefore, it is not possible to encode rules for correct behaviour in
the code for Xen's virtual network devices. The correct behaviour can
only be determined by the network configuration scripts which are also
responsible for establishing the desired network topology.
Ie, the behaviour must be configurable from userland.
I agree this should be configurable.
In many (most?) scenarios, checksums cannot safely be suppressed for
any significant proportion of the traffic. If the guests are strongly
Majority of the workloads probably expect guest <-> remote
communication. I'd be interested in which workloads (if any)
expect heavy dom0 <-> guest or guest <-> guest communication.
isolated with their own filesystems and the purpose is providing
multiple largely-independent hardware platforms, guest-guest
communication will be relatively rare, and of course communications
from one guest to the internet at large must be checksummed. The
Deferring the checksum to dom0 [Assumption = dom0 is where
it reaches the physical hw] where it can be offloaded
to the real hardware is not a bad idea - expected to be a
non-trivial performance boost.
suppression is only useful when a large amount of network traffic has
the different guests as endpoints; the most likely scenario is one
where the guests share `network' filesystems from dom0 - but this is
not the default configuration with the supplied scripts, and doing it
safely involves significant effort to ensure that the fs traffic is
protected from interference.
Ie, the checksum offload should be disabled by default.
* Checksum suppression for virtual network backends should not be done
with NETIF_F_NO_CSUM but with NETIF_F_IP_CSUM or the like, as for
Exactly what I was going to look into (changing the
way we do the implementation right now) for post-3.0.2.
* Any code in the frontend that attempts to decide whether the
peer for a packet is the backend guest itself or some other machine
further away should be removed.
* Checksum suppression control with ethtool -K should be supported
both for outbound and inbound packets on both frontend and backend
* The default should have checksum suppression enabled.
* Ideally, there would be example scripts which provide guest domains
with a set of eth1's on a private entirely-virtual network, all of
whose interfaces have checksums suppressed, and which does not
exchange packets with the wider Internet. This could be used for
intra-system NFS, etc.
Exactly. Yes. :)
Xen-devel mailing list