Xen project Mailing List

Re: [Xen-devel] xen-netfront possibly rides the rocket too often

From: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

Date: Thu, 15 May 2014 12:47:31 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Stefan Bader <stefan.bader@xxxxxxxxxxxxx>, Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>, netdev <netdev@xxxxxxxxxxxxxxx>

Delivery-date: Thu, 15 May 2014 11:47:41 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 2014-05-15 at 12:04 +0100, Wei Liu wrote: > On Thu, May 15, 2014 at 09:46:45AM +0100, Ian Campbell wrote: > > On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote: > > > On 13/05/14 19:21, Stefan Bader wrote: > > > > We had reports about this message being seen on EC2 for a while but > > > > finally a > > > > reporter did notice some details about the guests and was able to > > > > provide a > > > > simple way to reproduce[1]. > > > > > > > > For my local experiments I use a Xen-4.2.2 based host (though I would > > > > say the > > > > host versions are not important). The host has one NIC which is used as > > > > the > > > > outgoing port of a Linux based (not openvswitch) bridge. And the PV > > > > guests use > > > > that bridge. I set the mtu to 9001 (which was seen on affected instance > > > > types) > > > > and also inside the guests. As described in the report one guests runs > > > > redis-server and the other nodejs through two scripts (for me I had to > > > > do the > > > > two sub.js calls in separate shells). After a bit the error messages > > > > appear on > > > > the guest running the redis-server. > > > > > > > > I added some debug printk's to show a bit more detail about the skb and > > > > got the > > > > following (<length>@<offset (after masking off complete pages)>): > > > > > > > > [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots > > > > [ 698.108134] header 1490@238 -> 1 slots > > > > [ 698.108139] frag #0 1614@2164 -> + 1 pages > > > > [ 698.108143] frag #1 3038@1296 -> + 2 pages > > > > [ 698.108147] frag #2 6076@1852 -> + 2 pages > > > > [ 698.108151] frag #3 6076@292 -> + 2 pages > > > > [ 698.108156] frag #4 6076@2828 -> + 3 pages > > > > [ 698.108160] frag #5 3038@1268 -> + 2 pages > > > > [ 698.108164] frag #6 2272@1824 -> + 1 pages > > > > [ 698.108168] frag #7 3804@0 -> + 1 pages > > > > [ 698.108172] frag #8 6076@264 -> + 2 pages > > > > [ 698.108177] frag #9 3946@2800 -> + 2 pages > > > > [ 698.108180] frags adding 18 slots > > > > > > > > Since I am not deeply familiar with the networking code, I wonder about > > > > two things: > > > > - is there something that should limit the skb data length from all > > > > frags > > > > to stay below the 64K which the definition of MAX_SKB_FRAGS hints? > > > I think netfront should be able to handle 64K packets at most. > > > > Ah, maybe this relates to this fix from Wei? > > > > Yes, below patch limits SKB size to 64KB. However the problem here is > not SKB exceeding 64KB. The said SKB is acutally 43KB in size. The > problem is that guest kernel is using compound page so a frag which can > be fit into one 4K page spans two 4K pages. The fix seems to be > coalescing SKB in frontend, but it will degrade performance. So long as it only happens when this scenario occurs a performance degradation would seem preferable to dropping the skb altogether. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.