[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles



On Wed, Feb 26, 2014 at 04:11:23PM +0100, Sander Eikelenboom wrote:
> 
> Wednesday, February 26, 2014, 10:14:42 AM, you wrote:
> 
> 
> > Friday, February 21, 2014, 7:32:08 AM, you wrote:
> 
> 
> >> On 2014/2/20 19:18, Sander Eikelenboom wrote:
> >>> Thursday, February 20, 2014, 10:49:58 AM, you wrote:
> >>>
> >>>
> >>>> On 2014/2/19 5:25, Sander Eikelenboom wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> I'm currently having some network troubles with Xen and recent linux 
> >>>>> kernels.
> >>>>>
> >>>>> - When running with a 3.14-rc3 kernel in dom0 and a 3.13 kernel in domU
> >>>>>     I get what seems to be described in this thread: 
> >>>>> http://www.spinics.net/lists/netdev/msg242953.html
> >>>>>
> >>>>>     In the guest:
> >>>>>     [57539.859584] net eth0: rx->offset: 0, size: 4294967295
> >>>>>     [57539.859599] net eth0: rx->offset: 0, size: 4294967295
> >>>>>     [57539.859605] net eth0: rx->offset: 0, size: 4294967295
> >>>>>     [57539.859610] net eth0: Need more slots
> >>>>>     [58157.675939] net eth0: Need more slots
> >>>>>     [58725.344712] net eth0: Need more slots
> >>>>>     [61815.849180] net eth0: rx->offset: 0, size: 4294967295
> >>>>>     [61815.849205] net eth0: rx->offset: 0, size: 4294967295
> >>>>>     [61815.849216] net eth0: rx->offset: 0, size: 4294967295
> >>>>>     [61815.849225] net eth0: Need more slots
> >>>> This issue is familiar... and I thought it get fixed.
> >>>>   From original analysis for similar issue I hit before, the root cause
> >>>> is netback still creates response when the ring is full. I remember
> >>>> larger MTU can trigger this issue before, what is the MTU size?
> >>> In dom0 both for the physical nics and the guest vif's MTU=1500
> >>> In domU the eth0 also has MTU=1500.
> >>>
> >>> So it's not jumbo frames .. just everywhere the same plain defaults ..
> >>>
> >>> With the patch from Wei that solves the other issue, i'm still seeing the 
> >>> Need more slots issue on 3.14-rc3+wei's patch now.
> >>> I have extended the "need more slots warn" to also print the cons, slots, 
> >>> max,  rx->offset, size, hope that gives some more insight.
> >>> But it indeed is the VM were i had similar issues before, the primary 
> >>> thing this VM does is 2 simultaneous rsync's (one push one pull) with 
> >>> some gigabytes of data.
> >>>
> >>> This time it was also acompanied by a "grant_table.c:1857:d0 Bad grant 
> >>> reference " as seen below, don't know if it's a cause or a effect though.
> 
> >> The log "grant_table.c:1857:d0 Bad grant reference " was also seen before.
> >> Probably the response overlaps the request and grantcopy return error 
> >> when using wrong grant reference, Netback returns resp->status with 
> >> ||XEN_NETIF_RSP_ERROR(-1) which is 4294967295 printed above from frontend.
> >> Would it be possible to print log in xenvif_rx_action of netback to see 
> >> whether something wrong with max slots and used slots?
> 
> >> Thanks
> >> Annie
> 
> > Looking more closely it are perhaps 2 different issues ... the bad grant 
> > references do not happen
> > at the same time as the netfront messages in the guest.
> 
> > I added some debugpatches to the kernel netback, netfront and xen 
> > granttable code (see below)
> > One of the things was to simplify the code for the debug key to print the 
> > granttables, the present code
> > takes too long to execute and brings down the box due to stalls and NMI's. 
> > So it now only prints
> > the nr of entries per domain.
> 
> 
> > Issue 1: grant_table.c:1858:d0 Bad grant reference
> 
> > After running the box for just one night (with 15 VM's) i get these 
> > mentions of "Bad grant reference".
> > The maptrack also seems to increase quite fast and the number of entries 
> > seem to have gone up quite fast as well.
> 
> > Most domains have just one disk(blkfront/blkback) and one nic, a few have a 
> > second disk.
> > The blk drivers use persistent grants so i would assume it would reuse 
> > those and not increase it (by much).
> 

As far as I can tell netfront has a pool of grant references and it
will BUG_ON() if there's no grefs in the pool when you request one.
Since your DomU didn't crash so I suspect the book-keeping is still
intact.

> > Domain 1 seems to have increased it's nr_grant_entries from 2048 to 3072 
> > somewhere this night.
> > Domain 7 is the domain that happens to give the netfront messages.
> 
> > I also don't get why it is reporting the "Bad grant reference" for domain 
> > 0, which seems to have 0 active entries ..
> > Also is this amount of grant entries "normal" ? or could it be a leak 
> > somewhere ?
> 

I suppose Dom0 expanding its maptrack is normal. I see as well when I
increase the number of domains. But if it keeps increasing while the
number of DomUs stay the same then it is not normal.

Presumably you only have netfront and blkfront to use grant table and
your workload as described below invovled both so it would be hard to
tell which one is faulty.

There's no immediate functional changes regarding slot counting in this
dev cycle for network driver. But there's some changes to blkfront/back
which seem interesting (memory related).

My suggestion is, if you have a working base line, you can try to setup
different frontend / backend combination to help narrow down the
problem.

Wei.

> > (XEN) [2014-02-26 00:00:38] grant_table.c:1250:d1 Expanding dom (1) grant 
> > table from (4) to (5) frames.
> > (XEN) [2014-02-26 00:00:38] grant_table.c:1250:d1 Expanding dom (1) grant 
> > table from (5) to (6) frames.
> > (XEN) [2014-02-26 00:00:38] grant_table.c:290:d0 Increased maptrack size to 
> > 13/256 frames
> > (XEN) [2014-02-26 00:01:13] grant_table.c:290:d0 Increased maptrack size to 
> > 14/256 frames
> > (XEN) [2014-02-26 04:02:55] grant_table.c:1858:d0 Bad grant reference 
> > 4325377 | 2048 | 1 | 0
> > (XEN) [2014-02-26 04:15:33] grant_table.c:290:d0 Increased maptrack size to 
> > 15/256 frames
> > (XEN) [2014-02-26 04:15:53] grant_table.c:290:d0 Increased maptrack size to 
> > 16/256 frames
> > (XEN) [2014-02-26 04:15:56] grant_table.c:290:d0 Increased maptrack size to 
> > 17/256 frames
> > (XEN) [2014-02-26 04:15:56] grant_table.c:290:d0 Increased maptrack size to 
> > 18/256 frames
> > (XEN) [2014-02-26 04:15:57] grant_table.c:290:d0 Increased maptrack size to 
> > 19/256 frames
> > (XEN) [2014-02-26 04:15:57] grant_table.c:290:d0 Increased maptrack size to 
> > 20/256 frames
> > (XEN) [2014-02-26 04:15:59] grant_table.c:290:d0 Increased maptrack size to 
> > 21/256 frames
> > (XEN) [2014-02-26 04:16:00] grant_table.c:290:d0 Increased maptrack size to 
> > 22/256 frames
> > (XEN) [2014-02-26 04:16:00] grant_table.c:290:d0 Increased maptrack size to 
> > 23/256 frames
> > (XEN) [2014-02-26 04:16:00] grant_table.c:290:d0 Increased maptrack size to 
> > 24/256 frames
> > (XEN) [2014-02-26 04:16:10] grant_table.c:290:d0 Increased maptrack size to 
> > 25/256 frames
> > (XEN) [2014-02-26 04:16:10] grant_table.c:290:d0 Increased maptrack size to 
> > 26/256 frames
> > (XEN) [2014-02-26 04:16:17] grant_table.c:290:d0 Increased maptrack size to 
> > 27/256 frames
> > (XEN) [2014-02-26 04:16:20] grant_table.c:290:d0 Increased maptrack size to 
> > 28/256 frames
> > (XEN) [2014-02-26 04:16:56] grant_table.c:290:d0 Increased maptrack size to 
> > 29/256 frames
> > (XEN) [2014-02-26 05:15:04] grant_table.c:290:d0 Increased maptrack size to 
> > 30/256 frames
> > (XEN) [2014-02-26 05:15:05] grant_table.c:290:d0 Increased maptrack size to 
> > 31/256 frames
> > (XEN) [2014-02-26 05:21:15] grant_table.c:1858:d0 Bad grant reference 
> > 107085839 | 2048 | 1 | 0
> > (XEN) [2014-02-26 05:29:47] grant_table.c:1858:d0 Bad grant reference 
> > 268435460 | 2048 | 1 | 0
> > (XEN) [2014-02-26 07:53:20] gnttab_usage_print_all [ key 'g' pressed
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    0 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    0 active 
> > entries: 0
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    1 (v1) 
> > nr_grant_entries: 3072
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    1 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    1 active 
> > entries: 2117
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    2 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    2 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    2 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    3 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    3 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    3 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    4 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    4 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    4 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    5 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    5 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    5 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    6 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    6 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    6 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    7 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    7 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    7 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    8 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    8 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    8 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    9 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    9 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    9 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   10 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   10 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   10 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   11 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   11 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   11 active 
> > entries: 1061
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   12 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   12 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   12 active 
> > entries: 1045
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   13 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   13 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   13 active 
> > entries: 1060
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   14 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   14 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   14 active 
> > entries: 709
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   15 (v1) 
> > nr_grant_entries: 2048
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   15 (v1)
> > (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   15 active 
> > entries: 163
> > (XEN) [2014-02-26 07:53:20] gnttab_usage_print_all ] done
> > (XEN) [2014-02-26 07:55:09] grant_table.c:1858:d0 Bad grant reference 
> > 4325377 | 2048 | 1 | 0
> > (XEN) [2014-02-26 08:37:16] grant_table.c:1858:d0 Bad grant reference 
> > 268435460 | 2048 | 1 | 0
> 
> 
> 
> > Issue 2: net eth0: rx->offset: 0, size: xxxxxxxxxx
> 
> > In the guest (domain 7):
> 
> > Feb 26 08:55:09 backup kernel: [39258.090375] net eth0: rx->offset: 0, 
> > size: 4294967295
> > Feb 26 08:55:09 backup kernel: [39258.090392] net eth0: me here .. 
> > cons:15177803 slots:1 rp:15177807 max:18 err:0 rx->id:74 rx->offset:0 
> > size:4294967295 ref:533
> > Feb 26 08:55:09 backup kernel: [39258.090401] net eth0: rx->offset: 0, 
> > size: 4294967295
> > Feb 26 08:55:09 backup kernel: [39258.090406] net eth0: me here .. 
> > cons:15177803 slots:2 rp:15177807 max:18 err:-22 rx->id:76 rx->offset:0 
> > size:4294967295 ref:686
> > Feb 26 08:55:09 backup kernel: [39258.090415] net eth0: rx->offset: 0, 
> > size: 4294967295
> > Feb 26 08:55:09 backup kernel: [39258.090420] net eth0: me here .. 
> > cons:15177803 slots:3 rp:15177807 max:18 err:-22 rx->id:77 rx->offset:0 
> > size:4294967295 ref:571
> 
> > In dom0 i don't see any specific netback warnings related to this domain at 
> > this specific times, the printk's i added do trigger quite some times but 
> > these are probably not
> > errorneous, but they seem to only occur on the vif of domain 7 (probably 
> > the only domain that is swamping the network by doing rsync and webdavs and 
> > causes some fragmented packets)
> 
> Another addition ... the guest doesn't shutdown anymore on "xl shutdown" .. 
> it just does .. erhmm nothing .. (tried multiple times)
> After that i ssh'ed into the guest and did a "halt -p" ... the guest shutted 
> down .. but the guest remained in xl list in blocked state ..
> Doing a "xl console" shows:
> 
> [30024.559656] net eth0: me here .. cons:8713451 slots:1 rp:8713462 max:18 
> err:0 rx->id:234 rx->offset:0 size:4294967295 ref:-131941395332550
> [30024.559666] net eth0: rx->offset: 0, size: 4294967295
> [30024.559671] net eth0: me here .. cons:8713451 slots:2 rp:8713462 max:18 
> err:-22 rx->id:236 rx->offset:0 size:4294967295 ref:-131941395332504
> [30024.559680] net eth0: rx->offset: 0, size: 4294967295
> [30024.559686] net eth0: me here .. cons:8713451 slots:3 rp:8713462 max:18 
> err:-22 rx->id:1 rx->offset:0 size:4294967295 ref:-131941395332390
> [30536.665135] net eth0: Need more slots cons:9088533 slots:6 rp:9088539 
> max:17 err:0 rx-id:26 rx->offset:0 size:0 ref:687
> [39258.090375] net eth0: rx->offset: 0, size: 4294967295
> [39258.090392] net eth0: me here .. cons:15177803 slots:1 rp:15177807 max:18 
> err:0 rx->id:74 rx->offset:0 size:4294967295 ref:533
> [39258.090401] net eth0: rx->offset: 0, size: 4294967295
> [39258.090406] net eth0: me here .. cons:15177803 slots:2 rp:15177807 max:18 
> err:-22 rx->id:76 rx->offset:0 size:4294967295 ref:686
> [39258.090415] net eth0: rx->offset: 0, size: 4294967295
> [39258.090420] net eth0: me here .. cons:15177803 slots:3 rp:15177807 max:18 
> err:-22 rx->id:77 rx->offset:0 size:4294967295 ref:571
> INIT: Switching to runlevel: 0
> INIT: Sending processes the TERM signal
> [info] Using makefile-style concurrent boot in runlevel 0.
> Stopping openntpd: ntpd.
> [ ok ] Stopping mail-transfer-agent: nullmailer.
> [ ok ] Stopping web server: apache2 ... waiting .
> [ ok ] Asking all remaining processes to terminate...done.
> [ ok ] All processes ended within 2 seconds...done.
> [ ok ] Stopping enhanced syslogd: rsyslogd.
> [ ok ] Deconfiguring network interfaces...done.
> [ ok ] Deactivating swap...done.
> [65015.958259] EXT4-fs (xvda1): re-mounted. Opts: (null)
> [info] Will now halt.
> [65018.166546] vif vif-0: 5 starting transaction
> [65160.490419] INFO: task halt:4846 blocked for more than 120 seconds.
> [65160.490464]       Not tainted 3.14.0-rc4-20140225-vanilla-nfnbdebug2+ #1
> [65160.490485] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [65160.490510] halt            D ffff88001d6cfc38     0  4846   4838 
> 0x00000000
> [65280.490470] INFO: task halt:4846 blocked for more than 120 seconds.
> [65280.490517]       Not tainted 3.14.0-rc4-20140225-vanilla-nfnbdebug2+ #1
> [65280.490540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [65280.490564] halt            D ffff88001d6cfc38     0  4846   4838 
> 0x00000000
> 
> 
> Especially the  "[65018.166546] vif vif-0: 5 starting transaction" after the 
> halt surprises me ..
> 
> --
> Sander
> 
> > Feb 26 08:53:20 serveerstertje kernel: [39324.917255] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:15101115 cons:15101112 j:8
> > Feb 26 08:53:56 serveerstertje kernel: [39361.001436] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15127649 cons:15127648 j:13
> > Feb 26 08:54:00 serveerstertje kernel: [39364.725613] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15130263 cons:15130261 j:2
> > Feb 26 08:54:04 serveerstertje kernel: [39368.739504] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:15133143 cons:15133141 j:0
> > Feb 26 08:54:20 serveerstertje kernel: [39384.665044] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15144113 cons:15144112 j:0
> > Feb 26 08:54:29 serveerstertje kernel: [39393.569871] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15150203 cons:15150200 j:0
> > Feb 26 08:54:40 serveerstertje kernel: [39404.586566] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15157706 cons:15157704 j:12
> > Feb 26 08:54:56 serveerstertje kernel: [39420.759769] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:6 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:15168839 cons:15168835 j:0
> > Feb 26 08:54:56 serveerstertje kernel: [39421.001372] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15169002 cons:15168999 j:8
> > Feb 26 08:55:00 serveerstertje kernel: [39424.515073] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15171450 cons:15171447 j:0
> > Feb 26 08:55:10 serveerstertje kernel: [39435.154510] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15178773 cons:15178770 j:1
> > Feb 26 08:56:19 serveerstertje kernel: [39504.195908] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15227444 cons:15227444 j:0
> > Feb 26 08:57:39 serveerstertje kernel: [39583.799392] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15283346 cons:15283344 j:8
> > Feb 26 08:57:55 serveerstertje kernel: [39599.517673] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:15293937 cons:15293935 j:0
> > Feb 26 08:58:07 serveerstertje kernel: [39612.156622] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15302891 cons:15302889 j:19
> > Feb 26 08:58:07 serveerstertje kernel: [39612.400907] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15303034 cons:15303033 j:0
> > Feb 26 08:58:18 serveerstertje kernel: [39623.439383] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:6 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:15310915 cons:15310911 j:0
> > Feb 26 08:58:39 serveerstertje kernel: [39643.521808] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:6 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:15324769 cons:15324766 j:1
> 
> > Feb 26 09:27:07 serveerstertje kernel: [41351.622501] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16502932 cons:16502932 j:8
> > Feb 26 09:27:19 serveerstertje kernel: [41363.541003] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:16510837 cons:16510834 j:7
> > Feb 26 09:27:23 serveerstertje kernel: [41368.133306] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16513940 cons:16513937 j:0
> > Feb 26 09:27:43 serveerstertje kernel: [41388.025147] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16527870 cons:16527868 j:0
> > Feb 26 09:27:47 serveerstertje kernel: [41391.530802] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:2 prod:16530437 cons:16530437 j:7
> > Feb 26 09:27:51 serveerstertje kernel: [41395.521166] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16533320 cons:16533317 j:6
> > Feb 26 09:27:51 serveerstertje kernel: [41395.767066] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16533469 cons:16533469 j:0
> > Feb 26 09:27:51 serveerstertje kernel: [41395.802319] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:1 GSO:0 
> > vif->rx_last_skb_slots:0 nr_frags:0 prod:16533533 cons:16533533 j:24
> > Feb 26 09:27:51 serveerstertje kernel: [41395.837456] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:1 GSO:0 
> > vif->rx_last_skb_slots:0 nr_frags:0 prod:16533534 cons:16533534 j:1
> > Feb 26 09:27:51 serveerstertje kernel: [41395.872587] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16533597 cons:16533596 j:25
> > Feb 26 09:27:51 serveerstertje kernel: [41396.192784] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16533833 cons:16533832 j:3
> > Feb 26 09:27:51 serveerstertje kernel: [41396.235611] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16533890 cons:16533890 j:30
> > Feb 26 09:27:51 serveerstertje kernel: [41396.271047] vif vif-7-0 vif7.0: 
> > !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> > vif->rx_last_skb_slots:0 nr_frags:1 prod:16533898 cons:16533896 j:3
> 
> 
> > --
> > Sander
> 
> 
> 
> 
> 
> >>>
> >>> Will keep you posted when it triggers again with the extra info in the 
> >>> warn.
> >>>
> >>> --
> >>> Sander
> >>>
> >>>
> >>>
> >>>> Thanks
> >>>> Annie
> >>>>>     Xen reports:
> >>>>>     (XEN) [2014-02-18 03:22:47] grant_table.c:1857:d0 Bad grant 
> >>>>> reference 19791875
> >>>>>     (XEN) [2014-02-18 03:42:33] grant_table.c:1857:d0 Bad grant 
> >>>>> reference 268435460
> >>>>>     (XEN) [2014-02-18 04:15:23] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 14 frames
> >>>>>     (XEN) [2014-02-18 04:15:27] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 15 frames
> >>>>>     (XEN) [2014-02-18 04:15:48] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 16 frames
> >>>>>     (XEN) [2014-02-18 04:15:50] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 17 frames
> >>>>>     (XEN) [2014-02-18 04:15:55] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 18 frames
> >>>>>     (XEN) [2014-02-18 04:15:55] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 19 frames
> >>>>>     (XEN) [2014-02-18 04:15:56] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 20 frames
> >>>>>     (XEN) [2014-02-18 04:15:56] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 21 frames
> >>>>>     (XEN) [2014-02-18 04:15:59] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 22 frames
> >>>>>     (XEN) [2014-02-18 04:15:59] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 23 frames
> >>>>>     (XEN) [2014-02-18 04:16:00] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 24 frames
> >>>>>     (XEN) [2014-02-18 04:16:05] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 25 frames
> >>>>>     (XEN) [2014-02-18 04:16:05] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 26 frames
> >>>>>     (XEN) [2014-02-18 04:16:06] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 27 frames
> >>>>>     (XEN) [2014-02-18 04:16:12] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 28 frames
> >>>>>     (XEN) [2014-02-18 04:16:18] grant_table.c:289:d0 Increased maptrack 
> >>>>> size to 29 frames
> >>>>>     (XEN) [2014-02-18 04:17:00] grant_table.c:1857:d0 Bad grant 
> >>>>> reference 268435460
> >>>>>     (XEN) [2014-02-18 04:17:00] grant_table.c:1857:d0 Bad grant 
> >>>>> reference 268435460
> >>>>>     (XEN) [2014-02-18 04:34:03] grant_table.c:1857:d0 Bad grant 
> >>>>> reference 4325377
> >>>>>
> >>>>>
> >>>>>
> >>>>> Another issue with networking is when running both dom0 and domU's with 
> >>>>> a 3.14-rc3 kernel:
> >>>>>     - i can ping the guests from dom0
> >>>>>     - i can ping dom0 from the guests
> >>>>>     - But i can't ssh or access things by http
> >>>>>     - I don't see any relevant error messages ...
> >>>>>     - This is with the same system and kernel config as with the 3.14 
> >>>>> and 3.13 combination above
> >>>>>       (that previously worked fine)
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Sander
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Xen-devel mailing list
> >>>>> Xen-devel@xxxxxxxxxxxxx
> >>>>> http://lists.xen.org/xen-devel
> >>>
> >>>
> 
> 
> 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 4fc46eb..4d720b4 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> > @@ -1667,6 +1667,8 @@ skip_vfb:
> >                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_STD;
> >              } else if (!strcmp(buf, "cirrus")) {
> >                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
> > +            } else if (!strcmp(buf, "none")) {
> > +                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
> >              } else {
> >                  fprintf(stderr, "Unknown vga \"%s\" specified\n", buf);
> >                  exit(1);
> > diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> > index 107b000..ab56927 100644
> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -265,9 +265,10 @@ get_maptrack_handle(
> >      while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
> >      {
> >          nr_frames = nr_maptrack_frames(lgt);
> > -        if ( nr_frames >= max_nr_maptrack_frames() )
> > +        if ( nr_frames >= max_nr_maptrack_frames() ){
> > +                 gdprintk(XENLOG_INFO, "Already at max maptrack size: 
> > %u/%u frames\n",nr_frames, max_nr_maptrack_frames());
> >              break;
> > -
> > +       }
> >          new_mt = alloc_xenheap_page();
> >          if ( !new_mt )
> >              break;
> > @@ -285,8 +286,8 @@ get_maptrack_handle(
> >          smp_wmb();
> >          lgt->maptrack_limit      = new_mt_limit;
> 
> > -        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
> > -                 nr_frames + 1);
> > +        gdprintk(XENLOG_INFO, "Increased maptrack size to %u/%u frames\n",
> > +                 nr_frames + 1, max_nr_maptrack_frames());
> >      }
> 
> >      spin_unlock(&lgt->lock);
> > @@ -1854,7 +1855,7 @@ __acquire_grant_for_copy(
> 
> >      if ( unlikely(gref >= nr_grant_entries(rgt)) )
> >          PIN_FAIL(unlock_out, GNTST_bad_gntref,
> > -                 "Bad grant reference %ld\n", gref);
> > +                 "Bad grant reference %ld | %d | %d | %d \n", gref, 
> > nr_grant_entries(rgt), rgt->gt_version, ldom);
> 
> >      act = &active_entry(rgt, gref);
> >      shah = shared_entry_header(rgt, gref);
> > @@ -2830,15 +2831,19 @@ static void gnttab_usage_print(struct domain *rd)
> >      int first = 1;
> >      grant_ref_t ref;
> >      struct grant_table *gt = rd->grant_table;
> > -
> > +    unsigned int active=0;
> > +/*
> >      printk("      -------- active --------       -------- shared 
> > --------\n");
> >      printk("[ref] localdom mfn      pin          localdom gmfn     
> > flags\n");
> > -
> > +*/
> >      spin_lock(&gt->lock);
> 
> >      if ( gt->gt_version == 0 )
> >          goto out;
> 
> > +    printk("grant-table for remote domain:%5d (v%d) nr_grant_entries: 
> > %d\n",
> > +                   rd->domain_id, gt->gt_version, nr_grant_entries(gt));
> > +
> >      for ( ref = 0; ref != nr_grant_entries(gt); ref++ )
> >      {
> >          struct active_grant_entry *act;
> > @@ -2875,19 +2880,22 @@ static void gnttab_usage_print(struct domain *rd)
> >                     rd->domain_id, gt->gt_version);
> >              first = 0;
> >          }
> > -
> > +        active++;
> >          /*      [ddd]    ddddd 0xXXXXXX 0xXXXXXXXX      ddddd 0xXXXXXX 
> > 0xXX */
> > -        printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 
> > 0x%02x\n",
> > -               ref, act->domid, act->frame, act->pin,
> > -               sha->domid, frame, status);
> > +        /* printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 
> > 0x%02x\n", ref, act->domid, act->frame, act->pin, sha->domid, frame, 
> > status); */
> >      }
> 
> >   out:
> >      spin_unlock(&gt->lock);
> 
> > +    printk("grant-table for remote domain:%5d active entries: %d\n",
> > +                   rd->domain_id, active);
> > +/*
> >      if ( first )
> >          printk("grant-table for remote domain:%5d ... "
> >                 "no active grant table entries\n", rd->domain_id);
> > +*/
> > +
> >  }
> 
> >  static void gnttab_usage_print_all(unsigned char key)
> 
> 
> 
> 
> 
> 
> > diff --git a/drivers/net/xen-netback/netback.c 
> > b/drivers/net/xen-netback/netback.c
> > index e5284bc..6d93358 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -482,20 +482,23 @@ static void xenvif_rx_action(struct xenvif *vif)
> >                 .meta  = vif->meta,
> >         };
> 
> > +       int j=0;
> > +
> >         skb_queue_head_init(&rxq);
> 
> >         while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
> >                 RING_IDX max_slots_needed;
> >                 int i;
> > +               int nr_frags;
> 
> >                 /* We need a cheap worse case estimate for the number of
> >                  * slots we'll use.
> >                  */
> 
> >                 max_slots_needed = DIV_ROUND_UP(offset_in_page(skb->data) +
> > -                                               skb_headlen(skb),
> > -                                               PAGE_SIZE);
> > -               for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> > +                                               skb_headlen(skb), 
> > PAGE_SIZE);
> > +               nr_frags = skb_shinfo(skb)->nr_frags;
> > +               for (i = 0; i < nr_frags; i++) {
> >                         unsigned int size;
> >                         size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> >                         max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
> > @@ -508,6 +511,9 @@ static void xenvif_rx_action(struct xenvif *vif)
> >                 if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) 
> > {
> >                         skb_queue_head(&vif->rx_queue, skb);
> >                         need_to_notify = true;
> > +                       if (net_ratelimit())
> > +                               netdev_err(vif->dev, "!?!?!?! skb may not 
> > fit .. bail out now max_slots_needed:%d GSO:%d vif->rx_last_skb_slots:%d 
> > nr_frags:%d prod:%d cons:%d j:%d\n",
> > +                                       max_slots_needed, 
> > (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 || skb_shinfo(skb)->gso_type & 
> > SKB_GSO_TCPV6) ? 1 : 0, vif->rx_last_skb_slots, 
> > nr_frags,vif->rx.sring->req_prod,vif->rx.req_cons,j);
> >                         vif->rx_last_skb_slots = max_slots_needed;
> >                         break;
> >                 } else
> > @@ -518,6 +524,7 @@ static void xenvif_rx_action(struct xenvif *vif)
> >                 BUG_ON(sco->meta_slots_used > max_slots_needed);
> 
> >                 __skb_queue_tail(&rxq, skb);
> > +               j++;
> >         }
> 
> >         BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
> > @@ -541,7 +548,7 @@ static void xenvif_rx_action(struct xenvif *vif)
> >                         resp->offset = vif->meta[npo.meta_cons].gso_size;
> >                         resp->id = vif->meta[npo.meta_cons].id;
> >                         resp->status = sco->meta_slots_used;
> > -
> > +
> >                         npo.meta_cons++;
> >                         sco->meta_slots_used--;
> >                 }
> > @@ -705,7 +712,7 @@ static int xenvif_count_requests(struct xenvif *vif,
> >                  */
> >                 if (!drop_err && slots >= XEN_NETBK_LEGACY_SLOTS_MAX) {
> >                         if (net_ratelimit())
> > -                               netdev_dbg(vif->dev,
> > +                               netdev_err(vif->dev,
> >                                            "Too many slots (%d) exceeding 
> > limit (%d), dropping packet\n",
> >                                            slots, 
> > XEN_NETBK_LEGACY_SLOTS_MAX);
> >                         drop_err = -E2BIG;
> > @@ -728,7 +735,7 @@ static int xenvif_count_requests(struct xenvif *vif,
> >                  */
> >                 if (!drop_err && txp->size > first->size) {
> >                         if (net_ratelimit())
> > -                               netdev_dbg(vif->dev,
> > +                               netdev_err(vif->dev,
> >                                            "Invalid tx request, slot size 
> > %u > remaining size %u\n",
> >                                            txp->size, first->size);
> >                         drop_err = -EIO;
> 
> 
> 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index f9daa9e..67d5221 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -753,6 +753,7 @@ static int xennet_get_responses(struct netfront_info 
> > *np,
> >                         if (net_ratelimit())
> >                                 dev_warn(dev, "rx->offset: %x, size: %u\n",
> >                                          rx->offset, rx->status);
> > +                               dev_warn(dev, "me here .. cons:%d slots:%d 
> > rp:%d max:%d err:%d rx->id:%d rx->offset:%x size:%u 
> > ref:%ld\n",cons,slots,rp,max,err,rx->id, rx->offset, rx->status, ref);
> >                         xennet_move_rx_slot(np, skb, ref);
> >                         err = -EINVAL;
> >                         goto next;
> > @@ -784,7 +785,7 @@ next:
> 
> >                 if (cons + slots == rp) {
> >                         if (net_ratelimit())
> > -                               dev_warn(dev, "Need more slots\n");
> > +                               dev_warn(dev, "Need more slots cons:%d 
> > slots:%d rp:%d max:%d err:%d rx-id:%d rx->offset:%x size:%u 
> > ref:%ld\n",cons,slots,rp,max,err,rx->id, rx->offset, rx->status, ref);
> >                         err = -ENOENT;
> >                         break;
> >                 }
> > @@ -803,7 +804,6 @@ next:
> 
> >         if (unlikely(err))
> >                 np->rx.rsp_cons = cons + slots;
> > -
> >         return err;
> >  }
> 
> > @@ -907,6 +907,7 @@ static int handle_incoming_queue(struct net_device *dev,
> 
> >                 /* Ethernet work: Delayed to here as it peeks the header. */
> >                 skb->protocol = eth_type_trans(skb, dev);
> > +               skb_reset_network_header(skb);
> 
> >                 if (checksum_setup(dev, skb)) {
> >                         kfree_skb(skb);
> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.