[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles



Wednesday, February 26, 2014, 10:14:42 AM, you wrote:


> Friday, February 21, 2014, 7:32:08 AM, you wrote:


>> On 2014/2/20 19:18, Sander Eikelenboom wrote:
>>> Thursday, February 20, 2014, 10:49:58 AM, you wrote:
>>>
>>>
>>>> On 2014/2/19 5:25, Sander Eikelenboom wrote:
>>>>> Hi All,
>>>>>
>>>>> I'm currently having some network troubles with Xen and recent linux 
>>>>> kernels.
>>>>>
>>>>> - When running with a 3.14-rc3 kernel in dom0 and a 3.13 kernel in domU
>>>>>     I get what seems to be described in this thread: 
>>>>> http://www.spinics.net/lists/netdev/msg242953.html
>>>>>
>>>>>     In the guest:
>>>>>     [57539.859584] net eth0: rx->offset: 0, size: 4294967295
>>>>>     [57539.859599] net eth0: rx->offset: 0, size: 4294967295
>>>>>     [57539.859605] net eth0: rx->offset: 0, size: 4294967295
>>>>>     [57539.859610] net eth0: Need more slots
>>>>>     [58157.675939] net eth0: Need more slots
>>>>>     [58725.344712] net eth0: Need more slots
>>>>>     [61815.849180] net eth0: rx->offset: 0, size: 4294967295
>>>>>     [61815.849205] net eth0: rx->offset: 0, size: 4294967295
>>>>>     [61815.849216] net eth0: rx->offset: 0, size: 4294967295
>>>>>     [61815.849225] net eth0: Need more slots
>>>> This issue is familiar... and I thought it get fixed.
>>>>   From original analysis for similar issue I hit before, the root cause
>>>> is netback still creates response when the ring is full. I remember
>>>> larger MTU can trigger this issue before, what is the MTU size?
>>> In dom0 both for the physical nics and the guest vif's MTU=1500
>>> In domU the eth0 also has MTU=1500.
>>>
>>> So it's not jumbo frames .. just everywhere the same plain defaults ..
>>>
>>> With the patch from Wei that solves the other issue, i'm still seeing the 
>>> Need more slots issue on 3.14-rc3+wei's patch now.
>>> I have extended the "need more slots warn" to also print the cons, slots, 
>>> max,  rx->offset, size, hope that gives some more insight.
>>> But it indeed is the VM were i had similar issues before, the primary thing 
>>> this VM does is 2 simultaneous rsync's (one push one pull) with some 
>>> gigabytes of data.
>>>
>>> This time it was also acompanied by a "grant_table.c:1857:d0 Bad grant 
>>> reference " as seen below, don't know if it's a cause or a effect though.

>> The log "grant_table.c:1857:d0 Bad grant reference " was also seen before.
>> Probably the response overlaps the request and grantcopy return error 
>> when using wrong grant reference, Netback returns resp->status with 
>> ||XEN_NETIF_RSP_ERROR(-1) which is 4294967295 printed above from frontend.
>> Would it be possible to print log in xenvif_rx_action of netback to see 
>> whether something wrong with max slots and used slots?

>> Thanks
>> Annie

> Looking more closely it are perhaps 2 different issues ... the bad grant 
> references do not happen
> at the same time as the netfront messages in the guest.

> I added some debugpatches to the kernel netback, netfront and xen granttable 
> code (see below)
> One of the things was to simplify the code for the debug key to print the 
> granttables, the present code
> takes too long to execute and brings down the box due to stalls and NMI's. So 
> it now only prints
> the nr of entries per domain.


> Issue 1: grant_table.c:1858:d0 Bad grant reference

> After running the box for just one night (with 15 VM's) i get these mentions 
> of "Bad grant reference".
> The maptrack also seems to increase quite fast and the number of entries seem 
> to have gone up quite fast as well.

> Most domains have just one disk(blkfront/blkback) and one nic, a few have a 
> second disk.
> The blk drivers use persistent grants so i would assume it would reuse those 
> and not increase it (by much).

> Domain 1 seems to have increased it's nr_grant_entries from 2048 to 3072 
> somewhere this night.
> Domain 7 is the domain that happens to give the netfront messages.

> I also don't get why it is reporting the "Bad grant reference" for domain 0, 
> which seems to have 0 active entries ..
> Also is this amount of grant entries "normal" ? or could it be a leak 
> somewhere ?

> (XEN) [2014-02-26 00:00:38] grant_table.c:1250:d1 Expanding dom (1) grant 
> table from (4) to (5) frames.
> (XEN) [2014-02-26 00:00:38] grant_table.c:1250:d1 Expanding dom (1) grant 
> table from (5) to (6) frames.
> (XEN) [2014-02-26 00:00:38] grant_table.c:290:d0 Increased maptrack size to 
> 13/256 frames
> (XEN) [2014-02-26 00:01:13] grant_table.c:290:d0 Increased maptrack size to 
> 14/256 frames
> (XEN) [2014-02-26 04:02:55] grant_table.c:1858:d0 Bad grant reference 4325377 
> | 2048 | 1 | 0
> (XEN) [2014-02-26 04:15:33] grant_table.c:290:d0 Increased maptrack size to 
> 15/256 frames
> (XEN) [2014-02-26 04:15:53] grant_table.c:290:d0 Increased maptrack size to 
> 16/256 frames
> (XEN) [2014-02-26 04:15:56] grant_table.c:290:d0 Increased maptrack size to 
> 17/256 frames
> (XEN) [2014-02-26 04:15:56] grant_table.c:290:d0 Increased maptrack size to 
> 18/256 frames
> (XEN) [2014-02-26 04:15:57] grant_table.c:290:d0 Increased maptrack size to 
> 19/256 frames
> (XEN) [2014-02-26 04:15:57] grant_table.c:290:d0 Increased maptrack size to 
> 20/256 frames
> (XEN) [2014-02-26 04:15:59] grant_table.c:290:d0 Increased maptrack size to 
> 21/256 frames
> (XEN) [2014-02-26 04:16:00] grant_table.c:290:d0 Increased maptrack size to 
> 22/256 frames
> (XEN) [2014-02-26 04:16:00] grant_table.c:290:d0 Increased maptrack size to 
> 23/256 frames
> (XEN) [2014-02-26 04:16:00] grant_table.c:290:d0 Increased maptrack size to 
> 24/256 frames
> (XEN) [2014-02-26 04:16:10] grant_table.c:290:d0 Increased maptrack size to 
> 25/256 frames
> (XEN) [2014-02-26 04:16:10] grant_table.c:290:d0 Increased maptrack size to 
> 26/256 frames
> (XEN) [2014-02-26 04:16:17] grant_table.c:290:d0 Increased maptrack size to 
> 27/256 frames
> (XEN) [2014-02-26 04:16:20] grant_table.c:290:d0 Increased maptrack size to 
> 28/256 frames
> (XEN) [2014-02-26 04:16:56] grant_table.c:290:d0 Increased maptrack size to 
> 29/256 frames
> (XEN) [2014-02-26 05:15:04] grant_table.c:290:d0 Increased maptrack size to 
> 30/256 frames
> (XEN) [2014-02-26 05:15:05] grant_table.c:290:d0 Increased maptrack size to 
> 31/256 frames
> (XEN) [2014-02-26 05:21:15] grant_table.c:1858:d0 Bad grant reference 
> 107085839 | 2048 | 1 | 0
> (XEN) [2014-02-26 05:29:47] grant_table.c:1858:d0 Bad grant reference 
> 268435460 | 2048 | 1 | 0
> (XEN) [2014-02-26 07:53:20] gnttab_usage_print_all [ key 'g' pressed
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    0 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    0 active 
> entries: 0
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    1 (v1) 
> nr_grant_entries: 3072
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    1 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    1 active 
> entries: 2117
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    2 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    2 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    2 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    3 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    3 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    3 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    4 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    4 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    4 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    5 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    5 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    5 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    6 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    6 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    6 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    7 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    7 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    7 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    8 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    8 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    8 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    9 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    9 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:    9 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   10 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   10 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   10 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   11 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   11 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   11 active 
> entries: 1061
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   12 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   12 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   12 active 
> entries: 1045
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   13 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   13 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   13 active 
> entries: 1060
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   14 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   14 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   14 active 
> entries: 709
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   15 (v1) 
> nr_grant_entries: 2048
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   15 (v1)
> (XEN) [2014-02-26 07:53:20] grant-table for remote domain:   15 active 
> entries: 163
> (XEN) [2014-02-26 07:53:20] gnttab_usage_print_all ] done
> (XEN) [2014-02-26 07:55:09] grant_table.c:1858:d0 Bad grant reference 4325377 
> | 2048 | 1 | 0
> (XEN) [2014-02-26 08:37:16] grant_table.c:1858:d0 Bad grant reference 
> 268435460 | 2048 | 1 | 0



> Issue 2: net eth0: rx->offset: 0, size: xxxxxxxxxx

> In the guest (domain 7):

> Feb 26 08:55:09 backup kernel: [39258.090375] net eth0: rx->offset: 0, size: 
> 4294967295
> Feb 26 08:55:09 backup kernel: [39258.090392] net eth0: me here .. 
> cons:15177803 slots:1 rp:15177807 max:18 err:0 rx->id:74 rx->offset:0 
> size:4294967295 ref:533
> Feb 26 08:55:09 backup kernel: [39258.090401] net eth0: rx->offset: 0, size: 
> 4294967295
> Feb 26 08:55:09 backup kernel: [39258.090406] net eth0: me here .. 
> cons:15177803 slots:2 rp:15177807 max:18 err:-22 rx->id:76 rx->offset:0 
> size:4294967295 ref:686
> Feb 26 08:55:09 backup kernel: [39258.090415] net eth0: rx->offset: 0, size: 
> 4294967295
> Feb 26 08:55:09 backup kernel: [39258.090420] net eth0: me here .. 
> cons:15177803 slots:3 rp:15177807 max:18 err:-22 rx->id:77 rx->offset:0 
> size:4294967295 ref:571

> In dom0 i don't see any specific netback warnings related to this domain at 
> this specific times, the printk's i added do trigger quite some times but 
> these are probably not
> errorneous, but they seem to only occur on the vif of domain 7 (probably the 
> only domain that is swamping the network by doing rsync and webdavs and 
> causes some fragmented packets)

Another addition ... the guest doesn't shutdown anymore on "xl shutdown" .. it 
just does .. erhmm nothing .. (tried multiple times)
After that i ssh'ed into the guest and did a "halt -p" ... the guest shutted 
down .. but the guest remained in xl list in blocked state ..
Doing a "xl console" shows:

[30024.559656] net eth0: me here .. cons:8713451 slots:1 rp:8713462 max:18 
err:0 rx->id:234 rx->offset:0 size:4294967295 ref:-131941395332550
[30024.559666] net eth0: rx->offset: 0, size: 4294967295
[30024.559671] net eth0: me here .. cons:8713451 slots:2 rp:8713462 max:18 
err:-22 rx->id:236 rx->offset:0 size:4294967295 ref:-131941395332504
[30024.559680] net eth0: rx->offset: 0, size: 4294967295
[30024.559686] net eth0: me here .. cons:8713451 slots:3 rp:8713462 max:18 
err:-22 rx->id:1 rx->offset:0 size:4294967295 ref:-131941395332390
[30536.665135] net eth0: Need more slots cons:9088533 slots:6 rp:9088539 max:17 
err:0 rx-id:26 rx->offset:0 size:0 ref:687
[39258.090375] net eth0: rx->offset: 0, size: 4294967295
[39258.090392] net eth0: me here .. cons:15177803 slots:1 rp:15177807 max:18 
err:0 rx->id:74 rx->offset:0 size:4294967295 ref:533
[39258.090401] net eth0: rx->offset: 0, size: 4294967295
[39258.090406] net eth0: me here .. cons:15177803 slots:2 rp:15177807 max:18 
err:-22 rx->id:76 rx->offset:0 size:4294967295 ref:686
[39258.090415] net eth0: rx->offset: 0, size: 4294967295
[39258.090420] net eth0: me here .. cons:15177803 slots:3 rp:15177807 max:18 
err:-22 rx->id:77 rx->offset:0 size:4294967295 ref:571
INIT: Switching to runlevel: 0
INIT: Sending processes the TERM signal
[info] Using makefile-style concurrent boot in runlevel 0.
Stopping openntpd: ntpd.
[ ok ] Stopping mail-transfer-agent: nullmailer.
[ ok ] Stopping web server: apache2 ... waiting .
[ ok ] Asking all remaining processes to terminate...done.
[ ok ] All processes ended within 2 seconds...done.
[ ok ] Stopping enhanced syslogd: rsyslogd.
[ ok ] Deconfiguring network interfaces...done.
[ ok ] Deactivating swap...done.
[65015.958259] EXT4-fs (xvda1): re-mounted. Opts: (null)
[info] Will now halt.
[65018.166546] vif vif-0: 5 starting transaction
[65160.490419] INFO: task halt:4846 blocked for more than 120 seconds.
[65160.490464]       Not tainted 3.14.0-rc4-20140225-vanilla-nfnbdebug2+ #1
[65160.490485] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[65160.490510] halt            D ffff88001d6cfc38     0  4846   4838 0x00000000
[65280.490470] INFO: task halt:4846 blocked for more than 120 seconds.
[65280.490517]       Not tainted 3.14.0-rc4-20140225-vanilla-nfnbdebug2+ #1
[65280.490540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[65280.490564] halt            D ffff88001d6cfc38     0  4846   4838 0x00000000


Especially the  "[65018.166546] vif vif-0: 5 starting transaction" after the 
halt surprises me ..

--
Sander

> Feb 26 08:53:20 serveerstertje kernel: [39324.917255] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:15101115 cons:15101112 j:8
> Feb 26 08:53:56 serveerstertje kernel: [39361.001436] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15127649 cons:15127648 j:13
> Feb 26 08:54:00 serveerstertje kernel: [39364.725613] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15130263 cons:15130261 j:2
> Feb 26 08:54:04 serveerstertje kernel: [39368.739504] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:15133143 cons:15133141 j:0
> Feb 26 08:54:20 serveerstertje kernel: [39384.665044] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15144113 cons:15144112 j:0
> Feb 26 08:54:29 serveerstertje kernel: [39393.569871] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15150203 cons:15150200 j:0
> Feb 26 08:54:40 serveerstertje kernel: [39404.586566] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15157706 cons:15157704 j:12
> Feb 26 08:54:56 serveerstertje kernel: [39420.759769] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:6 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:15168839 cons:15168835 j:0
> Feb 26 08:54:56 serveerstertje kernel: [39421.001372] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15169002 cons:15168999 j:8
> Feb 26 08:55:00 serveerstertje kernel: [39424.515073] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15171450 cons:15171447 j:0
> Feb 26 08:55:10 serveerstertje kernel: [39435.154510] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15178773 cons:15178770 j:1
> Feb 26 08:56:19 serveerstertje kernel: [39504.195908] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15227444 cons:15227444 j:0
> Feb 26 08:57:39 serveerstertje kernel: [39583.799392] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15283346 cons:15283344 j:8
> Feb 26 08:57:55 serveerstertje kernel: [39599.517673] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:4 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:15293937 cons:15293935 j:0
> Feb 26 08:58:07 serveerstertje kernel: [39612.156622] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15302891 cons:15302889 j:19
> Feb 26 08:58:07 serveerstertje kernel: [39612.400907] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15303034 cons:15303033 j:0
> Feb 26 08:58:18 serveerstertje kernel: [39623.439383] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:6 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:15310915 cons:15310911 j:0
> Feb 26 08:58:39 serveerstertje kernel: [39643.521808] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:6 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:15324769 cons:15324766 j:1

> Feb 26 09:27:07 serveerstertje kernel: [41351.622501] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16502932 cons:16502932 j:8
> Feb 26 09:27:19 serveerstertje kernel: [41363.541003] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:16510837 cons:16510834 j:7
> Feb 26 09:27:23 serveerstertje kernel: [41368.133306] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16513940 cons:16513937 j:0
> Feb 26 09:27:43 serveerstertje kernel: [41388.025147] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16527870 cons:16527868 j:0
> Feb 26 09:27:47 serveerstertje kernel: [41391.530802] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:2 prod:16530437 cons:16530437 j:7
> Feb 26 09:27:51 serveerstertje kernel: [41395.521166] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:5 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16533320 cons:16533317 j:6
> Feb 26 09:27:51 serveerstertje kernel: [41395.767066] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16533469 cons:16533469 j:0
> Feb 26 09:27:51 serveerstertje kernel: [41395.802319] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:1 GSO:0 
> vif->rx_last_skb_slots:0 nr_frags:0 prod:16533533 cons:16533533 j:24
> Feb 26 09:27:51 serveerstertje kernel: [41395.837456] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:1 GSO:0 
> vif->rx_last_skb_slots:0 nr_frags:0 prod:16533534 cons:16533534 j:1
> Feb 26 09:27:51 serveerstertje kernel: [41395.872587] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16533597 cons:16533596 j:25
> Feb 26 09:27:51 serveerstertje kernel: [41396.192784] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16533833 cons:16533832 j:3
> Feb 26 09:27:51 serveerstertje kernel: [41396.235611] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16533890 cons:16533890 j:30
> Feb 26 09:27:51 serveerstertje kernel: [41396.271047] vif vif-7-0 vif7.0: 
> !?!?!?! skb may not fit .. bail out now max_slots_needed:3 GSO:1 
> vif->rx_last_skb_slots:0 nr_frags:1 prod:16533898 cons:16533896 j:3


> --
> Sander





>>>
>>> Will keep you posted when it triggers again with the extra info in the warn.
>>>
>>> --
>>> Sander
>>>
>>>
>>>
>>>> Thanks
>>>> Annie
>>>>>     Xen reports:
>>>>>     (XEN) [2014-02-18 03:22:47] grant_table.c:1857:d0 Bad grant reference 
>>>>> 19791875
>>>>>     (XEN) [2014-02-18 03:42:33] grant_table.c:1857:d0 Bad grant reference 
>>>>> 268435460
>>>>>     (XEN) [2014-02-18 04:15:23] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 14 frames
>>>>>     (XEN) [2014-02-18 04:15:27] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 15 frames
>>>>>     (XEN) [2014-02-18 04:15:48] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 16 frames
>>>>>     (XEN) [2014-02-18 04:15:50] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 17 frames
>>>>>     (XEN) [2014-02-18 04:15:55] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 18 frames
>>>>>     (XEN) [2014-02-18 04:15:55] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 19 frames
>>>>>     (XEN) [2014-02-18 04:15:56] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 20 frames
>>>>>     (XEN) [2014-02-18 04:15:56] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 21 frames
>>>>>     (XEN) [2014-02-18 04:15:59] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 22 frames
>>>>>     (XEN) [2014-02-18 04:15:59] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 23 frames
>>>>>     (XEN) [2014-02-18 04:16:00] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 24 frames
>>>>>     (XEN) [2014-02-18 04:16:05] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 25 frames
>>>>>     (XEN) [2014-02-18 04:16:05] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 26 frames
>>>>>     (XEN) [2014-02-18 04:16:06] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 27 frames
>>>>>     (XEN) [2014-02-18 04:16:12] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 28 frames
>>>>>     (XEN) [2014-02-18 04:16:18] grant_table.c:289:d0 Increased maptrack 
>>>>> size to 29 frames
>>>>>     (XEN) [2014-02-18 04:17:00] grant_table.c:1857:d0 Bad grant reference 
>>>>> 268435460
>>>>>     (XEN) [2014-02-18 04:17:00] grant_table.c:1857:d0 Bad grant reference 
>>>>> 268435460
>>>>>     (XEN) [2014-02-18 04:34:03] grant_table.c:1857:d0 Bad grant reference 
>>>>> 4325377
>>>>>
>>>>>
>>>>>
>>>>> Another issue with networking is when running both dom0 and domU's with a 
>>>>> 3.14-rc3 kernel:
>>>>>     - i can ping the guests from dom0
>>>>>     - i can ping dom0 from the guests
>>>>>     - But i can't ssh or access things by http
>>>>>     - I don't see any relevant error messages ...
>>>>>     - This is with the same system and kernel config as with the 3.14 and 
>>>>> 3.13 combination above
>>>>>       (that previously worked fine)
>>>>>
>>>>> --
>>>>>
>>>>> Sander
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@xxxxxxxxxxxxx
>>>>> http://lists.xen.org/xen-devel
>>>
>>>



> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 4fc46eb..4d720b4 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1667,6 +1667,8 @@ skip_vfb:
>                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_STD;
>              } else if (!strcmp(buf, "cirrus")) {
>                  b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_CIRRUS;
> +            } else if (!strcmp(buf, "none")) {
> +                b_info->u.hvm.vga.kind = LIBXL_VGA_INTERFACE_TYPE_NONE;
>              } else {
>                  fprintf(stderr, "Unknown vga \"%s\" specified\n", buf);
>                  exit(1);
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 107b000..ab56927 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -265,9 +265,10 @@ get_maptrack_handle(
>      while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
>      {
>          nr_frames = nr_maptrack_frames(lgt);
> -        if ( nr_frames >= max_nr_maptrack_frames() )
> +        if ( nr_frames >= max_nr_maptrack_frames() ){
> +                 gdprintk(XENLOG_INFO, "Already at max maptrack size: %u/%u 
> frames\n",nr_frames, max_nr_maptrack_frames());
>              break;
> -
> +       }
>          new_mt = alloc_xenheap_page();
>          if ( !new_mt )
>              break;
> @@ -285,8 +286,8 @@ get_maptrack_handle(
>          smp_wmb();
>          lgt->maptrack_limit      = new_mt_limit;

> -        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
> -                 nr_frames + 1);
> +        gdprintk(XENLOG_INFO, "Increased maptrack size to %u/%u frames\n",
> +                 nr_frames + 1, max_nr_maptrack_frames());
>      }

>      spin_unlock(&lgt->lock);
> @@ -1854,7 +1855,7 @@ __acquire_grant_for_copy(

>      if ( unlikely(gref >= nr_grant_entries(rgt)) )
>          PIN_FAIL(unlock_out, GNTST_bad_gntref,
> -                 "Bad grant reference %ld\n", gref);
> +                 "Bad grant reference %ld | %d | %d | %d \n", gref, 
> nr_grant_entries(rgt), rgt->gt_version, ldom);

>      act = &active_entry(rgt, gref);
>      shah = shared_entry_header(rgt, gref);
> @@ -2830,15 +2831,19 @@ static void gnttab_usage_print(struct domain *rd)
>      int first = 1;
>      grant_ref_t ref;
>      struct grant_table *gt = rd->grant_table;
> -
> +    unsigned int active=0;
> +/*
>      printk("      -------- active --------       -------- shared 
> --------\n");
>      printk("[ref] localdom mfn      pin          localdom gmfn     flags\n");
> -
> +*/
>      spin_lock(&gt->lock);

>      if ( gt->gt_version == 0 )
>          goto out;

> +    printk("grant-table for remote domain:%5d (v%d) nr_grant_entries: %d\n",
> +                   rd->domain_id, gt->gt_version, nr_grant_entries(gt));
> +
>      for ( ref = 0; ref != nr_grant_entries(gt); ref++ )
>      {
>          struct active_grant_entry *act;
> @@ -2875,19 +2880,22 @@ static void gnttab_usage_print(struct domain *rd)
>                     rd->domain_id, gt->gt_version);
>              first = 0;
>          }
> -
> +        active++;
>          /*      [ddd]    ddddd 0xXXXXXX 0xXXXXXXXX      ddddd 0xXXXXXX 0xXX 
> */
> -        printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 0x%02x\n",
> -               ref, act->domid, act->frame, act->pin,
> -               sha->domid, frame, status);
> +        /* printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 
> 0x%02x\n", ref, act->domid, act->frame, act->pin, sha->domid, frame, status); 
> */
>      }

>   out:
>      spin_unlock(&gt->lock);

> +    printk("grant-table for remote domain:%5d active entries: %d\n",
> +                   rd->domain_id, active);
> +/*
>      if ( first )
>          printk("grant-table for remote domain:%5d ... "
>                 "no active grant table entries\n", rd->domain_id);
> +*/
> +
>  }

>  static void gnttab_usage_print_all(unsigned char key)






> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index e5284bc..6d93358 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -482,20 +482,23 @@ static void xenvif_rx_action(struct xenvif *vif)
>                 .meta  = vif->meta,
>         };

> +       int j=0;
> +
>         skb_queue_head_init(&rxq);

>         while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
>                 RING_IDX max_slots_needed;
>                 int i;
> +               int nr_frags;

>                 /* We need a cheap worse case estimate for the number of
>                  * slots we'll use.
>                  */

>                 max_slots_needed = DIV_ROUND_UP(offset_in_page(skb->data) +
> -                                               skb_headlen(skb),
> -                                               PAGE_SIZE);
> -               for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +                                               skb_headlen(skb), PAGE_SIZE);
> +               nr_frags = skb_shinfo(skb)->nr_frags;
> +               for (i = 0; i < nr_frags; i++) {
>                         unsigned int size;
>                         size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
>                         max_slots_needed += DIV_ROUND_UP(size, PAGE_SIZE);
> @@ -508,6 +511,9 @@ static void xenvif_rx_action(struct xenvif *vif)
>                 if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
>                         skb_queue_head(&vif->rx_queue, skb);
>                         need_to_notify = true;
> +                       if (net_ratelimit())
> +                               netdev_err(vif->dev, "!?!?!?! skb may not fit 
> .. bail out now max_slots_needed:%d GSO:%d vif->rx_last_skb_slots:%d 
> nr_frags:%d prod:%d cons:%d j:%d\n",
> +                                       max_slots_needed, 
> (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 || skb_shinfo(skb)->gso_type & 
> SKB_GSO_TCPV6) ? 1 : 0, vif->rx_last_skb_slots, 
> nr_frags,vif->rx.sring->req_prod,vif->rx.req_cons,j);
>                         vif->rx_last_skb_slots = max_slots_needed;
>                         break;
>                 } else
> @@ -518,6 +524,7 @@ static void xenvif_rx_action(struct xenvif *vif)
>                 BUG_ON(sco->meta_slots_used > max_slots_needed);

>                 __skb_queue_tail(&rxq, skb);
> +               j++;
>         }

>         BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
> @@ -541,7 +548,7 @@ static void xenvif_rx_action(struct xenvif *vif)
>                         resp->offset = vif->meta[npo.meta_cons].gso_size;
>                         resp->id = vif->meta[npo.meta_cons].id;
>                         resp->status = sco->meta_slots_used;
> -
> +
>                         npo.meta_cons++;
>                         sco->meta_slots_used--;
>                 }
> @@ -705,7 +712,7 @@ static int xenvif_count_requests(struct xenvif *vif,
>                  */
>                 if (!drop_err && slots >= XEN_NETBK_LEGACY_SLOTS_MAX) {
>                         if (net_ratelimit())
> -                               netdev_dbg(vif->dev,
> +                               netdev_err(vif->dev,
>                                            "Too many slots (%d) exceeding 
> limit (%d), dropping packet\n",
>                                            slots, XEN_NETBK_LEGACY_SLOTS_MAX);
>                         drop_err = -E2BIG;
> @@ -728,7 +735,7 @@ static int xenvif_count_requests(struct xenvif *vif,
>                  */
>                 if (!drop_err && txp->size > first->size) {
>                         if (net_ratelimit())
> -                               netdev_dbg(vif->dev,
> +                               netdev_err(vif->dev,
>                                            "Invalid tx request, slot size %u 
> > remaining size %u\n",
>                                            txp->size, first->size);
>                         drop_err = -EIO;



> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index f9daa9e..67d5221 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -753,6 +753,7 @@ static int xennet_get_responses(struct netfront_info *np,
>                         if (net_ratelimit())
>                                 dev_warn(dev, "rx->offset: %x, size: %u\n",
>                                          rx->offset, rx->status);
> +                               dev_warn(dev, "me here .. cons:%d slots:%d 
> rp:%d max:%d err:%d rx->id:%d rx->offset:%x size:%u 
> ref:%ld\n",cons,slots,rp,max,err,rx->id, rx->offset, rx->status, ref);
>                         xennet_move_rx_slot(np, skb, ref);
>                         err = -EINVAL;
>                         goto next;
> @@ -784,7 +785,7 @@ next:

>                 if (cons + slots == rp) {
>                         if (net_ratelimit())
> -                               dev_warn(dev, "Need more slots\n");
> +                               dev_warn(dev, "Need more slots cons:%d 
> slots:%d rp:%d max:%d err:%d rx-id:%d rx->offset:%x size:%u 
> ref:%ld\n",cons,slots,rp,max,err,rx->id, rx->offset, rx->status, ref);
>                         err = -ENOENT;
>                         break;
>                 }
> @@ -803,7 +804,6 @@ next:

>         if (unlikely(err))
>                 np->rx.rsp_cons = cons + slots;
> -
>         return err;
>  }

> @@ -907,6 +907,7 @@ static int handle_incoming_queue(struct net_device *dev,

>                 /* Ethernet work: Delayed to here as it peeks the header. */
>                 skb->protocol = eth_type_trans(skb, dev);
> +               skb_reset_network_header(skb);

>                 if (checksum_setup(dev, skb)) {
>                         kfree_skb(skb);





_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.