[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x217/0x220



On 05/31/2017 10:25 PM, Steven Haigh wrote:
> On 2017-05-31 00:37, Steven Haigh wrote:
>> On 31/05/17 00:18, Boris Ostrovsky wrote:
>>> On 05/30/2017 06:27 AM, Steven Haigh wrote:
>>>> Just wanted to give this a nudge to try and get some suggestions on
>>>> where to go / what to do about this.
>>>>
>>>> On 28/05/17 09:44, Steven Haigh wrote:
>>>>> The last couple of days running on kernel 4.9.29 and 4.9.30 with Xen
>>>>> 4.9.0-rc6 I've had a number of ethernet lock ups that have taken my
>>>>> system off the network.
>>>>>
>>>>> This is a new development - but I'm not sure if its kernel or xen
>>>>> related.
>>>
>>> Since noone seems to have seen this it would be useful to narrow it
>>> down
>>> a bit.
>>>
>>> Do you observe this on rc5? Or with 4.9.28 kernel? Any particular load
>>> that you are using? Do you see this on a specific NIC?
>>
>> This install is currently using xen 4.9-rc7 and kernel 4.9.30. I would
>> say that there may be a connection between occurrences between disk
>> activity and the ethernet adapter locking up - but I haven't been able
>> to prove this in any valid way yet.
>>
>> I am currently running this script on the server in question to try and
>> get a log of how often the adapter locks up. I only added the logger
>> line tonight - so I don't have a great deal of historical data to add as
>> yet.
>>
>> #!/bin/bash
>> while true; do
>>         ping -c1 10.1.1.2 >& /dev/null
>>         if [ $? != 0 ]; then
>>                 logger 'No response. Resetting enp5s0'
>>                 mii-tool -R enp5s0
>>         fi
>>         sleep 5
>> done
>
> Just to keep kicking this along a little bit, my logs so far have shown:
> messages:May 31 00:20:10 No response. Resetting enp5s0
> messages:May 31 04:20:08 No response. Resetting enp5s0
> messages:May 31 12:21:37 No response. Resetting enp5s0
>
> Its almost spooky that its nearly 20 minutes past the hour on each reset.
>
> I've checked against the cron logs, but I can't find anything that
> would be scheduled on the Dom0 at that time.
>
> The logs also show that after running mii-tool to reset the ethernet
> adapter, connectivity has returned straight away.
>
> The network adapter uses the r8169 kernel module, and shows as:
> 05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
>
> I have a DomU backup script that runs *in* a DomU at 01:00 each night
> - that causes a lot of disk activity - but alas, that time hasn't
> lined up with anything as yet...
>
> Still seem to be fidgeting in the dark :(
>

Since you've already observed this problem with rc6 and 4.9.29, wouldn't
it be more useful to go backwards to narrow down where the problem first
occurred? I am not sure how moving to rc7 and 4.9.30 is going to help
unless you think this is a temporary regression.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.