[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] CONFIG_NO_HZ + CONFIG_CPU_IDLE freeze the system (Was Re: [PATCH] acpi : remove power from acpi_processor_cx structure)



On 09/07/2012 07:22 PM, John Stultz wrote:
> On 09/07/2012 07:20 AM, Daniel Lezcano wrote:
>> On 09/06/2012 11:18 PM, Rafael J. Wysocki wrote:
>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>> On 09/06/2012 10:04 PM, Rafael J. Wysocki wrote:
>>>>> On Thursday, September 06, 2012, Daniel Lezcano wrote:
>>>>>> On 09/06/2012 09:54 AM, Daniel Lezcano wrote:
>>>>>> I fall into this issue because NETCONSOLE is set, disabling it
>>>>>> allowed
>>>>>> me to go further.
>>>>>>
>>>>>> Unfortunately I am facing to some random freeze on the system which
>>>>>> seems to be related to CONFIG_NO_HZ=y and CONFIG_CPU_IDLE=y.
>>>>>>
>>>>>> Disabling one of them, make the freezes to disappear.
>>>>>>
>>>>>> Is it a known issue ?
>>>>> Well, there are systems having problems with this configuration,
>>>>> but they
>>>>> should be exceptional.  What system is that?
>>>> It is a laptop T61p with a Core 2 Duo T9500. Nothing exceptional I
>>>> believe. Maybe someone got the same issue ?
>>> Is it a regression for you?
>> Yes, I think so. The issue appears between v3.5 and v3.6-rc1.
>>
>> It is not easy to reproduce but after taking some time to dig, it seems
>> to appear with this commit:
>>
>> 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1 is the first bad commit
>> commit 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
>> Author: John Stultz <john.stultz@xxxxxxxxxx>
>> Date:   Fri Jul 13 01:21:53 2012 -0400
>>
>>      time: Condense timekeeper.xtime into xtime_sec
>>
>>      The timekeeper struct has a xtime_nsec, which keeps the
>>      sub-nanosecond remainder.  This ends up being somewhat
>>      duplicative of the timekeeper.xtime.tv_nsec value, and we
>>      have to do extra work to keep them apart, copying the full
>>      nsec portion out and back in over and over.
>>
>>      This patch simplifies some of the logic by taking the timekeeper
>>      xtime value and splitting it into timekeeper.xtime_sec and
>>      reuses the timekeeper.xtime_nsec for the sub-second portion
>>      (stored in higher res shifted nanoseconds).
>>
>>      This simplifies some of the accumulation logic. And will
>>      allow for more accurate timekeeping once the vsyscall code
>>      is updated to use the shifted nanosecond remainder.
>>
>>      Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx>
>>      Reviewed-by: Ingo Molnar <mingo@xxxxxxxxxx>
>>      Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>>      Cc: Richard Cochran <richardcochran@xxxxxxxxx>
>>      Cc: Prarit Bhargava <prarit@xxxxxxxxxx>
>>      Link:
>> http://lkml.kernel.org/r/1342156917-25092-5-git-send-email-john.stultz@xxxxxxxxxx
>>
>>      Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>
>> :040000 040000 4d6541ac1f6075d7adee1eef494b31a0cbda0934
>> dc5708bc738af695f092bf822809b13a1da104b6 M    kernel
>>
>> How to reproduce: with a laptop T61p, with a Core 2 Duo. I boot the
>> kernel in busybox and wait some minutes before writing something in the
>> console. At this moment, nothing appears to the console but the
>> characters are echo'ed several seconds later (could be 1, 5, or 10 secs
>> or more).
>>
>> That happens when CONFIG_CPU_IDLE and CONFIG_NO_HZ are set. Disabling
>> one of them, the issue does not appear.
> 
> Thanks for bisecting this down and the heads up!
> 
> Right off I can't see what might be causing this.  Bunch of questions:
> 
> Is this a 32 or 64 bit kernel?

It is a 32 bit kernel.

> By your description above, it sounds like the system is still
> functioning, but there's just a high latency for key-input. Is that right?

Yes that's correct but not only. During this freeze time, I can't ping
the host. When the output is echo'ed, the ping works again.

But if I ping the host indefinitely, it does not freeze and the console
is echo'ed without problem.

> Are other things on the system happening slowly?

I have a very minimal system but at the first glance when it is not frozen

> Does generating interrupts by hitting/holding down the ctrl key make the
> system respond faster?

no.

> Is there any dmesg output near when it occurs?

no.

> If you don't wait that minute after boot before typing anything, does it
> still trigger later? (or is it tied to early boot?)

That depends, that could happen immediately or later. It is more or less
random.

> On a whim, does the patch below avoid the problem?

Nope, same issue :/

Thanks
  -- Daniel

> 
> thanks
> -john
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 34e5eac..2fa0e52 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1179,6 +1179,7 @@ static void update_wall_time(void)
>      timekeeping_adjust(tk, offset);
>  
>  
> +#if 0
>      /*
>      * Store only full nanoseconds into xtime_nsec after rounding
>      * it up and add the remainder to the error difference.
> @@ -1192,6 +1193,7 @@ static void update_wall_time(void)
>      tk->xtime_nsec -= remainder;
>      tk->xtime_nsec += 1ULL << tk->shift;
>      tk->ntp_error += remainder << tk->ntp_error_shift;
> +#endif
>  
>      /*
>       * Finally, make sure that after the rounding
> 


-- 
 <http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.