[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen 4.14.3 incorrect (~3x) cpu frequency reported


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Mon, 10 Jan 2022 13:37:11 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LTQ0SJcaijGjmXCKZShy1mI/QvewbPnvLZHrHZ+oB3Q=; b=BLQ37xm3McVAJP/Jb+8TRd8EVRQetu20BgeiaVXOwaX9MPPWzZ0/ZHhS8wU4gHbvuh+b6f4EGWDkPPasKSvmWMkesXpGRwaCo45cCGx2S3WruPeBZ7t+8WCL3JKS68AkyAslQjIg+UT9BVJUrFTRvX9GTb91iToXuyTLPedrPKseMVOOhA9gy1Oj5bmjw0LHWV4z8XMQNlVN+BUAGDLRgJ/LH304Nq9J3ratLtA2CXvAc5F+HEJkfE6Ch+5krMdkTtxSxfouMgix3rtFQv9DUia5Ry8juCbcPwsgwmt8vtpY77t6tAFs7dM0GbVRfYIuMee5G0Sx73xKvwfg2Oex7A==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b5e4xOl45cCTqThyWWbaSQu0kYfFQFn3MSHn4uLoVRqq81M4bAZUT5Y9S5aWtav8cXXburc+DGawONaZkWSa5RSMOLXI/bnjnw/usnRgofGFTydX55j1EKx5IyJlcrASiiH605CHgDjt3INHJ2raXKvBuOeNixCBYrS5sY9gbf2D9xh6ivJ7eA5PnlDdZTmlJy+xwxW/uVSX4fXX2ZFv9MlxC60Gm3sJq5vbr8hbkd/yiC/jHc1qNGbomhVStC4jJFqJTNP6BOTxzml84BwerZiR4YzDHJPbgX0aDrHzZ+nXIGvrTXPTNkv7FnAnpXJsPqS7EtOOu7wzM0PRHuqqkA==
  • Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
  • Cc: James Dingwall <james-xen@xxxxxxxxxxxxxx>, <alexander.rossa@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 10 Jan 2022 12:37:29 +0000
  • Ironport-data: A9a23:7x+0BKnycPpLyLiA0njjAqLo5gwOIURdPkR7XQ2eYbSJt1+Wr1Gzt xIfDWiEOqrYYWH1ftsiYImw/EsCusTWy4A1SVdv+3g1FiMWpZLJC+rCIxarNUt+DCFioGGLT Sk6QoOdRCzhZiaE/n9BClVlxJVF/fngqoDUUYYoAQgsA180IMsdoUg7wbRh29cw2YLR7z6l4 rseneWOYDdJ5BYsWo4kw/rrRMRH5amaVJsw5zTSVNgT1LPsvyB94KE3fMldG0DQUIhMdtNWc s6YpF2PEsE1yD92Yj+tuu6TnkTn2dc+NyDW4pZdc/DKbhSvOkXee0v0XRYRQR4/ttmHozx+4 IRXi5eIcToYB7LnuPs4f0dWHgYiJ6ITrdcrIVDn2SCS50jPcn+qyPRyFkAme4Yf/46bA0kXq 6ZecmpUKEne2aTmm9pXScE17ignBNPsM44F/Glp0BnSDOo8QICFSKLPjTNd9Glr2Z4VR62BD yYfQRxGaRGQWV4QAXQSEYAijtWstECnIhQN/Tp5ooJoujOOnWSdyoPFOd7YatWMSchP2Fyfv XnP12LyGFcRM9n34SqI9Degi/HCmQv/WZkOD/uo+/hymlqRy2cPThoMWjOTo/O0l0q/UNJ3M FEP92wlqq1a3E6iS9TmGg21plaIvxgTRNNUF6s/5UeQycLpDx2xXzZeCGQbMZp/6ZFwFWdCO kK1c83BVCZRu4WtdEOk6Zy0shaeZyYYCjMYanpRJeca2OXLrIY2hxPJa99sFq+pk9H4cQ3NL yC2QDsW3OtK05NSv0mv1RWe2m/3+MCVJuIgzliPBgqYAhVFiJlJjmBCwXzS9r5+IYmQVTFtV 1BUypHFvIji4Xxg/RFhodnh/pn0vZ5p0xWG2DaD+qXNERz2ohZPmqgKsFlDyL9BaJpsRNMQS Ba7VfltzJFSJmC2SqR8fpi8Dc8npYC5S4i8Ca2PM4ETMskrHONiwM2ITRXJt4wKuBJ9+ZzTx L/BKZr8ZZrkIfkPIMWKqxc1juZwm3FWKZL7TpHn1RW3uYdyl1bOIYrpxGCmN7hjhIvd+V292 48Ga6OilksDOMWjPHi/2dNDfDgicClgbbir+pM/SwJ2Clc8cI3XI6WPkepJlk0Mt/k9q9okC VnmCxAIkwSu1CSXQehIA1g6AI7SsV9EhStTFQQnPEqy2mhlZoCq7awFcIAwc6Vh/+tmpcOYh dFcEylZKvgQGDnB5RoHapzx8N5reBix3FrcNCu5ejkvOZVnQlWRqNPjewLu8ggIDza26pRi8 +HxiFuDTMpRXRlmAebXdOmrkwG7s08Clb8gREDPONRSJhnhqdA4Nyzrg/YrCMgQMhGflCCC3 gObDE5A9+nAqoM46vfTgqWAo9v7GudyBBMCTWLa8ay3JW/R+W/6md1MV+OBfDb8UmLo+fr9O bUJnq+kaPBexQREqYtxFbpv3JkS3dq3qu8I1BlgEVXKc0+vVuFqLE6Z0JQdraZK3LJY51e7A xrd5tlANLyVE8r5C1pNdhE9Z+GO2PxIyDnf6fM5fBfz6CNtpefVVExTO1+HiTBHLaszO4Qgm L9ztMkT4g25qxwrLtfZ0XwEqzXSdiQNA/c9q5UXII73kQ56mFhNbKvVBjLy/JzSOc5HNVMnI 2PMiafP71iGKpEur5bn+aDx4Ndg
  • Ironport-hdrordr: A9a23:2t8/5qoQYgQP9Sb5b4Ok8NsaV5o9eYIsimQD101hICG8cqSj+f xG/c5rsiMc5wxwZJhNo7y90cq7MBfhHPxOkOos1N6ZNWGM0gaVxelZnO7fKlbbehEWmNQz6U 4ZSdkdNOHN
  • Ironport-sdr: 4a0kuLaA8bK1vC0VvN6JMCop2KY3wfZzlxsPtIYYIi8bwo6qkXa/G1TuEQvG90ycRSX05a48tt bLCADmUgOBA1QDgaUEGde3IZL82HdQlxKsb1PnwA93EkS4awYlaUx/cSg6IBG8lkKSIrzNsDy5 xZIbyXiltBd90xgX05FS4zj1g9hdMTs8lMJs7DQYapociFDt5Be5rqyieO/ZZ3LuhgvUFu9vUO XeFZD/Q2YkBFF9w7nevlkQ/g+W0IEuIg2KG24OQWLHiQTB8YgU7rog5li4Ne+oYSXPrgYnDCTI dnbwlHMsB4TdWgrJ98Vunt1Z
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jan 07, 2022 at 12:39:04PM +0100, Jan Beulich wrote:
> On 06.01.2022 16:08, James Dingwall wrote:
> >>> On Wed, Jul 21, 2021 at 12:59:11PM +0200, Jan Beulich wrote:              
> >>>                                                               
> >>>> On 21.07.2021 11:29, James Dingwall wrote:                               
> >>>>                                                               
> >>>>> We have a system which intermittently starts up and reports an 
> >>>>> incorrect cpu frequency:                                               
> > ...
> >>> I'm sorry to ask, but have you got around to actually doing that? Or
> >>> else is resolving this no longer of interest?
> > 
> > We have experienced an occurence of this issue on 4.14.3 with 'loglvl=all'
> > present on the xen command line.  I have attached the 'xl dmesg' output for
> > the fast MHz boot, the diff from the normal case is small so I've not added
> > that log separately:
> > 
> > --- normal-mhz/xl-dmesg.txt     2022-01-06 14:13:47.231465234 +0000
> > +++ funny-mhz/xl-dmesg.txt      2022-01-06 13:45:43.825148510 +0000
> > @@ -211,7 +211,7 @@
> >  (XEN)  cap enforcement granularity: 10ms
> >  (XEN) load tracking window length 1073741824 ns
> >  (XEN) Platform timer is 24.000MHz HPET
> > -(XEN) Detected 2294.639 MHz processor.
> > +(XEN) Detected 7623.412 MHz processor.
> >  (XEN) EFI memory map:
> >  (XEN)  0000000000000-0000000007fff type=3 attr=000000000000000f
> >  (XEN)  0000000008000-000000003cfff type=7 attr=000000000000000f
> 
> Below is a patch (suitably adjusted for 4.14.3) which I would hope can
> take care of the issue (assuming my vague guess on the reasons wasn't
> entirely off). It has some debugging code intentionally left in, and
> it's also not complete yet (other timer code needing similar
> adjustment). Given the improvements I've observed independent of your
> issue, I may not wait with submission until getting feedback from you,
> since - aiui - it may take some time for you to actually run into a
> case where the change would actually make an observable difference.
> 
> Jan
> 
> x86: improve TSC / CPU freq calibration accuracy
> 
> While the problem report was for extreme errors, even smaller ones would
> better be avoided: The calculated period to run calibration loops over
> can (and usually will) be shorter than the actual time elapsed between
> first and last platform timer and TSC reads. Adjust values returned from
> the init functions accordingly.
> 
> On a Skylake system I've tested this on accuracy (using HPET) went from
> detecting in some cases more than 220kHz too high a value to about
> ±1kHz. On other systems the original error range was much smaller, with
> less (in some cases only very little) improvement.
> 
> Reported-by: James Dingwall <james-xen@xxxxxxxxxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> TBD: Do we think we need to guard against the bizarre case of
>      "target + count" overflowing (i.e. wrapping)?

I also wonder whether a value of target close enough to the wrapping
point could cause the loop to stuck indefinitely, as count would
overflow and thus the condition won't be meet.

> TBD: Accuracy could be slightly further improved by using a (to be
>      introduced) rounding variant of muldiv64().
> TBD: I'm not entirely sure how useful the conditionals are - there
>      shouldn't be any inaccuracies from the division when count equals
>      target (upon entry to the conditionals), as then the divisor is
>      what the original value was just multiplied by.
> 
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -378,8 +378,9 @@ static u64 read_hpet_count(void)
>  
>  static int64_t __init init_hpet(struct platform_timesource *pts)
>  {
> -    uint64_t hpet_rate, start;
> +    uint64_t hpet_rate, start, expired;

Might be better named elapsed rather than expired?

It doesn't store the end of loop TSC value, but the delta between
start and end.

>      uint32_t count, target;
> +unsigned int i;//temp
>  
>      if ( hpet_address && strcmp(opt_clocksource, pts->id) &&
>           cpuidle_using_deep_cstate() )
> @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p
>  
>      pts->frequency = hpet_rate;
>  
> +for(i = 0; i < 16; ++i) {//temp
>      count = hpet_read32(HPET_COUNTER);
>      start = rdtsc_ordered();
>      target = count + CALIBRATE_VALUE(hpet_rate);
>      if ( target < count )
>          while ( hpet_read32(HPET_COUNTER) >= count )
>              continue;
> -    while ( hpet_read32(HPET_COUNTER) < target )
> +    while ( (count = hpet_read32(HPET_COUNTER)) < target )
>          continue;
>  
> -    return (rdtsc_ordered() - start) * CALIBRATE_FRAC;
> +    expired = rdtsc_ordered() - start;

There's also a window between the HPET read and the TSC read where an
SMI/NMI could cause a wrong frequency detection.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.