[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/S3: restore MCE (APs) and add MTRR (BSP) init



On Wed, Mar 04, 2026 at 03:47:14PM +0100, Jan Beulich wrote:
> On 04.03.2026 15:36, Marek Marczykowski wrote:
> > On Wed, Mar 04, 2026 at 02:39:01PM +0100, Jan Beulich wrote:
> >> MCE init for APs was broken when CPU feature re-checking was added. MTRR
> >> (re)init for the BSP looks to never have been there on the resume path.
> >>
> >> Fixes: bb502a8ca592 ("x86: check feature flags after resume")
> >> Reported-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >> ---
> >> Sadly we need to go by CPU number (zero vs non-zero) here. See the call
> >> site of recheck_cpu_features() in enter_state().
> > 
> > With this patch, I now see the "Thermal monitoring enabled" on resume
> > also for AP.
> > And then, the "Temperature above threshold" + "Running in modulated
> > clock mode" for AP too. But, I don't see matching "Temperature/speed
> > normal" for any of them...
> 
> Which would imply that for each CPU you see at most one such message after
> resume. Can you confirm this? 

For the current test, yes. I got the messages for CPUs 16, 6, 18, 4, 2 -
in this order. Not for 0, 8-15 or 20-21. Not sure about CPU0, but for
others it kinda looks like I got it for P cores, but not E cores? But
I'm not sure how to reliably distinguish them - I base it on the holes
in numbering due to smt=off. Specifically I have online CPUs:
0,2,4,6,8-16,18,20-21 (yeah, weird ordering...).

> (Generally for every CPU they should be
> alternating, but appear no more frequently than every 5 seconds. Albeit I
> can't help the impression that it is possible for the current state to not
> be reflected by the most recently seen message, for a potentially
> indefinite period of time.)
> 
> > My simple performance test says it's okay for now, though. I'll see how
> > it looks in a few hours...
> 
> I actually don't expect the change here to make a difference in that
> regard. intel_thermal_interrupt() exists only for reporting purposes.

Yeah, it's too soon to say definitely, but just after resume test said
stable 6ms, and now (~30min later) later it's at 12-14ms.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.