[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

HWP and ACPI workarounds


  • To: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Jason Andryuk <jandryuk@xxxxxxxxx>
  • Date: Tue, 14 Feb 2023 14:04:28 -0500
  • Delivery-date: Tue, 14 Feb 2023 19:04:58 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi,

Qubes recently incorporated my HWP patches, but there was a report of
a laptop, Thinkpad X1 Carbon Gen 4 with a Skylake processor, locking
up during boot when HWP is enabled.  A user found a kernel bug that
seems to be the same issue:
https://bugzilla.kernel.org/show_bug.cgi?id=110941.

That bug was fixed by Linux commit a21211672c9a ("ACPI / processor:
Request native thermal interrupt handling via _OSC").  The commit
message has a good summary of the issue and is included at the end of
this message.  The tl;dr is SMM crashes when it receives thermal
interrupts, so Linux calls the ACPI _OSC method to take over interrupt
handling.

Today, Linux calls the _OSC method when boot_cpu_has(X86_FEATURE_HWP),
but that is not exposed to the PV Dom0.  As a test, the Qubes user was
able to boot with the check expanded to `boot_cpu_has(X86_FEATURE_HWP)
|| xen_initial_domain()`.

We need some way for Xen to indicate the presence and/or use of HWP to
Dom0, and Dom0 needs to use that to call _OSC.

My first idea is that Dom0 could query Xen's cpufreq driver.  However,
Xen exposes the cpufreq driver through the unstable sysctl ops, and
using an unstable hypercall seems wrong for the kernel.

Can we add something to an existing hypercall - maybe platform_op?  Or
do we need a new stable hypercall?

Linux will perform the _OSC calls unilaterally upon seeing FEATURE_HWP
and independent of actually using HWP via the intel_pstate driver.
However, not using HWP may be an untested configuration in practice.
The intel_pstate.c driver will not use HWP when FEATURE_HWP_EPP is not
found.  So we could potentially cheat and expose only HWP to Dom0.
That should trigger the _OSC calls without letting Dom0 think it can
use HWP.  This is rather fragile though, so a more explicity method
seems better.

Roger's ACPI Processor patches that add xen_sanitize_pdc calls could
be leveraged.  On the Xen side, arch_acpi_set_pdc_bits() could be
extended to set bit 12, which would then be passed to the evaluate
_PDC call. _PDC is the older interface superseded by _OSC, but they
can be wrappers around the same implementation.  But if linux is just
using _OSC, it seems more compatible to follow that implementation.

Thoughts?

Thanks,
Jason

commit a21211672c9a1d730a39aa65d4a5b3414700adfb
Author: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
Date:   Wed Mar 23 21:07:39 2016 -0700

    ACPI / processor: Request native thermal interrupt handling via _OSC

    There are several reports of freeze on enabling HWP (Hardware PStates)
    feature on Skylake-based systems by the Intel P-states driver. The root
    cause is identified as the HWP interrupts causing BIOS code to freeze.

    HWP interrupts use the thermal LVT which can be handled by Linux
    natively, but on the affected Skylake-based systems SMM will respond
    to it by default.  This is a problem for several reasons:
     - On the affected systems the SMM thermal LVT handler is broken (it
       will crash when invoked) and a BIOS update is necessary to fix it.
     - With thermal interrupt handled in SMM we lose all of the reporting
       features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
     - Some thermal drivers like x86-package-temp depend on the thermal
       threshold interrupts signaled via the thermal LVT.
     - The HWP interrupts are useful for debugging and tuning
       performance (if the kernel can handle them).
    The native handling of thermal interrupts needs to be enabled
    because of that.

    This requires some way to tell SMM that the OS can handle thermal
    interrupts.  That can be done by using _OSC/_PDC in processor
    scope very early during ACPI initialization.

    The meaning of _OSC/_PDC bit 12 in processor scope is whether or
    not the OS supports native handling of interrupts for Collaborative
    Processor Performance Control (CPPC) notifications.  Since on
    HWP-capable systems CPPC is a firmware interface to HWP, setting
    this bit effectively tells the firmware that the OS will handle
    thermal interrupts natively going forward.

    For details on _OSC/_PDC refer to:
    
http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html

    To implement the _OSC/_PDC handshake as described, introduce a new
    function, acpi_early_processor_osc(), that walks the ACPI
    namespace looking for ACPI processor objects and invokes _OSC for
    them with bit 12 in the capabilities buffer set and terminates the
    namespace walk on the first success.

    Also modify intel_thermal_interrupt() to clear HWP status bits in
    the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
    them from firing continuously).



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.