[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HWP and ACPI workarounds


  • To: Jason Andryuk <jandryuk@xxxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Wed, 15 Feb 2023 10:50:32 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=d030QJs/paEOjeVj5QqAFgq32VGncikkRdJnJ397mPs=; b=lC6MCsWW9X/nwBvOEzp+OJlKdZXimxN+4QHhhxBrSuhiE1mIxEJaeQge/3CTwN7GdyHfPmUA4GfRcVD4DObOMQu/H+6bzqzH/LDuCD0ienUeYSlozuhFzsT0fBMotvVuBJGbjFEdjStwxKUvjSjnwL26+YDWqRwxBpRKuSU3nNGHA7FuALkyfo4TR9sA+u7at6lONRxzY4NCEnvdPdsbLdPLmcAyPcLxlMN8UycGvaO/IAgcGygfjEVf9qG3VtbL93UF98bE/7bypEJilMc5EIhwoQMGOzaI+IB8XAM7GsLVlKekxvLj6NtgmCT3+FkNXdebZgGDuuwIyLCp8w8AZA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mdhl2co/NCSjELnfPgyFZ/9hOT1Mgmtbi+46GOPK//CITGSe9SyGAS3vqHywM6eVvi5Tjv+7KTpx1ccERAYVnsNt0Ps4+uVSDTwLFhtA+0jPIvKf7P0YE7Fe8sr3bZF/u0rJ8qZSI9KVMxvMdlpCCfS4NYiSbcj0zs7Yi1rvGLfw0NSvMLhHDyQmZvHy+yexK215XxiSXMkPLeUuhlJK6cuqPuTDpChfLhBFdt9ZsNyfqttD1OxGGZXkuV1D5zfmL2pi03WuHNmxBL2583bgNCpfZNHpk/9XXBM6FRUGocw8DBdwJDa/MNFqVt9eFkW/TztQR1eVOJBy4J2GW6uMiA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Wed, 15 Feb 2023 09:50:50 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 14.02.2023 20:04, Jason Andryuk wrote:
> Qubes recently incorporated my HWP patches, but there was a report of
> a laptop, Thinkpad X1 Carbon Gen 4 with a Skylake processor, locking
> up during boot when HWP is enabled.  A user found a kernel bug that
> seems to be the same issue:
> https://bugzilla.kernel.org/show_bug.cgi?id=110941.
> 
> That bug was fixed by Linux commit a21211672c9a ("ACPI / processor:
> Request native thermal interrupt handling via _OSC").  The commit
> message has a good summary of the issue and is included at the end of
> this message.  The tl;dr is SMM crashes when it receives thermal
> interrupts, so Linux calls the ACPI _OSC method to take over interrupt
> handling.
> 
> Today, Linux calls the _OSC method when boot_cpu_has(X86_FEATURE_HWP),
> but that is not exposed to the PV Dom0.  As a test, the Qubes user was
> able to boot with the check expanded to `boot_cpu_has(X86_FEATURE_HWP)
> || xen_initial_domain()`.
> 
> We need some way for Xen to indicate the presence and/or use of HWP to
> Dom0, and Dom0 needs to use that to call _OSC.
> 
> My first idea is that Dom0 could query Xen's cpufreq driver.  However,
> Xen exposes the cpufreq driver through the unstable sysctl ops, and
> using an unstable hypercall seems wrong for the kernel.
> 
> Can we add something to an existing hypercall - maybe platform_op?  Or
> do we need a new stable hypercall?
> 
> Linux will perform the _OSC calls unilaterally upon seeing FEATURE_HWP
> and independent of actually using HWP via the intel_pstate driver.
> However, not using HWP may be an untested configuration in practice.
> The intel_pstate.c driver will not use HWP when FEATURE_HWP_EPP is not
> found.  So we could potentially cheat and expose only HWP to Dom0.
> That should trigger the _OSC calls without letting Dom0 think it can
> use HWP.  This is rather fragile though, so a more explicity method
> seems better.

I agree with the "fragile" aspect, but I'd also like to point out that
no matter what features we expose in CPUID the driver should never try
to take control when running under Xen (or perhaps more generally when
running virtualized).

> Roger's ACPI Processor patches that add xen_sanitize_pdc calls could
> be leveraged.  On the Xen side, arch_acpi_set_pdc_bits() could be
> extended to set bit 12, which would then be passed to the evaluate
> _PDC call. _PDC is the older interface superseded by _OSC, but they
> can be wrappers around the same implementation.  But if linux is just
> using _OSC, it seems more compatible to follow that implementation.

Using the _PDC bit would look quite reasonable to me. Yet what's
unclear to me is whether by the last sentence you actually mean to
indicate that you're not in favor of doing so (in which case more work
in Xen would likely be needed to actually support enough of _OSC).

What you don't touch at all is how you mean to surface the LVT based
interrupt to Dom0; the cited commit messages looks to describe uses
beyond the HWP driver, and it uses that as part of the justification
to override the firmware choice. The LAPIC is hidden (PV) or properly
disconnected from the physical one (PVH), plus Xen's MCE code (however
broken it may be) makes use of it. Or is the plan to ignore all of
that (at least for now) and limit things to the HWP driver's needs?

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.