[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 3/3] xen/acpi: upload power and performance related data from a PVH dom0


  • To: linux-kernel@xxxxxxxxxxxxxxx
  • From: Roger Pau Monne <roger.pau@xxxxxxxxxx>
  • Date: Mon, 21 Nov 2022 11:21:12 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Z49Tu1Ypmjei/9l5AHd8zeioUpwinUkwsOpJU7h/0Lo=; b=E39mtrVFOK3jj4DcwYxCysmTucj+7wzgSPcyqjqvsrrRP4tFobQOAo2ICYf27WsCXr3HeGP2FRaDwc/IHjegd56//DNQfE/FINkbhPAJQFhwRo9YxMnGuQXU3bavViNamNqykmEHuIjUoHCPBv8BtdSwjClJHi+jD7qJQ6qHfUX4c3CJ7nCKJPZe21vxe2N4fgsgDNGuUQYRxOIAAxEZjJfbJICbCwHAQUp/Xst7YEhZwfo8X3bfJZ8ws7BFQMn/NPQjNtTtWvRXU04KN9TWfBY0KAU+wiNtYriaB8tQX1S9vEbOleaIE6+NCePZQiHOROBfJQHrq7xbswh3afbVOQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OfMIqcuWdpLfMQRAAJ0ptLB/TEY+azYR7NbmuoX4NlbyKWzvGRST88tR6rNQebex0ROMS1llOigF5ojuIyUFNnhf6101FPpM0V9HcoDy39NJnUvSJfbLzPfXckeB2cYVioN9VYtXSwSzErnk0QrjhESs43UvLj4zRBE8bxIYnM+9rDgAJHGQ9Agepr69JBL+CBEO+8yqAPsQ2nlqlt1gD0+LzsCdpYgxnx+zjFWOeP2qe9+8io0CdciMRaEt5nFKOtvnejvQoKMsk1jNSP7uYZuiDIay7ZCiAzMhmF16FH8U34C0RQkg6lF0AZJGntwutgAKdavrqaSmB82LmMxgHA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, jgross@xxxxxxxx, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, x86@xxxxxxxxxx, "H. Peter Anvin" <hpa@xxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Oleksandr Tyshchenko <oleksandr_tyshchenko@xxxxxxxx>
  • Delivery-date: Mon, 21 Nov 2022 10:33:48 +0000
  • Ironport-data: A9a23:yCsLaKweCgdYjLlgNDt6t+dHxirEfRIJ4+MujC+fZmUNrF6WrkVVz mtNWDiOa/yPZGbwL49wbIy+9x8B6seDmIVnGQBsriAxQypGp/SeCIXCJC8cHc8wwu7rFxs7s ppEOrEsCOhuExcwcz/0auCJQUFUjP3OHfykTbaeYUidfCc8IA85kxVvhuUltYBhhNm9Emult Mj75sbSIzdJ4RYtWo4vw//F+U0HUMja4mtC5AVnP6kT5TcyqlFOZH4hDfDpR5fHatE88t6SH 47r0Ly/92XFyBYhYvvNfmHTKxBirhb6ZGBiu1IOM0SQqkEqSh8ai87XAME0e0ZP4whlqvgqo Dl7WT5cfi9yVkHEsLx1vxC1iEiSN4UekFPMCSDXXcB+UyQq2pYjqhljJBheAGEWxgp4KV9o+ OUnLzMpUlPdn8WHnpO0dO5Qpdt2eaEHPKtH0p1h5RfwKK98BLrlE+DN79Ie2yosjMdTG/qYf 9AedTdkcBXHZVtIJ0sTD5U92uyvgxETcRUB8A7T+fVxvjWVlVMouFTuGIO9ltiiX8Jak1zev mvb12/4HgsbJJqUzj/tHneE1rCQwnKnA9t6+LuQ6L1xu3iT6jUpOCYIDXjqh+O4kEGmRIcKQ 6AT0m90xUQoz2SvT9/gT1i7rWSCsxo0RdVdCas55RuLx66S5ByWbkAUQzgEZNE4ucseQT0xy kTPj97vHSZosrCeVTSa7Lj8hTG9Iy8ONkcZeDQJCwAC5rHLqoYpjwmJSc1/Cqmrld7kMTbqy juOoW41gLB7pdIE07WT+VHBni62oZ7IXkg5623/W2Oj4QRRfoOpZ4W0r1Pc6J5oJp6xR12As X5U3cSThMgCBI+A0iyERv4AGpmt5vCYIHvdh0JiG98q8DHF027zI6hT7St4KUMvNdwLERfpe Eb7qxJN44UVN3yvBYdseJ64AckuyanmFPzmW+rSY94IZYJ+HCeA+CxtfkeW03rajFk3kao/N JGYdu6hFX8fT69gyVKLq/w11LYqwmU0wzPVTJWilRC/i+PBPTiSVKsPN0aIYqYh9qSYrQ7J8 tFZccyX1xFYV+64aS7SmWIOEW03wbEALcieg6RqmiSre2KKxElJ5yft/I4c
  • Ironport-hdrordr: A9a23:ddTW96zTSUlfct4YXSXYKrPxTOgkLtp133Aq2lEZdPULSKGlfp GV9sjziyWetN9wYh4dcB67Scu9qBTnhORICOgqTMyftWzd1FdAQ7sSibcKrweBJ8S6zJ8l6U 4CSdkANDSPNykcsS+S2mDRfbcdKZu8gdiVbI/lvgtQpGpRGsRdBmlCe2Wm+hocfng6OXN1Lu vr2uN34x6bPVgHZMWyAXcIG8DFut3wjZrjJTIWGhI97wGKrDWwrJr3CQKR0BsyWy5Ghe5Kyx mOryXJooGY992rwB7V0GHeq7xQhdva09NGQOCcl8QPLT3oqwCwIKBsQaeLsjwZqPymrHwqjN 7PiRE9ONkb0QKbQkiF5T/WnyXw2jcn7HHvjXeenHvYuMT8ABY3EdBIi451egbQrxNIhqA17I t7m0ai87ZHBxLJmyrwo/DOShFRj0Kx5V4vi/QagXBzWZYXLJVRsYsc1kVIF4poJlOy1KkXVM 1VSO3M7vdfdl2XK1jfo2lU2dSpGk8+Gx+XK3Jyz/C94nxzpjRU3kEYzMsQkjMr75QmUaRJ4O zCL+BBiKxOZtV+V9MyOM4xBe+MTkDdSxPFN2yfZX79ErscBn7Lo5nrpJ0o+eCRfoASxpdaou WPbLphjx9zR6vSM7zM4HUSmSq9AllVHA6dhv223qIJ+4EVH9HQQGi+oFNHqbrTnxxQOLyeZx +JAuMnPxbSFxqRJW935XyOZ3ArEwh5bCQ0gKdOZ7vcmLO9FqTa8srmTd30GJ3BVR4ZZ0KXOA pxYNG0HrQM0nyW
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

When running as a PVH dom0 the ACPI MADT is crafted by Xen in order to
report the correct numbers of vCPUs that dom0 has, so the host MADT is
not provided to dom0.  This creates issues when parsing the power and
performance related data from ACPI dynamic tables, as the ACPI
Processor UIDs found on the dynamic code are likely to not match the
ones crafted by Xen in the dom0 MADT.

Xen would rely on Linux having filled at least the power and
performance related data of the vCPUs on the system, and would clone
that information in order to setup the remaining pCPUs on the system
if dom0 vCPUs < pCPUs.  However when running as PVH dom0 it's likely
that none of dom0 CPUs will have the power and performance data
filled, and hence the Xen ACPI Processor driver needs to fetch that
information by itself.

In order to do so correctly, introduce a new helper to fetch the _CST
data without taking into account the system capabilities from the
CPUID output, as the capabilities reported to dom0 in CPUID might be
different from the ones on the host.

Note that the newly introduced code will only fetch the _CST, _PSS,
_PPC and _PCT from a single CPU, and clone that information for all the
other Processors.  This won't work on an heterogeneous system with
Processors having different power and performance related data between
them.

Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>
---
 arch/x86/include/asm/xen/hypervisor.h |   2 +-
 arch/x86/xen/enlighten.c              |   2 +-
 drivers/xen/xen-acpi-processor.c      | 225 ++++++++++++++++++++++++--
 3 files changed, 211 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/xen/hypervisor.h 
b/arch/x86/include/asm/xen/hypervisor.h
index b4ed90ef5e68..1ead5253bc6c 100644
--- a/arch/x86/include/asm/xen/hypervisor.h
+++ b/arch/x86/include/asm/xen/hypervisor.h
@@ -62,7 +62,7 @@ void __init mem_map_via_hcall(struct boot_params 
*boot_params_p);
 #endif
 
 #ifdef CONFIG_XEN_DOM0
-bool __init xen_processor_present(uint32_t acpi_id);
+bool xen_processor_present(uint32_t acpi_id);
 void xen_sanitize_pdc(uint32_t *buf);
 #else
 static inline bool xen_processor_present(uint32_t acpi_id)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 394dd6675113..a7b41103d3e5 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -348,7 +348,7 @@ EXPORT_SYMBOL(xen_arch_unregister_cpu);
 #endif
 
 #ifdef CONFIG_XEN_DOM0
-bool __init xen_processor_present(uint32_t acpi_id)
+bool xen_processor_present(uint32_t acpi_id)
 {
        unsigned int i, maxid;
        struct xen_platform_op op = {
diff --git a/drivers/xen/xen-acpi-processor.c b/drivers/xen/xen-acpi-processor.c
index 9cb61db67efd..b189ea69d557 100644
--- a/drivers/xen/xen-acpi-processor.c
+++ b/drivers/xen/xen-acpi-processor.c
@@ -48,6 +48,8 @@ static unsigned long *acpi_id_cst_present;
 /* Which ACPI P-State dependencies for a enumerated processor */
 static struct acpi_psd_package *acpi_psd;
 
+static bool pr_initialized;
+
 static int push_cxx_to_hypervisor(struct acpi_processor *_pr)
 {
        struct xen_platform_op op = {
@@ -172,8 +174,13 @@ static int xen_copy_psd_data(struct acpi_processor *_pr,
 
        /* 'acpi_processor_preregister_performance' does not parse if the
         * num_processors <= 1, but Xen still requires it. Do it manually here.
+        *
+        * Also init the field if not set, as that's possible if the physical
+        * CPUs on the system doesn't match the data provided in the MADT when
+        * running as a PVH dom0.
         */
-       if (pdomain->num_processors <= 1) {
+       if (pdomain->num_processors <= 1 ||
+           dst->shared_type == CPUFREQ_SHARED_TYPE_NONE) {
                if (pdomain->coord_type == DOMAIN_COORD_TYPE_SW_ALL)
                        dst->shared_type = CPUFREQ_SHARED_TYPE_ALL;
                else if (pdomain->coord_type == DOMAIN_COORD_TYPE_HW_ALL)
@@ -313,6 +320,155 @@ static unsigned int __init get_max_acpi_id(void)
        pr_debug("Max ACPI ID: %u\n", max_acpi_id);
        return max_acpi_id;
 }
+
+/*
+ * Custom version of the native acpi_processor_evaluate_cst() function, to
+ * avoid some sanity checks done based on the CPUID data.  When running as a
+ * Xen domain the CPUID data provided to dom0 is not the native one, so C
+ * states cannot be sanity checked.  Leave it to the hypervisor which is also
+ * the entity running the driver.
+ */
+static int xen_acpi_processor_evaluate_cst(acpi_handle handle,
+                                          struct acpi_processor_power *info)
+{
+       struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+       union acpi_object *cst;
+       acpi_status status;
+       u64 count;
+       int last_index = 0;
+       int i, ret = 0;
+
+       status = acpi_evaluate_object(handle, "_CST", NULL, &buffer);
+       if (ACPI_FAILURE(status)) {
+               acpi_handle_debug(handle, "No _CST\n");
+               return -ENODEV;
+       }
+
+       cst = buffer.pointer;
+
+       /* There must be at least 2 elements. */
+       if (!cst || cst->type != ACPI_TYPE_PACKAGE || cst->package.count < 2) {
+               acpi_handle_warn(handle, "Invalid _CST output\n");
+               ret = -EFAULT;
+               goto end;
+       }
+
+       count = cst->package.elements[0].integer.value;
+
+       /* Validate the number of C-states. */
+       if (count < 1 || count != cst->package.count - 1) {
+               acpi_handle_warn(handle, "Inconsistent _CST data\n");
+               ret = -EFAULT;
+               goto end;
+       }
+
+       for (i = 1; i <= count; i++) {
+               union acpi_object *element;
+               union acpi_object *obj;
+               struct acpi_power_register *reg;
+               struct acpi_processor_cx cx;
+
+               /*
+                * If there is not enough space for all C-states, skip the
+                * excess ones and log a warning.
+                */
+               if (last_index >= ACPI_PROCESSOR_MAX_POWER - 1) {
+                       acpi_handle_warn(handle, "No room for more idle states 
(limit: %d)\n",
+                                        ACPI_PROCESSOR_MAX_POWER - 1);
+                       break;
+               }
+
+               memset(&cx, 0, sizeof(cx));
+
+               element = &cst->package.elements[i];
+               if (element->type != ACPI_TYPE_PACKAGE) {
+                       acpi_handle_info(handle, "_CST C%d type(%x) is not 
package, skip...\n",
+                                        i, element->type);
+                       continue;
+               }
+
+               if (element->package.count != 4) {
+                       acpi_handle_info(handle, "_CST C%d package count(%d) is 
not 4, skip...\n",
+                               i, element->package.count);
+                       continue;
+               }
+
+               obj = &element->package.elements[0];
+
+               if (obj->type != ACPI_TYPE_BUFFER) {
+                       acpi_handle_info(handle, "_CST C%d package element[0] 
type(%x) is not buffer, skip...\n",
+                                        i, obj->type);
+                       continue;
+               }
+
+               reg = (struct acpi_power_register *)obj->buffer.pointer;
+
+               obj = &element->package.elements[1];
+               if (obj->type != ACPI_TYPE_INTEGER) {
+                       acpi_handle_info(handle, "_CST C[%d] package element[1] 
type(%x) is not integer, skip...\n",
+                                        i, obj->type);
+                       continue;
+               }
+
+               cx.type = obj->integer.value;
+               /*
+                * There are known cases in which the _CST output does not
+                * contain C1, so if the type of the first state found is not
+                * C1, leave an empty slot for C1 to be filled in later.
+                */
+               if (i == 1 && cx.type != ACPI_STATE_C1)
+                       last_index = 1;
+
+               cx.address = reg->address;
+               cx.index = last_index + 1;
+
+               switch (reg->space_id) {
+               case ACPI_ADR_SPACE_FIXED_HARDWARE:
+                       cx.entry_method = ACPI_CSTATE_FFH;
+                       break;
+
+               case ACPI_ADR_SPACE_SYSTEM_IO:
+                       cx.entry_method = ACPI_CSTATE_SYSTEMIO;
+                       break;
+
+               default:
+                       acpi_handle_info(handle, "_CST C%d space_id(%x) neither 
FIXED_HARDWARE nor SYSTEM_IO, skip...\n",
+                                        i, reg->space_id);
+                       continue;
+               }
+
+               if (cx.type == ACPI_STATE_C1)
+                       cx.valid = 1;
+
+               obj = &element->package.elements[2];
+               if (obj->type != ACPI_TYPE_INTEGER) {
+                       acpi_handle_info(handle, "_CST C%d package element[2] 
type(%x) not integer, skip...\n",
+                                        i, obj->type);
+                       continue;
+               }
+
+               cx.latency = obj->integer.value;
+
+               obj = &element->package.elements[3];
+               if (obj->type != ACPI_TYPE_INTEGER) {
+                       acpi_handle_info(handle, "_CST C%d package element[3] 
type(%x) not integer, skip...\n",
+                                        i, obj->type);
+                       continue;
+               }
+
+               memcpy(&info->states[++last_index], &cx, sizeof(cx));
+       }
+
+       acpi_handle_info(handle, "Found %d idle states\n", last_index);
+
+       info->count = last_index;
+
+end:
+       kfree(buffer.pointer);
+
+       return ret;
+}
+
 /*
  * The read_acpi_id and check_acpi_ids are there to support the Xen
  * oddity of virtual CPUs != physical CPUs in the initial domain.
@@ -354,24 +510,44 @@ read_acpi_id(acpi_handle handle, u32 lvl, void *context, 
void **rv)
        default:
                return AE_OK;
        }
-       if (invalid_phys_cpuid(acpi_get_phys_id(handle,
-                                               acpi_type == ACPI_TYPE_DEVICE,
-                                               acpi_id))) {
+
+       if (!xen_processor_present(acpi_id)) {
                pr_debug("CPU with ACPI ID %u is unavailable\n", acpi_id);
                return AE_OK;
        }
-       /* There are more ACPI Processor objects than in x2APIC or MADT.
-        * This can happen with incorrect ACPI SSDT declerations. */
-       if (acpi_id >= nr_acpi_bits) {
-               pr_debug("max acpi id %u, trying to set %u\n",
-                        nr_acpi_bits - 1, acpi_id);
-               return AE_OK;
-       }
+
        /* OK, There is a ACPI Processor object */
        __set_bit(acpi_id, acpi_id_present);
 
        pr_debug("ACPI CPU%u w/ PBLK:0x%lx\n", acpi_id, (unsigned long)pblk);
 
+       if (!pr_initialized) {
+               struct acpi_processor *pr = context;
+               int rc;
+
+               /*
+                * There's no CPU on the system that has any performance or
+                * power related data, initialize all the required fields by
+                * fetching that info here.
+                *
+                * Note such information is only fetched once, and then reused
+                * for all pCPUs.  This won't work on heterogeneous systems
+                * with different Cx anb/or Px states between CPUs.
+                */
+
+               pr->handle = handle;
+
+               rc = acpi_processor_get_performance_info(pr);
+               if (rc)
+                       pr_debug("ACPI CPU%u failed to get performance data\n",
+                                acpi_id);
+               rc = xen_acpi_processor_evaluate_cst(handle, &pr->power);
+               if (rc)
+                       pr_debug("ACPI CPU%u failed to get _CST data\n", 
acpi_id);
+
+               pr_initialized = true;
+       }
+
        /* It has P-state dependencies */
        if (!acpi_processor_get_psd(handle, &acpi_psd[acpi_id])) {
                pr_debug("ACPI CPU%u w/ PST:coord_type = %llu domain = %llu\n",
@@ -392,8 +568,7 @@ read_acpi_id(acpi_handle handle, u32 lvl, void *context, 
void **rv)
 static int check_acpi_ids(struct acpi_processor *pr_backup)
 {
 
-       if (!pr_backup)
-               return -ENODEV;
+       BUG_ON(!pr_backup);
 
        if (acpi_id_present && acpi_id_cst_present)
                /* OK, done this once .. skip to uploading */
@@ -422,8 +597,8 @@ static int check_acpi_ids(struct acpi_processor *pr_backup)
 
        acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
                            ACPI_UINT32_MAX,
-                           read_acpi_id, NULL, NULL, NULL);
-       acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, read_acpi_id, NULL, NULL);
+                           read_acpi_id, NULL, pr_backup, NULL);
+       acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, read_acpi_id, pr_backup, 
NULL);
 
 upload:
        if (!bitmap_equal(acpi_id_present, acpi_ids_done, nr_acpi_bits)) {
@@ -464,6 +639,7 @@ static int xen_upload_processor_pm_data(void)
        struct acpi_processor *pr_backup = NULL;
        int i;
        int rc = 0;
+       bool free_perf = false;
 
        pr_info("Uploading Xen processor PM info\n");
 
@@ -475,13 +651,30 @@ static int xen_upload_processor_pm_data(void)
 
                if (!pr_backup) {
                        pr_backup = kzalloc(sizeof(struct acpi_processor), 
GFP_KERNEL);
-                       if (pr_backup)
+                       if (pr_backup) {
                                memcpy(pr_backup, _pr, sizeof(struct 
acpi_processor));
+                               pr_initialized = true;
+                       }
                }
                (void)upload_pm_data(_pr);
        }
 
+       if (!pr_backup) {
+               pr_backup = kzalloc(sizeof(struct acpi_processor), GFP_KERNEL);
+               if (!pr_backup)
+                       return -ENOMEM;
+               pr_backup->performance = kzalloc(sizeof(struct 
acpi_processor_performance),
+                                                GFP_KERNEL);
+               if (!pr_backup->performance) {
+                       kfree(pr_backup);
+                       return -ENOMEM;
+               }
+               free_perf = true;
+       }
+
        rc = check_acpi_ids(pr_backup);
+       if (free_perf)
+               kfree(pr_backup->performance);
        kfree(pr_backup);
 
        return rc;
-- 
2.37.3




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.