[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/3] acpi/processor: fix evaluating _PDC method when running as Xen dom0


  • To: Dave Hansen <dave.hansen@xxxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Fri, 2 Dec 2022 13:24:35 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GSJkgbLsYx6sBjOxZtCQkc6d32+ScXt3yjKtA3uX0Gw=; b=Fnsz32rIsgJZZDRDnUy2lzg42KGG7sI3rM/YRHAkNTbTvGKHNvdLUeEiIdaRSLAxAc0dgf608BWDj/oGyXb4IBL2Dl8HWETEmOzAybF0er/btNkMbgD1jRwqoD0GUwKp91TyvX2SQa9ToONN//t4nsxCgHMc2CGzdmQx0911VnE5HWadqc/1kzvTf2ypSWKHeDFk9/8mOoRDOUu0CkWv4iM5AgKfT+6tww0/Q4l+kUzE83+BA3dvDqQkQje92aK0iTtFIX2jO+kLJqVCvugNOUmhH+ZFf9Qkyv3w+EIJAcmQBwwMr6gpmGCAnOZzRDWxJgiSdCirgbX0CVzmsc5ouw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=j8sf+pVP37/6PlJ3PYocyVdw+Fmx+8MetZFMbDexSPJjKqRP1Ddx7bxMl59tdHs6mXH6kL2gUYzBwo1HYvd+wqVhd3fo9phNk8QLwti486vOW68jOxzeeG4tczCRiPcWgejWR8G2VF+qvWnDONwNt7ke1d7nZsxiXClDgl7UjFjXTlPWN5w+s6bchT+/2QgHio8zehqSEMaVhVFej4I9i78KhdsMSx6EFNnxHQRiosx9SXYeiWg+IhGGW6pBJ5A/WTSz4Qm72kSbMQx9sRjr1h3Y76CdGI2rW15xGCAl96XKX5h56HWhr3KSauo+k1/wBNA6djnAb4oIqJZNz3mdtw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, jgross@xxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Thomas Gleixner <tglx@xxxxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxxxxx>, Borislav Petkov <bp@xxxxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, x86@xxxxxxxxxx, "H. Peter Anvin" <hpa@xxxxxxxxx>, "Rafael J. Wysocki" <rafael@xxxxxxxxxx>, Len Brown <lenb@xxxxxxxxxx>, Alex Chiang <achiang@xxxxxx>, Venkatesh Pallipadi <venkatesh.pallipadi@xxxxxxxxx>, linux-acpi@xxxxxxxxxxxxxxx
  • Delivery-date: Fri, 02 Dec 2022 12:25:08 +0000
  • Ironport-data: A9a23:syEmwqIh2Z2uXJoXFE+REJUlxSXFcZb7ZxGr2PjKsXjdYENShjwOm jRNUTqAaKuIZTajfYsiPty18E0PvJTQn4UwTQJlqX01Q3x08seUXt7xwmUcnc+xBpaaEB84t ZV2hv3odp1coqr0/0/1WlTZhSAgk/rOHv+kUrWs1hlZHWdMUD0mhQ9oh9k3i4tphcnRKw6Ws Jb5rta31GWNglaYCUpJrfPdwP9TlK6q4mlB5ARkPaojUGL2zBH5MrpOfcldEFOgKmVkNrbSb /rOyri/4lTY838FYj9yuu+mGqGiaue60Tmm0hK6aYD76vRxjnVaPpIAHOgdcS9qZwChxLid/ jnvWauYEm/FNoWU8AgUvoIx/ytWZcWq85efSZSzXFD6I+QrvBIAzt03ZHzaM7H09c5VMExMz vMCBQkNTT6ypceG8o/kZOhj05FLwMnDZOvzu1lG5BSAVLMKZM6GRK/Ho9hFwD03m8ZCW+7EY NYUYiZuaxKGZABTPlAQC9Q1m+LAanvXKmUE7g7K4/dopTGMkWSd05C0WDbRUsaNSshP2F6Ru 0rN/njjAwFcP9uaodaA2iLx3LCUzXuiMG4UPKCm3NR1gkax+nweCB4fCFuDuf6FmkHrDrqzL GRRoELCt5Ma+EW1Q5/9VhujrXisvxgAVt4WGOo/gCmJy6zJ80OaC3ICQzppdtMrrok1SCYs2 1vPmMnmbRR/vbvQRX+D+7O8qTKpJTNTPWIEfTUDTwYO/5/kuo5bphDAVNF4C4auk8b4Xzr3x liipi8khq5VitUXzaKl5lPWqzW2r5PNQ0g+4QC/dn6q6hNRYI+jepCy7l7a/bBMIe6xS1iHs 38sgcWS7OkSS5qKkUSlSv0lFbWo6vDVdjHR6XZjFocssTSk/WWue6hU4TdjNAFoNNoJfXniZ 0q7kRMBurdQMWGsYKsxZJi+Y+wq1aHIB8X5UeqSZd1LCrBrfQGO8SVGZkOK2W3p1k82nskXP Zqde+6vAGwcBKAhyyC5L88U2r8qzyYx7WPLA5v8ynyPz7eYZ3eJRKwFdkOHauQ49KqIoS3U9 cpSM42BzBA3eOn/ZDTHtIcYNVEiM3c2H9b1ptZRe+rFJRBpcFzNENfUyLIlPoBgwaJck76R+ mnnAxAFjl3imXfANAOGLGh5b6/iVop+qnR9OjEwOVGv2D4oZoPHALojSqbbtIIPrIRLpcOYh dFYEylcKpyjkgj6xgk=
  • Ironport-hdrordr: A9a23:yWcib6wmT3WrgI8g3jm8KrPwC71zdoMgy1knxilNoH1uA7Slfq WV98jzuiWbtN98Yhwdcf7pAsi9qDDnhPtICOoqTNSftWvdyRKVxehZhOOJ/9SHIVydygc378 hdmsZFZOEYQmIK6foSTTPIdOoI0Z2syojtr+Hb1nJsRQZhZ+Vb6RtjAArzKC1LrU19dPwEKK Y=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Nov 30, 2022 at 08:48:14AM -0800, Dave Hansen wrote:
> On 11/30/22 07:53, Roger Pau Monné wrote:
> > On Tue, Nov 29, 2022 at 09:43:53AM -0800, Dave Hansen wrote:
> >> On 11/21/22 02:21, Roger Pau Monne wrote:
> >>> When running as a Xen dom0 the number of CPUs available to Linux can
> >>> be different from the number of CPUs present on the system, but in
> >>> order to properly fetch processor performance related data _PDC must
> >>> be executed on all the physical CPUs online on the system.
> >>
> >> How is the number of CPUs available to Linux different?
> >>
> >> Is this a result of the ACPI tables that dom0 sees being "wrong"?
> > 
> > Depends on the mode.  This is all specific to Linux running as a Xen
> > dom0.
> > 
> > For PV dom0 the ACPI tables that dom0 sees are the native ones,
> > however available CPUs are not detected based on the MADT, but using
> > hypercalls, see xen_smp_ops struct and the
> > x86_init.mpparse.get_smp_config hook used in smp_pv.c
> > (_get_smp_config()).
> > 
> > For a PVH dom0 Xen provides dom0 with a crafted MADT table that does
> > only contain the CPUs available to dom0, and hence is likely different
> > from the native one present on the hardware.
> > 
> > In any case, the dynamic tables dom0 sees where the Processor
> > objects/devices reside are not modified by Xen in any way, so the ACPI
> > Processors are always exposed to dom0 as present on the native
> > tables.
> > 
> > Xen cannot parse the dynamic ACPI tables (neither should it, since
> > then it would act as OSPM), so it relies on dom0 to provide same data
> > present on those tables for Xen to properly manage the frequency and
> > idle states of the CPUs on the system.
> > 
> >>> The current checks in processor_physically_present() result in some
> >>> processor objects not getting their _PDC methods evaluated when Linux
> >>> is running as Xen dom0.  Fix this by introducing a custom function to
> >>> use when running as Xen dom0 in order to check whether a processor
> >>> object matches a CPU that's online.
> >>
> >> What is the end user visible effect of this problem and of the solution?
> > 
> > Without this fix _PDC is only evaluated for the CPUs online from dom0
> > point of view, which means that if dom0 is limited to 8 CPUs but the
> > system has 24 CPUs, _PDC will only get evaluated for 8 CPUs, and that
> > can have the side effect of the data then returned by _PSD method or
> > other methods being different between CPUs where _PDC was evaluated vs
> > CPUs where the method wasn't evaluated.  Such mismatches can
> > ultimately lead to for example the CPU frequency driver in Xen not
> > initializing properly because the coordination methods between CPUs on
> > the same domain don't match.
> > 
> > Also not evaluating _PDC prevents the OS (or Xen in this case)
> > from notifying ACPI of the features it supports.
> > 
> > IOW this fix attempts to make sure all physically online CPUs get _PDC
> > evaluated, and in order to to that we need to ask the hypervisor if a
> > Processor ACPI ID matches an online CPU or not, because Linux doesn't
> > have that information when running as dom0.
> > 
> > Hope the above makes sense and allows to make some progress on the
> > issue, sometimes it's hard to summarize without getting too
> > specific,
> 
> Yes, writing changelogs is hard. :)
> 
> Let's try though.  I was missing some key pieces of background here.
> Believe it or not, I had no idea off the top of my head what _PDC was or
> why it's important.
> 
> the information about _PDC being required on all processors was missing,
> as was the information about the dom0's incomplete concept of the
> available physical processors.
> 
> == Background ==
> 
> In ACPI systems, the OS can direct power management, as opposed to the
> firmware.  This OS-directed Power Management is called OSPM.  Part of
> telling the firmware that the OS going to direct power management is
> making ACPI "_PDC" (Processor Driver Capabilities) calls.  These _PDC
> calls must be made on every processor.  If these _PDC calls are not
> completed on every processor it can lead to inconsistency and later
> failures in things like the CPU frequency driver.

I think the "on every processor" is not fully accurate.  _PDC methods
need to be evaluated for every Processor object.  Whether that
evaluation is executed on the physical processor that matches the ACPI
UID of the object/device is not mandatory (iow: you can evaluate
the _PDC methods of all Processor objects from the BSP).  The usage of
'on' seems to me to note that the methods are executed on the matching
physical processors.

I would instead use: "... must be made for every processor.  If these
_PDC calls are not completed for every processor..."

But I'm not a native English speaker, so this might all be irrelevant.

> 
> In a Xen system, the dom0 kernel is responsible for system-wide power
> management.  The dom0 kernel is in charge of OSPM.  However, the Xen
> hypervisor hides some processors information from the dom0 kernel.  This
> is presumably done to ensure that the dom0 system has less interference
> with guests that want to use the other processors.

dom0 on a Xen system is just another guest, so the admin can limit the
number of CPUs available to dom0, that's why we get into this weird
situation.

> == Problem ==
> 
> But, this leads to a problem: the dom0 kernel needs to run _PDC on all
> the processors, but it can't always see them.
> 
> == Solution ==
> 
> In dom0 kernels, ignore the existing ACPI method for determining if a
> processor is physically present because it might not be accurate.
> Instead, ask the hypervisor for this information.
> 
> This ensures that ...
> 
> ----
> 
> Is that about right?

Yes, I think it's accurate.  I will add to my commit log, thanks!

On the implementation side, is the proposed approach acceptable?
Mostly asking because it adds Xen conditionals to otherwise generic
ACPI code.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.