|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 2/2] xen/x86: introduce AMD_MCE_NONFATAL
On Tue Jul 8, 2025 at 2:07 AM CEST, Stefano Stabellini wrote:
> Today, checking for non-fatal MCE errors on ARM is very invasive: it
> involves a periodic timer interrupting the physical CPU execution at
> regular intervals. Moreover, when the timer fires, the handler sends an
> IPI to all physical CPUs.
>
> Both these actions are disruptive in terms of latency and deterministic
> execution times for real-time workloads. They might miss a deadline due
> to one of these IPIs. Make it possible to disable non-fatal MCE errors
> checking with a new Kconfig option (AMD_MCE_NONFATAL).
>
> Signed-off-by: Stefano Stabellini <stefano.stabellini@xxxxxxx>
> ---
> RFC. I couldn't find a better way to do this.
> ---
> xen/arch/x86/Kconfig.cpu | 15 +++++++++++++++
> xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 3 ++-
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
> index 5fb18db1aa..14e20ad19d 100644
> --- a/xen/arch/x86/Kconfig.cpu
> +++ b/xen/arch/x86/Kconfig.cpu
> @@ -10,6 +10,21 @@ config AMD
> May be turned off in builds targetting other vendors. Otherwise,
> must be enabled for Xen to work suitably on AMD platforms.
>
> +config AMD_MCE_NONFATAL
> + bool "Check for non-fatal MCEs on AMD CPUs"
> + default y
> + depends on AMD
> + help
> + Check for non-fatal MCE errors.
> +
> + When this option is on (default), Xen regularly checks for
> + non-fatal MCEs potentially occurring on all physical CPUs. The
> + checking is done via timers and IPI interrupts, which is
> + acceptable in most configurations, but not for real-time.
> +
> + Turn this option off if you plan on deploying real-time workloads
> + on Xen.
> +
This being in the CPU vendor submenu seems off. I'd expect only a list of
silicon vendors here. I think it ought to be in the regular Kconfig file.
> config INTEL
> bool "Support Intel CPUs"
> default y
> diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> index 7d48c9ab5f..812e18f612 100644
> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> @@ -191,7 +191,8 @@ static void cf_check mce_amd_work_fn(void *data)
>
> void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
> {
> - if (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)))
> + if ( !IS_ENABLED(CONFIG_AMD_MCE_NONFATAL) ||
> + (!(c->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON))) )
> return;
>
> /* Assume we are on K8 or newer AMD or Hygon CPU here */
It can be made more general to remove more code. What do you think of removing
all non-fatals and getting rid of the initcall altogether?
diff --git a/xen/arch/x86/Kconfig.cpu b/xen/arch/x86/Kconfig.cpu
index 5fb18db1aa..a4b892a1aa 100644
--- a/xen/arch/x86/Kconfig.cpu
+++ b/xen/arch/x86/Kconfig.cpu
@@ -10,6 +10,20 @@ config AMD
May be turned off in builds targetting other vendors.
Otherwise,
must be enabled for Xen to work suitably on AMD platforms.
+config MCE_NONFATAL
+ bool "Check for non-fatal MCEs"
+ default y
+ help
+ Check for non-fatal MCE errors.
+
+ When this option is on (default), Xen regularly checks for
+ non-fatal MCEs potentially occurring on all physical CPUs. The
+ checking is done via timers and IPI interrupts, which is
+ acceptable in most configurations, but not for real-time.
+
+ Turn this option off if you plan on deploying real-time
workloads
+ on Xen.
+
config INTEL
bool "Support Intel CPUs"
default y
diff --git a/xen/arch/x86/cpu/mcheck/Makefile
b/xen/arch/x86/cpu/mcheck/Makefile
index e6cb4dd503..c70b441888 100644
--- a/xen/arch/x86/cpu/mcheck/Makefile
+++ b/xen/arch/x86/cpu/mcheck/Makefile
@@ -1,12 +1,12 @@
-obj-$(CONFIG_AMD) += amd_nonfatal.o
+obj-$(filter $(CONFIG_AMD),$(CONFIG_MCE_NONFATAL)) += amd_nonfatal.o
obj-$(CONFIG_AMD) += mce_amd.o
obj-y += mcaction.o
obj-y += barrier.o
-obj-$(CONFIG_INTEL) += intel-nonfatal.o
+obj-$(filter $(CONFIG_INTEL),$(CONFIG_MCE_NONFATAL)) +=
intel-nonfatal.o
obj-y += mctelem.o
obj-y += mce.o
obj-y += mce-apei.o
obj-$(CONFIG_INTEL) += mce_intel.o
-obj-y += non-fatal.o
+obj-$(CONFIG_MCE_NONFATAL) += non-fatal.o
obj-y += util.o
obj-y += vmce.o
... with the Kconfig option probably in the regular x86 Kconfig rather than
Kconfig.cpu
Thoughts?
Cheers,
Alejandro
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |