Xen project Mailing List

Re: [Xen-devel] [PATCH v5 7/8] x86/microcode: Synchronize late microcode loading

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Fri, 08 Feb 2019 09:29:32 -0700

Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Ashok Raj <ashok.raj@xxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, tglx@xxxxxxxxxxxxx, Borislav Petkov <bp@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 08 Feb 2019 16:29:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 28.01.19 at 08:06, <chao.gao@xxxxxxxxx> wrote: > @@ -30,18 +31,25 @@ > #include <xen/smp.h> > #include <xen/softirq.h> > #include <xen/spinlock.h> > +#include <xen/stop_machine.h> > #include <xen/tasklet.h> > #include <xen/guest_access.h> > #include <xen/earlycpio.h> > +#include <xen/watchdog.h> > > +#include <asm/delay.h> > #include <asm/msr.h> > #include <asm/processor.h> > #include <asm/setup.h> > #include <asm/microcode.h> > > +/* By default, wait for 30000us */ > +#define MICROCODE_DEFAULT_TIMEOUT_US 30000 This value deserves some explanation as to how it was chosen. > static module_t __initdata ucode_mod; > static signed int __initdata ucode_mod_idx; > static bool_t __initdata ucode_mod_forced; > +static unsigned int nr_cores; Could you see about avoiding such a static? You have ... > +static int do_microcode_update(void *unused) ... an "unused" here after all, and it's the (indirect) caller of the function which calculates the value. > +{ > + unsigned int cpu = smp_processor_id(); > + int ret; > > - error = microcode_update_cpu(); > - if ( error ) > - info->error = error; > + ret = wait_for_cpus(&cpu_in, MICROCODE_DEFAULT_TIMEOUT_US); > + if ( ret ) > + return ret; > > - info->cpu = cpumask_next(info->cpu, &cpu_online_map); > - if ( info->cpu < nr_cpu_ids ) > - return continue_hypercall_on_cpu(info->cpu, do_microcode_update, > info); > + /* > + * Initiate an update on all processors which don't have an online > sibling > + * thread with a lower thread id. Other sibling threads just await the > + * completion of microcode update. > + */ > + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) > + ret = microcode_update_cpu(); > + /* > + * Increase the wait timeout to a safe value here since we're serializing > + * the microcode update and that could take a while on a large number of > + * CPUs. And that is fine as the *actual* timeout will be determined by > + * the last CPU finished updating and thus cut short > + */ > + if ( wait_for_cpus(&cpu_out, MICROCODE_DEFAULT_TIMEOUT_US * nr_cores) ) > + panic("Timeout when finishing updating microcode"); While I expect this to go away again in the next patch, I'd still like to see this improved, in particular in case the patch here goes in independently of the next one. After all on a system with 100 cores the timeout totals to a whopping 3 seconds. Generally the time needed to wait scales by the number of CPUs still in need of doing the update. And if a timeout is really to occur, it's perhaps because of one bad core or socket, not because nothing works at all. Hence it would seem both nice and possible to scale the "remaining time to wait" by the (known) number of remaining processors to respond. Additionally it would be confusing to see dozens of panics in case this actually triggers, because all CPUs would hit this at about the same time. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.