Xen project Mailing List

Re: [Xen-devel] [PATCH v7 08/10] x86/microcode: Synchronize late microcode loading

Date: Tue, 11 Jun 2019 20:36:17 +0800

Cc: Sergey Dyasli <sergey.dyasli@xxxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, Ashok Raj <ashok.raj@xxxxxxxxx>, WeiLiu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, tglx@xxxxxxxxxxxxx, Borislav Petkov <bp@xxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>

Delivery-date: Tue, 11 Jun 2019 12:32:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Jun 05, 2019 at 08:09:43AM -0600, Jan Beulich wrote: >>>> On 27.05.19 at 10:31, <chao.gao@xxxxxxxxx> wrote: >> This patch ports microcode improvement patches from linux kernel. >> >> Before you read any further: the early loading method is still the >> preferred one and you should always do that. The following patch is >> improving the late loading mechanism for long running jobs and cloud use >> cases. >> >> Gather all cores and serialize the microcode update on them by doing it >> one-by-one to make the late update process as reliable as possible and >> avoid potential issues caused by the microcode update. >> >> Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx> >> Tested-by: Chao Gao <chao.gao@xxxxxxxxx> >> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff] >> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7] >> Cc: Kevin Tian <kevin.tian@xxxxxxxxx> >> Cc: Jun Nakajima <jun.nakajima@xxxxxxxxx> >> Cc: Ashok Raj <ashok.raj@xxxxxxxxx> >> Cc: Borislav Petkov <bp@xxxxxxx> >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> Cc: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> Changes in v7: >> - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int. >> - reword the comment above microcode_update_cpu() to clearly state that >> one thread per core should do the update. >> >> Changes in v6: >> - Use one timeout period for rendezvous stage and another for update stage. >> - scale time to wait by the number of remaining cpus to respond. >> It helps to find something wrong earlier and thus we can reboot the >> system earlier. >> --- >> xen/arch/x86/microcode.c | 171 >> ++++++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 155 insertions(+), 16 deletions(-) >> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c >> index 23cf550..f4a417e 100644 >> --- a/xen/arch/x86/microcode.c >> +++ b/xen/arch/x86/microcode.c >> @@ -22,6 +22,7 @@ >> */ >> >> #include <xen/cpu.h> >> +#include <xen/cpumask.h> > >It seems vanishingly unlikely that you would need this explicit #include >here, but it certainly isn't wrong. > >> @@ -270,31 +296,90 @@ bool microcode_update_cache(struct microcode_patch >> *patch) >> return true; >> } >> >> -static long do_microcode_update(void *patch) >> +/* Wait for CPUs to rendezvous with a timeout (us) */ >> +static int wait_for_cpus(atomic_t *cnt, unsigned int expect, >> + unsigned int timeout) >> { >> - int error, cpu; >> - >> - error = microcode_update_cpu(patch); >> - if ( error ) >> + while ( atomic_read(cnt) < expect ) >> { >> - microcode_ops->free_patch(microcode_cache); >> - return error; >> + if ( !timeout ) >> + { >> + printk("CPU%d: Timeout when waiting for CPUs calling in\n", >> + smp_processor_id()); >> + return -EBUSY; >> + } >> + udelay(1); >> + timeout--; >> } > >There's no comment here and nothing in the description: I don't >recall clarification as to whether RDTSC is fine to be issued by a >thread when ucode is being updated by another thread on the >same core. Yes. I think it is fine. Ashok, could you share your opinion on this question? > >> +static int do_microcode_update(void *patch) >> +{ >> + unsigned int cpu = smp_processor_id(); >> + unsigned int cpu_nr = num_online_cpus(); >> + unsigned int finished; >> + int ret; >> + static bool error; >> >> - microcode_update_cache(patch); >> + atomic_inc(&cpu_in); >> + ret = wait_for_cpus(&cpu_in, cpu_nr, MICROCODE_CALLIN_TIMEOUT_US); >> + if ( ret ) >> + return ret; >> >> - return error; >> + ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig)); >> + /* >> + * Load microcode update on only one logical processor per core. >> + * Here, among logical processors of a core, the one with the >> + * lowest thread id is chosen to perform the loading. >> + */ >> + if ( !ret && (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) ) > >At the very least it's not obvious whether this hyper-threading-centric >view ("logical processor") also applies to AMD's compute unit model >(which reuses cpu_sibling_mask). It does, as the respective MSRs are >per-compute-unit rather than per-core, but I'd appreciate if the >wording could be adjusted to explicitly name both cases (multiple >threads per core and multiple cores per CU). OK. Will do > >> + { >> + ret = microcode_ops->apply_microcode(patch); >> + if ( !ret ) >> + atomic_inc(&cpu_updated); >> + } >> + /* >> + * Increase the wait timeout to a safe value here since we're >> serializing > >I'm struggling with the "increase": I don't see anything being increased >here. You simply use a larger timeout than above. > >> + * the microcode update and that could take a while on a large number of >> + * CPUs. And that is fine as the *actual* timeout will be determined by >> + * the last CPU finished updating and thus cut short >> + */ >> + atomic_inc(&cpu_out); >> + finished = atomic_read(&cpu_out); >> + while ( !error && finished != cpu_nr ) >> + { >> + /* >> + * During each timeout interval, at least a CPU is expected to >> + * finish its update. Otherwise, something goes wrong. >> + */ >> + if ( wait_for_cpus(&cpu_out, finished + 1, >> + MICROCODE_UPDATE_TIMEOUT_US) && !error ) >> + { >> + error = true; >> + panic("Timeout when finishing updating microcode (finished >> %d/%d)", >> + finished, cpu_nr); > >Why the setting of "error" when you panic anyway? > >And please use format specifiers matching the types of the >further arguments (i.e. twice %u here, but please check other >code as well). > >Furthermore (and I'm sure I've given this comment before) if >you really hit the limit, how many panic() invocations are there >going to be? You run this function on all CPUs after all. "error" is to avoid calling of panic() on multiple CPUs simultaneously. Roger is right: atomic primitives should be used here. > >On the whole, taking a 256-thread system as example, you >allow the whole process to take over 4 min without calling >panic(). >Leaving aside guests, I don't think Xen itself would >survive this in all cases. We've found the need to process >softirqs with far smaller delays, in particular from key handlers >producing lots of output. At the very least there should be a >bold warning logged if the system had been in stop-machine >state for, say, longer than 100ms (value subject to discussion). > In theory, if you mean 256 cores, yes. Do you think a configurable and run-time changeable upper bound for the whole process can address your concern? The default value for this upper bound can be set to a large value (for example, 1s * the number of online core) and the admin can ajust/lower the upper bound according to the way (serial or parallel) to perform the update and other requirements. Once the upper bound is reached, we would call panic(). >> + } >> + >> + finished = atomic_read(&cpu_out); >> + } >> + >> + /* >> + * Refresh CPU signature (revision) on threads which didn't call >> + * apply_microcode(). >> + */ >> + if ( cpu != cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) >> + ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig)); > >Another option would be for the CPU doing the update to simply >propagate the new value to all its siblings' cpu_sig values. Will do. > >> @@ -337,12 +429,59 @@ int >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len) >> if ( patch ) >> microcode_ops->free_patch(patch); >> ret = -EINVAL; >> - goto free; >> + goto put; >> } >> >> - ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map), >> - do_microcode_update, patch); >> + atomic_set(&cpu_in, 0); >> + atomic_set(&cpu_out, 0); >> + atomic_set(&cpu_updated, 0); >> + >> + /* Calculate the number of online CPU core */ >> + nr_cores = 0; >> + for_each_online_cpu(cpu) >> + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) ) >> + nr_cores++; >> + >> + printk(XENLOG_INFO "%d cores are to update their microcode\n", >> nr_cores); >> + >> + /* >> + * We intend to disable interrupt for long time, which may lead to >> + * watchdog timeout. >> + */ >> + watchdog_disable(); >> + /* >> + * Late loading dance. Why the heavy-handed stop_machine effort? >> + * >> + * - HT siblings must be idle and not execute other code while the other >> + * sibling is loading microcode in order to avoid any negative >> + * interactions cause by the loading. >> + * >> + * - In addition, microcode update on the cores must be serialized until >> + * this requirement can be relaxed in the future. Right now, this is >> + * conservative and good. >> + */ >> + ret = stop_machine_run(do_microcode_update, patch, NR_CPUS); >> + watchdog_enable(); >> + >> + if ( atomic_read(&cpu_updated) == nr_cores ) >> + { >> + spin_lock(&microcode_mutex); >> + microcode_update_cache(patch); >> + spin_unlock(&microcode_mutex); >> + } >> + else if ( atomic_read(&cpu_updated) == 0 ) >> + microcode_ops->free_patch(patch); >> + else >> + { >> + printk("Updating microcode succeeded on part of CPUs and failed >> on\n" >> + "others due to an unknown reason. A system with different\n" >> + "microcode revisions is considered unstable. Please reboot >> and\n" >> + "do not load the microcode that triggers this warning\n"); >> + microcode_ops->free_patch(patch); >> + } > >As said on an earlier patch, I think the cache can be updated if at >least one CPU loaded the blob successfully. Additionally I'd like to >ask that you log the number of successfully updated cores. And >finally perhaps "differing" instead of "different" and omit "due to >an unknown reason"? Will do. Thanks Chao _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.