[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 4/6] microcode: don't call apply_microcode() in cpu_request_microcode()



On Thu, Nov 29, 2018 at 12:28:46PM +0800, Chao Gao wrote:
> On Wed, Nov 28, 2018 at 04:02:25PM +0100, Roger Pau Monné wrote:
> >On Wed, Nov 28, 2018 at 01:34:14PM +0800, Chao Gao wrote:
> >> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
> >> index 8350d22..cca7b2c 100644
> >> --- a/xen/arch/x86/microcode.c
> >> +++ b/xen/arch/x86/microcode.c
> >> @@ -233,20 +233,12 @@ int microcode_resume_cpu(unsigned int cpu)
> >>      return err;
> >>  }
> >>  
> >> -static int microcode_update_cpu(const void *buf, size_t size)
> >> +static int microcode_update_cpu(void)
> >>  {
> >>      int err;
> >> -    unsigned int cpu = smp_processor_id();
> >> -    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
> >>  
> >>      spin_lock(&microcode_mutex);
> >> -
> >> -    err = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
> >> -    if ( likely(!err) )
> >> -        err = microcode_ops->cpu_request_microcode(cpu, buf, size);
> >> -    else
> >> -        __microcode_fini_cpu(cpu);
> >> -
> >> +    err = microcode_ops->apply_microcode(smp_processor_id());
> >>      spin_unlock(&microcode_mutex);
> >>  
> >>      return err;
> >> @@ -259,7 +251,7 @@ static long do_microcode_update(void *_info)
> >>  
> >>      BUG_ON(info->cpu != smp_processor_id());
> >>  
> >> -    error = microcode_update_cpu(info->buffer, info->buffer_size);
> >> +    error = microcode_update_cpu();
> >
> >Why don't you just set info->error = microcode_update_cpu()?
> >
> >AFAICT this is done to attempt to update the remaining CPUs if one
> >update failed?
> 
> Yes. But this patch doesn't change the logic here. Actually, if HT is
> enabled and microcode is shared between the logical threads of the same
> core, so if one thread updates microcode successfully, its sibling would
> always fail in current logic. I am trying to explain why we cannot abort
> the update even though an error is met in current logic. It definitely
> can be solved by tweaking the logic slightly. 
> 
> >
> >Is there anyway to rollback to the previous state so all CPUs have the
> >same microcode?
> 
> Seems it is not allowed to load a microcode with numeratically smaller
> revision according to 9.11.7.2.
> 
> With patch 6, a panic() would be triggered if some cpus failed to do the
> update. I didn't try to change the logic here.
> 
> >I assume nothing good will come out of running a
> >system with CPUs using different microcode versions, but maybe that's
> >not so bad?
> 
> It is better that all CPUs have the same microcode revision. 
> 
> Linux kernel rejects late microcode update if finding some CPUs
> offlined. I may port this patch to Xen too in a separate patch.

What happens with hotplug CPUs?

Even if you disable late loading when there are offlined CPUs you
still need to handle hotplug CPUs, which IMO should share the same
path with offlined CPUs: the microcode update should be done ASAP
after bringing the CPU up.

> >
> >>      if ( error )
> >>          info->error = error;
> >>  
> >> @@ -276,6 +268,8 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>  {
> >>      int ret;
> >>      struct microcode_info *info;
> >> +    unsigned int cpu = smp_processor_id();
> >> +    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
> >>  
> >>      if ( len != (uint32_t)len )
> >>          return -E2BIG;
> >> @@ -294,10 +288,6 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>          return ret;
> >>      }
> >>  
> >> -    info->buffer_size = len;
> >> -    info->error = 0;
> >> -    info->cpu = cpumask_first(&cpu_online_map);
> >> -
> >>      if ( microcode_ops->start_update )
> >>      {
> >>          ret = microcode_ops->start_update();
> >> @@ -308,6 +298,26 @@ int 
> >> microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
> >>          }
> >>      }
> >>  
> >> +    spin_lock(&microcode_mutex);
> >> +
> >> +    ret = microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
> >> +    if ( likely(!ret) )
> >> +        ret = microcode_ops->cpu_request_microcode(cpu, info->buffer, 
> >> len);
> >> +    else
> >> +        __microcode_fini_cpu(cpu);
> >> +
> >> +    spin_unlock(&microcode_mutex);
> >
> >Why do you need to hold the lock here?
> >
> >microcode_update is already serialized and should only be executed on
> >a CPU at a time due to the usage of chained
> >continue_hypercall_on_cpu.
> 
> microcode_resume_cpu() also uses the 'uci' and the global microcode cache.
> This lock is to prevent them happening simultaneously (someone is
> adding/replacing entries to a list and another is reading the list).
> All existing call sites of collec_cpu_info() and cpu_request_microcode()
> are protected with this lock.

I mean, there should be some kind of protection to prevent the list
from changing at all while there's an update in progress, or else you
risk using different versions for different CPUs if there's a list
addition while a microcode update is in progress?

The current lock will just prevent list corruption by serializing
users, but it's not going to prevent changes while there's a microcode
update in progress.

> >> @@ -609,17 +612,13 @@ static int cpu_request_microcode(unsigned int cpu, 
> >> const void *buf,
> >>      }
> >>  
> >>    out:
> >> -#if CONFIG_HVM
> >> -    svm_host_osvw_init();
> >> -#endif
> >> -
> >>      /*
> >>       * In some cases we may return an error even if processor's microcode 
> >> has
> >>       * been updated. For example, the first patch in a container file is 
> >> loaded
> >>       * successfully but subsequent container file processing encounters a
> >>       * failure.
> >>       */
> >> -    return error;
> >> +    return !error ? matched_cnt : error;
> >
> >You can use error ?: matched_cnt; which is shorter.
> 
> I was told that It would be good to put the likely case in the first place.

This is not a hot path anyway, but if you prefer the current code I'm
fine with it.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.