|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2] x86/microcode: Synchronize late microcode loading
>>> On 25.04.18 at 13:46, <chao.gao@xxxxxxxxx> wrote:
> @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t
> size)
> return err;
> }
>
> -static long do_microcode_update(void *_info)
> +/* Wait for all CPUs to rendezvous with a timeout (us) */
> +static int wait_for_cpus(atomic_t *cnt, int timeout)
unsigned int
> +static int do_microcode_update(void *_info)
> +{
> + struct microcode_info *info = _info;
> + unsigned int cpu = smp_processor_id();
> + int ret;
> +
> + ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
> + if ( ret )
> + return ret;
> + /*
> + * Logical threads which set the first bit in cpu_sibling_mask can do
> + * the update. Other sibling threads just await the completion of
> + * microcode update.
> + */
> + if ( cpumask_test_and_set_cpu(
> + cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
> + ret = microcode_update_cpu(info->buffer, info->buffer_size);
Isn't the condition inverted (i.e. missing a ! )?
Also I take it that you've confirmed that loading ucode in parallel on multiple
cores of the same socket is not a problem? The comment in the last hunk
suggests otherwise.
> + /*
> + * Increase the wait timeout to a safe value here since we're serializing
> + * the microcode update and that could take a while on a large number of
> + * CPUs. And that is fine as the *actual* timeout will be determined by
> + * the last CPU finished updating and thus cut short
> + */
> + if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
> + num_online_cpus()) )
> + panic("Timeout when finishing updating microcode");
A 3s timeout (as an example for a system with 100 CPU threads) is still
absurdly high to me, but considering you panic() anyway if you hit the
timeout the question mainly is whether there's a slim chance for this to
complete a brief moment before the timeout expires. If all goes well,
you won't come close to even 1s, but as said before - there may be
guests running, and they may become utterly confused if they don't
get any time within a second or more.
With you no longer doing things sequentially I don't, however, see why
you need to scale the timeout by CPU count.
> +
> + return ret;
> }
You're losing this return value (once for every CPU making it into this
function).
> @@ -318,26 +357,52 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void)
> buf, unsigned long len)
>
> ret = copy_from_guest(info->buffer, buf, len);
> if ( ret != 0 )
> - {
> - xfree(info);
> - return ret;
> - }
> + goto free;
>
> info->buffer_size = len;
> - info->error = 0;
> - info->cpu = cpumask_first(&cpu_online_map);
> +
> + /* cpu_online_map must not change during update */
> + if ( !get_cpu_maps() )
> + {
> + ret = -EBUSY;
> + goto free;
> + }
>
> if ( microcode_ops->start_update )
> {
> ret = microcode_ops->start_update();
> if ( ret != 0 )
> - {
> - xfree(info);
> - return ret;
> - }
> + goto put;
> }
>
> - return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
> + cpumask_empty(&info->cpus);
DYM cpumask_clear()?
> + atomic_set(&info->cpu_in, 0);
> + atomic_set(&info->cpu_out, 0);
> +
> + /*
> + * We intend to disable interrupt for long time, which may lead to
> + * watchdog timeout.
> + */
> + watchdog_disable();
> + /*
> + * Late loading dance. Why the heavy-handed stop_machine effort?
> + *
> + * -HT siblings must be idle and not execute other code while the other
> + * sibling is loading microcode in order to avoid any negative
> + * interactions cause by the loading.
> + *
> + * -In addition, microcode update on the cores must be serialized until
> + * this requirement can be relaxed in the feature. Right now, this is
> + * conservative and good.
This is the comment I've referred to above.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |