[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] misc/xenmicrocode: Upload /lib/firmware/<some blob> to the hypervisor

On Thu, Jan 29, 2015 at 04:21:05AM +0100, Luis R. Rodriguez wrote:
> How close?

As close as we can get but not closer - see the thing about updating
microcode on Intel hyperthreaded logical cores in the other mail.

We probably can do it in parallel if needed. But it hasn't been needed
until now.

> I've reviewed the implmentation a bit more on the Xen side. For early boot
> things look similar to what is done upstream on the kernel. For the run time
> update here's what Xen does in detail, elaborating a bit more on Andrew's
> summary of how it works.
> The XENPF_microcode_update hypercall calls the general Xen microcode_update() 
> which
> will do microcode_ops->start_update() (only AMD has this op for and it does
> svm_host_osvw_reset()) and finally it continues the hypercall by calling
> do_microcode_update() on the cpumask_first(&cpu_online_map) *always*. The
> mechanism that Xen uses to continue the hypercall is by using
> continue_hypercall_on_cpu(), if this returns 0 then it is guaranteed to run
> *at some in the future* on the given CPU. If preemption is enabled this
> could also mean the hypercall was preempted, and can be preempted later
> on the other CPU. This will in turn will do the same call but on the
> next CPU using continuation until it reaches the end of the CPU mask.
> The do_microcode_update() call itself calls ops->cpu_request_microcode()
> on each iteration which in turn should also do the ops->apply_microcode()
> once a microcode buffer on the file that fits is found. The buffers are
> kept in case of suspend / resume.

Yah, this is mostly fine except the preemption thing. If the guests get
to see an inconsistent state with a subset of the cores updated and the
rest not, then that is bad.

Not to mention the case when we have to late-update problematic
microcode which has to happen in parallel on each core. I haven't seen
one so far but we should be prepared.

> There is no tight loop here or locking of what other CPUs do while one is 
> running
> work to update microcode. Tons of things can happen in between so some 
> refinements
> seem desirable and likely this implementation does differ quite signifantly
> over the Linux kernel's legacy 'rescan' interface.

Well, we're not very strict there either but that works so far. We'll
change it if the need arises.

> Given this review, it seems folks should use xenmicrocode keeping in mind the
> above algorithm, and support wise folks should be ready to consider upgrades 
> on
> microcode and possible issues / caveats from vendors on a case by case basis.


> From what I gather some folks have even considered tainting kernels when the
> sysfs rescan interface is used, I do wonder if this is worthy on Xen for this
> tool given the possible issues here... or am I just paranoid about this?
> It seems like this might be more severe of an issue for Xen as-is.

That would not be unnecesary.

So, I would try to do the application of the microcode in the hypervisor
as tight as possible. Maybe the hypercall could hand in the microcode
blob only and the hypervisor can "schedule" an update later, after
having frozen the guests.

In any case, we should strive for a parallel late update, simultaneously
on each core with no interruption. The kernel doesn't do that either but
it'll probably have to, one day.

I don't know whether this is possible at all in xen and whether doing
a simple sequential update method now and improving it later is easier
than doing it right and in parallel from the get-go. I'm talking
hypothetically here, I have no idea what actually is possible and doable
in xen.



ECO tip #101: Trim your mails when you reply.

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.