|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus
>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> Xen vMCE bugfix: inject vMCE# to all vcpus
>
> In our test for win8 guest mce, we find a bug in that no matter what
> SRAO/SRAR
> error xen inject to win8 guest, it always reboot.
>
> The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this
> is
> not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs).
>
> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.
I see no correlation between the fix (and its description) and the
problem at hand: Why would Win8 reboot if it doesn't receive a
particular MCE on all CPUs? Isn't that model specific behavior?
Furthermore I doubt that an MCE on one socket indeed causes
MCE-s on all other sockets, not to speak of distinct NUMA nodes
(it would already surprise me if MCE-s got broadcast across cores
within a socket, unless they are caused by a resource shared
across cores).
> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c Tue Jun 05 03:18:00 2012 +0800
> +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c Wed Jun 13 23:40:45 2012 +0800
> @@ -638,6 +638,32 @@
> return rec;
> }
>
> +static int inject_vmce(struct domain *d)
Is it really necessary to move this vendor independent function
into a vendor specific source file?
> +{
> + struct vcpu *v;
> +
> + /* inject vMCE to all vcpus */
> + for_each_vcpu(d, v)
> + {
> + if ( !test_and_set_bool(v->mce_pending) &&
> + ((d->is_hvm) ? 1 :
> + guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) )
Quite strange a way to say
(d->is_hvm || guest_has_trap_callback(d, v->vcpu_id,
TRAP_machine_check))
> + {
> + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n",
> + d->domain_id, v->vcpu_id);
> + vcpu_kick(v);
> + }
> + else
> + {
> + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n",
> + d->domain_id, v->vcpu_id);
> + return -1;
Why do you bail here? This is particularly bad if v->mce_pending
was already set on some vCPU (as that could simply mean the guest
just didn't get around to handle the vMCE yet).
> + }
> + }
> +
> + return 0;
> +}
> +
> static void intel_memerr_dhandler(
> struct mca_binfo *binfo,
> enum mce_result *result,
Also, how does this whole change interact with vmce_{rd,wr}msr()?
The struct bank_entry instances live on a per-domain list, so the
vMCE being delivered to all vCPU-s means they will all race for the
single entry (and might erroneously access others, particularly in
vmce_wrmsr()'s MCG_STATUS handling).
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |