Xen project Mailing List

Re: [Xen-devel] [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus

To: "Jinsong Liu" <jinsong.liu@xxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Wed, 13 Jun 2012 09:53:47 +0100

Cc: Keir Fraser <keir@xxxxxxx>, Yunhong Jiang <yunhong.jiang@xxxxxxxxx>, Haitao Shan <haitao.shan@xxxxxxxxx>, Xiantao Zhang <xiantao.zhang@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 13 Jun 2012 08:53:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote: > Xen vMCE bugfix: inject vMCE# to all vcpus > > In our test for win8 guest mce, we find a bug in that no matter what > SRAO/SRAR > error xen inject to win8 guest, it always reboot. > > The root cause is, current Xen vMCE logic inject vMCE# only to vcpu0, this > is > not correct for Intel MCE (Under Intel arch, h/w generate MCE# to all CPUs). > > This patch fix vMCE injection bug, injecting vMCE# to all vcpus. I see no correlation between the fix (and its description) and the problem at hand: Why would Win8 reboot if it doesn't receive a particular MCE on all CPUs? Isn't that model specific behavior? Furthermore I doubt that an MCE on one socket indeed causes MCE-s on all other sockets, not to speak of distinct NUMA nodes (it would already surprise me if MCE-s got broadcast across cores within a socket, unless they are caused by a resource shared across cores). > --- a/xen/arch/x86/cpu/mcheck/mce_intel.c Tue Jun 05 03:18:00 2012 +0800 > +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c Wed Jun 13 23:40:45 2012 +0800 > @@ -638,6 +638,32 @@ > return rec; > } > > +static int inject_vmce(struct domain *d) Is it really necessary to move this vendor independent function into a vendor specific source file? > +{ > + struct vcpu *v; > + > + /* inject vMCE to all vcpus */ > + for_each_vcpu(d, v) > + { > + if ( !test_and_set_bool(v->mce_pending) && > + ((d->is_hvm) ? 1 : > + guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) ) Quite strange a way to say (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id, TRAP_machine_check)) > + { > + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d vcpu%d\n", > + d->domain_id, v->vcpu_id); > + vcpu_kick(v); > + } > + else > + { > + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d vcpu%d\n", > + d->domain_id, v->vcpu_id); > + return -1; Why do you bail here? This is particularly bad if v->mce_pending was already set on some vCPU (as that could simply mean the guest just didn't get around to handle the vMCE yet). > + } > + } > + > + return 0; > +} > + > static void intel_memerr_dhandler( > struct mca_binfo *binfo, > enum mce_result *result, Also, how does this whole change interact with vmce_{rd,wr}msr()? The struct bank_entry instances live on a per-domain list, so the vMCE being delivered to all vCPU-s means they will all race for the single entry (and might erroneously access others, particularly in vmce_wrmsr()'s MCG_STATUS handling). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.