Xen project Mailing List

Re: [Xen-devel] [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus

From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>

Date: Wed, 13 Jun 2012 10:49:55 +0000

Accept-language: en-US

Cc: Keir Fraser <keir@xxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 13 Jun 2012 10:50:18 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHNSUILOYFQl4PoQNeTRPYYZQub0Zb3/xCg

Thread-topic: [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus

Jan Beulich wrote: >>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote: >> Xen vMCE bugfix: inject vMCE# to all vcpus >> >> In our test for win8 guest mce, we find a bug in that no matter what >> SRAO/SRAR error xen inject to win8 guest, it always reboot. >> >> The root cause is, current Xen vMCE logic inject vMCE# only to >> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w >> generate MCE# to all CPUs). >> >> This patch fix vMCE injection bug, injecting vMCE# to all vcpus. > > I see no correlation between the fix (and its description) and the > problem at hand: Why would Win8 reboot if it doesn't receive a > particular MCE on all CPUs? Isn't that model specific behavior? It's not model specific. For Intel processor, MCE# is broadcast to all logical processors on the system on which the UCR errors are supported (refer Intel SDM 15.9.3.1 & 15.9.3.2). So for vMCE injection under Intel arch, it's a bug if inject vMCE# only to vcpu0. As for why win8 reboot, I guess it need wait all cpus enter mce handler and then go ahead. If timeout fail, it will reboot. I have no win8 code, I can only infer so, and test result prove this point. (It's reasonable of win8 doing so, as Intel's SDM suggestion, due to the potentially shared machine check MSR resources among the logical processors on the same package/core, the MCE handler may be required to synchronize with the other processors that received a machine check error and serialize access to the machine check registers when analyzing, logging and clearing the information in the machine check registers) > > Furthermore I doubt that an MCE on one socket indeed causes > MCE-s on all other sockets, not to speak of distinct NUMA nodes > (it would already surprise me if MCE-s got broadcast across cores > within a socket, unless they are caused by a resource shared > across cores). Somehow I agree. But at least currently from Intel SDM, architecturally it would broadcast. > >> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c Tue Jun 05 03:18:00 2012 >> +0800 +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c Wed Jun 13 23:40:45 >> 2012 +0800 @@ -638,6 +638,32 @@ return rec; >> } >> >> +static int inject_vmce(struct domain *d) > > Is it really necessary to move this vendor independent function > into a vendor specific source file? For AMD, I'm not quite sure if inject vMCE to all vcpus is proper for its arch. I remember AMD use nmi/ single MCE#/ broadcast MCE# in their different platforms. BTW, currently in Xen mce logic, inject_vmce() is only used by Intel mce, so I put it into Intel mce logic to avoid potential issue. Intel mce and AMD mce differs in some points (including MCE# injection), so we'd better not spool together. Only if we are totally sure some logic is safe to co-used by Intel and AMD will we put it into common mce code. > >> +{ >> + struct vcpu *v; >> + >> + /* inject vMCE to all vcpus */ >> + for_each_vcpu(d, v) >> + { >> + if ( !test_and_set_bool(v->mce_pending) && >> + ((d->is_hvm) ? 1 : >> + guest_has_trap_callback(d, v->vcpu_id, >> TRAP_machine_check)) ) > > Quite strange a way to say > > (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id, > TRAP_machine_check)) For hvm it's OK to just set v->mce_pending. While for pv it need also check if guest callback has been registered. > >> + { >> + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d >> vcpu%d\n", + d->domain_id, v->vcpu_id); >> + vcpu_kick(v); >> + } >> + else >> + { >> + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d >> vcpu%d\n", + d->domain_id, v->vcpu_id); >> + return -1; > > Why do you bail here? This is particularly bad if v->mce_pending > was already set on some vCPU (as that could simply mean the guest > just didn't get around to handle the vMCE yet). the case v->mce_pending already set while new vmce come will result in domain crash. I don't think guest can still handle this case, e.g. under baremetal if 2nd mce occur while 1st mce have not been handled os will reboot directly. > >> + } >> + } >> + >> + return 0; >> +} >> + >> static void intel_memerr_dhandler( >> struct mca_binfo *binfo, >> enum mce_result *result, > > Also, how does this whole change interact with vmce_{rd,wr}msr()? > The struct bank_entry instances live on a per-domain list, so the > vMCE being delivered to all vCPU-s means they will all race for the > single entry (and might erroneously access others, particularly in > vmce_wrmsr()'s MCG_STATUS handling). Yes, I agree. However, recently we have done vMCE re-design work (ongoing some internal review) and would present sooner. In new approach, MSRs are per-vcpu instead of per-domain. MSRs race issue (and the effort to make it clean) would be pointless then. So at this butfix patch, I only touch vMCE# injection itself. Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.