Xen project Mailing List

Re: [Xen-devel] [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus

From: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>

Date: Wed, 13 Jun 2012 12:37:39 +0000

Accept-language: en-US

Cc: Keir Fraser <keir@xxxxxxx>, "Jiang, Yunhong" <yunhong.jiang@xxxxxxxxx>, "Shan, Haitao" <haitao.shan@xxxxxxxxx>, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Wed, 13 Jun 2012 12:43:30 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHNSVltOYFQl4PoQNeTRPYYZQub0Zb4J+FA

Thread-topic: [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus

Jan Beulich wrote: >>>> On 13.06.12 at 12:49, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote: >> Jan Beulich wrote: >>>>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>> wrote: >>>> Xen vMCE bugfix: inject vMCE# to all vcpus >>>> >>>> In our test for win8 guest mce, we find a bug in that no matter >>>> what SRAO/SRAR error xen inject to win8 guest, it always reboot. >>>> >>>> The root cause is, current Xen vMCE logic inject vMCE# only to >>>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w >>>> generate MCE# to all CPUs). >>>> >>>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus. >>> >>> I see no correlation between the fix (and its description) and the >>> problem at hand: Why would Win8 reboot if it doesn't receive a >>> particular MCE on all CPUs? Isn't that model specific behavior? >> >> It's not model specific. For Intel processor, MCE# is broadcast to >> all logical processors on the system on which the UCR errors are >> supported (refer Intel SDM 15.9.3.1 & 15.9.3.2). So for vMCE >> injection under Intel arch, it's a bug if inject vMCE# only to vcpu0. > > This is for a certain group of errors, but I can't find any statement > that this is general architectural behavior. Xen mce only inject SRAO/SRAR error to guest, which both belong to UCR error. For other error like CE and UCNA, hypervisor just virq to dom0 mcelog for logging. > >>> Furthermore I doubt that an MCE on one socket indeed causes >>> MCE-s on all other sockets, not to speak of distinct NUMA nodes >>> (it would already surprise me if MCE-s got broadcast across cores >>> within a socket, unless they are caused by a resource shared >>> across cores). >> >> Somehow I agree. But at least currently from Intel SDM, >> architecturally it would broadcast. > > I can't even see how this would work reliably across packages or > nodes: Suppose two entities each encounter a UCR - they would > each signal the other one, and the handling of this signal would > need to be synchronized with the handling of the local UCR > (irrespective of the order in which the events arrive). I don't think h/w story being so, though I in fact don't have all details. Basically, processor is just a 'report point' (via bank) to indicate what error occur. For example, error (mostly) generated at memory and transfer through footpoints e.g. memory --> memory controller --> QPI --> cache controller --> ... --> ... --> cpu core. It's not one processor signaling the other cpu. A mca mechanism monitor different units decide when and what error reported, and notify all processors (that's what broadcast mean). > >>>> + if ( !test_and_set_bool(v->mce_pending) && >>>> + ((d->is_hvm) ? 1 : >>>> + guest_has_trap_callback(d, v->vcpu_id, >>>> TRAP_machine_check)) ) >>> >>> Quite strange a way to say >>> >>> (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id, >>> TRAP_machine_check)) >> >> For hvm it's OK to just set v->mce_pending. While for pv it need >> also check if guest callback has been registered. > > Sure. I was just asking why you use a conditional operator when > the simpler || would do. Ah I see. It's better. > >>>> + { >>>> + mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d >>>> vcpu%d\n", + d->domain_id, v->vcpu_id); + >>>> vcpu_kick(v); + } >>>> + else >>>> + { >>>> + mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d >>>> vcpu%d\n", + d->domain_id, v->vcpu_id); >>>> + return -1; >>> >>> Why do you bail here? This is particularly bad if v->mce_pending >>> was already set on some vCPU (as that could simply mean the guest >>> just didn't get around to handle the vMCE yet). >> >> the case v->mce_pending already set while new vmce come will result >> in domain crash. I don't think guest can still handle this case, >> e.g. under baremetal if 2nd mce occur while 1st mce have not been >> handled os will reboot directly. > > Sorry, no. This goes back to the questionable nature of the > broadcasting above - if a CPU encounters a UCR itself and gets > signaled of a remote CPU having encountered one, it should be > able to cope. Otherwise the risk of (pointless!) system death > would increase with the number of CPUs. > >>> Also, how does this whole change interact with vmce_{rd,wr}msr()? >>> The struct bank_entry instances live on a per-domain list, so the >>> vMCE being delivered to all vCPU-s means they will all race for the >>> single entry (and might erroneously access others, particularly in >>> vmce_wrmsr()'s MCG_STATUS handling). >> >> Yes, I agree. >> However, recently we have done vMCE re-design work (ongoing some >> internal review) and would present sooner. In new approach, MSRs are >> per-vcpu instead of per-domain. MSRs race issue (and the effort to >> make it clean) would be pointless then. So at this butfix patch, I >> only touch vMCE# injection itself. > > I understand that you want the simpler version as bug fix, but > you didn't answer whether it was at all verified that the single > per-domain entry logic would actually cope with the broadcast > delivery logic you're adding here. I'm afraid the per-vCPU entry > logic is a prerequisite to this change (in whatever form it may end > up going in). > > Jan Agree, let's hold it. After we present new vMCE approach, we can do it at that time, and no race issue then. Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.