[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen vMCE bugfix: inject vMCE# to all vcpus



Jan Beulich wrote:
>>>> On 13.06.12 at 10:05, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Xen vMCE bugfix: inject vMCE# to all vcpus
>> 
>> In our test for win8 guest mce, we find a bug in that no matter what
>> SRAO/SRAR error xen inject to win8 guest, it always reboot.
>> 
>> The root cause is, current Xen vMCE logic inject vMCE# only to
>> vcpu0, this is not correct for Intel MCE (Under Intel arch, h/w
>> generate MCE# to all CPUs). 
>> 
>> This patch fix vMCE injection bug, injecting vMCE# to all vcpus.
> 
> I see no correlation between the fix (and its description) and the
> problem at hand: Why would Win8 reboot if it doesn't receive a
> particular MCE on all CPUs? Isn't that model specific behavior?

It's not model specific. For Intel processor, MCE# is broadcast to all logical 
processors on the system on which the UCR errors are supported (refer Intel SDM 
15.9.3.1 & 15.9.3.2). So for vMCE injection under Intel arch, it's a bug if 
inject vMCE# only to vcpu0.

As for why win8 reboot, I guess it need wait all cpus enter mce handler and 
then go ahead. If timeout fail, it will reboot. I have no win8 code, I can only 
infer so, and test result prove this point.
(It's reasonable of win8 doing so, as Intel's SDM suggestion, due to the 
potentially shared machine check MSR resources among the logical processors on 
the same package/core, the MCE handler may be required to synchronize with the 
other processors that received a machine check error and serialize access to 
the machine check registers when analyzing, logging and clearing the 
information in the machine check registers)

> 
> Furthermore I doubt that an MCE on one socket indeed causes
> MCE-s on all other sockets, not to speak of distinct NUMA nodes
> (it would already surprise me if MCE-s got broadcast across cores
> within a socket, unless they are caused by a resource shared
> across cores).

Somehow I agree. But at least currently from Intel SDM, architecturally it 
would broadcast.

> 
>> --- a/xen/arch/x86/cpu/mcheck/mce_intel.c    Tue Jun 05 03:18:00 2012
>> +0800 +++ b/xen/arch/x86/cpu/mcheck/mce_intel.c      Wed Jun 13 23:40:45
>>      2012 +0800 @@ -638,6 +638,32 @@ return rec;
>>  }
>> 
>> +static int inject_vmce(struct domain *d)
> 
> Is it really necessary to move this vendor independent function
> into a vendor specific source file?

For AMD, I'm not quite sure if inject vMCE to all vcpus is proper for its arch. 
I remember AMD use nmi/ single MCE#/ broadcast MCE# in their different 
platforms.

BTW, currently in Xen mce logic, inject_vmce() is only used by Intel mce, so I 
put it into Intel mce logic to avoid potential issue. Intel mce and AMD mce 
differs in some points (including MCE# injection), so we'd better not spool 
together. Only if we are totally sure some logic is safe to co-used by Intel 
and AMD will we put it into common mce code.

> 
>> +{
>> +    struct vcpu *v;
>> +
>> +    /* inject vMCE to all vcpus */
>> +    for_each_vcpu(d, v)
>> +    {
>> +        if ( !test_and_set_bool(v->mce_pending) &&
>> +            ((d->is_hvm) ? 1 :
>> +            guest_has_trap_callback(d, v->vcpu_id,
>> TRAP_machine_check)) ) 
> 
> Quite strange a way to say
> 
>             (d->is_hvm || guest_has_trap_callback(d, v->vcpu_id,
> TRAP_machine_check)) 

For hvm it's OK to just set v->mce_pending. While for pv it need also check if 
guest callback has been registered.

> 
>> +        {
>> +            mce_printk(MCE_VERBOSE, "MCE: inject vMCE to dom%d
>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>> +            vcpu_kick(v);
>> +        }
>> +        else
>> +        {
>> +            mce_printk(MCE_QUIET, "Fail to inject vMCE to dom%d
>> vcpu%d\n", +                       d->domain_id, v->vcpu_id);
>> +            return -1;
> 
> Why do you bail here? This is particularly bad if v->mce_pending
> was already set on some vCPU (as that could simply mean the guest
> just didn't get around to handle the vMCE yet).

the case v->mce_pending already set while new vmce come will result in domain 
crash.
I don't think guest can still handle this case, e.g. under baremetal if 2nd mce 
occur while 1st mce have not been handled os will reboot directly.

> 
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  static void intel_memerr_dhandler(
>>               struct mca_binfo *binfo,
>>               enum mce_result *result,
> 
> Also, how does this whole change interact with vmce_{rd,wr}msr()?
> The struct bank_entry instances live on a per-domain list, so the
> vMCE being delivered to all vCPU-s means they will all race for the
> single entry (and might erroneously access others, particularly in
> vmce_wrmsr()'s MCG_STATUS handling).

Yes, I agree.
However, recently we have done vMCE re-design work (ongoing some internal 
review) and would present sooner. In new approach, MSRs are per-vcpu instead of 
per-domain.  MSRs race issue (and the effort to make it clean) would be 
pointless then. So at this butfix patch, I only touch vMCE# injection itself.

Thanks,
Jinsong
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.