[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen/MCE: adjust for future new vMCE model



Jan Beulich wrote:
>>>> On 05.07.12 at 11:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Jan Beulich wrote:
>>>>>> On 05.07.12 at 05:30, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>> wrote: @@ -68,7 +74,7 @@
>>>> 
>>>>  int vmce_restore_vcpu(struct vcpu *v, uint64_t caps)  {
>>>> -    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
>>>> +    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT )
>>>>      {
>>>>          dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA
>>>>                  capabilities" " %#" PRIx64 " for d%d:v%u
>>>>          (supported: %#Lx)\n", @@ -77,6 +83,12 @@ return -EPERM;  
>>>> } 
>>>> 
>>>> +    if ( (caps & MCG_CAP_COUNT) != GUEST_BANK_NUM ) +    {
>>>> +        dprintk(XENLOG_G_ERR, "Bank number inconsistency\n"); +
>>>> return -EPERM; +    } +
>>>>      v->arch.mcg_cap = caps;
>>>>      return 0;
>>>>  }
>>> 
>>> These two changes, as pointed out before, need some further
>>> thought and tweaking: As I said earlier, we are already shipping
>>> the code in its prior form, so outright rejecting MCG_CTL_P set
>>> and non-default bank counts is not a proper solution. Warning
>>> about them being in an unsupported state is certainly acceptable.
>>> 
>>> And I think the guest visible MCG_CAP value also shouldn't
>>> change across migration, so tolerating (but ignoring) a higher
>>> bank count seems necessary. Not sure what to do when the
>>> bank count is lower (0 or 1) - for 1, all communication to the
>>> guest should probably go through bank 0, while 0 should
>>> probably disable vMCE  for that vCPU.
>>> 
>>> Further I just noticed that you don't touch fill_vmsr_data() at
>>> all (sending patches created with diff's -p option or equivalent
>>> helps spotting where individual changes belong), yet that
>>> function uses the hardware bank number to fill the struct
>>> bank_entry. With the intended concept, the "bank" member
>>> of that structure can probably go away altogether.
>>> 
>>> Jan
>> 
>> 
>> Seems things become a little bit chaos, let's jump out from details,
>> make agreement first about what's the general purpose of this
>> middle-work patch. 
>> 
>> ============
>> 1. current xen vmce status
>>   1.1) current xen vmce has 2 kinds of bugs:
>>     * functional bug: it actually doesn't work correctly for guest;
>>     * migration bug: partly solved by c/s 24887;
>>   1.2) c/s 24887 not in formal xen release version, it's a temporary
>> fix (but unfortunately has been backported to SELS11 sp2)
>> ============ 
>> 2. target of this middle-work patch
>>   2.1) it's not used to address functional bug
>>   2.3) it does minimal work, just in order not to bring
>>     trouble/dirty to future new vMCE, so it would * remove MCG_CTL
>> --> otherwise new vMCE have to add useless MCG_CTL_P and MCG_CTL, w/
>> potential model specific issue; 
>>     * stick all 1's to MCi_CTL --> to avoid semantic issue;
>>     * set MCG_CTL_P=0 and 2 banks at MCG_CAP --> otherwise dirty
>> code at new vMCE, like bank number issue; ============
>> 
>> I think in this middle-work patch, we don't need constrained by c/s
>> 24887: 
>> 1. c/s 24887 not in formal xen release
>> 2. the benefit of keep compatible w/ 24887 is minor:
>>   * currently all xen version have migration bug, 3.x -> 4.0 -> 4.1
>> -> 4.2 
>>   * keep compatible w/ 24887 only benefit one case --> migration
>> from SLES11 sp2 to xen 4.2 (for both SLES11 sp2 and xen 4.2, vMCE
>> actually doesn't functionally work fine to guest)
>> 3. the price/hurt to future vMCE is high (to keep compatible w/
>> 24887) 
> 
> Why do you consider the price high? I think it would require pretty
> minor changes.

No, what I meant is not coding price. I mean the price that have to add dirty 
and useless change to the new vMCE is high. Just curious why we cannot simply 
get rid of c/s 24887? after all it only benefit SLES11 sp2.

> 
>> Considering above, I prefer an outright cleanup in xen4.2, not
>> constrained by c/s 24887 and not bring trouble/dirty to future vMCE.
> 
> Whether or not is was appropriate for us to backport this change
> early is unclear, but given that back at that time I had already
> pointed out the problems and asked for work to be done to clean
> it up, and given that it has taken over four months to get going
> with this, I don't think you would suggest the alternative of us
> having left the bug unfixed for this entire time period.

Jan, I'm not challenge why you backport to SLES11 sp2. If there is anything 
wrong, I agree it's my fault. Currently what I concern is if we can do an 
outright cleanup at xen4.2 so that future vMCE could be simple and clean.

> 
> So unless the price to pay is unreasonably high, I'd favor getting
> this taken care of rather than ignored.

If so, why we need this middle-work patch? It could just keep current status at 
xen 4.2, then start 'dirty' new vMCE implement at xen4.3 --> and the 'dirty' 
inherit from c/s 24887 which from code point of view would be totally removed 
at new vMCE.

Thanks,
Jinsong

> 
> If I can find time to work on your last version of the patch towards
> the direction I have in mind tomorrow, I'll do so; I'll be gone for
> two weeks afterwards (and be mostly invisible for another one),
> so wouldn't be able to look at this again presumably until the
> beginning of August.
> 
> Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.