[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] Xen/MCE: adjust for future new vMCE model



Jan Beulich wrote:
>>>> On 05.07.12 at 14:46, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
>> Jan Beulich wrote:
>>>>>> On 05.07.12 at 11:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>> wrote: 
>>>> Jan Beulich wrote:
>>>>>>>> On 05.07.12 at 05:30, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
>>>>>>>> wrote: @@ -68,7 +74,7 @@
>>>>>> 
>>>>>>  int vmce_restore_vcpu(struct vcpu *v, uint64_t caps)  {
>>>>>> -    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P )
>>>>>> +    if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT )
>>>>>>      {
>>>>>>          dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA
>>>>>>                  capabilities" " %#" PRIx64 " for d%d:v%u
>>>>>>          (supported: %#Lx)\n", @@ -77,6 +83,12 @@ return -EPERM;
>>>>>> } 
>>>>>> 
>>>>>> +    if ( (caps & MCG_CAP_COUNT) != GUEST_BANK_NUM ) +    {
>>>>>> +        dprintk(XENLOG_G_ERR, "Bank number inconsistency\n"); +
>>>>>>      return -EPERM; +    } + v->arch.mcg_cap = caps;
>>>>>>      return 0;
>>>>>>  }
>>>>> 
>>>>> These two changes, as pointed out before, need some further
>>>>> thought and tweaking: As I said earlier, we are already shipping
>>>>> the code in its prior form, so outright rejecting MCG_CTL_P set
>>>>> and non-default bank counts is not a proper solution. Warning
>>>>> about them being in an unsupported state is certainly acceptable.
>>>>> 
>>>>> And I think the guest visible MCG_CAP value also shouldn't
>>>>> change across migration, so tolerating (but ignoring) a higher
>>>>> bank count seems necessary. Not sure what to do when the
>>>>> bank count is lower (0 or 1) - for 1, all communication to the
>>>>> guest should probably go through bank 0, while 0 should
>>>>> probably disable vMCE  for that vCPU.
>>>>> 
>>>>> Further I just noticed that you don't touch fill_vmsr_data() at
>>>>> all (sending patches created with diff's -p option or equivalent
>>>>> helps spotting where individual changes belong), yet that
>>>>> function uses the hardware bank number to fill the struct
>>>>> bank_entry. With the intended concept, the "bank" member
>>>>> of that structure can probably go away altogether.
>>>>> 
>>>>> Jan
>>>> 
>>>> 
>>>> Seems things become a little bit chaos, let's jump out from
>>>> details, make agreement first about what's the general purpose of
>>>> this middle-work patch. 
>>>> 
>>>> ============
>>>> 1. current xen vmce status
>>>>   1.1) current xen vmce has 2 kinds of bugs:
>>>>     * functional bug: it actually doesn't work correctly for guest;
>>>>     * migration bug: partly solved by c/s 24887;
>>>>   1.2) c/s 24887 not in formal xen release version, it's a
>>>> temporary fix (but unfortunately has been backported to SELS11
>>>> sp2) ============ 
>>>> 2. target of this middle-work patch
>>>>   2.1) it's not used to address functional bug
>>>>   2.3) it does minimal work, just in order not to bring
>>>>     trouble/dirty to future new vMCE, so it would * remove MCG_CTL
>>>> --> otherwise new vMCE have to add useless MCG_CTL_P and MCG_CTL,
>>>> w/ potential model specific issue;
>>>>     * stick all 1's to MCi_CTL --> to avoid semantic issue;
>>>>     * set MCG_CTL_P=0 and 2 banks at MCG_CAP --> otherwise dirty
>>>> code at new vMCE, like bank number issue; ============
>>>> 
>>>> I think in this middle-work patch, we don't need constrained by
>>>> c/s 24887: 
>>>> 1. c/s 24887 not in formal xen release
>>>> 2. the benefit of keep compatible w/ 24887 is minor:
>>>>   * currently all xen version have migration bug, 3.x -> 4.0 ->
>>>> 4.1 -> 4.2 
>>>>   * keep compatible w/ 24887 only benefit one case --> migration
>>>> from SLES11 sp2 to xen 4.2 (for both SLES11 sp2 and xen 4.2, vMCE
>>>> actually doesn't functionally work fine to guest)
>>>> 3. the price/hurt to future vMCE is high (to keep compatible w/
>>>> 24887)
>>> 
>>> Why do you consider the price high? I think it would require pretty
>>> minor changes.
>> 
>> No, what I meant is not coding price. I mean the price that have to
>> add 
>> dirty and useless change to the new vMCE is high. Just curious why
>> we cannot simply get rid of c/s 24887? after all it only benefit
>> SLES11 sp2. 
> 
> This is what save MCG_CAP, and that this is necessary we agreed
> on iirc. So why would you want to drop that code all of the sudden?

The original idea we buy in save/restore MCG_CAP is for the sake of future 
capability bits exentsion of MCG_CAP. I didn't image keep a useless bit in new 
vMCE.

> 
>>>> Considering above, I prefer an outright cleanup in xen4.2, not
>>>> constrained by c/s 24887 and not bring trouble/dirty to future
>>>> vMCE. 
>>> 
>>> Whether or not is was appropriate for us to backport this change
>>> early is unclear, but given that back at that time I had already
>>> pointed out the problems and asked for work to be done to clean
>>> it up, and given that it has taken over four months to get going
>>> with this, I don't think you would suggest the alternative of us
>>> having left the bug unfixed for this entire time period.
>> 
>> Jan, I'm not challenge why you backport to SLES11 sp2. If there is
>> anything wrong, I agree it's my fault. Currently what I concern is
>> if we can do an outright cleanup at xen4.2 so that future vMCE could
>> be simple and clean. 
> 
> But we can't tell our customers that migration from, say, SP2 to
> SP3 won't be possible.

Somehow I agree, though not 100%, say, how can you tell customers that migrate 
from sp1 to sp2 (and former) would block? AFAICT it only benefit a little 
earlier to sp2 (in our solution it delay to sp3).

> 
>>> So unless the price to pay is unreasonably high, I'd favor getting
>>> this taken care of rather than ignored.
>> 
>> If so, why we need this middle-work patch? It could just keep
>> current status at xen 4.2, then start 'dirty' new vMCE implement at
>> xen4.3 --> and the 'dirty' inherit from c/s 24887 which from code
>> point of view would be totally removed at new vMCE.
> 
> No, what we now want is to complete said c/s. While at that time
> I thought we would need to save MCi_CTL, with the new concept
> we won't need to. Instead, we need to enforce it to be all ones
> universally.
> 
> The only compatibility thing is the need to deal with higher bank
> counts perceived to be available by guests, and the possibly
> previously seen set MCG_CTL_P bit.
> 
> And yes, if this created a lot of ugly and otherwise unnecessary
> code, I'd agree not to care for this compatibility. But as I don't
> expect this to be the case, I thought I'd still ask for it (since if
> we don't do it in -unstable, we'll have to carry a private patch
> in SLE to do so, and presumably for much longer than the 4.3
> development period).
> 
> Jan

If so, the only thing we need do in this middle-work patch is to remove mci_ctl 
bank array and stick all 1's to MCi_CTL;


Thanks,
Jinsong

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.