[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Xen/MCE: adjust for future new vMCE model
Jan Beulich wrote: >>>> On 05.07.12 at 14:46, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote: >> Jan Beulich wrote: >>>>>> On 05.07.12 at 11:20, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>> wrote: >>>> Jan Beulich wrote: >>>>>>>> On 05.07.12 at 05:30, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> >>>>>>>> wrote: @@ -68,7 +74,7 @@ >>>>>> >>>>>> int vmce_restore_vcpu(struct vcpu *v, uint64_t caps) { >>>>>> - if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT & ~MCG_CTL_P ) >>>>>> + if ( caps & ~g_mcg_cap & ~MCG_CAP_COUNT ) >>>>>> { >>>>>> dprintk(XENLOG_G_ERR, "%s restore: unsupported MCA >>>>>> capabilities" " %#" PRIx64 " for d%d:v%u >>>>>> (supported: %#Lx)\n", @@ -77,6 +83,12 @@ return -EPERM; >>>>>> } >>>>>> >>>>>> + if ( (caps & MCG_CAP_COUNT) != GUEST_BANK_NUM ) + { >>>>>> + dprintk(XENLOG_G_ERR, "Bank number inconsistency\n"); + >>>>>> return -EPERM; + } + v->arch.mcg_cap = caps; >>>>>> return 0; >>>>>> } >>>>> >>>>> These two changes, as pointed out before, need some further >>>>> thought and tweaking: As I said earlier, we are already shipping >>>>> the code in its prior form, so outright rejecting MCG_CTL_P set >>>>> and non-default bank counts is not a proper solution. Warning >>>>> about them being in an unsupported state is certainly acceptable. >>>>> >>>>> And I think the guest visible MCG_CAP value also shouldn't >>>>> change across migration, so tolerating (but ignoring) a higher >>>>> bank count seems necessary. Not sure what to do when the >>>>> bank count is lower (0 or 1) - for 1, all communication to the >>>>> guest should probably go through bank 0, while 0 should >>>>> probably disable vMCE for that vCPU. >>>>> >>>>> Further I just noticed that you don't touch fill_vmsr_data() at >>>>> all (sending patches created with diff's -p option or equivalent >>>>> helps spotting where individual changes belong), yet that >>>>> function uses the hardware bank number to fill the struct >>>>> bank_entry. With the intended concept, the "bank" member >>>>> of that structure can probably go away altogether. >>>>> >>>>> Jan >>>> >>>> >>>> Seems things become a little bit chaos, let's jump out from >>>> details, make agreement first about what's the general purpose of >>>> this middle-work patch. >>>> >>>> ============ >>>> 1. current xen vmce status >>>> 1.1) current xen vmce has 2 kinds of bugs: >>>> * functional bug: it actually doesn't work correctly for guest; >>>> * migration bug: partly solved by c/s 24887; >>>> 1.2) c/s 24887 not in formal xen release version, it's a >>>> temporary fix (but unfortunately has been backported to SELS11 >>>> sp2) ============ >>>> 2. target of this middle-work patch >>>> 2.1) it's not used to address functional bug >>>> 2.3) it does minimal work, just in order not to bring >>>> trouble/dirty to future new vMCE, so it would * remove MCG_CTL >>>> --> otherwise new vMCE have to add useless MCG_CTL_P and MCG_CTL, >>>> w/ potential model specific issue; >>>> * stick all 1's to MCi_CTL --> to avoid semantic issue; >>>> * set MCG_CTL_P=0 and 2 banks at MCG_CAP --> otherwise dirty >>>> code at new vMCE, like bank number issue; ============ >>>> >>>> I think in this middle-work patch, we don't need constrained by >>>> c/s 24887: >>>> 1. c/s 24887 not in formal xen release >>>> 2. the benefit of keep compatible w/ 24887 is minor: >>>> * currently all xen version have migration bug, 3.x -> 4.0 -> >>>> 4.1 -> 4.2 >>>> * keep compatible w/ 24887 only benefit one case --> migration >>>> from SLES11 sp2 to xen 4.2 (for both SLES11 sp2 and xen 4.2, vMCE >>>> actually doesn't functionally work fine to guest) >>>> 3. the price/hurt to future vMCE is high (to keep compatible w/ >>>> 24887) >>> >>> Why do you consider the price high? I think it would require pretty >>> minor changes. >> >> No, what I meant is not coding price. I mean the price that have to >> add >> dirty and useless change to the new vMCE is high. Just curious why >> we cannot simply get rid of c/s 24887? after all it only benefit >> SLES11 sp2. > > This is what save MCG_CAP, and that this is necessary we agreed > on iirc. So why would you want to drop that code all of the sudden? The original idea we buy in save/restore MCG_CAP is for the sake of future capability bits exentsion of MCG_CAP. I didn't image keep a useless bit in new vMCE. > >>>> Considering above, I prefer an outright cleanup in xen4.2, not >>>> constrained by c/s 24887 and not bring trouble/dirty to future >>>> vMCE. >>> >>> Whether or not is was appropriate for us to backport this change >>> early is unclear, but given that back at that time I had already >>> pointed out the problems and asked for work to be done to clean >>> it up, and given that it has taken over four months to get going >>> with this, I don't think you would suggest the alternative of us >>> having left the bug unfixed for this entire time period. >> >> Jan, I'm not challenge why you backport to SLES11 sp2. If there is >> anything wrong, I agree it's my fault. Currently what I concern is >> if we can do an outright cleanup at xen4.2 so that future vMCE could >> be simple and clean. > > But we can't tell our customers that migration from, say, SP2 to > SP3 won't be possible. Somehow I agree, though not 100%, say, how can you tell customers that migrate from sp1 to sp2 (and former) would block? AFAICT it only benefit a little earlier to sp2 (in our solution it delay to sp3). > >>> So unless the price to pay is unreasonably high, I'd favor getting >>> this taken care of rather than ignored. >> >> If so, why we need this middle-work patch? It could just keep >> current status at xen 4.2, then start 'dirty' new vMCE implement at >> xen4.3 --> and the 'dirty' inherit from c/s 24887 which from code >> point of view would be totally removed at new vMCE. > > No, what we now want is to complete said c/s. While at that time > I thought we would need to save MCi_CTL, with the new concept > we won't need to. Instead, we need to enforce it to be all ones > universally. > > The only compatibility thing is the need to deal with higher bank > counts perceived to be available by guests, and the possibly > previously seen set MCG_CTL_P bit. > > And yes, if this created a lot of ugly and otherwise unnecessary > code, I'd agree not to care for this compatibility. But as I don't > expect this to be the case, I thought I'd still ask for it (since if > we don't do it in -unstable, we'll have to carry a private patch > in SLE to do so, and presumably for much longer than the 4.3 > development period). > > Jan If so, the only thing we need do in this middle-work patch is to remove mci_ctl bank array and stick all 1's to MCi_CTL; Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |