[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] Xen/MCA: bugfix for mca bank clear


  • To: "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>
  • From: "Ren, Yongjie" <yongjie.ren@xxxxxxxxx>
  • Date: Tue, 12 Mar 2013 08:05:09 +0000
  • Accept-language: zh-CN, en-US
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxx>
  • Delivery-date: Tue, 12 Mar 2013 08:07:10 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: AQHOHkWbo3AgTDHj00CkpRCMA8skS5ihTt1QgABiyOA=
  • Thread-topic: [Xen-devel] [PATCH 1/2] Xen/MCA: bugfix for mca bank clear

> -----Original Message-----
> From: Liu, Jinsong
> Sent: Tuesday, March 12, 2013 10:07 AM
> To: Jan Beulich; Ren, Yongjie
> Cc: xen-devel
> Subject: RE: [Xen-devel] [PATCH 1/2] Xen/MCA: bugfix for mca bank clear
> 
> Jan Beulich wrote:
> >>>> On 11.03.13 at 11:26, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx> wrote:
> >> Jan Beulich wrote:
> >>>>>> On 28.02.13 at 14:21, "Liu, Jinsong" <jinsong.liu@xxxxxxxxx>
> >>>>>> wrote:
> >>>> Jan Beulich wrote:
> >>>>>>>> On 28.02.13 at 09:05, "Jan Beulich" <JBeulich@xxxxxxxx>
> wrote:
> >>>>>> And then I don't see why you would do physical MSR accesses in
> >>>>>> the injection case at all. It should really be mca_wrmsr() to
> >>>>>> take care of this, it just needs to have a way to know it got
> >>>>>> called in the context of an injection. One fundamental question
> >>>>>> here is
> >>>>>> why the invalidation of the interposed data is being done
> >>>>>> before the MSR write rather than after. One thing we surely
> >>>>>> don't care much about in the injection logic is interference with
> >>>>>> a real #MC.
> >>>>>
> >>>>> Like this (RFC only, untested so far), based on having gone
> >>>>> through all call sites of mca_wrmsr():
> >>>>>
> >>>>> --- a/xen/arch/x86/cpu/mcheck/mce.c
> >>>>> +++ b/xen/arch/x86/cpu/mcheck/mce.c
> >>>>> @@ -1065,13 +1065,15 @@ static void intpose_add(unsigned int
> cpu
> >>>>>      printk("intpose_add: interpose array full - request
> >>>>> dropped\n");  }
> >>>>>
> >>>>> -void intpose_inval(unsigned int cpu_nr, uint64_t msr)
> >>>>> +bool_t intpose_inval(unsigned int cpu_nr, uint64_t msr)  { -
> >>>>> struct intpose_ent *ent; +    struct intpose_ent *ent =
> >>>>> intpose_lookup(cpu_nr, msr, NULL);
> >>>>>
> >>>>> -    if ((ent = intpose_lookup(cpu_nr, msr, NULL)) != NULL) {
> >>>>> -        ent->cpu_nr = -1;
> >>>>> -    }
> >>>>> +    if ( !ent )
> >>>>> +        return 0;
> >>>>> +
> >>>>> +    ent->cpu_nr = -1;
> >>>>> +    return 1;
> >>>>>  }
> >>>>>
> >>>>>  #define IS_MCA_BANKREG(r) \
> >>>>> --- a/xen/arch/x86/cpu/mcheck/mce.h
> >>>>> +++ b/xen/arch/x86/cpu/mcheck/mce.h
> >>>>> @@ -77,7 +77,7 @@ extern void mce_recoverable_register(mce
> >>>>>  /* Read an MSR, checking for an interposed value first */
> >>>>>  extern struct intpose_ent *intpose_lookup(unsigned int, uint64_t,
> >>>>> uint64_t *); -extern void intpose_inval(unsigned int, uint64_t);
> >>>>> +extern bool_t intpose_inval(unsigned int, uint64_t);
> >>>>>
> >>>>>  static inline uint64_t mca_rdmsr(unsigned int msr)  {
> >>>>> @@ -89,9 +89,9 @@ static inline uint64_t mca_rdmsr(unsigne
> >>>>>
> >>>>>  /* Write an MSR, invalidating any interposed value */
> >>>>>  #define mca_wrmsr(msr, val) do { \
> >>>>> -       intpose_inval(smp_processor_id(), msr); \
> >>>>> -       wrmsrl(msr, val); \
> >>>>> -} while (0)
> >>>>> +    if ( !intpose_inval(smp_processor_id(), msr) ) \ +
> >>>>> wrmsrl(msr, val); \ +} while ( 0 )
> >>>>>
> >>>>>
> >>>>>  /* Utility function to "logout" all architectural MCA telemetry
> >>>>> from the MCA
> >>>>
> >>>> No, it doesn't work, considering mce broadcast to all cpus, while
> >>>> s/w only simulate 1 cpu.
> >>>
> >>> I don't see how that case would be handled any better with the
> >>> invalidation happening before the MSR write (as is the case now).
> >>>
> >>> Please explain, Jan
> >>
> >> Sorry for late reply.
> >>
> >> What I meant is, mca_wrmsr updated in this way is not quite clean,
> >> since per syntax it still has the risk 'sometimes access simulated
> >> value, sometimes access physical bank'.
> >>
> >> But of course your patch works for current xen-mceinj tools, so if
> >> you think it's OK we will have a test at the specific machine and
> >> ACK later.
> >
> > As said before, I don't think we need to consider interference
> > between injection and real events (or if we do, then I think there
> > are other aspects that would need fixing too).
> >
> > Hence, as long as what I proposed works for the injection case,
> > that ought to be better than what we have currently. Perhaps
> > explicitly filtering out the not supported case of broadcast being
> > requested with the rejection (iirc there's a way to request this)
> > would need to be added to the change suggested earlier.
> >
> > Jan
> 
> Rongjie,
> 
> Would you please test Jan's patch at *the* WSM-EX machine? It should be
> fine but test is necessary and better.
> 
Jan,
Your patch can work fine during my testing.
Tested-by: Yongjie Ren <yongjie.ren@xxxxxxxxx>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.