|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1] psr: fix bug which may cause crash
On 19-11-27 11:06:49, Jan Beulich wrote:
> On 27.11.2019 07:24, Yi Sun wrote:
> > During test, we found a crash on Xen with below trace.
> > (XEN) Xen call trace:
> > (XEN) [<ffff82d0802a065a>] R psr.c#l3_cdp_write_msr+0x1e/0x22
> > (XEN) [<ffff82d0802a0858>] F psr.c#do_write_psr_msrs+0x6d/0x109
> > (XEN) [<ffff82d08023e000>] F smp_call_function_interrupt+0x5a/0xac
> > (XEN) [<ffff82d0802a2b89>] F call_function_interrupt+0x20/0x34
> > (XEN) [<ffff82d080282c64>] F do_IRQ+0x175/0x6ae
> > (XEN) [<ffff82d08038b8ba>] F common_interrupt+0x10a/0x120
> > (XEN) [<ffff82d0802ec616>] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1
> > (XEN) [<ffff82d0802ecc01>] F cpu_idle.c#acpi_processor_idle+0x41d/0x626
> > (XEN) [<ffff82d08027353b>] F domain.c#idle_loop+0xa5/0xa7
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 20:
> > (XEN) GENERAL PROTECTION FAULT
> > (XEN) [error_code=0000]
> > (XEN) ****************************************
> >
> > Root cause is that the cache of COS registers are not initialized
> > for CAT/CDP which have non-zero default value. That causes invalid
> > write to MSR when COS id has exceeded the max number.. So fix it by
> > initializing the cache.
>
> I'm struggling with this description, first and foremost because
> there's no (recognizable to me) connection between the supposed
> root cause and the crash. Exceeding the maximum number is a bug in
> some loop's bounds I would say, not an omission of cached value
> initialization. In particular I see in do_write_psr_msrs()
>
> for ( j = 0; j < cos_num; j++, index++ )
> {
> if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
> {
> feat->cos_reg_val[cos * cos_num + j] = info->val[index];
> props->write_msr(cos, info->val[index], props->type[j]);
> }
> }
>
Sorry, my description is not clear. The bug happens when CDP and MBA
co-exist and MBA COS_MAX is bigger than CDP COS_MAX. E.g. MBA has 8
COS registers but CDP only have 6. When setting MBA throttling value
for the 7th guest, the value array would be:
+------------------+------------------+--------------+
| Data default val | Code default val | MBA throttle |
+------------------+------------------+--------------+
We should avoid writting CDP data/code valules to MSR here. This should
be prevented by:
if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] )
But the whole cos_reg_val[] is not initialized to default value so that
the check cannot prevent default value setting.
> Afaict the makes clear that values found in ->cos_reg_val[] would
> never get written out (which fits it being just a cache). If
> anything, a "random" match of the cache value and the two be
> written value would _prevent_ an MSR write despite potentially
> the MSR in fact currently holding a different value.
>
> Nevertheless a few remarks on the patch itself, just in case
> it's just the description that has misguided me.
>
> > --- a/xen/arch/x86/psr.c
> > +++ b/xen/arch/x86/psr.c
> > @@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf
> > *regs,
> > [FEAT_TYPE_L3_CDP] = "L3 CDP",
> > [FEAT_TYPE_L2_CAT] = "L2 CAT",
> > };
> > + unsigned int i = 0;
>
> Unnecessary initializer and too wide a scope.
>
Ok, u8 is enough.
> > @@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf
> > *regs,
> > return false;
> >
> > /* We reserve cos=0 as default cbm (all bits within cbm_len are
> > 1). */
> > - feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len);
> > + for(i = 0; i < MAX_COS_REG_CNT; i++)
>
> There are number of blanks missing here (and even more ones in
> the other instance below). It also seems to me that the comment
> ends up misplaced now. If ...
>
Sorry, the comment should be modified.
> > + feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len);
> >
> > wrmsrl((type == FEAT_TYPE_L3_CAT ?
> > MSR_IA32_PSR_L3_MASK(0) :
>
> ... this indeed is to remain a single write, it may want to move
> here. But as per above keeping cached and actual values in sync
> may make it necessary to move this write into the loop as well.
>
You are right, I missed to loop this sentence.
Another idea:
I remembered that the original purpose to only write COS[0] here is to
improve performance by not writing too many MSRs. So I am thinking to
change the fix to below line in do_write_psr_msrs().
if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] &&
cos <= feat->cos_max )
What is your opinion? Thanks!
> Jan
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |