[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1] psr: fix bug which may cause crash
On 27.11.2019 07:24, Yi Sun wrote: > During test, we found a crash on Xen with below trace. > (XEN) Xen call trace: > (XEN) [<ffff82d0802a065a>] R psr.c#l3_cdp_write_msr+0x1e/0x22 > (XEN) [<ffff82d0802a0858>] F psr.c#do_write_psr_msrs+0x6d/0x109 > (XEN) [<ffff82d08023e000>] F smp_call_function_interrupt+0x5a/0xac > (XEN) [<ffff82d0802a2b89>] F call_function_interrupt+0x20/0x34 > (XEN) [<ffff82d080282c64>] F do_IRQ+0x175/0x6ae > (XEN) [<ffff82d08038b8ba>] F common_interrupt+0x10a/0x120 > (XEN) [<ffff82d0802ec616>] F cpu_idle.c#acpi_idle_do_entry+0x9d/0xb1 > (XEN) [<ffff82d0802ecc01>] F cpu_idle.c#acpi_processor_idle+0x41d/0x626 > (XEN) [<ffff82d08027353b>] F domain.c#idle_loop+0xa5/0xa7 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 20: > (XEN) GENERAL PROTECTION FAULT > (XEN) [error_code=0000] > (XEN) **************************************** > > Root cause is that the cache of COS registers are not initialized > for CAT/CDP which have non-zero default value. That causes invalid > write to MSR when COS id has exceeded the max number.. So fix it by > initializing the cache. I'm struggling with this description, first and foremost because there's no (recognizable to me) connection between the supposed root cause and the crash. Exceeding the maximum number is a bug in some loop's bounds I would say, not an omission of cached value initialization. In particular I see in do_write_psr_msrs() for ( j = 0; j < cos_num; j++, index++ ) { if ( feat->cos_reg_val[cos * cos_num + j] != info->val[index] ) { feat->cos_reg_val[cos * cos_num + j] = info->val[index]; props->write_msr(cos, info->val[index], props->type[j]); } } Afaict the makes clear that values found in ->cos_reg_val[] would never get written out (which fits it being just a cache). If anything, a "random" match of the cache value and the two be written value would _prevent_ an MSR write despite potentially the MSR in fact currently holding a different value. Nevertheless a few remarks on the patch itself, just in case it's just the description that has misguided me. > --- a/xen/arch/x86/psr.c > +++ b/xen/arch/x86/psr.c > @@ -316,6 +316,7 @@ static bool cat_init_feature(const struct cpuid_leaf > *regs, > [FEAT_TYPE_L3_CDP] = "L3 CDP", > [FEAT_TYPE_L2_CAT] = "L2 CAT", > }; > + unsigned int i = 0; Unnecessary initializer and too wide a scope. > @@ -332,7 +333,8 @@ static bool cat_init_feature(const struct cpuid_leaf > *regs, > return false; > > /* We reserve cos=0 as default cbm (all bits within cbm_len are 1). > */ > - feat->cos_reg_val[0] = cat_default_val(feat->cat.cbm_len); > + for(i = 0; i < MAX_COS_REG_CNT; i++) There are number of blanks missing here (and even more ones in the other instance below). It also seems to me that the comment ends up misplaced now. If ... > + feat->cos_reg_val[i] = cat_default_val(feat->cat.cbm_len); > > wrmsrl((type == FEAT_TYPE_L3_CAT ? > MSR_IA32_PSR_L3_MASK(0) : ... this indeed is to remain a single write, it may want to move here. But as per above keeping cached and actual values in sync may make it necessary to move this write into the loop as well. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |