[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?
Dan, We spent long time to track down Cset#8383 yesterday, and now the current identified issue is I/Dcache patch was not turned on in the default built! Hope other community members won't hit this problem again. >From the discussion, it is definitely the issue on the specific HP box on >accessing PAL call. To be the correct approach, we should definitely track >it down to find out the potential implementation or platform issue. Hope you can track this down ASAP to remove this hurdle. -Fred Magenheimer, Dan (HP Labs Fort Collins) wrote: > With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter > the problem on a shipping machine and the symptom is that > the machine immediately crashes when a domU is launched. > > With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter > a different problem on an unreleased machine. > > I know that you are focused primarily on the unreleased machine, > but in this case, I think we should be cautious for the new user > as the developer knows to change the option when running > on the unreleased machine. > > I will spend some more time on this when I have a chance. > I think it is a real bug (probably PAL accessing some address > which isn't pinned) that occurs only on some boxes due > to some factor like memory configuration. > > Thanks, > Dan > > P.S. The debug output just before the crash was: > ia64_fault: General Exception: IA-64 Reserved Register/Field fault > (data access): reflecting > >> -----Original Message----- >> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx] >> Sent: Wednesday, December 21, 2005 10:34 PM >> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; >> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] >> Console problem on domU on tip? >> >> Dan, >> >> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as >> the default build configuration. People may not be aware of >> this build flag and miss it one each new build. >> >> All the newer generation ia64 processors will come with >> splitted I/Dcache as discussed in the previous mail thread >> and it is documented in the Itanium architectur of possible >> splitted cache for future implementation. With default >> turning off, it is a potential bugs for all Tiger4 systems >> using for daily development and future platforms to come. >> >> It is also indicated through your mail, it is only HP rx2620 >> system has issue and not the other HP boxes. Can you track >> down this issue? Rather than put a kludge for rx2620 box? >> >> Thanks, >> >> -Fred >> >> >> Magenheimer, Dan (HP Labs Fort Collins) wrote: >>> Committed (but without removal of ifdefs until we >>> track down this problem). >>> >>>> -----Original Message----- >>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >>>> Sent: Monday, December 19, 2005 7:15 PM >>>> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin; >>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>> >>>> I guest maybe the firmware on your machine doesn't implement >>>> this pal call due to there is no split I/D cache at that >>>> time, so when you call this pal call, it will return >>>> PAL_STATUS_UNIMPLEMENTED, Could you please turn on >>>> CONFIG_IA64_SPLIT_CACHE and try this new patch to see >>>> whether your machine can boot domain0? >>>> If this patch works, could you please remove all >>>> CONFIG_IA64_SPLIT_CACHE macro? >>>> >>>> Thanks >>>> -Anthony >>>> >>>>> -----Original Message----- >>>>> From: Magenheimer, Dan (HP Labs Fort Collins) >>>> [mailto:dan.magenheimer@xxxxxx] >>>>> Sent: 2005年12月19日 23:48 >>>>> To: Xu, Anthony; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>> >>>>> I have been distracted tracking another bug... >>>>> >>>>> Here's where I got: >>>>> >>>>> The machine is a new (April 2005) HP rx2620 so it is >>>>> not old firmware. I can't reproduce it on a machine >>>>> with an ITP (which does have older firmware). >>>>> >>>>> This PAL call is never used in Linux, though there is a >>>>> routine coded for it. It is the only >>>>> PAL call coded in Linux that occurs with psr.ic off. >>>>> >>>>> The crash I am seeing occurs either during the PAL call or >>>>> immediately upon return. >>>>> >>>>> Is it OK to >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >>>>>> Sent: Monday, December 19, 2005 2:02 AM >>>>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins); >>>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>> >>>>>> Dan, >>>>>> Have you got time to verify below discussion? >>>>>> >>>>>> Thanks >>>>>> -Anthony >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Tian, Kevin >>>>>>> Sent: 2005年12月16日 10:16 >>>>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)'; >>>>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx' >>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>>> >>>>>>>> From: Xu, Anthony >>>>>>>> Sent: 2005年12月16日 9:54 >>>>>>>> >>>>>>>>> Also, why panic if it fails? >>>>>>>>> >>>>>>> >>>>>>> Panic is not required here, and we could just print out a >>>>>>> warning message. Previously panic is kept there to help our >>>>>>> debug in early stage. >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Does the problem happen only on VTI? Or both VTI and non-VTI >>>>>>>>> on split-cache machines? >>>>>>>> >>>>>>>> Sometimes, it makes domain0 crash at the very beginning of the >>>>>>>> domain0 boot process, especially on MP machine. >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Anthony >>>>>>> >>>>>>> One complement is, that problem definitely exists on new >>>>>>> split-cache processors, for dom0/domU. For VTI domain, we have >>>>>>> logic within device model to ensure consistence. >>>>>>> >>>>>>> Thanks, >>>>>>> Kevin >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins) >>>>>>>> [mailto:dan.magenheimer@xxxxxx] >>>>>>>>> Sent: 2005年12月16日 1:39 >>>>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >>>>>>>>> Cc: Xu, Anthony >>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >>>>>>>>> >>>>>>>>>>> Is this code fragment necessary for VTI to boot domU >>>>>>>>>>> or is it OK to remove? >>>>>>>>>> >>>>>>>>>> The comment is inaccurate and it should be domU. That I/D >>>>>>>>>> cache sync step is mandatory to boot domU on new IA64 >>>>>>>>>> processor which has split L2 I/D cache. If without such I/D >>>>>>>>>> cache sync, control panel loads domU's kernel image which >>>>>>>>>> only affects D side cache. If there're some stale entry on >>>>>>>>>> I-side cache within same range of dom0 image, people will >>>>>>>>>> see machine going weird. >>>>>>>>> >>>>>>>>> I don't understand... how can there be stale entries in the >>>>>>>>> I-cache? The instructions have just been written to memory >>>>>>>>> (through D-cache) and no instructions in this domain have yet >>>>>>>>> been executed. I do see that the D-cache needs to be flushed >>>>>>>>> so that memory is coherent but are there better ways to do >>>>>>>>> that without a pal call? >>>>>>>>> >>>>>>>>>> Normally I/D cache sync shouldn't force any problem. >>>>>>>>>> Possibly there's some problem with the pal calling code, >>>>>>>>>> like incorrect ITLB mapping for pal or similar issue... >>>>>>>>> >>>>>>>>> Although the ia64_pal_cache_flush routine is defined in >>>>>>>>> linux's pal.h, it doesn't appear to be used anywhere in Linux >>>>>>>>> so there is no use model to copy. I suspect there is some >>>>>>>>> use model for the call that we don't understand, for example >>>>>>>>> maybe it should only be called with physical &progress? It >>>>>>>>> definitely fails every time on one of my (newer) machines and >>>>>>>>> disabling the pal call makes the problem go away. >>>>>>>>> >>>>>>>>>> Though it's intermittent, please >>>>>>>>>> keep this code >>>>>>>>>> there for correctness. >>>>>>>>> >>>>>>>>> Since the call is definitely failing under some circumstances >>>>>>>>> that we don't understand, I'm inclined to at least put the >>>>>>>>> code in an #ifdef CONFIG_SPLIT_CACHE >>>>>>>>> >>>>>>>>> Does the problem happen only on VTI? Or both VTI and non-VTI >>>>>>>>> on split-cache machines? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> P.S. I tried Anthony's patch (which moves the PAL call after >>>>>>>>> new_thread()) but it still crashes. _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |