[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?
Dan, Could you double check the itr which is mapping PAL code is there just before invoking ia64_pal_call_static? Thanks -Anthony >-----Original Message----- >From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx] >Sent: 2005年12月24日 2:22 >To: Xu, Anthony; Yang, Fred; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on >domU on tip? > >I got up early and spent several hours trying to debug >this further. By adding timing loops and other debug code >and moving all the relevant PAL macros around, I proved >conclusively that the ia64_pal_call_static assembly routine >is not returning. Next I added an infinite loop to the ivt >nested TLB handler (which isn't used by Xen except by some >fast paths that are currently off). With this loop, the >error message goes away and Xen "freezes". I think this >proves that the PAL call is inappropriately accessing some >(unpinned) data location with psr.ic off. > >You should note that this is the only PAL call that requires >psr.ic to be off. I suspect that OS's need to be prepared >for the possibility that a fault occurs. Linux is not >so never calls the routine. Xen is not prepared either. > >Happy holidays! > >Dan > >> -----Original Message----- >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >> Sent: Thursday, December 22, 2005 7:29 PM >> To: Magenheimer, Dan (HP Labs Fort Collins); Yang, Fred; >> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] >> Console problem on domU on tip? >> >> >With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter >> >the problem on a shipping machine and the symptom is that >> >the machine immediately crashes when a domU is launched. >> >> Dan, >> That means dom0 can boot with CONFIG_IA64_SPLIT_CACHE on, and >> PAL_CACHE_FLUSH has been invoked successfully in the process >> of dom0 boot. So this is not PAL_CACHE_FLUSH issue, there >> must be some other issue. Could you provide more information >> about the crash, due to we can't reproduce this issue. >> >> Thanks. >> >> -Anthony >> >> >> >-----Original Message----- >> >From: Magenheimer, Dan (HP Labs Fort Collins) >> [mailto:dan.magenheimer@xxxxxx] >> >Sent: 2005年12月22日 21:26 >> >To: Yang, Fred; Xu, Anthony; Tian, Kevin; >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> >Subject: RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] >> Console problem on >> >domU on tip? >> > >> >With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter >> >the problem on a shipping machine and the symptom is that >> >the machine immediately crashes when a domU is launched. >> > >> >With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter >> >a different problem on an unreleased machine. >> > >> >I know that you are focused primarily on the unreleased machine, >> >but in this case, I think we should be cautious for the new user >> >as the developer knows to change the option when running >> >on the unreleased machine. >> > >> >I will spend some more time on this when I have a chance. >> >I think it is a real bug (probably PAL accessing some address >> >which isn't pinned) that occurs only on some boxes due >> >to some factor like memory configuration. >> > >> >Thanks, >> >Dan >> > >> >P.S. The debug output just before the crash was: >> >ia64_fault: General Exception: IA-64 Reserved Register/Field >> fault (data >> >access): reflecting >> > >> >> -----Original Message----- >> >> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx] >> >> Sent: Wednesday, December 21, 2005 10:34 PM >> >> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; >> >> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> >> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] >> >> Console problem on domU on tip? >> >> >> >> Dan, >> >> >> >> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as >> >> the default build configuration. People may not be aware of >> >> this build flag and miss it one each new build. >> >> >> >> All the newer generation ia64 processors will come with >> >> splitted I/Dcache as discussed in the previous mail thread >> >> and it is documented in the Itanium architectur of possible >> >> splitted cache for future implementation. With default >> >> turning off, it is a potential bugs for all Tiger4 systems >> >> using for daily development and future platforms to come. >> >> >> >> It is also indicated through your mail, it is only HP rx2620 >> >> system has issue and not the other HP boxes. Can you track >> >> down this issue? Rather than put a kludge for rx2620 box? >> >> >> >> Thanks, >> >> >> >> -Fred >> >> >> >> >> >> Magenheimer, Dan (HP Labs Fort Collins) wrote: >> >> > Committed (but without removal of ifdefs until we >> >> > track down this problem). >> >> > >> >> >> -----Original Message----- >> >> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >> >> >> Sent: Monday, December 19, 2005 7:15 PM >> >> >> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin; >> >> >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> >> >> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >> >> >> >> >> >> I guest maybe the firmware on your machine doesn't implement >> >> >> this pal call due to there is no split I/D cache at that >> >> >> time, so when you call this pal call, it will return >> >> >> PAL_STATUS_UNIMPLEMENTED, Could you please turn on >> >> >> CONFIG_IA64_SPLIT_CACHE and try this new patch to see >> >> >> whether your machine can boot domain0? >> >> >> If this patch works, could you please remove all >> >> >> CONFIG_IA64_SPLIT_CACHE macro? >> >> >> >> >> >> Thanks >> >> >> -Anthony >> >> >> >> >> >>> -----Original Message----- >> >> >>> From: Magenheimer, Dan (HP Labs Fort Collins) >> >> >> [mailto:dan.magenheimer@xxxxxx] >> >> >>> Sent: 2005年12月19日 23:48 >> >> >>> To: Xu, Anthony; Tian, Kevin; >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> >> >>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >> >> >>> >> >> >>> I have been distracted tracking another bug... >> >> >>> >> >> >>> Here's where I got: >> >> >>> >> >> >>> The machine is a new (April 2005) HP rx2620 so it is >> >> >>> not old firmware. I can't reproduce it on a machine >> >> >>> with an ITP (which does have older firmware). >> >> >>> >> >> >>> This PAL call is never used in Linux, though there is a >> >> >>> routine coded for it. It is the only >> >> >>> PAL call coded in Linux that occurs with psr.ic off. >> >> >>> >> >> >>> The crash I am seeing occurs either during the PAL call or >> >> >>> immediately upon return. >> >> >>> >> >> >>> Is it OK to >> >> >>> >> >> >>> >> >> >>>> -----Original Message----- >> >> >>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >> >> >>>> Sent: Monday, December 19, 2005 2:02 AM >> >> >>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins); >> >> >>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> >> >>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >> >> >>>> >> >> >>>> Dan, >> >> >>>> Have you got time to verify below discussion? >> >> >>>> >> >> >>>> Thanks >> >> >>>> -Anthony >> >> >>>> >> >> >>>>> -----Original Message----- >> >> >>>>> From: Tian, Kevin >> >> >>>>> Sent: 2005年12月16日 10:16 >> >> >>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)'; >> >> >>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx' >> >> >>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip? >> >> >>>>> >> >> >>>>>> From: Xu, Anthony >> >> >>>>>> Sent: 2005年12月16日 9:54 >> >> >>>>>> >> >> >>>>>>> Also, why panic if it fails? >> >> >>>>>>> >> >> >>>>> >> >> >>>>> Panic is not required here, and we could just print out >> >> a warning >> >> >>>>> message. Previously panic is kept there to help our debug in >> >> >>>>> early stage. >> >> >>>>> >> >> >>>>>> >> >> >>>>>> >> >> >>>>>>> Does the problem happen only on VTI? Or both VTI and >> >> non-VTI on >> >> >>>>>>> split-cache machines? >> >> >>>>>> >> >> >>>>>> Sometimes, it makes domain0 crash at the very >> beginning of the >> >> >>>>>> domain0 boot process, especially on MP machine. >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> Thanks >> >> >>>>>> -Anthony >> >> >>>>> >> >> >>>>> One complement is, that problem definitely exists on new >> >> >>>>> split-cache processors, for dom0/domU. For VTI >> domain, we have >> >> >>>>> logic within device model to ensure consistence. >> >> >>>>> >> >> >>>>> Thanks, >> >> >>>>> Kevin >> >> >>>>>> >> >> >>>>>> >> >> >>>>>>> -----Original Message----- >> >> >>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins) >> >> >>>>>> [mailto:dan.magenheimer@xxxxxx] >> >> >>>>>>> Sent: 2005年12月16日 1:39 >> >> >>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> >> >>>>>>> Cc: Xu, Anthony >> >> >>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on >> domU on tip? >> >> >>>>>>> >> >> >>>>>>>>> Is this code fragment necessary for VTI to boot domU >> >> >>>>>>>>> or is it OK to remove? >> >> >>>>>>>> >> >> >>>>>>>> The comment is inaccurate and it should be >> >> domU. That I/D >> >> >>>>>>>> cache sync step is mandatory to boot domU on new IA64 >> >> >>>>>>>> processor which has split L2 I/D cache. If >> without such I/D >> >> >>>>>>>> cache sync, control panel loads domU's kernel image >> >> which only >> >> >>>>>>>> affects D side cache. If there're some stale >> entry on I-side >> >> >>>>>>>> cache within same range of dom0 image, people will >> >> see machine >> >> >>>>>>>> going weird. >> >> >>>>>>> >> >> >>>>>>> I don't understand... how can there be stale entries in the >> >> >>>>>>> I-cache? The instructions have just been written to memory >> >> >>>>>>> (through D-cache) and no instructions in this >> domain have yet >> >> >>>>>>> been executed. >> >> >>>>>>> I do see that the D-cache needs to be flushed so that >> >> memory is >> >> >>>>>>> coherent but are there better ways to do that without a pal >> >> >>>>>>> call? >> >> >>>>>>> >> >> >>>>>>>> Normally I/D cache sync shouldn't force any >> >> problem. Possibly >> >> >>>>>>>> there's some problem with the pal calling code, like >> >> incorrect >> >> >>>>>>>> ITLB mapping for pal or similar issue... >> >> >>>>>>> >> >> >>>>>>> Although the ia64_pal_cache_flush routine is defined >> >> in linux's >> >> >>>>>>> pal.h, it doesn't appear to be used anywhere in >> Linux so there >> >> >>>>>>> is no use model to copy. I suspect there is some use >> >> model for >> >> >>>>>>> the call that we don't understand, for example >> maybe it should >> >> >>>>>>> only be called with physical &progress? It >> definitely fails >> >> >>>>>>> every time on one of my (newer) machines and >> disabling the pal >> >> >>>>>>> call makes the problem go away. >> >> >>>>>>> >> >> >>>>>>>> Though it's intermittent, please >> >> >>>>>>>> keep this code >> >> >>>>>>>> there for correctness. >> >> >>>>>>> >> >> >>>>>>> Since the call is definitely failing under some >> circumstances >> >> >>>>>>> that we don't understand, I'm inclined to at least >> >> put the code >> >> >>>>>>> in an #ifdef CONFIG_SPLIT_CACHE >> >> >>>>>>> >> >> >>>>>>> Does the problem happen only on VTI? Or both VTI >> and non-VTI >> >> >>>>>>> on split-cache machines? >> >> >>>>>>> >> >> >>>>>>> Thanks, >> >> >>>>>>> Dan >> >> >>>>>>> >> >> >>>>>>> P.S. I tried Anthony's patch (which moves the PAL >> call after >> >> >>>>>>> new_thread()) but it still crashes. >> >> >> >> >> _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |