[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?


  • To: "Magenheimer, Dan \(HP Labs Fort Collins\)" <dan.magenheimer@xxxxxx>, "Xu, Anthony" <anthony.xu@xxxxxxxxx>, "Tian, Kevin" <kevin.tian@xxxxxxxxx>, <xen-ia64-devel@xxxxxxxxxxxxxxxxxxx>
  • From: "Yang, Fred" <fred.yang@xxxxxxxxx>
  • Date: Thu, 22 Dec 2005 08:15:29 -0800
  • Delivery-date: Thu, 22 Dec 2005 16:18:48 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcYBELBGMu7ZHSeYRUaEav49mCDvgwAAVFMwAADLrDAAIbQEoAARHDxgAAFwgsAApRrdUAAOEX2AABW4J/AAUJyDYAAareHQABB8dxAABifEIA==
  • Thread-topic: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel] Console problem on domU on tip?

Dan,

We spent long time to track down Cset#8383 yesterday, and now the current 
identified issue is I/Dcache patch was not turned on in the default built!  
Hope other community members won't hit this problem again.

>From the discussion, it is definitely the issue on the specific HP box on 
>accessing PAL call.   To be the correct approach, we should definitely track 
>it down to find out the potential implementation or platform issue.  

Hope you can track this down ASAP to remove this hurdle.

-Fred

Magenheimer, Dan (HP Labs Fort Collins) wrote:
> With CONFIG_IA64_SPLIT_CACHE on, a new user may encounter
> the problem on a shipping machine and the symptom is that
> the machine immediately crashes when a domU is launched.
> 
> With CONFIG_IA64_SPLIT_CACHE off, a developer may encounter
> a different problem on an unreleased machine.
> 
> I know that you are focused primarily on the unreleased machine,
> but in this case, I think we should be cautious for the new user
> as the developer knows to change the option when running
> on the unreleased machine.
> 
> I will spend some more time on this when I have a chance.
> I think it is a real bug (probably PAL accessing some address
> which isn't pinned) that occurs only on some boxes due
> to some factor like memory configuration.
> 
> Thanks,
> Dan
> 
> P.S. The debug output just before the crash was:
> ia64_fault: General Exception: IA-64 Reserved Register/Field fault
> (data access): reflecting 
> 
>> -----Original Message-----
>> From: Yang, Fred [mailto:fred.yang@xxxxxxxxx]
>> Sent: Wednesday, December 21, 2005 10:34 PM
>> To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony;
>> Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> Subject: CONFIG_IA64_SPLIT_CACHE was: [Xen-ia64-devel]
>> Console problem on domU on tip?
>> 
>> Dan,
>> 
>> Can we suggest to always turn on #CONFIG_IA64_SPLIT_CACHE as
>> the default build configuration.  People may not be aware of
>> this build flag and miss it one each new build.
>> 
>> All the newer generation ia64 processors will come with
>> splitted I/Dcache as discussed in the previous mail thread
>> and it is documented in the Itanium architectur of possible
>> splitted cache for future implementation.  With default
>> turning off, it is a potential bugs for all Tiger4 systems
>> using for daily development and future platforms to come.
>> 
>> It is also indicated through your mail, it is only HP  rx2620
>> system has issue and not the other HP boxes.  Can you track
>> down this issue?  Rather than put a kludge for rx2620 box?
>> 
>> Thanks,
>> 
>> -Fred
>> 
>> 
>> Magenheimer, Dan (HP Labs Fort Collins) wrote:
>>> Committed (but without removal of ifdefs until we
>>> track down this problem).
>>> 
>>>> -----Original Message-----
>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>>>> Sent: Monday, December 19, 2005 7:15 PM
>>>> To: Magenheimer, Dan (HP Labs Fort Collins); Tian, Kevin;
>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>>>> 
>>>> I guest maybe the firmware on your machine doesn't implement
>>>> this pal call due to there is no split I/D cache at that
>>>> time, so when you call this pal call, it will return
>>>> PAL_STATUS_UNIMPLEMENTED, Could you please turn on
>>>> CONFIG_IA64_SPLIT_CACHE  and try this new patch to see
>>>> whether your machine can boot domain0?
>>>> If this patch works, could you please remove all
>>>> CONFIG_IA64_SPLIT_CACHE macro?
>>>> 
>>>> Thanks
>>>> -Anthony
>>>> 
>>>>> -----Original Message-----
>>>>> From: Magenheimer, Dan (HP Labs Fort Collins)
>>>> [mailto:dan.magenheimer@xxxxxx]
>>>>> Sent: 2005年12月19日 23:48
>>>>> To: Xu, Anthony; Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>>>>> 
>>>>> I have been distracted tracking another bug...
>>>>> 
>>>>> Here's where I got:
>>>>> 
>>>>> The machine is a new (April 2005) HP rx2620 so it is
>>>>> not old firmware.   I can't reproduce it on a machine
>>>>> with an ITP (which does have older firmware).
>>>>> 
>>>>> This PAL call is never used in Linux, though there is a
>>>>> routine coded for it.  It is the only
>>>>> PAL call coded in Linux that occurs with psr.ic off.
>>>>> 
>>>>> The crash I am seeing occurs either during the PAL call or
>>>>> immediately upon return. 
>>>>> 
>>>>> Is it OK to
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>>>>>> Sent: Monday, December 19, 2005 2:02 AM
>>>>>> To: Tian, Kevin; Magenheimer, Dan (HP Labs Fort Collins);
>>>>>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>>>>>> 
>>>>>> Dan,
>>>>>> Have you got time to verify below discussion?
>>>>>> 
>>>>>> Thanks
>>>>>> -Anthony
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Tian, Kevin
>>>>>>> Sent: 2005年12月16日 10:16
>>>>>>> To: Xu, Anthony; 'Magenheimer, Dan (HP Labs Fort Collins)';
>>>>>>> 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx'
>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>>>>>>> 
>>>>>>>> From: Xu, Anthony
>>>>>>>> Sent: 2005年12月16日 9:54
>>>>>>>> 
>>>>>>>>> Also, why panic if it fails?
>>>>>>>>> 
>>>>>>> 
>>>>>>> Panic is not required here, and we could just print out a
>>>>>>> warning message. Previously panic is kept there to help our
>>>>>>> debug in early stage. 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Does the problem happen only on VTI?  Or both VTI and non-VTI
>>>>>>>>> on split-cache machines?
>>>>>>>> 
>>>>>>>> Sometimes, it makes domain0 crash at the very beginning of the
>>>>>>>> domain0 boot process, especially on MP machine.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> -Anthony
>>>>>>> 
>>>>>>> One complement is, that problem definitely exists on new
>>>>>>> split-cache processors, for dom0/domU. For VTI domain, we have
>>>>>>> logic within device model to ensure consistence.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Kevin
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Magenheimer, Dan (HP Labs Fort Collins)
>>>>>>>> [mailto:dan.magenheimer@xxxxxx]
>>>>>>>>> Sent: 2005年12月16日 1:39
>>>>>>>>> To: Tian, Kevin; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>>>>>>>> Cc: Xu, Anthony
>>>>>>>>> Subject: RE: [Xen-ia64-devel] Console problem on domU on tip?
>>>>>>>>> 
>>>>>>>>>>> Is this code fragment necessary for VTI to boot domU
>>>>>>>>>>> or is it OK to remove?
>>>>>>>>>> 
>>>>>>>>>>      The comment is inaccurate and it should be domU. That I/D
>>>>>>>>>> cache sync step is mandatory to boot domU on new IA64
>>>>>>>>>> processor which has split L2 I/D cache. If without such I/D
>>>>>>>>>> cache sync, control panel loads domU's kernel image which
>>>>>>>>>> only affects D side cache. If there're some stale entry on
>>>>>>>>>> I-side cache within same range of dom0 image, people will
>>>>>>>>>> see machine going weird.
>>>>>>>>> 
>>>>>>>>> I don't understand... how can there be stale entries in the
>>>>>>>>> I-cache? The instructions have just been written to memory
>>>>>>>>> (through D-cache) and no instructions in this domain have yet
>>>>>>>>> been executed. I do see that the D-cache needs to be flushed
>>>>>>>>> so that memory is coherent but are there better ways to do
>>>>>>>>> that without a pal call? 
>>>>>>>>> 
>>>>>>>>>>      Normally I/D cache sync shouldn't force any problem.
>>>>>>>>>> Possibly there's some problem with the pal calling code,
>>>>>>>>>> like incorrect ITLB mapping for pal or similar issue...
>>>>>>>>> 
>>>>>>>>> Although the ia64_pal_cache_flush routine is defined in
>>>>>>>>> linux's pal.h, it doesn't appear to be used anywhere in Linux
>>>>>>>>> so there is no use model to copy.  I suspect there is some
>>>>>>>>> use model for the call that we don't understand, for example
>>>>>>>>> maybe it should only be called with physical &progress?  It
>>>>>>>>> definitely fails every time on one of my (newer) machines and
>>>>>>>>> disabling the pal call makes the problem go away.
>>>>>>>>> 
>>>>>>>>>> Though it's intermittent, please
>>>>>>>>>> keep this code
>>>>>>>>>> there for correctness.
>>>>>>>>> 
>>>>>>>>> Since the call is definitely failing under some circumstances
>>>>>>>>> that we don't understand, I'm inclined to at least put the
>>>>>>>>> code in an #ifdef CONFIG_SPLIT_CACHE
>>>>>>>>> 
>>>>>>>>> Does the problem happen only on VTI?  Or both VTI and non-VTI
>>>>>>>>> on split-cache machines? 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Dan
>>>>>>>>> 
>>>>>>>>> P.S. I tried Anthony's patch (which moves the PAL call after
>>>>>>>>> new_thread()) but it still crashes.


_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.