[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash



Isaku Yamahata wrote:
> Hi. Good catch. Some comments.
> I attached two patches to fix, could you try them?
> 
> - bss.page_aligned.
>   Where is the section used?
>   grep didn't tell me. Surely x86 uses .bss.page_aligned in
>   linux/arch/[i386, x86_64]/kernel/head[-xen].S,
>   but no files unuder linux/arch/ia64/ don't use it.

You may need to check drivers/xen/core/evtchn.c, the code as following :-)
Xiantao

static int pirq_eoi_does_unmask;
static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
        __attribute__ ((__section__(".bss.page_aligned"), 
__aligned__(PAGE_SIZE)));



> - ia64_fast_eoi.
>   I suppose ia64_fast_eoi is used for optimization instead of
>   PHYSDEVOP_eoi. I'm not sure how much improvement it provides,
>   though. Anyway ia64_fast_eoi hypercall implementation should also
>   be updated which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn
> support. 
> 
> thanks,
> 
> On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
>> Hi, Isaku & All
>>     The attached patch should fix the weird issue.  In upstream, we
>> also find some other weird issues, for example, we can't boot dom0
>> on some platforms, and dom0 may have different behavior with
>> different initrds.  After debug, I found it should be caused by
>> incorrect setting for pirq_needs_eoi page.  There are two main
>> issues found during the debug:     
>> 1.  the related two hypercalls are not enabled in the correct way,
>> so dom0 and hypervisor doesn't have the agreement on which pirq
>> needs EOI.  
>> 2.  the page is not really linked to bss section even if this is the
>> must, so kernel deems it as memory cache and uses it for many ways,
>> and finally leads to varid issues.  
>> Thanks
>> Xiantao
>> 
>> 
>> 
>> You, Yongkang wrote:
>>>> I tried 2048M (and other value), but I wasn't reproduce it.
>>>> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
>>>> tested?
>>> 
>>> Isaku/All,
>>> 
>>> This issue is really very hard to locate. Now I am a little
>>> suspecting it is related with building process, as if changing
>>> building method, this issue is gone too.
>>> 
>>> 1, It doesn't happen to all machines. But it can be stably reproduce
>>> in our nightly test machine with same binary. 2, When system
>>> crashing, dom0_mem is set to 2048M. And if using other memory size,
>>> this issue disappeared too. 3, It seems happened between dom0
>>> changeset 743~753, as it workds if we use old built Dom0 kernel +
>>> new Xen. And the old nightly testing doesn't have issue. 4, When I
>>> try to do regression testing between 743~753, I found different
>>> build method might cause crashing and non-crashing.
>>> 
>>> In our default building process, as stubdomain is not supported in
>>> IA64, so we removed install-stubdom and dist-stubdom from "install:"
>>> and "dist:" lines in main Makefile. It has been changed  more than 2
>>> months. The real compiling command is "make -j3 >xyz_file". And the
>>> crashing issue is related with building process.
>>> 
>>> When I do regression testing, sometimes I didn't change Makefile,
>>> but still use "make -j3". Then the crashing is gone.
>>> 
>>> I am not sure if my suspection is possible, as it still need more
>>> trying. Compiling Dom0 is not easy like Xen. It is costing. I would
>>> try to do more, but maybe not so quick, as many another things need
>>> to do at the same time. If the default compilation is okay, do you
>>> think it is worthy to do more investigating?
>>> 
>>> Any suggestion will be much appreciated.
>>> 
>>> Best Regards,
>>> Yongkang You
>>> 
>>> On Tuesday, December 16, 2008 10:22 AM, "Isaku Yamahata" wrote:
>>> 
>>>> On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
>>>>> On Monday, December 08, 2008 2:10 PM, "Isaku Yamahata" wrote:
>>>>> 
>>>>>> On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
>>>>>>> Isaku Yamahata wrote:
>>>>>>>> On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
>>>>>>>>> Hi Isaku,
>>>>>>>>>     We re-get the detail information from serial port, please
>>>>>>>>> see below. Two comments add:
>>>>>>>> 
>>>>>>>> Thank you.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>     1. We can be sure the Cset#18832 works well on the same
>>>>>>>>> tiger4 machine. But we did not do regression test between
>>>>>>>>> 18832 and this 18860. 
>>>>>>>>>     2. It is strange that on another Tiger4 box, dom0 will NOT
>>>>>>>>> crash. Do you have any idea from the serial log? Thanks!
>>>>>>>> 
>>>>>>>> I haven't hit this crash. And Kuwamura-san's test seems that
>>>>>>>> he haven't hit it either. Kuwamura-san, is it correct?
>>>>>>>> Hmm... it seems to depend on hw configuration?
>>>>>>>> I'm inclined to suspect masking/unmasking interruption race.
>>>>>>>> event channel issues? But that's just only my very vague guess.
>>>>>>>> 
>>>>>>>> The difference between 18832 and 18860 means the merging
>>>>>>>> xen-unstable into xen-ia64-unstable. Looking the log, I suspect
>>>>>>>> linux-2.6.18-xen instead of xen.
>>>>>>>> Could you provide the linux c/s which corresponds to 18832 and
>>>>>>>> 18860?
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Isaku,
>>>>>>>     Yes, some of our machines do not crash. I am afraid there
>>>>>>>     may be some potential issue. By testing 18832, we use
>>>>>>> linux#742. While 18860 uses linux#753. Thanks!
>>>>>> 
>>>>>> Thank you. Taking rough look at them those change sets doesn't
>>>>>> seem culprit. I agree with you that this may indicate some
>>>>>> potential bugs...
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> This bug is stably reproduced, if providing "dom0_mem=2048M" in
>>>>> append option. And if setting dom0_mem to 1024M or 4096M, the
>>>>> crashing doesn't happen. 
>>>>> 
>>>>> We tried #18869 Xen + #742 Dom0, system is okay. So the problem
>>>>> might be in Linux tree between #742~#753
>>>> 
>>>> I tried 2048M (and other value), but I wasn't reproduce it.
>>>> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which
>>>> you tested? 
>>>> 
>>>> thanks,
>>> 
>>> _______________________________________________
>>> Xen-ia64-devel mailing list
>>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>>> http://lists.xensource.com/xen-ia64-devel
>> 
> 
> 
>> _______________________________________________
>> Xen-ia64-devel mailing list
>> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> http://lists.xensource.com/xen-ia64-devel


_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.