[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash



Hi. Good catch. Some comments.
I attached two patches to fix, could you try them?

- bss.page_aligned.
  Where is the section used?
  grep didn't tell me. Surely x86 uses .bss.page_aligned in
  linux/arch/[i386, x86_64]/kernel/head[-xen].S,
  but no files unuder linux/arch/ia64/ don't use it.

- ia64_fast_eoi.
  I suppose ia64_fast_eoi is used for optimization instead of
  PHYSDEVOP_eoi. I'm not sure how much improvement it provides, though.
  Anyway ia64_fast_eoi hypercall implementation should also be updated
  which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn support.

thanks,

On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
> Hi, Isaku & All
>     The attached patch should fix the weird issue.  In upstream, we also find 
> some other weird issues, for example, we can't boot dom0 on some platforms, 
> and dom0 may have different behavior with different initrds.  After debug, I 
> found it should be caused by incorrect setting for pirq_needs_eoi page.  
> There are two main issues found during the debug: 
> 1.  the related two hypercalls are not enabled in the correct way, so dom0 
> and hypervisor doesn't have the agreement on which pirq needs EOI. 
> 2.  the page is not really linked to bss section even if this is the must, so 
> kernel deems it as memory cache and uses it for many ways, and finally leads 
> to varid issues. 
> Thanks 
> Xiantao
> 
> 
> 
> You, Yongkang wrote:
> >> I tried 2048M (and other value), but I wasn't reproduce it.
> >> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
> >> tested?
> > 
> > Isaku/All,
> > 
> > This issue is really very hard to locate. Now I am a little
> > suspecting it is related with building process, as if changing
> > building method, this issue is gone too.  
> > 
> > 1, It doesn't happen to all machines. But it can be stably reproduce
> > in our nightly test machine with same binary. 2, When system
> > crashing, dom0_mem is set to 2048M. And if using other memory size,
> > this issue disappeared too. 3, It seems happened between dom0
> > changeset 743~753, as it workds if we use old built Dom0 kernel + new
> > Xen. And the old nightly testing doesn't have issue. 4, When I try to
> > do regression testing between 743~753, I found different build method
> > might cause crashing and non-crashing.    
> > 
> > In our default building process, as stubdomain is not supported in
> > IA64, so we removed install-stubdom and dist-stubdom from "install:"
> > and "dist:" lines in main Makefile. It has been changed  more than 2
> > months. The real compiling command is "make -j3 >xyz_file". And the
> > crashing issue is related with building process.    
> > 
> > When I do regression testing, sometimes I didn't change Makefile, but
> > still use "make -j3". Then the crashing is gone. 
> > 
> > I am not sure if my suspection is possible, as it still need more
> > trying. Compiling Dom0 is not easy like Xen. It is costing. I would
> > try to do more, but maybe not so quick, as many another things need
> > to do at the same time. If the default compilation is okay, do you
> > think it is worthy to do more investigating?    
> > 
> > Any suggestion will be much appreciated.
> > 
> > Best Regards,
> > Yongkang You
> > 
> > On Tuesday, December 16, 2008 10:22 AM, "Isaku Yamahata" wrote:
> > 
> >> On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
> >>> On Monday, December 08, 2008 2:10 PM, "Isaku Yamahata" wrote:
> >>> 
> >>>> On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
> >>>>> Isaku Yamahata wrote:
> >>>>>> On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
> >>>>>>> Hi Isaku,
> >>>>>>>     We re-get the detail information from serial port, please
> >>>>>>> see below. Two comments add:
> >>>>>> 
> >>>>>> Thank you.
> >>>>>> 
> >>>>>> 
> >>>>>>>     1. We can be sure the Cset#18832 works well on the same
> >>>>>>> tiger4 machine. But we did not do regression test between 18832
> >>>>>>> and this 18860. 
> >>>>>>>     2. It is strange that on another Tiger4 box, dom0 will NOT
> >>>>>>> crash. Do you have any idea from the serial log? Thanks!
> >>>>>> 
> >>>>>> I haven't hit this crash. And Kuwamura-san's test seems that
> >>>>>> he haven't hit it either. Kuwamura-san, is it correct?
> >>>>>> Hmm... it seems to depend on hw configuration?
> >>>>>> I'm inclined to suspect masking/unmasking interruption race.
> >>>>>> event channel issues? But that's just only my very vague guess.
> >>>>>> 
> >>>>>> The difference between 18832 and 18860 means the merging
> >>>>>> xen-unstable into xen-ia64-unstable. Looking the log, I suspect
> >>>>>> linux-2.6.18-xen instead of xen.
> >>>>>> Could you provide the linux c/s which corresponds to 18832 and
> >>>>>> 18860?
> >>>>> 
> >>>>> 
> >>>>> Hi Isaku,
> >>>>>     Yes, some of our machines do not crash. I am afraid there may
> >>>>>     be some potential issue. By testing 18832, we use linux#742.
> >>>>> While 18860 uses linux#753. Thanks!
> >>>> 
> >>>> Thank you. Taking rough look at them those change sets doesn't seem
> >>>> culprit. I agree with you that this may indicate some potential
> >>>> bugs...
> >>> 
> >>> Hi All,
> >>> 
> >>> This bug is stably reproduced, if providing "dom0_mem=2048M" in
> >>> append option. And if setting dom0_mem to 1024M or 4096M, the
> >>> crashing doesn't happen. 
> >>> 
> >>> We tried #18869 Xen + #742 Dom0, system is okay. So the problem
> >>> might be in Linux tree between #742~#753
> >> 
> >> I tried 2048M (and other value), but I wasn't reproduce it.
> >> Hmm, does it reproduce with "dom0_mem=2048M" on all boxes which you
> >> tested? 
> >> 
> >> thanks,
> > 
> > _______________________________________________
> > Xen-ia64-devel mailing list
> > Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-ia64-devel
> 


> _______________________________________________
> Xen-ia64-devel mailing list
> Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-ia64-devel

-- 
yamahata

Attachment: ia64-fast-eoi.patch
Description: Text Data

Attachment: fix_pirq_eoi_page.patch
Description: Text Data

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.