[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] FYI: gcc segfault also meet with as



Tristan -- I have definitely seen it in dom0.  Also, the frequency
seems to be very unpredictable... I have often thought that it
was fixed only to see it appear again later.  For me, it seems
to usually occur in the first few repeated builds of Linux, but
then may not occur again (until a reboot).

Hopefully, it is the same problem you see in SMP-g so it will be easier
to track down!

Anthony -- The problem pre-dates the code I added (in December?)
to handle the 16M memory block/granule just below max_page, though
it is certainly possible that there is more than one bug
and they all have the same symptoms.  It might also be true that
your change doesn't really fix the bug but changes memory mappings
enough to change the appearance/frequency?  Perhaps you should post
your "drop 16M from dom0" patch so that others can try it out?

Thanks,
Dan

> -----Original Message-----
> From: Tristan Gingold [mailto:Tristan.Gingold@xxxxxxxx] 
> Sent: Wednesday, April 05, 2006 5:01 AM
> To: Xu, Anthony; Magenheimer, Dan (HP Labs Fort Collins); 
> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: Re: [Xen-ia64-devel] FYI: gcc segfault also meet with as
> 
> Le Mercredi 05 Avril 2006 05:24, Xu, Anthony a écrit :
> > From: Magenheimer,
> >
> > >Dan (HP Labs Fort Collins)
> > >Yes this problem has been with us for several months and
> > >anybody who exercises Xen/ia64 heavily will probably see it
> > >occur.  It is definitely not specific to gcc... it may even
> > >be occuring in ltp, but since it is not repeatable (the
> > >failure appears almost randomly), it is hard to link a
> > >single ltp test failure to the "gcc segfault" problem.
> > >
> > >Based on what I have seen, I suspect it has something
> > >to do with a stale translation... perhaps some flush/purge
> > >is not working properly or maybe a region id is being
> > >incorrectly re-used.
> > >
> > >Isolating this problem will take a lot of effort and
> > >some sophisticated debug tools/hardware.  However, I
> > >would not recommend Xen/ia64 be "released" to customers
> > >until it is found/fixed.
> Yes.
> 
> > I also saw gcc segmentation on dom0 recently, and I got 
> chance to debug
> > this, I caught this issue once,and I found this gcc 
> segmentation is due to
> > process access a address which is not belong to this process.
> > I also found the machine address of this fault address 
> belongs to the high
> > memory block(16M), which is used to avoid VIRTUAL_MEMMAP issue.
> >
> > I looked into the code, and found xen uses this 16M just 
> below max_page.
> > and this 16M can't be guaranteed not used by box firmware.
> > But I didn't find the area was used by firmware from memmap 
> dumped in efi
> > shell.
> >
> > Though, I still dropped down this 16M from dom0, and run 
> kernel build,
> > I haven't see gcc segmentation fault since then.
> >
> > I'm not sure if it is the root cause, just FYI.
> 
> From my experience, this bug occurs only in domU (ie, I was 
> not able to 
> reproduce it in dom0).
> 
> During this night, I was able to compile linux kernel 55 
> times without any 
> segfault.  I don't know the frequence of this bug, but I 
> thought it was more 
> frequent.
> 
> When SMP-g is activated, such a bug appears very easily.  But 
> SMP-g is not 
> stable!
> 
> Tristan.
> 
> 
> 

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.