[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make xen0 more stable


  • To: "Xu, Anthony" <anthony.xu@xxxxxxxxx>
  • From: "Magenheimer, Dan (HP Labs Fort Collins)" <dan.magenheimer@xxxxxx>
  • Date: Tue, 18 Oct 2005 07:10:23 -0700
  • Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 18 Oct 2005 14:07:41 +0000
  • List-id: Discussion of the ia64 port of Xen <xen-ia64-devel.lists.xensource.com>
  • Thread-index: AcXP2HRI83rf2/ODRBuc5moDrgGuwgAGusrwAAqDThAABKMkoAAL0j3AAAZQlCAAHDZ80AADyA3wAHMgqiAANiI6oAAT/FBg
  • Thread-topic: [Xen-ia64-devel] [PATCH] fixed some bugs to make xen0 more stable

I agree.  Since it happens so rarely and the failure is
very visible, we should worry about tracking it later.
From the symptoms, I suspect it is another case where
a rid is not getting mangled or unmangled or something
like that.

> -----Original Message-----
> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] 
> Sent: Monday, October 17, 2005 10:39 PM
> To: Magenheimer, Dan (HP Labs Fort Collins)
> Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make 
> xen0 more stable
> 
> Yes, I need wait very long to trigger this, the build process 
> is very slow on my machine. Can we leave it alone, and 
> revisit it later?
> 
> >-----Original Message-----
> >From: Magenheimer, Dan (HP Labs Fort Collins) 
> [mailto:dan.magenheimer@xxxxxx]
> >Sent: 2005å10æ17æ 10:49
> >To: Xu, Anthony
> >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to 
> make xen0 more stable
> >
> >I ran tests all weekend long.  59 out of 60 builds were
> >successful.  One failed, with the same message as below.
> >At least it is reproducible... if you wait long enough :-(
> >
> >> -----Original Message-----
> >> From: Magenheimer, Dan (HP Labs Fort Collins)
> >> Sent: Friday, October 14, 2005 1:57 PM
> >> To: 'Xu, Anthony'
> >> Cc: 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx'
> >> Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> xen0 more stable
> >>
> >> After 12 successful builds, I got two in a row that failed
> >> with a segmentation fault. :-(  Since the heartbeat is now 
> turned off,
> >> I can see that Xen is giving a clue as to what the problem is.
> >> When both faults happened, even though the failure shows up at
> >> a different place in the build I got an identical 
> non-fatal message:
> >>
> >> vcpu_translate: bad address: 0000000005a65a69, 
> viip=2000000000163750,
> >>  vipsr=00001213081a6018,  iip=20000000001d6180, 
> ipsr=0000101308126018
> >>
> >> I wonder what that address is... I have seen it before.
> >> Perhaps it is predicates?
> >>
> >> I won't have much of an opportunity to look further for this
> >> for awhile so wanted to post what I've seen to date.
> >>
> >> Dan
> >>
> >> > -----Original Message-----
> >> > From: Magenheimer, Dan (HP Labs Fort Collins)
> >> > Sent: Friday, October 14, 2005 12:05 PM
> >> > To: Xu, Anthony
> >> > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> > xen0 more stable
> >> >
> >> > There were definitely some bugs involving the itir in
> >> > vcpu_translate.  In the process of fixing them,
> >> > I was over-aggressive in cleaning up some code.
> >> > When I backed out some of that cleanup, everything
> >> > seems to be fine.  (I still get a couple of NaT fault
> >> > messages every compile, but they seem to be harmless.)
> >> >
> >> > The segfault problem occurs rarely enough that I don't
> >> > know if I fixed it but have run 9 builds without
> >> > a problem now and I definitely fixed some itir
> >> > problems, so I have committed the changeset to
> >> > xen-ia64-unstable.
> >> >
> >> > > -----Original Message-----
> >> > > From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> > > [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf
> >> > > Of Magenheimer, Dan (HP Labs Fort Collins)
> >> > > Sent: Thursday, October 13, 2005 10:37 PM
> >> > > To: Xu, Anthony
> >> > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> > > xen0 more stable
> >> > >
> >> > > In my testing, I now saw what appeared to be an infinite loop
> >> > > of NaT faults.  The "ps" command showed a "sh" with several
> >> > > minutes of CPU time while the console window scrolled 
> continually
> >> > > with "NaT fault... attempting to handle as privop".  This may
> >> > > or may not be a side effect of the patch I am testing.  I'll
> >> > > see if it shows up again (but am logging off now until the
> >> > > morning).
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >> > > > Sent: Thursday, October 13, 2005 8:41 PM
> >> > > > To: Magenheimer, Dan (HP Labs Fort Collins)
> >> > > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > > > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make
> >> > > > xen0 more stable
> >> > > >
> >> > > > We shouldn't see any Nat faults. And I didn't see Nat faults
> >> > > > on my test.
> >> > > >
> >> > > >
> >> > > > >-----Original Message-----
> >> > > > >From: Magenheimer, Dan (HP Labs Fort Collins)
> >> > > > [mailto:dan.magenheimer@xxxxxx]
> >> > > > >Sent: 2005å10æ14æ 3:59
> >> > > > >To: Xu, Anthony
> >> > > > >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> > > > >Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to
> >> > > > make xen0 more stable
> >> > > > >
> >> > > > >> However, my testing is not going well so far.  I had just
> >> > > > >> completed compiling Linux 15 times on tip (with Tristan's
> >> > > > >> SMP patch) without any problems, but 2 of 5 runs 
> so far with
> >> > > > >> this new patch failed with segment faults.
> >> > > > >
> >> > > > >Followed by six successful builds :-%
> >> > > > >
> >> > > > >I'm going to assume this is a random occurrence of a bug
> >> > > > >unrelated to your patch that happens to occur only every
> >> > > > >few hours or so and will commit your patch.
> >> > > > >
> >> > > > >By the way, I am now seeing two NaT faults per Linux build
> >> > > > >that are printing "attempting to handle as privop."
> >> > > > >I assume your fix exposed these but the messages are
> >> > > > >harmless?
> >> > > > >
> >> > > > >Dan
> >> > > >
> >> > >
> >> >
> >>
> 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.