[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make xen0 more stable
I agree. Since it happens so rarely and the failure is very visible, we should worry about tracking it later. From the symptoms, I suspect it is another case where a rid is not getting mangled or unmangled or something like that. > -----Original Message----- > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > Sent: Monday, October 17, 2005 10:39 PM > To: Magenheimer, Dan (HP Labs Fort Collins) > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make > xen0 more stable > > Yes, I need wait very long to trigger this, the build process > is very slow on my machine. Can we leave it alone, and > revisit it later? > > >-----Original Message----- > >From: Magenheimer, Dan (HP Labs Fort Collins) > [mailto:dan.magenheimer@xxxxxx] > >Sent: 2005å10æ17æ 10:49 > >To: Xu, Anthony > >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to > make xen0 more stable > > > >I ran tests all weekend long. 59 out of 60 builds were > >successful. One failed, with the same message as below. > >At least it is reproducible... if you wait long enough :-( > > > >> -----Original Message----- > >> From: Magenheimer, Dan (HP Labs Fort Collins) > >> Sent: Friday, October 14, 2005 1:57 PM > >> To: 'Xu, Anthony' > >> Cc: 'xen-ia64-devel@xxxxxxxxxxxxxxxxxxx' > >> Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make > >> xen0 more stable > >> > >> After 12 successful builds, I got two in a row that failed > >> with a segmentation fault. :-( Since the heartbeat is now > turned off, > >> I can see that Xen is giving a clue as to what the problem is. > >> When both faults happened, even though the failure shows up at > >> a different place in the build I got an identical > non-fatal message: > >> > >> vcpu_translate: bad address: 0000000005a65a69, > viip=2000000000163750, > >> vipsr=00001213081a6018, iip=20000000001d6180, > ipsr=0000101308126018 > >> > >> I wonder what that address is... I have seen it before. > >> Perhaps it is predicates? > >> > >> I won't have much of an opportunity to look further for this > >> for awhile so wanted to post what I've seen to date. > >> > >> Dan > >> > >> > -----Original Message----- > >> > From: Magenheimer, Dan (HP Labs Fort Collins) > >> > Sent: Friday, October 14, 2005 12:05 PM > >> > To: Xu, Anthony > >> > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make > >> > xen0 more stable > >> > > >> > There were definitely some bugs involving the itir in > >> > vcpu_translate. In the process of fixing them, > >> > I was over-aggressive in cleaning up some code. > >> > When I backed out some of that cleanup, everything > >> > seems to be fine. (I still get a couple of NaT fault > >> > messages every compile, but they seem to be harmless.) > >> > > >> > The segfault problem occurs rarely enough that I don't > >> > know if I fixed it but have run 9 builds without > >> > a problem now and I definitely fixed some itir > >> > problems, so I have committed the changeset to > >> > xen-ia64-unstable. > >> > > >> > > -----Original Message----- > >> > > From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >> > > [mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf > >> > > Of Magenheimer, Dan (HP Labs Fort Collins) > >> > > Sent: Thursday, October 13, 2005 10:37 PM > >> > > To: Xu, Anthony > >> > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> > > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make > >> > > xen0 more stable > >> > > > >> > > In my testing, I now saw what appeared to be an infinite loop > >> > > of NaT faults. The "ps" command showed a "sh" with several > >> > > minutes of CPU time while the console window scrolled > continually > >> > > with "NaT fault... attempting to handle as privop". This may > >> > > or may not be a side effect of the patch I am testing. I'll > >> > > see if it shows up again (but am logging off now until the > >> > > morning). > >> > > > >> > > > -----Original Message----- > >> > > > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > >> > > > Sent: Thursday, October 13, 2005 8:41 PM > >> > > > To: Magenheimer, Dan (HP Labs Fort Collins) > >> > > > Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> > > > Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to make > >> > > > xen0 more stable > >> > > > > >> > > > We shouldn't see any Nat faults. And I didn't see Nat faults > >> > > > on my test. > >> > > > > >> > > > > >> > > > >-----Original Message----- > >> > > > >From: Magenheimer, Dan (HP Labs Fort Collins) > >> > > > [mailto:dan.magenheimer@xxxxxx] > >> > > > >Sent: 2005å10æ14æ 3:59 > >> > > > >To: Xu, Anthony > >> > > > >Cc: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> > > > >Subject: RE: [Xen-ia64-devel] [PATCH] fixed some bugs to > >> > > > make xen0 more stable > >> > > > > > >> > > > >> However, my testing is not going well so far. I had just > >> > > > >> completed compiling Linux 15 times on tip (with Tristan's > >> > > > >> SMP patch) without any problems, but 2 of 5 runs > so far with > >> > > > >> this new patch failed with segment faults. > >> > > > > > >> > > > >Followed by six successful builds :-% > >> > > > > > >> > > > >I'm going to assume this is a random occurrence of a bug > >> > > > >unrelated to your patch that happens to occur only every > >> > > > >few hours or so and will commit your patch. > >> > > > > > >> > > > >By the way, I am now seeing two NaT faults per Linux build > >> > > > >that are printing "attempting to handle as privop." > >> > > > >I assume your fix exposed these but the messages are > >> > > > >harmless? > >> > > > > > >> > > > >Dan > >> > > > > >> > > > >> > > >> > _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |