[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-ia64-devel] PATCH: slightly improve stability
With this new patch (not including Tristan's stability patch by far), we can Successfully finish 50 linux compiles. We'll continue the test. Thanks, -Anthony >-----Original Message----- >From: Magenheimer, Dan (HP Labs Fort Collins) [mailto:dan.magenheimer@xxxxxx] >Sent: 2006å4æ30æ 0:13 >To: Magenheimer, Dan (HP Labs Fort Collins); Xu, Anthony; Tristan Gingold; >xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux Kernel Dev) >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >Argh! After 103 successful linux compiles, two of the >next 10 had a segfault. Restarting again with Anthony's >updated patch (plus Tristan's stability patch)... > >> -----Original Message----- >> From: Magenheimer, Dan (HP Labs Fort Collins) >> Sent: Saturday, April 29, 2006 7:58 AM >> To: 'Xu, Anthony'; Tristan Gingold; >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux >> Kernel Dev) >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability >> >> Hi Anthony -- >> >> With both Tristan's stability patch and your earlier patch, >> I have completed 103 linux compiles now with no segfaults >> yet. Did you see your segfault with Tristan's patch >> included? >> >> I'll continue running over the weekend with the bits I >> have but if I see a segfault I will add in the additional >> store in Xen entry (minstate.h) from your newer patch. >> >> Dan >> >> > -----Original Message----- >> > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >> > Sent: Saturday, April 29, 2006 12:03 AM >> > To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; >> > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux >> > Kernel Dev) >> > Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability >> > >> > Hi Dan, >> > >> > Yes, we also got a segmentation fault in 1 run out of 30. >> > >> > Could you please try this new patch? >> > >> > Thanks, >> > -Anthony >> > >> > >-----Original Message----- >> > >From: Magenheimer, Dan (HP Labs Fort Collins) >> > [mailto:dan.magenheimer@xxxxxx] >> > >Sent: 2006å4æ28æ 22:49 >> > >To: Xu, Anthony; Tristan Gingold; >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; >> > >Williamson, Alex (Linux Kernel Dev) >> > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability >> > > >> > >Hi Anthony -- >> > > >> > >I tried your patch overnight and still got a segmentation >> > >fault in 1 run out of 50. I didn't try Tristan's patch yet, >> > >so will try both at the same time next... perhaps there >> > >are two different problems that show up as the segmentation >> > >fault. >> > > >> > >Dan >> > > >> > >> -----Original Message----- >> > >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] >> > >> Sent: Thursday, April 27, 2006 9:19 PM >> > >> To: Xu, Anthony; Tristan Gingold; >> > >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs >> > >> Fort Collins); Williamson, Alex (Linux Kernel Dev) >> > >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability >> > >> >> > >> Hi Tristan, >> > >> Could you please check whether this patch address RSE issue? >> > >> >> > >> Yes, Intel QA team is doing the test in the meantime. >> > >> >> > >> >> > >> Thanks, >> > >> -Anthony >> > >> >> > >> >-----Original Message----- >> > >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx >> > >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On >> > >> Behalf Of Xu, Anthony >> > >> >Sent: 2006?4?28? 9:48 >> > >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; >> > >> Magenheimer, Dan (HP >> > >> >Labs Fort Collins); Alex Williamson >> > >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability >> > >> > >> > >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx >> > >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On >> > >> Behalf Of Tristan >> > >> >>Gingold >> > >> >>Sent: 2006?4?27? 23:14 >> > >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan >> > >> (HP Labs Fort >> > >> >>Collins); Alex Williamson >> > >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability >> > >> >> >> > >> >>Hi, >> > >> >> >> > >> >>as reported earlier, this patch seems to improve stability: >> > >> crashes are at >> > >> >>least more coherent and maybe less frequent. >> > >> >> >> > >> >>RSE handling seems to have a bug: crahes are now due to >> > >> either a bad value in >> > >> >>a stacked register or a use of an invalid stacked register >> > >> (although cfm >> > >> >>seems correct in gdb!) >> > >> > >> > >> >I'm looking at this too, >> > >> >Yes there is a bug about handle_lazy_cover. >> > >> > >> > >> >void ia64_do_page_fault (unsigned long address, unsigned >> > >> long isr, struct >> > >> >pt_regs *regs, unsigned long itir) >> > >> >{ >> > >> > unsigned long iip = regs->cr_iip, iha; >> > >> > // FIXME should validate address here >> > >> > unsigned long pteval; >> > >> > unsigned long is_data = !((isr >> >> IA64_ISR_X_BIT) & 1UL); >> > >> > IA64FAULT fault; >> > >> > >> > >> > if ((isr & IA64_ISR_IR) && handle_lazy_cover(current, >> > >> isr, regs)) return; >> > >> > >> > >> >This code sequence is intended to handle following scenario. >> > >> > >> > >> >1. Guest executes br.ret, this may cause mandatory RSE load, >> > >> and this load may >> > >> >cause TLB miss. >> > >> >2. VMM gets control, but VMM can't handle this TLB miss >> > >> itself, then VMM injects >> > >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi" >> > >> to jump to Guest >> > >> >TLB miss handler, this TLB miss happens again. >> > >> >3. At this time, interrupt_collection_enabled is 0, so >> > >> handle_lazy_cover >> > >> >executes "cover" on behalf of Guest, and return to Guest TLB >> > >> miss handler again, >> > >> >this time there is no TLB miss. >> > >> > >> > >> > >> > >> >Following code sequence is in ia64_leave_kernel path with >> > >> psr.ic and psr.i off. >> > >> >When br.ret.dptk.many b0 is executed, there may be a >> > >> mandatory load, thus >> > >> >There may be a tlb miss, according to above description >> > >> handle_lazy_cover >> > >> >executes "cover" on behalf of Guest and return to Guest, >> > >> this is no correct >> > >> >in this scenario. >> > >> > >> > >> >I didn't find an easy way to fix this bug. >> > >> > >> > >> > >> > >> > mov loc6=0 >> > >> > mov loc7=0 >> > >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid >> > >> > ;; >> > >> > mov loc8=0 >> > >> > mov loc9=0 >> > >> > cmp.ne pReturn,p0=r0,in1 // if recursion count >> > >> != 0, we need to do a >> > >> >br.ret >> > >> > mov loc10=0 >> > >> > mov loc11=0 >> > >> >(pReturn) br.ret.dptk.many b0 >> > >> >#endif /* !CONFIG_ITANIUM */ >> > >> ># undef pRecurse >> > >> ># undef pReturn >> > >> > ;; >> > >> > alloc r17=ar.pfs,0,0,0,0 // drop current >> register frame >> > >> > ;; >> > >> > loadrs >> > >> > >> > >> >Thanks, >> > >> >Anthony >> > >> > >> > >> > >> > >> >> >> > >> >>Tested by doing many linux kernel compilation in SMP >> > domU (> 100). >> > >> >> >> > >> >>Tristan. >> > >> > >> > >> >_______________________________________________ >> > >> >Xen-ia64-devel mailing list >> > >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx >> > >> >http://lists.xensource.com/xen-ia64-devel >> > >> >> > >> _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |