[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-ia64-devel] PATCH: slightly improve stability
Hi Anthony -- With both Tristan's stability patch and your earlier patch, I have completed 103 linux compiles now with no segfaults yet. Did you see your segfault with Tristan's patch included? I'll continue running over the weekend with the bits I have but if I see a segfault I will add in the additional store in Xen entry (minstate.h) from your newer patch. Dan > -----Original Message----- > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > Sent: Saturday, April 29, 2006 12:03 AM > To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux > Kernel Dev) > Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > > Hi Dan, > > Yes, we also got a segmentation fault in 1 run out of 30. > > Could you please try this new patch? > > Thanks, > -Anthony > > >-----Original Message----- > >From: Magenheimer, Dan (HP Labs Fort Collins) > [mailto:dan.magenheimer@xxxxxx] > >Sent: 2006å4æ28æ 22:49 > >To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; > >Williamson, Alex (Linux Kernel Dev) > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > > > >Hi Anthony -- > > > >I tried your patch overnight and still got a segmentation > >fault in 1 run out of 50. I didn't try Tristan's patch yet, > >so will try both at the same time next... perhaps there > >are two different problems that show up as the segmentation > >fault. > > > >Dan > > > >> -----Original Message----- > >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] > >> Sent: Thursday, April 27, 2006 9:19 PM > >> To: Xu, Anthony; Tristan Gingold; > >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs > >> Fort Collins); Williamson, Alex (Linux Kernel Dev) > >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >> > >> Hi Tristan, > >> Could you please check whether this patch address RSE issue? > >> > >> Yes, Intel QA team is doing the test in the meantime. > >> > >> > >> Thanks, > >> -Anthony > >> > >> >-----Original Message----- > >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On > >> Behalf Of Xu, Anthony > >> >Sent: 2006?4?28? 9:48 > >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; > >> Magenheimer, Dan (HP > >> >Labs Fort Collins); Alex Williamson > >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > >> > > >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On > >> Behalf Of Tristan > >> >>Gingold > >> >>Sent: 2006?4?27? 23:14 > >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan > >> (HP Labs Fort > >> >>Collins); Alex Williamson > >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability > >> >> > >> >>Hi, > >> >> > >> >>as reported earlier, this patch seems to improve stability: > >> crashes are at > >> >>least more coherent and maybe less frequent. > >> >> > >> >>RSE handling seems to have a bug: crahes are now due to > >> either a bad value in > >> >>a stacked register or a use of an invalid stacked register > >> (although cfm > >> >>seems correct in gdb!) > >> > > >> >I'm looking at this too, > >> >Yes there is a bug about handle_lazy_cover. > >> > > >> >void ia64_do_page_fault (unsigned long address, unsigned > >> long isr, struct > >> >pt_regs *regs, unsigned long itir) > >> >{ > >> > unsigned long iip = regs->cr_iip, iha; > >> > // FIXME should validate address here > >> > unsigned long pteval; > >> > unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL); > >> > IA64FAULT fault; > >> > > >> > if ((isr & IA64_ISR_IR) && handle_lazy_cover(current, > >> isr, regs)) return; > >> > > >> >This code sequence is intended to handle following scenario. > >> > > >> >1. Guest executes br.ret, this may cause mandatory RSE load, > >> and this load may > >> >cause TLB miss. > >> >2. VMM gets control, but VMM can't handle this TLB miss > >> itself, then VMM injects > >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi" > >> to jump to Guest > >> >TLB miss handler, this TLB miss happens again. > >> >3. At this time, interrupt_collection_enabled is 0, so > >> handle_lazy_cover > >> >executes "cover" on behalf of Guest, and return to Guest TLB > >> miss handler again, > >> >this time there is no TLB miss. > >> > > >> > > >> >Following code sequence is in ia64_leave_kernel path with > >> psr.ic and psr.i off. > >> >When br.ret.dptk.many b0 is executed, there may be a > >> mandatory load, thus > >> >There may be a tlb miss, according to above description > >> handle_lazy_cover > >> >executes "cover" on behalf of Guest and return to Guest, > >> this is no correct > >> >in this scenario. > >> > > >> >I didn't find an easy way to fix this bug. > >> > > >> > > >> > mov loc6=0 > >> > mov loc7=0 > >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid > >> > ;; > >> > mov loc8=0 > >> > mov loc9=0 > >> > cmp.ne pReturn,p0=r0,in1 // if recursion count > >> != 0, we need to do a > >> >br.ret > >> > mov loc10=0 > >> > mov loc11=0 > >> >(pReturn) br.ret.dptk.many b0 > >> >#endif /* !CONFIG_ITANIUM */ > >> ># undef pRecurse > >> ># undef pReturn > >> > ;; > >> > alloc r17=ar.pfs,0,0,0,0 // drop current register frame > >> > ;; > >> > loadrs > >> > > >> >Thanks, > >> >Anthony > >> > > >> > > >> >> > >> >>Tested by doing many linux kernel compilation in SMP > domU (> 100). > >> >> > >> >>Tristan. > >> > > >> >_______________________________________________ > >> >Xen-ia64-devel mailing list > >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >> >http://lists.xensource.com/xen-ia64-devel > >> > _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |