[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] PATCH: slightly improve stability



Argh!  After 103 successful linux compiles, two of the
next 10 had a segfault.  Restarting again with Anthony's
updated patch (plus Tristan's stability patch)... 

> -----Original Message-----
> From: Magenheimer, Dan (HP Labs Fort Collins) 
> Sent: Saturday, April 29, 2006 7:58 AM
> To: 'Xu, Anthony'; Tristan Gingold; 
> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux 
> Kernel Dev)
> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> 
> Hi Anthony --
> 
> With both Tristan's stability patch and your earlier patch,
> I have completed 103 linux compiles now with no segfaults
> yet.   Did you see your segfault with Tristan's patch
> included?
> 
> I'll continue running over the weekend with the bits I
> have but if I see a segfault I will add in the additional
> store in Xen entry (minstate.h) from your newer patch.
> 
> Dan
> 
> > -----Original Message-----
> > From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] 
> > Sent: Saturday, April 29, 2006 12:03 AM
> > To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; 
> > xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux 
> > Kernel Dev)
> > Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > 
> > Hi Dan,
> > 
> > Yes, we also got a segmentation fault in 1 run out of 30.
> > 
> > Could you please try this new patch?
> > 
> > Thanks,
> > -Anthony 
> > 
> > >-----Original Message-----
> > >From: Magenheimer, Dan (HP Labs Fort Collins) 
> > [mailto:dan.magenheimer@xxxxxx]
> > >Sent: 2006å4æ28æ 22:49
> > >To: Xu, Anthony; Tristan Gingold; 
> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> > >Williamson, Alex (Linux Kernel Dev)
> > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > >
> > >Hi Anthony --
> > >
> > >I tried your patch overnight and still got a segmentation
> > >fault in 1 run out of 50.  I didn't try Tristan's patch yet,
> > >so will try both at the same time next... perhaps there
> > >are two different problems that show up as the segmentation
> > >fault.
> > >
> > >Dan
> > >
> > >> -----Original Message-----
> > >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> > >> Sent: Thursday, April 27, 2006 9:19 PM
> > >> To: Xu, Anthony; Tristan Gingold;
> > >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs
> > >> Fort Collins); Williamson, Alex (Linux Kernel Dev)
> > >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > >>
> > >> Hi Tristan,
> > >> Could you please check whether this patch address RSE issue?
> > >>
> > >> Yes, Intel QA team is doing the test in the meantime.
> > >>
> > >>
> > >> Thanks,
> > >> -Anthony
> > >>
> > >> >-----Original Message-----
> > >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> > >> Behalf Of Xu, Anthony
> > >> >Sent: 2006?4?28? 9:48
> > >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> > >> Magenheimer, Dan (HP
> > >> >Labs Fort Collins); Alex Williamson
> > >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> > >> >
> > >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> > >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> > >> Behalf Of Tristan
> > >> >>Gingold
> > >> >>Sent: 2006?4?27? 23:14
> > >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan
> > >> (HP Labs Fort
> > >> >>Collins); Alex Williamson
> > >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
> > >> >>
> > >> >>Hi,
> > >> >>
> > >> >>as reported earlier, this patch seems to improve stability:
> > >> crashes are at
> > >> >>least more coherent and maybe less frequent.
> > >> >>
> > >> >>RSE handling seems to have a bug: crahes are now due to
> > >> either a bad value in
> > >> >>a stacked register or a use of an invalid stacked register
> > >> (although cfm
> > >> >>seems correct in gdb!)
> > >> >
> > >> >I'm looking at this too,
> > >> >Yes there is a bug about handle_lazy_cover.
> > >> >
> > >> >void ia64_do_page_fault (unsigned long address, unsigned
> > >> long isr, struct
> > >> >pt_regs *regs, unsigned long itir)
> > >> >{
> > >> >        unsigned long iip = regs->cr_iip, iha;
> > >> >        // FIXME should validate address here
> > >> >        unsigned long pteval;
> > >> >        unsigned long is_data = !((isr >> 
> IA64_ISR_X_BIT) & 1UL);
> > >> >        IA64FAULT fault;
> > >> >
> > >> >        if ((isr & IA64_ISR_IR) && handle_lazy_cover(current,
> > >> isr, regs)) return;
> > >> >
> > >> >This code sequence is intended to handle following scenario.
> > >> >
> > >> >1. Guest executes br.ret, this may cause mandatory RSE load,
> > >> and this load may
> > >> >cause TLB miss.
> > >> >2. VMM gets control, but VMM can't handle this TLB miss
> > >> itself, then VMM injects
> > >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi"
> > >> to jump to Guest
> > >> >TLB miss handler, this TLB miss happens again.
> > >> >3. At this time, interrupt_collection_enabled is 0, so
> > >> handle_lazy_cover
> > >> >executes "cover" on behalf of Guest, and return to Guest TLB
> > >> miss handler again,
> > >> >this time there is no TLB miss.
> > >> >
> > >> >
> > >> >Following code sequence is in ia64_leave_kernel path with
> > >> psr.ic and psr.i off.
> > >> >When br.ret.dptk.many b0 is executed, there may be a
> > >> mandatory load, thus
> > >> >There may be a tlb miss, according to above description
> > >> handle_lazy_cover
> > >> >executes "cover" on behalf of Guest and return to Guest,
> > >> this is no correct
> > >> >in this scenario.
> > >> >
> > >> >I didn't find an easy way to fix this bug.
> > >> >
> > >> >
> > >> >        mov loc6=0
> > >> >        mov loc7=0
> > >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid
> > >> >        ;;
> > >> >        mov loc8=0
> > >> >        mov loc9=0
> > >> >        cmp.ne pReturn,p0=r0,in1        // if recursion count
> > >> != 0, we need to do a
> > >> >br.ret
> > >> >        mov loc10=0
> > >> >        mov loc11=0
> > >> >(pReturn) br.ret.dptk.many b0
> > >> >#endif /* !CONFIG_ITANIUM */
> > >> >#       undef pRecurse
> > >> >#       undef pReturn
> > >> >        ;;
> > >> >        alloc r17=ar.pfs,0,0,0,0        // drop current 
> register frame
> > >> >        ;;
> > >> >        loadrs
> > >> >
> > >> >Thanks,
> > >> >Anthony
> > >> >
> > >> >
> > >> >>
> > >> >>Tested by doing many linux kernel compilation in SMP 
> > domU (> 100).
> > >> >>
> > >> >>Tristan.
> > >> >
> > >> >_______________________________________________
> > >> >Xen-ia64-devel mailing list
> > >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> > >> >http://lists.xensource.com/xen-ia64-devel
> > >>
> > 
> 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.