[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] PATCH: slightly improve stability



Hi Anthony --

With both Tristan's stability patch and your earlier patch,
I have completed 103 linux compiles now with no segfaults
yet.   Did you see your segfault with Tristan's patch
included?

I'll continue running over the weekend with the bits I
have but if I see a segfault I will add in the additional
store in Xen entry (minstate.h) from your newer patch.

Dan

> -----Original Message-----
> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx] 
> Sent: Saturday, April 29, 2006 12:03 AM
> To: Magenheimer, Dan (HP Labs Fort Collins); Tristan Gingold; 
> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Williamson, Alex (Linux 
> Kernel Dev)
> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> 
> Hi Dan,
> 
> Yes, we also got a segmentation fault in 1 run out of 30.
> 
> Could you please try this new patch?
> 
> Thanks,
> -Anthony 
> 
> >-----Original Message-----
> >From: Magenheimer, Dan (HP Labs Fort Collins) 
> [mailto:dan.magenheimer@xxxxxx]
> >Sent: 2006å4æ28æ 22:49
> >To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> >Williamson, Alex (Linux Kernel Dev)
> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >
> >Hi Anthony --
> >
> >I tried your patch overnight and still got a segmentation
> >fault in 1 run out of 50.  I didn't try Tristan's patch yet,
> >so will try both at the same time next... perhaps there
> >are two different problems that show up as the segmentation
> >fault.
> >
> >Dan
> >
> >> -----Original Message-----
> >> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
> >> Sent: Thursday, April 27, 2006 9:19 PM
> >> To: Xu, Anthony; Tristan Gingold;
> >> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs
> >> Fort Collins); Williamson, Alex (Linux Kernel Dev)
> >> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >>
> >> Hi Tristan,
> >> Could you please check whether this patch address RSE issue?
> >>
> >> Yes, Intel QA team is doing the test in the meantime.
> >>
> >>
> >> Thanks,
> >> -Anthony
> >>
> >> >-----Original Message-----
> >> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> >> Behalf Of Xu, Anthony
> >> >Sent: 2006?4?28? 9:48
> >> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
> >> Magenheimer, Dan (HP
> >> >Labs Fort Collins); Alex Williamson
> >> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
> >> >
> >> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
> >> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
> >> Behalf Of Tristan
> >> >>Gingold
> >> >>Sent: 2006?4?27? 23:14
> >> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan
> >> (HP Labs Fort
> >> >>Collins); Alex Williamson
> >> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
> >> >>
> >> >>Hi,
> >> >>
> >> >>as reported earlier, this patch seems to improve stability:
> >> crashes are at
> >> >>least more coherent and maybe less frequent.
> >> >>
> >> >>RSE handling seems to have a bug: crahes are now due to
> >> either a bad value in
> >> >>a stacked register or a use of an invalid stacked register
> >> (although cfm
> >> >>seems correct in gdb!)
> >> >
> >> >I'm looking at this too,
> >> >Yes there is a bug about handle_lazy_cover.
> >> >
> >> >void ia64_do_page_fault (unsigned long address, unsigned
> >> long isr, struct
> >> >pt_regs *regs, unsigned long itir)
> >> >{
> >> >  unsigned long iip = regs->cr_iip, iha;
> >> >  // FIXME should validate address here
> >> >  unsigned long pteval;
> >> >  unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL);
> >> >  IA64FAULT fault;
> >> >
> >> >  if ((isr & IA64_ISR_IR) && handle_lazy_cover(current,
> >> isr, regs)) return;
> >> >
> >> >This code sequence is intended to handle following scenario.
> >> >
> >> >1. Guest executes br.ret, this may cause mandatory RSE load,
> >> and this load may
> >> >cause TLB miss.
> >> >2. VMM gets control, but VMM can't handle this TLB miss
> >> itself, then VMM injects
> >> >TLB miss to Guest TLB miss handler, when VMM executing "rfi"
> >> to jump to Guest
> >> >TLB miss handler, this TLB miss happens again.
> >> >3. At this time, interrupt_collection_enabled is 0, so
> >> handle_lazy_cover
> >> >executes "cover" on behalf of Guest, and return to Guest TLB
> >> miss handler again,
> >> >this time there is no TLB miss.
> >> >
> >> >
> >> >Following code sequence is in ia64_leave_kernel path with
> >> psr.ic and psr.i off.
> >> >When br.ret.dptk.many b0 is executed, there may be a
> >> mandatory load, thus
> >> >There may be a tlb miss, according to above description
> >> handle_lazy_cover
> >> >executes "cover" on behalf of Guest and return to Guest,
> >> this is no correct
> >> >in this scenario.
> >> >
> >> >I didn't find an easy way to fix this bug.
> >> >
> >> >
> >> >  mov loc6=0
> >> >  mov loc7=0
> >> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid
> >> >  ;;
> >> >  mov loc8=0
> >> >  mov loc9=0
> >> >  cmp.ne pReturn,p0=r0,in1        // if recursion count
> >> != 0, we need to do a
> >> >br.ret
> >> >  mov loc10=0
> >> >  mov loc11=0
> >> >(pReturn) br.ret.dptk.many b0
> >> >#endif /* !CONFIG_ITANIUM */
> >> ># undef pRecurse
> >> ># undef pReturn
> >> >  ;;
> >> >  alloc r17=ar.pfs,0,0,0,0        // drop current register frame
> >> >  ;;
> >> >  loadrs
> >> >
> >> >Thanks,
> >> >Anthony
> >> >
> >> >
> >> >>
> >> >>Tested by doing many linux kernel compilation in SMP 
> domU (> 100).
> >> >>
> >> >>Tristan.
> >> >
> >> >_______________________________________________
> >> >Xen-ia64-devel mailing list
> >> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
> >> >http://lists.xensource.com/xen-ia64-devel
> >>
> 
_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.