[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-ia64-devel] PATCH: slightly improve stability



Hi Dan,

Yes, we also got a segmentation fault in 1 run out of 30.

Could you please try this new patch?

Thanks,
-Anthony 

>-----Original Message-----
>From: Magenheimer, Dan (HP Labs Fort Collins)
[mailto:dan.magenheimer@xxxxxx]
>Sent: 2006?4?28? 22:49
>To: Xu, Anthony; Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
>Williamson, Alex (Linux Kernel Dev)
>Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>
>Hi Anthony --
>
>I tried your patch overnight and still got a segmentation
>fault in 1 run out of 50.  I didn't try Tristan's patch yet,
>so will try both at the same time next... perhaps there
>are two different problems that show up as the segmentation
>fault.
>
>Dan
>
>> -----Original Message-----
>> From: Xu, Anthony [mailto:anthony.xu@xxxxxxxxx]
>> Sent: Thursday, April 27, 2006 9:19 PM
>> To: Xu, Anthony; Tristan Gingold;
>> xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs
>> Fort Collins); Williamson, Alex (Linux Kernel Dev)
>> Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>>
>> Hi Tristan,
>> Could you please check whether this patch address RSE issue?
>>
>> Yes, Intel QA team is doing the test in the meantime.
>>
>>
>> Thanks,
>> -Anthony
>>
>> >-----Original Message-----
>> >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
>> Behalf Of Xu, Anthony
>> >Sent: 2006?4?28? 9:48
>> >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx;
>> Magenheimer, Dan (HP
>> >Labs Fort Collins); Alex Williamson
>> >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability
>> >
>> >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx
>> >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On
>> Behalf Of Tristan
>> >>Gingold
>> >>Sent: 2006?4?27? 23:14
>> >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan
>> (HP Labs Fort
>> >>Collins); Alex Williamson
>> >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability
>> >>
>> >>Hi,
>> >>
>> >>as reported earlier, this patch seems to improve stability:
>> crashes are at
>> >>least more coherent and maybe less frequent.
>> >>
>> >>RSE handling seems to have a bug: crahes are now due to
>> either a bad value in
>> >>a stacked register or a use of an invalid stacked register
>> (although cfm
>> >>seems correct in gdb!)
>> >
>> >I'm looking at this too,
>> >Yes there is a bug about handle_lazy_cover.
>> >
>> >void ia64_do_page_fault (unsigned long address, unsigned
>> long isr, struct
>> >pt_regs *regs, unsigned long itir)
>> >{
>> >    unsigned long iip = regs->cr_iip, iha;
>> >    // FIXME should validate address here
>> >    unsigned long pteval;
>> >    unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL);
>> >    IA64FAULT fault;
>> >
>> >    if ((isr & IA64_ISR_IR) && handle_lazy_cover(current,
>> isr, regs)) return;
>> >
>> >This code sequence is intended to handle following scenario.
>> >
>> >1. Guest executes br.ret, this may cause mandatory RSE load,
>> and this load may
>> >cause TLB miss.
>> >2. VMM gets control, but VMM can't handle this TLB miss
>> itself, then VMM injects
>> >TLB miss to Guest TLB miss handler, when VMM executing "rfi"
>> to jump to Guest
>> >TLB miss handler, this TLB miss happens again.
>> >3. At this time, interrupt_collection_enabled is 0, so
>> handle_lazy_cover
>> >executes "cover" on behalf of Guest, and return to Guest TLB
>> miss handler again,
>> >this time there is no TLB miss.
>> >
>> >
>> >Following code sequence is in ia64_leave_kernel path with
>> psr.ic and psr.i off.
>> >When br.ret.dptk.many b0 is executed, there may be a
>> mandatory load, thus
>> >There may be a tlb miss, according to above description
>> handle_lazy_cover
>> >executes "cover" on behalf of Guest and return to Guest,
>> this is no correct
>> >in this scenario.
>> >
>> >I didn't find an easy way to fix this bug.
>> >
>> >
>> >    mov loc6=0
>> >    mov loc7=0
>> >(pRecurse) br.call.dptk.few b0=rse_clear_invalid
>> >    ;;
>> >    mov loc8=0
>> >    mov loc9=0
>> >    cmp.ne pReturn,p0=r0,in1        // if recursion count
>> != 0, we need to do a
>> >br.ret
>> >    mov loc10=0
>> >    mov loc11=0
>> >(pReturn) br.ret.dptk.many b0
>> >#endif /* !CONFIG_ITANIUM */
>> >#   undef pRecurse
>> >#   undef pReturn
>> >    ;;
>> >    alloc r17=ar.pfs,0,0,0,0        // drop current register frame
>> >    ;;
>> >    loadrs
>> >
>> >Thanks,
>> >Anthony
>> >
>> >
>> >>
>> >>Tested by doing many linux kernel compilation in SMP domU (> 100).
>> >>
>> >>Tristan.
>> >
>> >_______________________________________________
>> >Xen-ia64-devel mailing list
>> >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
>> >http://lists.xensource.com/xen-ia64-devel
>>

Attachment: RSE_incomplete_cfm.patch
Description: RSE_incomplete_cfm.patch

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.