[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-ia64-devel] PATCH: slightly improve stability
Is there any reason why the Anthony's patch was dropped? I think this patch is also needed. I got the following message. I guess the cause is as follows But this happens very rarely... linux-2.6-xen-sparse/arch/ia64/xen/xenentry.S Here psr.i and psr.ic is off rse_clear_invalid: ... (pRecurse) br.call.dptk.few b0=rse_clear_invalid ;; mov loc8=0 <<<<<<<<<<<<<<< 0xa0000001000687c0 please notice ifs = 8000000000000000 mov loc9=0 1. Right before mov loc8=0, vcpu is switched to another cpu. 2. While the vcpu is waiting for cpu, the tlb entry which backs the rse stack is purged. 3. The vcpu gets cpu again, tlb miss fault occurs with isr.ir = 1. 4. xen ia64_do_page_fault() calls handle_lazy_cover() which sets cr.ifs = 0. 5. xen returns cpu execution to the guest. 6. mov loc8 = 0 is executed with cfm = 0. Illigal operation fault is raised 7. priv_handle_op() is called. but it fails to emulate because mov loc8 = 0 isn't privileged op. 8. ia64_handle_privop() calls panic_domain(). Thanks. (XEN) priv_emulate: priv_handle_op fails, isr=0x0 (XEN) $$$$$ PANIC in domain 0 (k6=0xf0000000041c8000): psr.ic off, delivering fault=5400,ipsr=101208026030,iip=a0000001000687c0,ifa=2000000000144f60,isr=0,PSCB.iip=2000000000144f60 (XEN) (XEN) Call Trace: (XEN) [<f00000000409e030>] show_stack+0x80/0xa0 (XEN) sp=f0000000041cfb80 bsp=f0000000041c8e48 (XEN) [<f00000000407d780>] panic_domain+0xf0/0x1d0 (XEN) sp=f0000000041cfd50 bsp=f0000000041c8de0 (XEN) [<f0000000040707b0>] check_bad_nested_interruption+0x110/0x120 (XEN) sp=f0000000041cfe00 bsp=f0000000041c8db0 (XEN) [<f000000004070a20>] reflect_interruption+0x260/0x460 (XEN) sp=f0000000041cfe00 bsp=f0000000041c8d60 (XEN) [<f00000000409cba0>] ia64_leave_kernel+0x0/0x310 (XEN) sp=f0000000041cfe00 bsp=f0000000041c8d60 (XEN) [<a0000001000687c0>] ??? (XEN) sp=f0000000041d0000 bsp=f0000000041c8d60 (XEN) d 0xf000000007ffb208 domid 0 (XEN) vcpu 0xf0000000041c8000 vcpu 3 (XEN) (XEN) CPU 3 (XEN) psr : 0000101208026030 ifs : 8000000000000000 ip : [<a0000001000687c0>] (XEN) ip is at ??? (XEN) unat: 0000000000000000 pfs : 8000000000000710 rsc : 0000000000580008 (XEN) rnat: 0000000000000000 bsps: e00000000b328fe8 pr : 000000000559a7a9 (XEN) ldrs: 0000000000600000 ccv : 0000000000000000 fpsr: 0009804c0270033f (XEN) csd : 0000000000000000 ssd : 0000000000000000 (XEN) b0 : a0000001000687c0 b6 : 2000000000144f60 b7 : a000000000010640 (XEN) f6 : 1003e0000000000000000 f7 : 000000000000000000000 (XEN) f8 : 100198ff97fe000000000 f9 : 1003effffffffffffff05 (XEN) f10 : 1003e00000000000000b0 f11 : 1001192d7b6702eedd629 (XEN) r1 : 200000000021c278 r2 : c000000000000309 r3 : 60000fffffc5e7e0 (XEN) r8 : 200000000003eff0 r9 : 0000000000000001 r10 : 0000000000000000 (XEN) r11 : c000000000000593 r12 : 60000fffffc5e7e0 r13 : 200000000048cac0 (XEN) r14 : 2000000000144f60 r15 : 2000000000217320 r16 : e00000000b328fc8 (XEN) r17 : 00000000000002b0 r18 : 0000000000000058 r19 : 0000000000580000 (XEN) r20 : 0009804c8a70033f r21 : 2000000000109c70 r22 : 0000000000000000 (XEN) r23 : 60000fff7fffc128 r24 : 0000000000000000 r25 : 0000000000000000 (XEN) r26 : c00000000000048b r27 : 000000000000000f r28 : 2000000000144f60 (XEN) r29 : 0000001308126030 r30 : 8000000000000002 r31 : 000000000559a361 (XEN) domain_crash_sync called from xenmisc.c:194 (XEN) Domain 0 (vcpu#3) crashed on cpu#3: (XEN) d 0xf000000007ffb208 domid 0 (XEN) vcpu 0xf0000000041c8000 vcpu 3 On Fri, Apr 28, 2006 at 11:18:45AM +0800, Xu, Anthony wrote: > Hi Tristan, > Could you please check whether this patch address RSE issue? > > Yes, Intel QA team is doing the test in the meantime. > > > Thanks, > -Anthony > > >-----Original Message----- > >From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Xu, Anthony > >Sent: 2006?4?28? 9:48 > >To: Tristan Gingold; xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP > >Labs Fort Collins); Alex Williamson > >Subject: RE: [Xen-ia64-devel] PATCH: slightly improve stability > > > >>From: xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx > >>[mailto:xen-ia64-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Tristan > >>Gingold > >>Sent: 2006?4?27? 23:14 > >>To: xen-ia64-devel@xxxxxxxxxxxxxxxxxxx; Magenheimer, Dan (HP Labs Fort > >>Collins); Alex Williamson > >>Subject: [Xen-ia64-devel] PATCH: slightly improve stability > >> > >>Hi, > >> > >>as reported earlier, this patch seems to improve stability: crashes are at > >>least more coherent and maybe less frequent. > >> > >>RSE handling seems to have a bug: crahes are now due to either a bad value > >>in > >>a stacked register or a use of an invalid stacked register (although cfm > >>seems correct in gdb!) > > > >I'm looking at this too, > >Yes there is a bug about handle_lazy_cover. > > > >void ia64_do_page_fault (unsigned long address, unsigned long isr, struct > >pt_regs *regs, unsigned long itir) > >{ > > unsigned long iip = regs->cr_iip, iha; > > // FIXME should validate address here > > unsigned long pteval; > > unsigned long is_data = !((isr >> IA64_ISR_X_BIT) & 1UL); > > IA64FAULT fault; > > > > if ((isr & IA64_ISR_IR) && handle_lazy_cover(current, isr, regs)) > > return; > > > >This code sequence is intended to handle following scenario. > > > >1. Guest executes br.ret, this may cause mandatory RSE load, and this load > >may > >cause TLB miss. > >2. VMM gets control, but VMM can't handle this TLB miss itself, then VMM > >injects > >TLB miss to Guest TLB miss handler, when VMM executing "rfi" to jump to Guest > >TLB miss handler, this TLB miss happens again. > >3. At this time, interrupt_collection_enabled is 0, so handle_lazy_cover > >executes "cover" on behalf of Guest, and return to Guest TLB miss handler > >again, > >this time there is no TLB miss. > > > > > >Following code sequence is in ia64_leave_kernel path with psr.ic and psr.i > >off. > >When br.ret.dptk.many b0 is executed, there may be a mandatory load, thus > >There may be a tlb miss, according to above description handle_lazy_cover > >executes "cover" on behalf of Guest and return to Guest, this is no correct > >in this scenario. > > > >I didn't find an easy way to fix this bug. > > > > > > mov loc6=0 > > mov loc7=0 > >(pRecurse) br.call.dptk.few b0=rse_clear_invalid > > ;; > > mov loc8=0 > > mov loc9=0 > > cmp.ne pReturn,p0=r0,in1 // if recursion count != 0, we need to > > do a > >br.ret > > mov loc10=0 > > mov loc11=0 > >(pReturn) br.ret.dptk.many b0 > >#endif /* !CONFIG_ITANIUM */ > ># undef pRecurse > ># undef pReturn > > ;; > > alloc r17=ar.pfs,0,0,0,0 // drop current register frame > > ;; > > loadrs > > > >Thanks, > >Anthony > > > > > >> > >>Tested by doing many linux kernel compilation in SMP domU (> 100). > >> > >>Tristan. > > > >_______________________________________________ > >Xen-ia64-devel mailing list > >Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > >http://lists.xensource.com/xen-ia64-devel > _______________________________________________ > Xen-ia64-devel mailing list > Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-ia64-devel -- yamahata Attachment:
10484:076ded8c04ea.patch _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |