[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.5 random freeze question



On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
<stefano.stabellini@xxxxxxxxxxxxx> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> <andrii.tseglytskyi@xxxxxxxxxxxxxxx> wrote:
>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> > <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> Hi Stefano,
>> >>>
>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >>> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> >> Hi Stefano,
>> >>> >>
>> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && 
>> >>> >> > > lr_all_full() )
>> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >>> >> > >      else
>> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >>> >> > >
>> >>> >> > >  }
>> >>> >> >
>> >>> >> > Yes, exactly
>> >>> >>
>> >>> >> I tried, hang still occurs with this change
>> >>> >
>> >>> > We need to figure out why during the hang you still have all the LRs
>> >>> > busy even if you are getting maintenance interrupts that should cause
>> >>> > them to be cleared.
>> >>> >
>> >>>
>> >>> I see that I have free LRs during maintenance interrupt
>> >>>
>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >>> (XEN)    HW_LR[0]=9a015856
>> >>> (XEN)    HW_LR[1]=0
>> >>> (XEN)    HW_LR[2]=0
>> >>> (XEN)    HW_LR[3]=0
>> >>> (XEN) Inflight irq=86 lr=0
>> >>> (XEN) Inflight irq=2 lr=255
>> >>> (XEN) Pending irq=2
>> >>>
>> >>> But I see that after I got hang - maintenance interrupts are generated
>> >>> continuously. Platform continues printing the same log till reboot.
>> >>
>> >> Exactly the same log? As in the one above you just pasted?
>> >> That is very very suspicious.
>> >
>> > Yes exactly the same log. And looks like it means that LRs are flushed
>> > correctly.
>> >
>> >>
>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>> >> new maintenance interrupt immediately causing an infinite loop.
>> >>
>> >
>> > Yes, this is what I'm thinking about. Taking in account all collected
>> > debug info it looks like once LRs are overloaded with SGIs -
>> > maintenance interrupt occurs.
>> > And then it is not handled properly, and occurs again and again - so
>> > platform hangs inside its handler.
>> >
>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> hypervisor entry.
>> >>
>> >
>> > Now trying.
>> >
>> >>
>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> index 4d2a92d..6ae8dc4 100644
>> >> --- a/xen/arch/arm/gic.c
>> >> +++ b/xen/arch/arm/gic.c
>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >>      if ( is_idle_vcpu(v) )
>> >>          return;
>> >>
>> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> +
>> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >>
>> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >>
>> >>      gic_restore_pending_irqs(current);
>> >>
>> >> -
>> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> -    else
>> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> -
>> >>  }
>> >>
>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum 
>> >> gic_sgi sgi)
>> >
>>
>> Heh - I don't see hangs with this patch :) But also I see that
>> maintenance interrupt doesn't occur (and no hang as result)
>> Stefano - is this expected?
>
> No maintenance interrupts at all? That's strange. You should be
> receiving them when LRs are full and you still have interrupts pending
> to be added to them.
>
> You could add another printk here to see if you should be receiving
> them:
>
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> +    {
> +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> -    else
> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> -
> +    }
>  }
>

Requested properly:

(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt

But does not occur


>
>> >
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.