|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Bug report and patch about IRQ freezing after gic_restore_state
On Mon, 20 May 2013, Julien Grall wrote:
> On 05/20/2013 01:41 AM, Jaeyong Yoo wrote:
>
> Hello,
>
> > I'm running xen on Arndale board and if I run both iperf and du command at
> > Dom0,
> > one of IRQ (either SATA or network) suddenly stop occuring anymore.
> > After some investigation, I found out that when context switching at Xen,
> > IRQs in LR (about to be delivered to Doms) could be lost and never occur
> > anymore.
> > Here goes function call sequence that this problem occurs:
> > (in context switching)
> > - schedule_tail
> > - ctxt_switch_from
> > - local_irq_enable
> > - // after this part, some IRQ can occur and could be directly
> > written to LR
> > - ctxt_switch_to
> > - ... (some more functions)
> > - // before the above IRQ is delivered to Dom (and maintenance
> > IRQ not called),
> > // gic_restore_state can be called
> > - gic_restore_state /* when restoring gic state, the above IRQ
> > * (written to LR) is overwritten
> > * to the previous values, and
> > somehow,
> > * the corresponding IRQ never occur
> > again */
> >
> > I made the following patch (i.e., enable local irq after gic_restore_state)
> > for preventing the above problem.
>
> Thanks for the patch, I was looking with a similar error on the Arndale
> Board for a couple of day.
Indeed, thanks for the analysis of the bug and the patch!
It is a particularly difficult bug to track down because it can only
happen if an irq arrives after ctxt_switch_from and before
ctxt_switch_to, and the irq is for the next vcpu to be scheduled on the
pcpu (otherwise the v == current check at the beginning of
gic_set_guest_irq would catch that).
Rather than extending the check in gic_set_guest_irq, I think it is wise
to run ctxt_switch_to with interrupts disabled.
> > Signed-off-by: Jaeyong Yoo <jaeyong.yoo@xxxxxxxxxxx>
> > ---
> > xen/arch/arm/domain.c | 4 ++--
> > xen/arch/arm/gic.c | 4 ++--
> > 2 files changed, 4 insertions(+), 4 deletions(-)
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index f71b582..2c3b132 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -141,6 +141,8 @@ static void ctxt_switch_to(struct vcpu *n)
> > /* VGIC */
> > gic_restore_state(n);
> > + local_irq_enable();
> > +
>
> Could you move the local_irq_enable right after ctxt_switch_to?
Right, good idea.
> > /* XXX VFP */
> > /* XXX MPU */
> > @@ -215,8 +217,6 @@ static void schedule_tail(struct vcpu *prev)
> > {
> > ctxt_switch_from(prev);
> > - local_irq_enable();
> > -
> > /* TODO
> > update_runstate_area(current);
> > */
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index d4f0a43..8186ad8 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -81,11 +81,11 @@ void gic_restore_state(struct vcpu *v)
> > if ( is_idle_vcpu(v) )
> > return;
> > - spin_lock_irq(&gic.lock);
> > + spin_lock(&gic.lock);
> > this_cpu(lr_mask) = v->arch.lr_mask;
> > for ( i=0; i<nr_lrs; i++)
> > GICH[GICH_LR + i] = v->arch.gic_lr[i];
> > - spin_unlock_irq(&gic.lock);
> > + spin_unlock(&gic.lock);
>
> As the IRQ is disabled and the GICH registers can only be modified by
> the current physical CPU, I think you can remove the spin_{,un}lock and
> replace it by a dsb.
Yes, we can remove the spin_lock but I don't think we need a dsb
there. See the presence of an isb() two lines below.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |