[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function



On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote:
> On Tue, 28 Jan 2014, Ian Campbell wrote:
> > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > > Alternatively, as Ian suggested, we could increase the priotiry of 
> > > > > SGIs
> > > > > but I am a bit wary of making that change at RC2.
> > > > 
> > > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > > primitives to work around this issue on a case by case basis. It's just
> > > > too subtle IMHO.
> > > > 
> > > > The IPI and cross CPU calling primitives are basically predicated on
> > > > those IPIs interrupting normal interrupt handlers.
> > > 
> > > The problem is that we don't know if we can context switch properly
> > > nested interrupts.
> > 
> > What do you mean? We don't have to context switch an IPI.
> 
> Sorry, I meant save/restore registers, stack pointer, processor mode,
> etc, for nested interrupts.

Right, we do handle that actually since it is the same code as handles
an IRQ during a hypercall (since a hypercall is just another type of
trap).

> > > Also I would need to think harder whether everything
> > > would work correctly without hitches with multiple SGIs happening
> > > simultaneously (with more than 2 cpus involved).
> > 
> > Since all IPIs would be at the same higher priority only one will be
> > active on each CPU at a time. If you are worried about multiple CPUs
> > then that is already an issue today, just at a lower priority.
> 
> That is correct, but if we moved the on_selected_cpus call out of the
> interrupt handler I don't think the problem could occur.

Sure, I'm not opposed to fixing this issue by making an architectural
improvement to the code which happens to not use on_selected_cpus.

> > AIUI this issue only occurs with "proto device assignment" patches added
> > to 4.4, n which case I think the solution can wait until 4.5 and can be
> > done properly via the IPI priority fix.
>  
> I think this is a pretty significant problem, even if we don't commit a
> fix, we should post a proper patch that we deem acceptable and link it to
> the wiki so that anybody that needs it can find it and be sure that it
> works correctly.

FWIW I have an almost fix to the IPI priority thing which I would intend
to fix in 4.5 regardless of whether this issue needs it or not.

> In my opinion if we go to this length then we might as well commit it
> (if it doesn't touch common code of course), but I'll leave the decision
> up to you and George.
> 
> Given the constraints, the solution I would feel more comfortable with at
> this time is something like the following patch (lightly tested):

I don't think this buys you anything over just fixing on_selected_cpus
to work as it should, unless there is some reason why deferring this
work to a tasklet is the logically correct thing to do and/or a better
designh?

(IOW if you are only doing this to avoid calling on_selected_cpus in
interrupt context then I think this is the wrong fix).

Ian.

> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..b00ca73 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>  
>  static unsigned nr_lrs;
>  
> +static void gic_irq_eoi(void *info)
> +{
> +    int virq = (uintptr_t) info;
> +    GICC[GICC_DIR] = virq;
> +}
> +
> +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
> +static void eoi_action(unsigned long unused)
> +{
> +    struct pending_irq *p = this_cpu(eoi_irq);
> +    ASSERT(p != NULL);
> +
> +    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
> +            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
> +
> +    this_cpu(eoi_irq) = NULL;
> +}
> +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
> +
>  /* The GIC mapping of CPU interfaces does not necessarily match the
>   * logical CPU numbering. Let's use mapping as returned by the GIC
>   * itself
> @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
>  
>  }
>  
> -static void gic_irq_eoi(void *info)
> -{
> -    int virq = (uintptr_t) info;
> -    GICC[GICC_DIR] = virq;
> -}
> -
>  static void maintenance_interrupt(int irq, void *dev_id, struct 
> cpu_user_regs *regs)
>  {
>      int i = 0, virq, pirq = -1;
> @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, 
> struct cpu_user_regs *r
>              if ( cpu == smp_processor_id() )
>                  gic_irq_eoi((void*)(uintptr_t)pirq);
>              else
> -                on_selected_cpus(cpumask_of(cpu),
> -                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> +            {
> +                ASSERT(this_cpu(eoi_irq) == NULL);
> +                this_cpu(eoi_irq) = p;
> +                tasklet_schedule(&eoi_tasklet);
> +            }
>          }
>  
>          i++;



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.