[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v1 2/2] xen/arm: Fix deadlock in on_selected_cpus function
On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote: > On Tue, 28 Jan 2014, Ian Campbell wrote: > > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote: > > > On Tue, 28 Jan 2014, Ian Campbell wrote: > > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote: > > > > > Alternatively, as Ian suggested, we could increase the priotiry of > > > > > SGIs > > > > > but I am a bit wary of making that change at RC2. > > > > > > > > I'm leaning the other way -- I'm wary of open coding magic locking > > > > primitives to work around this issue on a case by case basis. It's just > > > > too subtle IMHO. > > > > > > > > The IPI and cross CPU calling primitives are basically predicated on > > > > those IPIs interrupting normal interrupt handlers. > > > > > > The problem is that we don't know if we can context switch properly > > > nested interrupts. > > > > What do you mean? We don't have to context switch an IPI. > > Sorry, I meant save/restore registers, stack pointer, processor mode, > etc, for nested interrupts. Right, we do handle that actually since it is the same code as handles an IRQ during a hypercall (since a hypercall is just another type of trap). > > > Also I would need to think harder whether everything > > > would work correctly without hitches with multiple SGIs happening > > > simultaneously (with more than 2 cpus involved). > > > > Since all IPIs would be at the same higher priority only one will be > > active on each CPU at a time. If you are worried about multiple CPUs > > then that is already an issue today, just at a lower priority. > > That is correct, but if we moved the on_selected_cpus call out of the > interrupt handler I don't think the problem could occur. Sure, I'm not opposed to fixing this issue by making an architectural improvement to the code which happens to not use on_selected_cpus. > > AIUI this issue only occurs with "proto device assignment" patches added > > to 4.4, n which case I think the solution can wait until 4.5 and can be > > done properly via the IPI priority fix. > > I think this is a pretty significant problem, even if we don't commit a > fix, we should post a proper patch that we deem acceptable and link it to > the wiki so that anybody that needs it can find it and be sure that it > works correctly. FWIW I have an almost fix to the IPI priority thing which I would intend to fix in 4.5 regardless of whether this issue needs it or not. > In my opinion if we go to this length then we might as well commit it > (if it doesn't touch common code of course), but I'll leave the decision > up to you and George. > > Given the constraints, the solution I would feel more comfortable with at > this time is something like the following patch (lightly tested): I don't think this buys you anything over just fixing on_selected_cpus to work as it should, unless there is some reason why deferring this work to a tasklet is the logically correct thing to do and/or a better designh? (IOW if you are only doing this to avoid calling on_selected_cpus in interrupt context then I think this is the wrong fix). Ian. > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index e6257a7..b00ca73 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask); > > static unsigned nr_lrs; > > +static void gic_irq_eoi(void *info) > +{ > + int virq = (uintptr_t) info; > + GICC[GICC_DIR] = virq; > +} > + > +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq); > +static void eoi_action(unsigned long unused) > +{ > + struct pending_irq *p = this_cpu(eoi_irq); > + ASSERT(p != NULL); > + > + on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu), > + gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0); > + > + this_cpu(eoi_irq) = NULL; > +} > +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0); > + > /* The GIC mapping of CPU interfaces does not necessarily match the > * logical CPU numbering. Let's use mapping as returned by the GIC > * itself > @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d) > > } > > -static void gic_irq_eoi(void *info) > -{ > - int virq = (uintptr_t) info; > - GICC[GICC_DIR] = virq; > -} > - > static void maintenance_interrupt(int irq, void *dev_id, struct > cpu_user_regs *regs) > { > int i = 0, virq, pirq = -1; > @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, > struct cpu_user_regs *r > if ( cpu == smp_processor_id() ) > gic_irq_eoi((void*)(uintptr_t)pirq); > else > - on_selected_cpus(cpumask_of(cpu), > - gic_irq_eoi, (void*)(uintptr_t)pirq, 0); > + { > + ASSERT(this_cpu(eoi_irq) == NULL); > + this_cpu(eoi_irq) = p; > + tasklet_schedule(&eoi_tasklet); > + } > } > > i++; _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |