|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] xen/arm: Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279
On Tue, May 3, 2016 at 2:03 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi Dario and George,
>
> What is the status of this patch? It would be nice to have it for Xen 4.7 to
> avoid unwanted crash when secondary CPUs fails to come online.
Wei, can you put this on your release blockers list? (Julien, I take
it this should be a blocker, right?)
-George
>
> Regards,
>
> On 26/04/16 18:49, Dario Faggioli wrote:
>>
>> On Tue, 2016-04-26 at 15:25 +0100, Julien Grall wrote:
>>>
>>> Hi Dario,
>>>
>> Hi,
>>
>>> A couple of people have been reported Xen crash on the ARM64
>>> Foundation Model [1] with recent unstable.
>>>
>> Ok, thanks for reporting.
>>
>>> The crash seems to happen when Xen fails to bring up secondary CPUs
>>> (see stack trace below).
>>>
>> Ah... I see.
>>
>>> From my understanding, csched_free_pdata is trying to kill the
>>> timer spc->ticker. However the status of this timer is
>>> TIMER_STATUS_invalid.
>>>
>>> This is because csched_init_pdata has set a deadline for the
>>> timer (set_timer) and the softirq to schedule the timer has
>>> not yet happen (indeed Xen is still in early boot).
>>>
>> Yes, this is sort of what happens (only slightly different, see the
>> changelog of the atached patch patch).
>>
>>
>>> I am not sure how to fix this issue. How will you recommend
>>> to fix it?
>>>
>> Yeah, well, doing it cleanly includes a slight change in the scheduler
>> hooks API, IMO... and we indeed should do it cleanly :-))
>>
>> George, what do you think?
>>
>> Honestly, this is similar to what I was thinking to do already (I mean,
>> having an deinit_pdata hook, "symmetric" with the init_pdata one), when
>> working on that series, because I do think it's cleaner... then, I
>> abandoned the idea, as it looked to not be necessary... But apparently
>> it may actually be! :-)
>>
>> Let me know, and I'll resubmit the patch properly (together with
>> another bugfix I have in my queue).
>>
>> Dario
>> ---
>> commit eca4d65fb67a71c0f6563aafbfdd68e566c53c32
>> Author: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>> Date: Tue Apr 26 17:42:36 2016 +0200
>>
>> xen: sched: fix killing an uninitialized timer in free_pdata.
>>
>> commit 64269d9365 "sched: implement .init_pdata in Credit,
>> Credit2 and RTDS" helped fixing Credit2 runqueues, and
>> the races we had in switching scheduler for pCPUs, but
>> introduced another issue. In fact, if CPU bringup fails
>> during __cpu_up() (and, more precisely, after CPU_UP_PREPARE,
>> but before CPU_STARTING) the CPU_UP_CANCELED notifier
>> would be executed, which calls the free_pdata hook.
>>
>> Such hook does two things: (1) undo the initialization
>> done inside the init_pdata hook; (2) free the memory
>> allocated by the alloc_pdata hook.
>>
>> However, in the failure path just described, it is possible
>> that only alloc_pdata has really been called, which is
>> potentially and issue (depending on what actually happens
>> inside the implementation of free_pdata).
>>
>> In fact, for Credit1 (the only scheduler that actually
>> allocates per-pCPU data), this result in calling kill_timer()
>> on a timer that had not yet been initialized, which causes
>> the following:
>>
>> (XEN) Xen call trace:
>> (XEN) [<000000000022e304>] timer.c#active_timer+0x8/0x24 (PC)
>> (XEN) [<000000000022f624>] kill_timer+0x108/0x2e0 (LR)
>> (XEN) [<00000000002208c0>]
>> sched_credit.c#csched_free_pdata+0xd8/0x114
>> (XEN) [<0000000000227a18>]
>> schedule.c#cpu_schedule_callback+0xc0/0x12c
>> (XEN) [<0000000000219944>] notifier_call_chain+0x78/0x9c
>> (XEN) [<00000000002015fc>] cpu_up+0x104/0x130
>> (XEN) [<000000000028f7c0>] start_xen+0xaf8/0xce0
>> (XEN) [<00000000810021d8>] 00000000810021d8
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at
>> timer.c:279
>> (XEN) ****************************************
>>
>> Solve this by making the scheduler hooks API symmetric again,
>> i.e., by adding an deinit_pdata hook and making it responsible
>> of undoing what init_pdata did, rather than asking to free_pdata
>> to do everything.
>>
>> This is cleaner and, in the case at hand, makes it possible to
>> only call free_pdata, which is the right thing to do, as only
>> allocation and no initialization was performed.
>>
>> Reported-by: Julien Grall <julien.grall@xxxxxxx>
>> Signed-off-by: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
>> ---
>> Cc: George Dunlap <george.dunlap@xxxxxxxxxx>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>> Cc: Varun.Swara@xxxxxxx
>> Cc: Steve Capper <Steve.Capper@xxxxxxx>
>>
>> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
>> index bc36837..0a6a1b4 100644
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -482,15 +482,25 @@ static inline void __runq_tickle(struct csched_vcpu
>> *new)
>> }
>>
>> static void
>> -csched_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched_free_pdata(const struct scheduler *ops, void *pcpu)
>> {
>> - struct csched_private *prv = CSCHED_PRIV(ops);
>> struct csched_pcpu *spc = pcpu;
>> - unsigned long flags;
>>
>> if ( spc == NULL )
>> return;
>>
>> + xfree(spc);
>> +}
>> +
>> +static void
>> +csched_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +{
>> + struct csched_private *prv = CSCHED_PRIV(ops);
>> + struct csched_pcpu *spc = pcpu;
>> + unsigned long flags;
>> +
>> + ASSERT(spc != NULL);
>> +
>> spin_lock_irqsave(&prv->lock, flags);
>>
>> prv->credit -= prv->credits_per_tslice;
>> @@ -507,8 +517,6 @@ csched_free_pdata(const struct scheduler *ops, void
>> *pcpu, int cpu)
>> kill_timer(&prv->master_ticker);
>>
>> spin_unlock_irqrestore(&prv->lock, flags);
>> -
>> - xfree(spc);
>> }
>>
>> static void *
>> @@ -2091,6 +2099,7 @@ static const struct scheduler sched_credit_def = {
>> .free_vdata = csched_free_vdata,
>> .alloc_pdata = csched_alloc_pdata,
>> .init_pdata = csched_init_pdata,
>> + .deinit_pdata = csched_deinit_pdata,
>> .free_pdata = csched_free_pdata,
>> .switch_sched = csched_switch_sched,
>> .alloc_domdata = csched_alloc_domdata,
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index 46b9279..f4c37b4 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -2261,13 +2261,15 @@ csched2_switch_sched(struct scheduler *new_ops,
>> unsigned int cpu,
>> }
>>
>> static void
>> -csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> +csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>> {
>> unsigned long flags;
>> struct csched2_private *prv = CSCHED2_PRIV(ops);
>> struct csched2_runqueue_data *rqd;
>> int rqi;
>>
>> + ASSERT(pcpu == NULL);
>> +
>> spin_lock_irqsave(&prv->lock, flags);
>>
>> ASSERT(cpumask_test_cpu(cpu, &prv->initialized));
>> @@ -2387,7 +2389,7 @@ static const struct scheduler sched_credit2_def = {
>> .alloc_vdata = csched2_alloc_vdata,
>> .free_vdata = csched2_free_vdata,
>> .init_pdata = csched2_init_pdata,
>> - .free_pdata = csched2_free_pdata,
>> + .deinit_pdata = csched2_deinit_pdata,
>> .switch_sched = csched2_switch_sched,
>> .alloc_domdata = csched2_alloc_domdata,
>> .free_domdata = csched2_free_domdata,
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 5546999..1a64521 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1529,7 +1529,7 @@ static void cpu_schedule_down(unsigned int cpu)
>> struct schedule_data *sd = &per_cpu(schedule_data, cpu);
>> struct scheduler *sched = per_cpu(scheduler, cpu);
>>
>> - SCHED_OP(sched, free_pdata, sd->sched_priv, cpu);
>> + SCHED_OP(sched, free_pdata, sd->sched_priv);
>> SCHED_OP(sched, free_vdata, idle_vcpu[cpu]->sched_priv);
>>
>> idle_vcpu[cpu]->sched_priv = NULL;
>> @@ -1554,8 +1554,10 @@ static int cpu_schedule_callback(
>> case CPU_UP_PREPARE:
>> rc = cpu_schedule_up(cpu);
>> break;
>> - case CPU_UP_CANCELED:
>> case CPU_DEAD:
>> + SCHED_OP(sched, deinit_pdata, sd->sched_priv, cpu);
>> + /* Fallthrough */
>> + case CPU_UP_CANCELED:
>> cpu_schedule_down(cpu);
>> break;
>> default:
>> @@ -1684,7 +1686,7 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>> vpriv = SCHED_OP(new_ops, alloc_vdata, idle,
>> idle->domain->sched_priv);
>> if ( vpriv == NULL )
>> {
>> - SCHED_OP(new_ops, free_pdata, ppriv, cpu);
>> + SCHED_OP(new_ops, free_pdata, ppriv);
>> return -ENOMEM;
>> }
>>
>> @@ -1714,7 +1716,8 @@ int schedule_cpu_switch(unsigned int cpu, struct
>> cpupool *c)
>> SCHED_OP(new_ops, tick_resume, cpu);
>>
>> SCHED_OP(old_ops, free_vdata, vpriv_old);
>> - SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);
>> + SCHED_OP(old_ops, deinit_pdata, ppriv_old, cpu);
>> + SCHED_OP(old_ops, free_pdata, ppriv_old);
>>
>> out:
>> per_cpu(cpupool, cpu) = c;
>> diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
>> index 1db7c8d..240f66c 100644
>> --- a/xen/include/xen/sched-if.h
>> +++ b/xen/include/xen/sched-if.h
>> @@ -135,9 +135,10 @@ struct scheduler {
>> void (*free_vdata) (const struct scheduler *, void *);
>> void * (*alloc_vdata) (const struct scheduler *, struct
>> vcpu *,
>> void *);
>> - void (*free_pdata) (const struct scheduler *, void *,
>> int);
>> + void (*free_pdata) (const struct scheduler *, void *);
>> void * (*alloc_pdata) (const struct scheduler *, int);
>> void (*init_pdata) (const struct scheduler *, void *,
>> int);
>> + void (*deinit_pdata) (const struct scheduler *, void *,
>> int);
>> void (*free_domdata) (const struct scheduler *, void *);
>> void * (*alloc_domdata) (const struct scheduler *, struct
>> domain *);
>>
>>
>
> --
> Julien Grall
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |