|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin
> On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@xxxxxxxx> wrote:
>
> On Thu, 2018-04-12 at 17:38 +0200, Dario Faggioli wrote:
>> On Thu, 2018-04-12 at 15:15 +0200, Dario Faggioli wrote:
>>> On Thu, 2018-04-12 at 14:45 +0200, Olaf Hering wrote:
>>>>
>>>> dies after the first iteration.
>>>>
>>>> BUG_ON(!test_bit(_VPF_migrating, &prev->pause_flags));
>>>>
>>
>> Update. I replaced this:
>>
> Olaf, new patch! :-)
>
> FTR, a previous version of this (where I was not printing
> smp_processor_id() and prev->is_running), produced the output that I am
> attaching below.
>
> Looks to me like, while on the crashing CPU, we are here [*]:
>
> void context_saved(struct vcpu *prev)
> {
> ...
> if ( unlikely(prev->pause_flags & VPF_migrating) )
> {
> unsigned long flags;
> spinlock_t *lock = vcpu_schedule_lock_irqsave(prev, &flags);
>
> if (vcpu_runnable(prev) || !test_bit(_VPF_migrating,
> &prev->pause_flags))
> printk("CPU %u: d%uv%d isr=%u runnbl=%d proc=%d pf=%lu orq=%d
> csf=%u\n",
> smp_processor_id(), prev->domain->domain_id, prev->vcpu_id,
> prev->is_running, vcpu_runnable(prev),
> prev->processor, prev->pause_flags,
> SCHED_OP(vcpu_scheduler(prev), onrunq, prev),
> SCHED_OP(vcpu_scheduler(prev), csflags, prev));
>
> [*]
>
> if ( prev->runstate.state == RUNSTATE_runnable )
> vcpu_runstate_change(prev, RUNSTATE_offline, NOW());
> BUG_ON(curr_on_cpu(prev->processor) == prev);
> SCHED_OP(vcpu_scheduler(prev), sleep, prev);
>
> vcpu_schedule_unlock_irqrestore(lock, flags, prev);
>
> vcpu_migrate(prev);
> }
> }
>
> On the "other CPU", we might be around here [**]:
>
> static void vcpu_migrate(struct vcpu *v)
> {
> ...
> if ( v->is_running ||
> !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )n
I think the bottom line is, for this test to be valid, then at this point
test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but at this point it
doesn’t: If someone else has come by and cleared the bit, done migration, and
woken it up, and then someone *else* set the bit again without taking it off
the runqueue, it may still be on the runqueue.
My series which calls vcpu_sleep_nosync_locked() after setting VPF_migrating
should help with this.
Or, alternately, instead of baking all this implicit knowledge about credit
into the scheduler, we should just implement credit_vcpu_migrate(), and have it
remove it from one runqueue and put it on another.
-George
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |