[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] crash in csched_load_balance after xl vcpu-pin



On Fri, 2018-04-13 at 09:03 +0000, George Dunlap wrote:
> > On Apr 12, 2018, at 6:25 PM, Dario Faggioli <dfaggioli@xxxxxxxx>
> > wrote:
> > 
> > On the "other CPU", we might be around here [**]:
> > 
> > static void vcpu_migrate(struct vcpu *v)
> > {
> >    ...
> >    if ( v->is_running ||
> >         !test_and_clear_bit(_VPF_migrating, &v->pause_flags) )n
> 
> I think the bottom line is, for this test to be valid, then at this
> point test_bit(VPF_migrating) *must* imply !vcpu_on_runqueue(v), but
> at this point it doesn’t: If someone else has come by and cleared the
> bit, done migration, and woken it up, and then someone *else* set the
> bit again without taking it off the runqueue, it may still be on the
> runqueue.
> 
BTW, I suddenly realized that Olaf, in his reproducer, is changing both
hard and soft-affinity.

That means two calls to vcpu_set_affinity(). And here's the race
(beware, it's a bit of a long chain of events! :-P):

 CPU A                                  CPU B
 .                                      .
 schedule(current == v)                 vcpu_set_affinity(v) <-- for hard 
affinity
  prev = current     // == v             .
  schedule_lock(CPU A)                   .
   csched_schedule()                     schedule_lock(CPU A)
   if (runnable(v))  //YES               x
    runq_insert(v)                       x
   return next != v                      x
  schedule_unlock(CPU A)                 x // takes the lock
  context_switch(prev,next)              set_bit(v, VPF_migrating)
   context_saved(prev) // still == v     .
    v->is_running = 0                    schedule_unlock(CPU A)
    SMP_MB                               domain_update_node_affinity(v->d)
    .                                    if (test_bit(v, VPF_migrating) // YES
    .                                     vcpu_sleep_nosync(v)
    .                                      schedule_lock(CPU A)
    .                                      if (!vcpu_runnable(v)) // YES
    .                                       SCHED_OP(v, sleep)
    .                                        if (curr_on_cpu(v, CPU A)) // NO
    .                                         ---
    .                                        else if (__vcpu_on_runq(v)) // YES
    .                                         runq_remove(v)
    .                                       schedule_unlock(CPU A)
    .                                     vcpu_migrate(v)
    .                                      for {
    .                                       schedule_lock(CPU A)
    .                                       SCHED_OP(v, pick_cpu)
    .                                        set_bit(v, CSCHED_MIGRATING)
    .                                        return CPU D
    .                                       pick_called = 1
    .                                       schedule_unlock(CPU A)
    if (test_bit(v, VPF_migrating)) // YES  schedule_lock(CPU A + CPU D)
     vcpu_sleep_nosync(v)                   if (pick_called && ) // YES
      schedule_lock(CPU A)                   break
      x                                    }
      x // CPU B clears VPF_migrating!     if (v->is_running || 
!test_and_clear(v, VPF_migrating)) // NO
      x                                     ---
      x                                    vcpu_move_locked(v)
      x                                     v->processor = CPU D
      x                                    schedule_unlock(CPU A + CPU D)
      x // takes *CPU D* lock              .
      if (!vcpu_runnable(v)) // FALSE, as VPF_migrating is now clear
       ---                                 vcpu_wake(v)
      schedule_unlock(CPU D)                .
      vcpu_migrate(v)                       schedule_lock(CPU D)
        for {                               if (vcpu_runnable(v)) // YES
         schedule_lock(CPU D)                SCHED_OP(v, wake)
         x                                    runq_insert(v) // v is now in CPU 
D's runqueue
         x                                    runq_tickle(v)
         x                                  schedule_unlock(CPU D)
         x // takes the lock                .
         SCHED_OP(v, pick_cpu)              .
          set_bit(v, CSCHED_MIGRATING)      .
          return CPU C                      .
         pick_called = 1                    .
         schedule_unlock(CPU D)             .
         .                                  vcpu_set_affinity(v) <-- for 
soft-affinity
         .                                  schedule_lock(CPU D)
         schedule_lock(CPU D + CPU C)       set_bit(v, VPF_migrating)
         x                                  schedule_unlock(CPU D)
         x // takes the lock                .
         if (pick_called && ...) // YES     .
          break                             .
        }                                   .
        if ( v->is_running || !test_and_clear(v, VPF_migrating)) // FALSE !!
         vcpu_move_locked(v, CPU C)         .
         BUG_ON(__vcpu_on_runq(v))          .

It appears that changing hard-affinity only does not trigger the bug,
which would mean this is correct.

Also, as Olaf just reported, running with your series (and changing
both hard and soft-affinity), also work.

Now we have to decide if take your series and backport it (which is
what I'm leaning toward), or do something else.

But if you don't mind, we'd have to do it on Monday, as I have to run
right now. :-P

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.