[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] multiple runqueues in credit2



Hi George,

Bot Justin and I were able to reproduce a situation where, on a 2 socket
system (see below), credit2 was activating only 1 runqueue.

That seemed in line with some comment in the sched_credit2.c source
file, such as this one:

 /*
  * Design:
  *
  * VMs "burn" credits based on their weight; higher weight means
  * credits burn more slowly.  The highest weight vcpu burns credits at
  * a rate of 1 credit per nanosecond.  Others burn proportionally
  * more.
  *
  * vcpus are inserted into the runqueue by credit order.
  *
  * Credits are "reset" when the next vcpu in the runqueue is less than
  * or equal to zero.  At that point, everyone's credits are "clipped"
  * to a small value, and a fixed credit is added to everyone.
  *
  * The plan is for all cores that share an L2 will share the same
  * runqueue.  At the moment, there is one global runqueue for all
  * cores.
  */

However, I remembered it different, and looking at init_pcpu() I spotted
this:

    /* Figure out which runqueue to put it in */
    /* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to 
runqueue 0. */
    if ( cpu == 0 )
        rqi = 0;
    else
        rqi = cpu_to_socket(cpu);

which looks to me like the code for having one runqueue per socket _is_
there already! That means two things: (1) that comment above is
wrong :-) but, at the same time, (2) this code right here is not
working!

Justin also noticed that init_pcpu() was actually being called twice,
for all pcpus except #0, triggering the following warning:

    printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);

I did some investigation, in the following system:

cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       1        0        0
  2:       2        0        0
  3:       3        0        0
  4:       0        1        1
  5:       1        1        1
  6:       2        1        1
  7:       3        1        1

So, what I expect is, for instance, cpu 1 to be on runqueue 0, and cpu 5
on runqueue 1.

The problem is here:

  static void *
  csched_alloc_pdata(const struct scheduler *ops, int cpu)
  {
      /* Check to see if the cpu is online yet */
      /* Note: cpu 0 doesn't get a STARTING callback */
      if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
          init_pcpu(ops, cpu);
      else
          printk("%s: cpu %d not online yet, deferring initializatgion\n",
                 __func__, cpu);

      return (void *)1;
  }

In fact, this is meant to actually call init_pcpu() *only* on pcpu 0
(which don't get the STARTING notification) and on those pcpus that are
already onlined. Unfortunately, "cpu_to_socket(cpu) >= 0" is not (any
longer?) a valid way to check the latter, and in fact init_pcpus() is
always called, even for pcpus that are not identified and inited yet.
That, with cpu_to_socket() returning constantly 0, means all the pcpus
end up in the sole and only runqueue 0.

I verified that removing the right side of the || makes things work (I
enabled some debug output and added some more myself):

(XEN) csched_alloc_pdata for cpu 0 on socket 0
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
...
(XEN) CPU 1 APIC 1 -> Node 0
(XEN) csched_vcpu_insert: Inserting d32767v1
(XEN) csched_alloc_pdata for cpu 1 on socket 0
(XEN) csched_alloc_pdata: cpu 1 not online yet, deferring initializatgion
(XEN) Booting processor 1/1 eip 8e000
(XEN) Initializing CPU#1
(XEN) CPU: L1 I cache 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 512K (64 bytes/line)
(XEN) CPU 1(4) -> Processor 0, Core 1
(XEN) CPU1: AMD Quad-Core AMD Opteron(tm) Processor 2376 stepping 02
(XEN) csched_cpu_starting on cpu 1
(XEN) Adding cpu 1 to runqueue 0
...
(XEN) CPU 5 APIC 5 -> Node 1
(XEN) microcode: CPU4 collect_cpu_info: patch_id=0x1000086
(XEN) csched_vcpu_insert: Inserting d32767v5
(XEN) csched_alloc_pdata for cpu 5 on socket 0
(XEN) csched_alloc_pdata: cpu 5 not online yet, deferring initializatgion
(XEN) Booting processor 5/5 eip 8e000
(XEN) Initializing CPU#5
(XEN) CPU: L1 I cache 64K (64 bytes/line), D cache 64K (64 bytes/line)
(XEN) CPU: L2 Cache: 512K (64 bytes/line)
(XEN) CPU 5(4) -> Processor 1, Core 1
(XEN) CPU5: AMD Quad-Core AMD Opteron(tm) Processor 2376 stepping 02
(XEN) csched_cpu_starting on cpu 5
(XEN) Adding cpu 5 to runqueue 1
...

Now the question is, for fixing this, would it be preferable to do
something along this line (i.e., removing the right side of the || and,
in general, make csched_alloc_pdata() a pcpu 0 only thing)? Or, perhaps,
should I look into a way to properly initialize the cpu_data array, so
that cpu_to_socket() actually returns something '< 0' for pcpus not yet
onlined and identified?

The former is surely quicker, but I think I like the latter better
(provided it's doable). What do you think?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.