[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Design and Question: Eliminate Xen (RTDS) scheduler overhead on dedicated CPU



Hi Dario and George,

I'm exploring the design choice of eliminating the Xen scheduler overhead on the dedicated CPU. A dedicated CPU is a PCPU that has a full capacity VCPU pinned onto it and no other VCPUs will run on that PCPU. In other words, when a full-capacity VCPU is dedicated to a dedicated CPU, other VCPUs will never be scheduled onto that dedicated CPU; Because the dedicated CPU will only run the full-capacity VCPU pinned to it, the scheduler does not need to be invoked on that dedicated CPU. Considering the current RTDS scheduler implementation, eliminating the scheduler on the dedicated CPU could save the scheduler overhead (i.e., 1000 - 2000 cycles) per 1ms (roughly speaking) on that dedicated CPU.

This dedicated CPU feature could be useful for the extreme low latency applications in domU, in which several microseconds matters. Because the dedicated VCPU is "always available" on that dedicated CPU (since scheduler overhead is eliminated), the processes inside domU that are running on the dedicated VCPU will avoid the scheduling latency and be more responsive.

The dedicated CPU feature is called Exclusive Affinity feature in VMware's vSphere. I watched a presentation from VMWorld 2013: "Extreme Performance Series Network Speed Ahead" (https://www.youtube.com/watch?v=I-D1a0QaZaU), that discusses this Exclusive Affinity feature:
  ÂAt 34', they list the I/O latency introduced by the hypervisor scheduler.
  ÂAt 35', they introduced the exclusive affinity feature that dedicate a full capacity VCPU to a PCPU so that the scheduler overhead and the context switch overhead are eliminated.
  ÂAt 39':56'', they discussed the side effects of the Exclusive Affinity feature.

What I want to do is to implement the Exclusive Affinity feature in Xen (which I called dedicated CPU) and measure how much scheduler overhead we can save by using this feature.

[Design]
I added a per_cpu field, cpu_d_status, that has four statuses:
SCHED_CPU_D_STATUS_DISABLED
â: the cpu is a non-dedicated CPU, scheduler should be invoked on this cpu;â
SCHED_CPU_D_STATUS_
âINIT: the cpu is set to be dedicated CPU by user, but we haven't migrated the dedicated VCPU to this CPU;
SCHED_CPU_D_STATUS_ENABLED
â: the cpu has been set to dedicated CPU, and it has the dedicated VCPU running on it now; the scheduler should never be invoked on this cpu once it's in this status.
SCHED_CPU_D_STATUS_RESTOREâ: This cpu has been set to a non-dedicated CPU from a dedicated CPU by user, we need to do some house-keeping work (e.g., update the parameters of the dedicated vcpu and re-arm the timers) before mark it as a non-dedicated CPU.

I added two hypercalls to add/remove a dedicated CPU:
Hypercall âXEN_DOMCTL_SCHEDOP_add_dedvcpu pins a dedicated VCPU to a dedicated PCPU;
Hypercall XEN_DOMCTL_SCHEDOP_remove_dedvcpu Ârestore the dedicated PCPU back to the non-dedicated PCPU.

When the hypercall XEN_DOMCTL_SCHEDOP_add_dedvcpu is called, it willÂ
Step 1) Mark the cpu_d_status on the dedicated CPU as SCHED_CPU_D_STATUS_
âINIT; and
      ÂI
f the VCPU is not on the dedicated CPU right now,Â
m
igrate the dedicated VCPU to the corresponding dedicated CPU, .
Step 2) Exclude the dedicated CPU from the scheduling decisions made from other cpus. In other words, the RTDS scheduler in sched_rt.c will never raise SCHEDULE_SOFTIRQ to the dedicated CPU.
Step 3) After the dedicated VCPU is running on the dedicated CPU, mark the dedicated CPU's cpu_d_status as SCHED_CPU_D_STATUS_
âENABLED; andÂ
kill the following timers (sd is the schedule_data on the pcpu and v is the dedicated vcpu on the dedicated pcpu) so that scheduler won't be triggered by timers:
  kill_timer(&sd->s_timer);
  kill_timer(&v->periodic_timer);
  kill_timer(&v->singleshot_timer);
  kill_timer(&v->poll_timer);

When the hypercall XEN_DOMCTL_SCHEDOP_
âremoveâ
_dedvcpu is called
â, I just did the reverse operation to re-init the timers on the pcpu and vcpu and raise the SCHEDULE_SOFTIRQ to call the scheduler on that pcpu.â

â[Problems]â
âThe issue I'm encountering is as follows:â
After I implemented the dedicated cpu feature, I compared the latency of a cpu-intensive task in domU on dedicated CPU (denoted as R_dedcpu) and the latency on non-dedicated CPU (denoted as R_nodedcpu). The expected result should be R_dedcpu < R_nodedcpu since we avoid the scheduler overhead. However, the actual result is R_dedcpu > R_nodedcpu, and R_dedcpu - R_nodedcpu ~= 1000 cycles.

After adding some trace to every function that may raise the SCHEDULE_SOFTIRQ, I found:
When a cpu is not marked as dedicated cpu and the scheduler on it is not disabled, the vcpu_block() is triggered 2896 times duringÂ58280322928ns (i.e., triggered once everyÂ20,124,421ns in average) on the dedicated cpu. However,
When I disable the scheduler on a dedicated cpu, the function vcpu_block(void) @schedule.c will be triggered very frequently; the vcpu_block(void) is triggered 644824 times during 8,918,636,761ns (i.e., once every 13831ns in average) on the dedicated cpu.Â

To sum up the problem I'm facing, the vcpu_block(void) is trigger much faster and more frequently when the scheduler is disabled on a cpu than when the scheduled is enabled.Â

[My question]
I'm very confused at the reason why vcpu_block(void) is triggered so frequently when the scheduler is disabled. The vcpu_block(void) is called by the SCHEDOP_block hypercall, but why this hypercall will be triggered so frequently?Â

It will be great if you know the answer directly. (This is just a pure hope and I cannot really expect it. :-) )
But I really appreciate it if you could give me some directions on how I should figure it out. I grepped vcpu_block(void) and SCHEDOP_block Âin the xen code base, but didn't found much call to them.Â

What confused me most is that Âthe dedicated VCPU should be blocked less frequently instead of more frequently when the scheduler is disabled on the dedicated CPU, because the dedicated VCPU is always running on the CPU now without the hypervisor scheduler's interference.

(I'm not sure if it's a good idea to attach the patch at the end of this email. It may just make this email too long and hard to read. Please let me know if you need it and I will send it in a separate email.)

Thank you very much for your advice, time and help!

Best regards,

Meng
Â
-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.