[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Performance evaluation and Questions: Eliminating Xen (RTDS) scheduler overhead on dedicated CPU



Hi Dario,

2015-04-23 8:48 GMT-04:00 Dario Faggioli <dario.faggioli@xxxxxxxxxx>:
> Hey, guys,
>
> I know, I know, I'm soooo much late to the party! Sorry, I got trapped
> into a thing that I really needed to finish... :-/
>
> I've got no intention to resurrect this old thread, just wanted to
> pointed out a few things.

What you pointed out is very interesting! I will have a look at them
and then show the numbers of the measurements.
(I will also need/have to redo some experiments to see how the TSC may
affect the results, which I didn't realize before. :-( )

>
> On Wed, 2015-04-08 at 16:52 -0400, Meng Xu wrote:
>> 2015-04-08 5:13 GMT-04:00 George Dunlap <george.dunlap@xxxxxxxxxxxxx>:
>> > On 04/07/2015 09:25 PM, Meng Xu wrote:
>> >> Hi George, Dario and Konrad,
>> >>
>> >> I finished a prototype of the RTDS scheduler with the dedicated CPU
>> >> feature and did some quick evaluation on this feature. Right now, I
>> >> need to refactor the code (because it is kind of messy when I was
>> >> exploring different approaches :() and will send out the clean patch
>> >> later (this week or next week). But the design follows our discussion
>> >> at 
>> >> http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02854.html.
>> >>
> The idea of 'dedicated CPU' makes sense. It's also always been quite
> common, in the Linux community, to see it as a real-time oriented
> feature. I personally don't agree much, as real-time is about
> determinism and dedicating a CPU to a task (in our case, that would mean
> dedicating a pCPU to a vCPU and then, in the guest, that vCPU to a task)
> does not automatically gives you determinism.
>
> Sure, it cuts off some overhead and some sources of unpredictable
> behavior (e.g., scheduler code), but not all of them (what about, for
> instance, caches shared with non-isolated pCPUs).

I'm working on eliminating the shared cache interference among guest
domains on non-isolated pCPUs. I have a prototype that can statically
partition the LLC into equal size partitions and assign it to guest
domains based on domain's configuration file. I'm running some
evaluations to show the effectiveness of the cache partitioning
approach (via page coloring) and will send you guys a slide about this
soon. :-)

>> > And in any case, as you say, it looks like the source of the overhead is
>> > the very frequent invocation of the RTDS scheduler.
>>
> Exactly! I'd put it this way: there are more urgent and more useful
> optimization, in general, but especially in RTDS, to be done before
> thinking at something like this.

Right.

>> > What I was expecting you to test, for the RTDS scheduler, was the
>> > wake-up latency.  Have you looked at that at all?
>>
> Indeed, that would be really interesting.
>
>> Ah, I didn't realize this... Do you have any concrete evaluation plan for 
>> this?
>> In my mind, I can issue hypercalls in domU to wake-up and sleep a vcpu
>> and measure how long it takes to wake up a vcpu. Maybe you have some
>> better idea in mind?
>> (The wake up latency of a vcpu will depends on the priority of the
>> vcpu and how heavy loaded the system is, in my speculation.)
>>
> Yes, that is something that could (should?) be done, as the wakeup
> latency of a vcpu is a lower bound for the wakeup latency of in-guest
> workloads, so we really want to know where we stand wrt to that, if we
> need to improve things, and if yes how.
>
> It's priority and load dependant... yes, of course, but that's why we
> have real-time schedulers for, isn't it? :-P Jokes apart, for the actual
> 'lower bound', we're clearly interesting to measure a vcpu when running
> alone on a pCPU, or with top priority.
>
> On the other hand, to look at wakeup latency from within the guest,
> cyclictest is the way to go:
> https://rt.wiki.kernel.org/index.php/Cyclictest
>
> What we want is to run it inside a guest, under different host and guest
> load conditions (and using different schedulers, varying the scheduling
> parameters, etc), and see what happens... Ever looked at that? I think
> it would be interesting.

I will have a look at the cyclictest and have some evaluation set up.
I totally agree that wakeup latency (on highest priority vcpu) is very
important for real-time applications. I will start a new thread about
the cyclictest result once I get it.

>
> I've done something similar while preparing this talk:
> https://archive.fosdem.org/2014/schedule/event/virtiaas16/
>
> But never got the chance to repeat the experiments (neither I did any
> further reasoning or investigation about how the timestamps are
> obtained, TSC emulation, etc., as George pointed out)
>
> That's all... Sorry again for chiming in only now. :-(

No problem at all! :-) Your advice and information are very useful!

Thanks and best regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.