Xen project Mailing List

Re: [Xen-devel] Performance evaluation and Questions: Eliminating Xen (RTDS) scheduler overhead on dedicated CPU

Hey, guys, I know, I know, I'm soooo much late to the party! Sorry, I got trapped into a thing that I really needed to finish... :-/ I've got no intention to resurrect this old thread, just wanted to pointed out a few things. On Wed, 2015-04-08 at 16:52 -0400, Meng Xu wrote: > 2015-04-08 5:13 GMT-04:00 George Dunlap <george.dunlap@xxxxxxxxxxxxx>: > > On 04/07/2015 09:25 PM, Meng Xu wrote: > >> Hi George, Dario and Konrad, > >> > >> I finished a prototype of the RTDS scheduler with the dedicated CPU > >> feature and did some quick evaluation on this feature. Right now, I > >> need to refactor the code (because it is kind of messy when I was > >> exploring different approaches :() and will send out the clean patch > >> later (this week or next week). But the design follows our discussion > >> at > >> http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg02854.html. > >> The idea of 'dedicated CPU' makes sense. It's also always been quite common, in the Linux community, to see it as a real-time oriented feature. I personally don't agree much, as real-time is about determinism and dedicating a CPU to a task (in our case, that would mean dedicating a pCPU to a vCPU and then, in the guest, that vCPU to a task) does not automatically gives you determinism. Sure, it cuts off some overhead and some sources of unpredictable behavior (e.g., scheduler code), but not all of them (what about, for instance, caches shared with non-isolated pCPUs). No, IMO, if you want determinism, you should make the code deterministic, not get rid of it! :-D In fact, Linux has a feature similar the one Meng investigated, and that has traditionally been used (at least until I was involved with Linux scheduling) by HPC people, database engines and high frequency trading use cases (which are also often categorized as 'real-time workloads' but just aren't, IMO). It's called isolcpus. For sure there was a boot time parameter for it, and it looks like it is still there: http://wiki.linuxcnc.org/cgi-bin/wiki.pl?The_Isolcpus_Boot_Parameter_And_GRUB2 http://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html http://lxr.linux.no/linux+v3.19.1/Documentation/kernel-parameters.txt#L1530 I'm not sure whether they grew interfaces to setup this at runtime, but doubt it. For us, I'm not sure whether something like that would be useful. To be fruitfully used together with something similar to Linux's isolcpus, it need to look like how Meng is doing it, i.e., it ought to be possible to handle single vCPUs, not full domains. However... > > Secondly, adding an entirely new interface, as implementing the > > "dedicated cpu" would require, on the other hand, is a fairly > > significant cost. It's costly for users to learn and configure the new > > interface, it's costly to document, and once it's there we have to > > continue to support it perhaps for a long time to come; and the feature > > itself is also fairly complicated and increases the code maintenance. > > > > So the performance improvement you've shown so far I think is nowhere > > near high enough a benefit to outweigh this cost. > ... I agree with George on this... > OK. I see and agree. > ... and I'm happy you also do! :-D > > And in any case, as you say, it looks like the source of the overhead is > > the very frequent invocation of the RTDS scheduler. > Exactly! I'd put it this way: there are more urgent and more useful optimization, in general, but especially in RTDS, to be done before thinking at something like this. > You could probably > > get the same kinds of benefits without adding any new interfaces by > > reducing the amount of time the scheduler gets invoked when there are no > > other tasks to run on that cpu. > Exactly. And again, that is particularly relevant to RTDS, as numbers show. Looking again at Linux world, this (i.e., avoiding invoking the scheduler when there is only one task on a CPU) is also something they've introduced rather recently. It's called full dynticks: https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt http://ertl.jp/~shinpei/conf/ospert13/slides/FredericWeisbecker.pdf https://lwn.net/Articles/549580/ http://thread.gmane.org/gmane.linux.kernel/1485210 [*] [*] check out Linus' replies... "awesome" as usual, he even managed to rant about virtualization, all by himself!! :-P That is IMO a line of action that may deserve some investigation. RTDS-wise, for sure... the Credit-s, are not at all bad from that perspective (as your numbers also show), but it might be possible to do better. > Yes. This is what Dagaen (cc.ed) is doing right now. He had a RFC > patch and sent to me last week. We are working on refining the patch > before sending it out to the mailing list. > I'll be super glad to see this! :-D > > > > What I was expecting you to test, for the RTDS scheduler, was the > > wake-up latency. Have you looked at that at all? > Indeed, that would be really interesting. > Ah, I didn't realize this... Do you have any concrete evaluation plan for > this? > In my mind, I can issue hypercalls in domU to wake-up and sleep a vcpu > and measure how long it takes to wake up a vcpu. Maybe you have some > better idea in mind? > (The wake up latency of a vcpu will depends on the priority of the > vcpu and how heavy loaded the system is, in my speculation.) > Yes, that is something that could (should?) be done, as the wakeup latency of a vcpu is a lower bound for the wakeup latency of in-guest workloads, so we really want to know where we stand wrt to that, if we need to improve things, and if yes how. It's priority and load dependant... yes, of course, but that's why we have real-time schedulers for, isn't it? :-P Jokes apart, for the actual 'lower bound', we're clearly interesting to measure a vcpu when running alone on a pCPU, or with top priority. On the other hand, to look at wakeup latency from within the guest, cyclictest is the way to go: https://rt.wiki.kernel.org/index.php/Cyclictest What we want is to run it inside a guest, under different host and guest load conditions (and using different schedulers, varying the scheduling parameters, etc), and see what happens... Ever looked at that? I think it would be interesting. I've done something similar while preparing this talk: https://archive.fosdem.org/2014/schedule/event/virtiaas16/ But never got the chance to repeat the experiments (neither I did any further reasoning or investigation about how the timestamps are obtained, TSC emulation, etc., as George pointed out) That's all... Sorry again for chiming in only now. :-( Regards, Dario

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.