Xen project Mailing List

Re: [Xen-devel] Priority for SMP VMs

To: "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx>

From: "Gabriel Southern" <gsouther@xxxxxxx>

Date: Wed, 30 Jul 2008 22:40:40 -0400

Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Mark Williamson <mark.williamson@xxxxxxxxxxxx>

Delivery-date: Wed, 30 Jul 2008 19:41:02 -0700

Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=fdrcreZJ5M7+VZ3m3WqqkFN5ceWBY9VqmddAKEk5fmUIwjVnXU3fqAACzdc5CVQCkB 6DZKLC3dpKuVM5XQzfezKLZycJ+vnDWchIGcnvd1sQOIfd2GLgreqTB+TTbBTysfvS68 kamN0QpSc85KXe7OkdTvhw4NxkxO4ww2a/Qvc=

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

George, I'll be interested to hear what your thoughts are when you get a chance to look at this. I'd also be interested to look at the tool you mentioned for doing some more in depth analysis. -Gabriel On Thu, Jul 24, 2008 at 11:20 AM, George Dunlap <George.Dunlap@xxxxxxxxxxxxx> wrote: > Those are certainly unexpected results. :-) Hmm... I'll take a quick look > sometime this week or next. I have some internal patches that add tracing > to runstate changes, and an internal tool that's not really ready for > release yet that can do all sorts of fun analysis... let me know if you want > the patches. (I'll probably try to get the patches in after the freeze is > up.) > > -George > > Gabriel Southern wrote: >> >> Hi George, >> >> Thanks for your comments. I understand that the scheduler has to >> balance a variety of different VM activity and I am only testing one >> very limited aspect of it. I tried running the test you suggested >> using just "while(1) ;" loops and making sure I had enough threads >> running so that each VM could use all the CPU time it had available. >> The CPU time allocation was basically the same as what I described >> earlier: >> >> 1-VCPU VM 12.28% >> 2-VCPU VM 9.26% >> 3-VCPU VM 11.55% >> 4-VCPU VM 12.79% >> 5-VCPU VM 13.32% >> 6-VCPU VM 13.50% >> 7-VCPU VM 13.60% >> 8-VCPU VM 13.71% >> >> I also tried running a test with 8 VM where 7-VMs had 8-VCPUs and 1 VM >> had 1-VCPU. Each VM was running 8 threads of the "while (1) ;" loops >> to make sure it was trying to use all the CPU time it could get. In >> this case each of the 8-VCPU VMs received around 12.96% of CPU time >> and the 1-VCPU VM received 9.27%. >> >> I have a basic idea about how the credit scheduler works, but not good >> enough to understand exactly why I am seeing this behavior. I'm >> guessing it has to do with the VMs that have more VCPUs getting extra >> opportunities to run simply because they have more entries in the >> runq. >> >> I'd be curious if anyone else is able to verify the behavior I've >> described. Also if anyone who has a better understanding of how the >> credit scheduler has a better idea of why I'm observing this behavior >> I'd be interested to hear that as well. Obviously I don't think this >> is a high priority problem, but it might be something that is useful >> to be aware of. I also admit that I could be observing this behavior >> due to some sort of user error on my part, rather than there being any >> problem with the credit scheduler. >> >> Thanks, >> >> Gabriel >> >> >> On Tue, Jul 22, 2008 at 7:07 AM, George Dunlap >> <George.Dunlap@xxxxxxxxxxxxx> wrote: >>> >>> Hey Gabriel, >>> >>> Remember that the goal of the scheduler isn't to enforce strict >>> equality of cpu time, but to divide cpu time according to the weight >>> while maximizing physical cpu usage (and thus total system >>> throughput). After a VM has used its allocated cpu time, it can still >>> get "spare" cpu cycles in a "best-effort" manner, if no VMs with >>> allocated cpu time left are currently running. This "best-effort" is >>> divided equally among vcpus, so a domain with more vcpus will >>> naturally get more of this "extra" time than a domain with less. >>> >>> If I recall correctly, the SPECCPU suite uses real workloads, such as >>> bzip, gcc, and others. A lot of these workloads also include disk >>> I/O, which may cause vcpus to block. Blocking and waking of different >>> vcpus and VMs is bound to cause some interesting interactions between >>> VMs; for example, if a 1-vcpu and an 8-vcpu VM are running, and the >>> 1-vcpu VM blocks, the 8-vcpu VM can use the extra processor time the >>> 1-vcpu VM isn't using; however, if some of the 8-vcpu VM's vcpus >>> block, the 1-vcpu VM can't use the extra cpu time; the cpus just sit >>> idle. >>> >>> If you want a "clean" scheduler test, you should instead run "while(1) >>> ;" loops, which will never block, and will always consume all cpu time >>> available. My guess is if you do that, then the cpu time given to >>> each domain will be exactly according to their weight. On the other >>> hand, if you do a "kernbench" test, which will include a lot of >>> blocking, I suspect you may get even more disparity between the >>> runtimes. >>> >>> -George >>> >>> On Tue, Jul 22, 2008 at 4:43 AM, Gabriel Southern <gsouther@xxxxxxx> >>> wrote: >>>> >>>> Hi Mark, >>>> >>>> Thanks for the reply, I'll be interested to see if you have any >>>> additional thoughts after I describe one of the tests that I have run. >>>> >>>> The system that I have been working with is a dual quad-core system so >>>> it has eight logical processors. Most of the tests that I have run >>>> have been with 8 VMs executing simultaneously with various different >>>> numbers of VCPUs for each VM. Most of the tests have been run with >>>> various benchmarks from the SPEC CPU2006 suite. >>>> >>>> One test that does not use the SPEC benchmarks and is probably the >>>> easiest to replicate is as follows: >>>> >>>> Eight VMs configured with varying numbers of VCPUs ranging from 1 to >>>> 8. Each VM executing a program with the same number of threads as it >>>> has VCPUs (1 VCPU VM has 1 thread, 8 VCPU VM has 8 threads) where each >>>> thread is running an infinite loop designed to use CPU time. No cap >>>> was set and each VM had a weight of 256. >>>> >>>> >From what I understand about how the credit scheduler works I would >>>> think in this case each VM would receive 12.5% of the total system CPU >>>> time. However, after running this test for a couple of hours the host >>>> CPU time had been allocated as follows: >>>> >>>> 1-VCPU VM: 12.14% >>>> 2-VCPU VM: 9.26% >>>> 3-VCPU VM: 11.58% >>>> 4-VCPU VM: 12.81% >>>> 5-VCPU VM: 13.35% >>>> 6-VCPU VM: 13.53% >>>> 7-VCPU VM: 13.62% >>>> 8-VCPU VM: 13.72% >>>> >>>> As you can see the number of VCPUs changes the allocation of CPU so >>>> that VMs with fewer VCPUs receive less CPU time than they should based >>>> on the configured weight value. I'm not sure why the 1-VCPU VM is >>>> getting more CPU time in this test than the 2 and 3 VCPU VMs. Overall >>>> the trend that I have seen is that assigning more VCPUs to a VM >>>> slightly increases that VM's priority on an overcommitted host, this >>>> test ended up with the 1-VCPU VM not following that trend exactly. >>>> >>>> I'd be interested to hear any thoughts you have on these results; >>>> either comments about my experiment setup, or thoughts about the why >>>> the scheduling algorithm is exhibiting this behavior. >>>> >>>> Thanks, >>>> >>>> -Gabriel >>>> >>>> On Mon, Jul 21, 2008 at 5:00 PM, Mark Williamson >>>> <mark.williamson@xxxxxxxxxxxx> wrote: >>>>> >>>>> Hi Gabriel, >>>>> >>>>> I'm not particularly familiar with the credit scheduler but I'll do my >>>>> best to >>>>> help clarify things a bit (I hope!). >>>>> >>>>> On Thursday 03 July 2008, Gabriel Southern wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I'm working a project with SMP VMs and I noticed something about the >>>>>> behavior of the credit scheduler that does not match my understanding >>>>>> of the documentation about the credit scheduler. It seems like >>>>>> assigning more VCPUs to a VM increases the proportion of total system >>>>>> CPU resources the VM will receive, whereas the documentation indicates >>>>>> that this should be controlled by the weight value. >>>>>> >>>>>> For example when running a CPU intensvie benchmark with some VMs >>>>>> configured with 1-VCPU and other VMs configured with 8-VCPUs, the >>>>>> benchmark took 37% longer to complete on the VMs with 1-VCPU than the >>>>>> ones with 8-VCPUs. Unfortunately I did not record the exact values >>>>>> for CPU time that each VM received; however, I think that the 8-VCPU >>>>>> VMs did receive around 30% more CPU time than the 1-VCPU VMs. These >>>>>> tests were performed with the default weight of 256 for all VMs and no >>>>>> cap configured. >>>>> >>>>> You need to tell us a bit more about how you did your benchmarking... >>>>> Were >>>>> the SMP and UP guests running concurrently and competing for CPU time? >>>>> Or >>>>> were they run separately? Was the benchmark able to take advantage of >>>>> multiple CPUs itself? >>>>> >>>>>> I don't think that this is the behavior that the scheduler should >>>>>> exhibit based on the documentation I read. I admit the tests I was >>>>>> doing were not really practical use cases for real applications. But >>>>>> I'd be curious if anyone knows if this is a limitation of the design >>>>>> of the credit scheduler, or possibly due to a configuration problem >>>>>> with my system. I running Xen 3.2.0 compiled from the official source >>>>>> distribution tarball, and the guest VMs are also using the 3.2.0 >>>>>> distribution with the 2.6.18 kernel. Any ideas anyone has about why >>>>>> my system is behaving this way are appreciated. >>>>> >>>>> Without knowing more about your setup there are lots of things that >>>>> could be >>>>> happening... >>>>> >>>>> If you're not using caps then there's no reason why the SMP guests >>>>> shouldn't >>>>> get more CPU time if they're somehow able to consume more slack time in >>>>> the >>>>> system. SMP scheduling makes things pretty complicated! >>>>> >>>>> If you reply with more details, I can try and offer my best guess as to >>>>> what >>>>> might be happening. If you don't get a response within a day or two, >>>>> please >>>>> feel free to poke me directly. >>>>> >>>>> Cheers, >>>>> Mark >>>>> >>>>>> Thanks, >>>>>> >>>>>> Gabriel >>>>>> >>>>>> _______________________________________________ >>>>>> Xen-devel mailing list >>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>>>> http://lists.xensource.com/xen-devel >>>>> >>>>> >>>>> -- >>>>> Push Me Pull You - Distributed SCM tool >>>>> (http://www.cl.cam.ac.uk/~maw48/pmpu/) >>>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@xxxxxxxxxxxxxxxxxxx >>>> http://lists.xensource.com/xen-devel >>>> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.