[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Priority for SMP VMs


  • To: "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx>
  • From: "Gabriel Southern" <gsouther@xxxxxxx>
  • Date: Wed, 30 Jul 2008 22:40:40 -0400
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx, Mark Williamson <mark.williamson@xxxxxxxxxxxx>
  • Delivery-date: Wed, 30 Jul 2008 19:41:02 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=fdrcreZJ5M7+VZ3m3WqqkFN5ceWBY9VqmddAKEk5fmUIwjVnXU3fqAACzdc5CVQCkB 6DZKLC3dpKuVM5XQzfezKLZycJ+vnDWchIGcnvd1sQOIfd2GLgreqTB+TTbBTysfvS68 kamN0QpSc85KXe7OkdTvhw4NxkxO4ww2a/Qvc=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

George,

I'll be interested to hear what your thoughts are when you get a
chance to look at this.  I'd also be interested to look at the tool
you mentioned for doing some more in depth analysis.

-Gabriel

On Thu, Jul 24, 2008 at 11:20 AM, George Dunlap
<George.Dunlap@xxxxxxxxxxxxx> wrote:
> Those are certainly unexpected results. :-)  Hmm... I'll take a quick look
> sometime this week or next.  I have some internal patches that add tracing
> to runstate changes, and an internal tool that's not really ready for
> release yet that can do all sorts of fun analysis... let me know if you want
> the patches.  (I'll probably try to get the patches in after the freeze is
> up.)
>
>  -George
>
> Gabriel Southern wrote:
>>
>> Hi George,
>>
>> Thanks for your comments.  I understand that the scheduler has to
>> balance a variety of different VM activity and I am only testing one
>> very limited aspect of it.  I tried running the test you suggested
>> using just "while(1) ;" loops and making sure I had enough threads
>> running so that each VM could use all the CPU time it had available.
>> The CPU time allocation was basically the same as what I described
>> earlier:
>>
>> 1-VCPU VM 12.28%
>> 2-VCPU VM 9.26%
>> 3-VCPU VM 11.55%
>> 4-VCPU VM 12.79%
>> 5-VCPU VM 13.32%
>> 6-VCPU VM 13.50%
>> 7-VCPU VM 13.60%
>> 8-VCPU VM 13.71%
>>
>> I also tried running a test with 8 VM where 7-VMs had 8-VCPUs and 1 VM
>> had 1-VCPU.  Each VM was running 8 threads of the "while (1) ;" loops
>> to make sure it was trying to use all the CPU time it could get.  In
>> this case each of the 8-VCPU VMs received around 12.96% of CPU time
>> and the 1-VCPU VM received 9.27%.
>>
>> I have a basic idea about how the credit scheduler works, but not good
>> enough to understand exactly why I am seeing this behavior.  I'm
>> guessing it has to do with the VMs that have more VCPUs getting extra
>> opportunities to run simply because they have more entries in the
>> runq.
>>
>> I'd be curious if anyone else is able to verify the behavior I've
>> described.  Also if anyone who has a better understanding of how the
>> credit scheduler has a better idea of why I'm observing this behavior
>> I'd be interested to hear that as well.  Obviously I don't think this
>> is a high priority problem, but it might be something that is useful
>> to be aware of.  I also admit that I could be observing this behavior
>> due to some sort of user error on my part, rather than there being any
>> problem with the credit scheduler.
>>
>> Thanks,
>>
>> Gabriel
>>
>>
>> On Tue, Jul 22, 2008 at 7:07 AM, George Dunlap
>> <George.Dunlap@xxxxxxxxxxxxx> wrote:
>>>
>>> Hey Gabriel,
>>>
>>> Remember that the goal of the scheduler isn't to enforce strict
>>> equality of cpu time, but to divide cpu time according to the weight
>>> while maximizing physical cpu usage (and thus total system
>>> throughput).  After a VM has used its allocated cpu time, it can still
>>> get "spare" cpu cycles in a "best-effort" manner, if no VMs with
>>> allocated cpu time left are currently running.  This "best-effort" is
>>> divided equally among vcpus, so a domain with more vcpus will
>>> naturally get more of this "extra" time than a domain with less.
>>>
>>> If I recall correctly, the SPECCPU suite uses real workloads, such as
>>> bzip, gcc, and others.  A lot of these workloads also include disk
>>> I/O, which may cause vcpus to block.  Blocking and waking of different
>>> vcpus and VMs is bound to cause some interesting interactions between
>>> VMs; for example, if a 1-vcpu and an 8-vcpu VM are running, and the
>>> 1-vcpu VM blocks, the 8-vcpu VM can use the extra processor time the
>>> 1-vcpu VM isn't using; however, if some of the 8-vcpu VM's vcpus
>>> block, the 1-vcpu VM can't use the extra cpu time; the cpus just sit
>>> idle.
>>>
>>> If you want a "clean" scheduler test, you should instead run "while(1)
>>> ;" loops, which will never block, and will always consume all cpu time
>>> available.  My guess is if you do that, then the cpu time given to
>>> each domain will be exactly according to their weight.  On the other
>>> hand, if you do a "kernbench" test, which will include a lot of
>>> blocking, I suspect you may get even more disparity between the
>>> runtimes.
>>>
>>>  -George
>>>
>>> On Tue, Jul 22, 2008 at 4:43 AM, Gabriel Southern <gsouther@xxxxxxx>
>>> wrote:
>>>>
>>>> Hi Mark,
>>>>
>>>> Thanks for the reply, I'll be interested to see if you have any
>>>> additional thoughts after I describe one of the tests that I have run.
>>>>
>>>> The system that I have been working with is a dual quad-core system so
>>>> it has eight logical processors.  Most of the tests that I have run
>>>> have been with 8 VMs executing simultaneously with various different
>>>> numbers of VCPUs for each VM.  Most of the tests have been run with
>>>> various benchmarks from the SPEC CPU2006 suite.
>>>>
>>>> One test that does not use the SPEC benchmarks and is probably the
>>>> easiest to replicate is as follows:
>>>>
>>>> Eight VMs configured with varying numbers of VCPUs ranging from 1 to
>>>> 8.  Each VM executing a program with the same number of threads as it
>>>> has VCPUs (1 VCPU VM has 1 thread, 8 VCPU VM has 8 threads) where each
>>>> thread is running an infinite loop designed to use CPU time.  No cap
>>>> was set and each VM had a weight of 256.
>>>>
>>>> >From what I understand about how the credit scheduler works I would
>>>> think in this case each VM would receive 12.5% of the total system CPU
>>>> time.  However, after running this test for a couple of hours the host
>>>> CPU time had been allocated as follows:
>>>>
>>>> 1-VCPU VM: 12.14%
>>>> 2-VCPU VM: 9.26%
>>>> 3-VCPU VM: 11.58%
>>>> 4-VCPU VM: 12.81%
>>>> 5-VCPU VM: 13.35%
>>>> 6-VCPU VM: 13.53%
>>>> 7-VCPU VM: 13.62%
>>>> 8-VCPU VM: 13.72%
>>>>
>>>> As you can see the number of VCPUs changes the allocation of CPU so
>>>> that VMs with fewer VCPUs receive less CPU time than they should based
>>>> on the configured weight value.  I'm not sure why the 1-VCPU VM is
>>>> getting more CPU time in this test than the 2 and 3 VCPU VMs.  Overall
>>>> the trend that I have seen is that assigning more VCPUs to a VM
>>>> slightly increases that VM's priority on an overcommitted host, this
>>>> test ended up with the 1-VCPU VM not following that trend exactly.
>>>>
>>>> I'd be interested to hear any thoughts you have on these results;
>>>> either comments about my experiment setup, or thoughts about the why
>>>> the scheduling algorithm is exhibiting this behavior.
>>>>
>>>> Thanks,
>>>>
>>>> -Gabriel
>>>>
>>>> On Mon, Jul 21, 2008 at 5:00 PM, Mark Williamson
>>>> <mark.williamson@xxxxxxxxxxxx> wrote:
>>>>>
>>>>> Hi Gabriel,
>>>>>
>>>>> I'm not particularly familiar with the credit scheduler but I'll do my
>>>>> best to
>>>>> help clarify things a bit (I hope!).
>>>>>
>>>>> On Thursday 03 July 2008, Gabriel Southern wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm working a project with SMP VMs and I noticed something about the
>>>>>> behavior of the credit scheduler that does not match my understanding
>>>>>> of the documentation about the credit scheduler.  It seems like
>>>>>> assigning more VCPUs to a VM increases the proportion of total system
>>>>>> CPU resources the VM will receive, whereas the documentation indicates
>>>>>> that this should be controlled by the weight value.
>>>>>>
>>>>>> For example when running a CPU intensvie benchmark with some VMs
>>>>>> configured with 1-VCPU and other VMs configured with 8-VCPUs, the
>>>>>> benchmark took 37% longer to complete on the VMs with 1-VCPU than the
>>>>>> ones with 8-VCPUs.  Unfortunately I did not record the exact values
>>>>>> for CPU time that each VM received; however, I think that the 8-VCPU
>>>>>> VMs did receive around 30% more CPU time than the 1-VCPU VMs.  These
>>>>>> tests were performed with the default weight of 256 for all VMs and no
>>>>>> cap configured.
>>>>>
>>>>> You need to tell us a bit more about how you did your benchmarking...
>>>>>  Were
>>>>> the SMP and UP guests running concurrently and competing for CPU time?
>>>>>  Or
>>>>> were they run separately?  Was the benchmark able to take advantage of
>>>>> multiple CPUs itself?
>>>>>
>>>>>> I don't think that this is the behavior that the scheduler should
>>>>>> exhibit based on the documentation I read.  I admit the tests I was
>>>>>> doing were not really practical use cases for real applications.  But
>>>>>> I'd be curious if anyone knows if this is a limitation of the design
>>>>>> of the credit scheduler, or possibly due to a configuration problem
>>>>>> with my system.  I running Xen 3.2.0 compiled from the official source
>>>>>> distribution tarball, and the guest VMs are also using the 3.2.0
>>>>>> distribution with the 2.6.18 kernel.  Any ideas anyone has about why
>>>>>> my system is behaving this way are appreciated.
>>>>>
>>>>> Without knowing more about your setup there are lots of things that
>>>>> could be
>>>>> happening...
>>>>>
>>>>> If you're not using caps then there's no reason why the SMP guests
>>>>> shouldn't
>>>>> get more CPU time if they're somehow able to consume more slack time in
>>>>> the
>>>>> system.  SMP scheduling makes things pretty complicated!
>>>>>
>>>>> If you reply with more details, I can try and offer my best guess as to
>>>>> what
>>>>> might be happening.  If you don't get a response within a day or two,
>>>>> please
>>>>> feel free to poke me directly.
>>>>>
>>>>> Cheers,
>>>>> Mark
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Gabriel
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>>>> http://lists.xensource.com/xen-devel
>>>>>
>>>>>
>>>>> --
>>>>> Push Me Pull You - Distributed SCM tool
>>>>> (http://www.cl.cam.ac.uk/~maw48/pmpu/)
>>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@xxxxxxxxxxxxxxxxxxx
>>>> http://lists.xensource.com/xen-devel
>>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.