[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler
Hi, Naoki Ask Emmanuel and George first, since I am not maintaining scheduler. By the way, I want to see the xentrace data for original one. (adding vcpu priority and credit in trace-output is helpful.) Your problem seems vcpu priority mis-handling in somewhere. Thanks Atsushi SAKAI NISHIGUCHI Naoki <nisiguti@xxxxxxxxxxxxxx> wrote: > Hi, Atsushi > > After my patches applied, I have tested similarly. > The CPU% shows following. > dom0 25 > dom1 25 > dom2 50 > dom3 100 > > How do you think about my patches? > > Regards, > Naoki Nishiguchi > > Atsushi SAKAI wrote: > > Hi, George > > > > Sorry for delaying. > > > > With this type of changes, > > The CPU% shows following. > > dom1 26 > > dom2 26 > > dom3 51 > > dom4 96 > > > > Thanks > > Atsushi SAKAI > > > > "George Dunlap" <George.Dunlap@xxxxxxxxxxxxx> wrote: > > > >> OK, I've grueled through an example by hand and think I see what's going > >> on. > >> > >> So the idea of the credit scheduler is that we have a certain number > >> of "credits" per accounting period, and each of these credits > >> represents a certain amount of time. The scheduler gives out credits > >> according to weight, so theoretically each accounting period, if all > >> vcpus are active, each should consume all of its credits. Based on > >> that assumption, if a vcpu has run and accumulated more than one full > >> accounting period of credits, it's probably idle and we can leave it > >> be. > >> > >> The problem in this situation isnt' so much with rounding errors, as > >> with *scheduling granularity*. In the eample given: > >> > >> d1: weight 128 > >> d2: weight 128 > >> d3: weight 256 > >> d4: weight 512 > >> > >> If each domain has 2 vcpus, and there are 2 cores, then the credits > >> will be divided thus: > >> > >> d1: 37 credits / vcpu > >> d2: 37 credits / vcpu > >> d3: 75 credits / vcpu > >> d4: 150 credits / vcpu > >> > >> But since scheduling and accounting only happens every "tick", and > >> every "tick" is 100 credits. So each vcpu of d{1,2}, instead of > >> consuming 37 credits, consumes 100; same with each vcpu of d3. At > >> the end of the first accounting period, d{1,2,3} have gotten to run > >> for 100 credits worth of time, but d4 hasn't gotten to run at all. > >> > >> In short, the fact that we have a 100-credit scheduling granularity > >> breaks the assumption that every VM has had a chance to run each > >> accounting period when there are really long runqueues. > >> > >> I can think of a couple of solutions: the simplest one might be to > >> sort the runqueue by number of credits -- at least every accounting > >> period. In that case, d4 would always get to run every accounting > >> period; d{1.2} might not run for a given accounting period, but the > >> next time it would have twice the number of credits, &c. > >> > >> Others might include extending accounting periods when we have long > >> runqueues, or doing the credit limit during accounting only if it's > >> not on the runqueue (Sakai-san's idea) *combined* with a check when > >> the vcpu blocks. That would catch vcpus that are only moderately > >> active, but just happen to be on the runqueue for several accounting > >> periods in a row. > >> > >> Sakai-san, would you be willing to try to implement a simple "runqueue > >> sort" patch, and see if it also solves your scheduling issue? > >> > >> -George > >> > >> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@xxxxxxxxxxxxxx> > >> wrote: > >>> Hi, Emmanuel > >>> > >>> 1)rounding error for credit > >>> > >>> This patch is over rounding error. > >>> So I think it does not need to consider this effect. > >>> If you think, would you suggest me your patch. > >>> It seems changing CSCHED_TICKS_PER_ACCT is not enough. > >>> > >>> 2)Effect for I/O intensive job. > >>> > >>> I am not change the code for BOOST priority. > >>> I just changes "credit reset" condition. > >>> It should be no effect on I/O intensive(but I am not measured it.) > >>> If it needs, I will test it. > >>> Which test is best for this change? > >>> (Simple I/O test is not enough for this case, > >>> I think complex domain I/O configuration is needed to prove this patch > >>> effect.) > >>> > >>> 3)vcpu allocation measurement. > >>> > >>> At first time, I use > >>> http://weather.ou.edu/~apw/projects/stress/ > >>> stress --cpu xx --timeout xx --verbose > >>> then I use simple test.(since 2vcpus on 1domain) > >>> yes > /dev/null & > >>> yes > /dev/null & > >>> Now I test with suggested method, then result is > >>> original w/ patch > >>> dom1 27 25 > >>> dom2 27 25 > >>> dom3 53 50 > >>> dom4 91 98 > >>> > >>> > >>> Thanks > >>> Atsushi SAKAI > >>> > >>> > >>> > >>> > >>> Emmanuel Ackaouy <ackaouy@xxxxxxxxx> wrote: > >>> > >>>> On Dec 9, 2008, at 2:25, George Dunlap wrote: > >>>>> On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI > >>>>> <sakaia@xxxxxxxxxxxxxx> wrote: > >>>>>> You mean it should get rid of "credit reset"? > >>>>> Yes, that's exactly what I was thinking. Removing the check for vcpus > >>>>> on the runqueue may actually be functionally equivalent to removing > >>>>> the check altogether. > >>>> Essentially, this code is there as a safeguard against rounding errors > >>>> and other oddball cases. In theory, a runnable VCPU should seldom > >>>> accumulate more than one time slice's worth of credits. > >>>> > >>>> The problem with your change is that a VCPU that is not a spinner > >>>> but instead runs and sleeps may not be removed from the accounting > >>>> list because when it should because it will not always be running when > >>>> accounting and the check in question is performed. Potentially this will > >>>> do very bad things for VCPUs that are I/O intensive or otherwise yield > >>>> or sleep for a short time before consuming a full time slice. > >>>> > >>>> One thing that may help here is to make the credit calculations less > >>>> prone to rounding errors. One thing I had wanted to do while at > >>>> XenSource but never got around to was to change the arithmetic > >>>> so that instead of 30 credits representing a time slice, we would > >>>> make this a much bigger number. > >>>> > >>>> In this case for example, you would get credit allocations that had > >>>> less significant rounding errors if you used 30000 instead of 30 > >>>> credits per time slice: > >>>> > >>>> dom1 vcpu0,1 w128 credit 3750 > >>>> dom2 vcpu0,1 w128 credit 3750 > >>>> dom3 vcpu0,1 w256 credit 7500 > >>>> dom4 vcpu0,1 w512 credit 15000 > >>>> > >>>> I suspect this would get rid of a large number of cases such as the > >>>> one you are reporting, where a runnable VCPU's credit exceeds > >>>> one entire time slice. This type of change would improve accuracy > >>>> and not screw up credit computation for I/O intensive and other > >>>> non spinning domains. > >>>> > >>>> What do you think? > >>>> > >>>> Also please confirm that your VCPUs are indeed doing simple > >>>> "while(1);" loops. > >>>> > >>>> Cheers, > >>>> Emmanuel. > >>> > >>> > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@xxxxxxxxxxxxxxxxxxx > >>> http://lists.xensource.com/xen-devel > >>> > >>> > >>> ------------------------------------------------------------------------ > >>> > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@xxxxxxxxxxxxxxxxxxx > >>> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |