Xen project Mailing List

RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6)

To: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>

From: Dan Magenheimer <dan.magenheimer@xxxxxxxxxx>

Date: Wed, 7 Apr 2010 15:13:25 -0700 (PDT)

Cc: "Xen-Devel $xen-devel@xxxxxxxxxxxxxxxxxxx$" <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 07 Apr 2010 15:15:54 -0700

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

One more followup on this: It appears that I AM seeing the "irreproduciblity" problem as well, even with vcpus=1 for all guests. The range is a bit smaller, more like 5%, though. Sum of cpusec across all domains plus dom0 seems to be very reproducible (within < 0.5%). Total elapsed time for the whole workload is what is widely varying. So, either I/O is taking much longer in some cases (despite an identical workload on identical hardware); or the scheduler is selecting the idle domain much more frequently; or... something else? > -----Original Message----- > From: Dan Magenheimer > Sent: Tuesday, April 06, 2010 10:52 AM > To: George Dunlap > Cc: Xen-Devel (xen-devel@xxxxxxxxxxxxxxxxxxx) > Subject: RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6) > > Hi George -- > > Thanks again for the reply. Hope it's OK if I go back > on-list... I'm hoping others may be able to reproduce > as my ability to experiment is limited now (see below). > > > From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx] > > > > 1) Make a large ramdisk in each VM, big enough for the whole kernel > > tree and binaries. Do the build there, and see if you have the same > > discrepancy. > > My test domains have 384MB each. Dom0 has 256MB. > (Total physical RAM is only 2GB.) So this isn't > really an option. > > > 2) Play with the dom0 io scheduler and see if it has an effect. If > > your current one is "noop", that's suspicious; see if "cfq" works > > better. > > On dom0, /sys/block/sda/queue/scheduler shows [cfq]. > Don't know if this matters but /sys/block/tapdev*/queue/scheduler > show [noop]. > > > 3) Take a trace of just the scheduling events, using xentrace... > > I lost about a week of test runs that I'm working on for > Xen Summit and have to re-do those before I do much > experimenting, but will try out some of your ideas when > my (week of redo) test runs are done. In the meantime, I'm > still monitoring the test runs that I am running now. > (I need a reliable set of non-tmem runs as a base > to compare various tmem runs against.) > > I reported two problems that we can call: > 1) "racing ahead", where one of a pair of identical domains > seems to get a lot more cycles than the other > 2) "irreproducibility", where two seemingly identical > and heavily overcommitted test runs have timing results > that differ by an unreasonable amount (6-7%) > > After reducing my test domains to a single vcpu, the > "irreproducibility" problem seems to be greatly reduced. > I made three runs and they differ by <0.3%. So as > best I can tell, this problem requires multi-vcpu domains. > (Actually, I changed from "file" to "tap:aio" also so > it could be that too.) > > However, with: > > a) vcpus=1 for the test domains (see previous post) and > b) vcpus=1 for test domains and dom0_max_vcpus=1 > > I am still seeing the "racing ahead" problem. On > a current run of (b) > > 142s dom0 > 479s 64-bit #1 > 454s 64-bit #2 <-- 6% less > 536s 32-bit #1 > 447s 32-bit #2 <-- 16% less! > > Again, this is a transitory oddity that may shed some > light... after completion of the workload, the runtimes > are very similar THOUGH #2 seems to always be the > slower of the two by a small amount (<0.5%). > > Thanks, > Dan > > > -----Original Message----- > > From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx] > > Sent: Tuesday, April 06, 2010 5:24 AM > > To: Dan Magenheimer > > Subject: Re: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6) > > > > How much memory does each VM have? Another possibility is that this > > has to do with unfairness in the block driver servicing requests. > > Three ways you could test this hypothesis. > > > > 1) Make a large ramdisk in each VM, big enough for the whole kernel > > tree and binaries. Do the build there, and see if you have the same > > discrepancy. > > > > 2) Play with the dom0 io scheduler and see if it has an effect. If > > your current one is "noop", that's suspicious; see if "cfq" works > > better. > > > > 3) Take a trace of just the scheduling events, using xentrace, and > use > > xenalyze to see how much time each vcpu is spending running, > runnable, > > and blocked (waiting for the cpu). If the scheduler is being unfair, > > then some vcpus will spend more time "runnable" than others. If it's > > something else (the dom0 disk scheduler being unfair, or the vm just > > using different amounts of memory) then "runnable" will not be > > considerably higher. > > > > To do #3: > > > > # xentrace -D -e 0x28000 -S 32 /tmp/filename.trace > > > > Then download: > > http://xenbits.xensource.com/ext/xenalyze.hg > > > > Make it, and run the following command: > > > > $ xenalyze -s --cpu-hz [speed-in-gigahertz]G filename.trace > > > filename.summary > > > > The summary file breaks information down by domain, then vcpu; look > at > > the "runstates" for each vcpu (running, runnable, blocked) and > compare > > them. > > > > -George > > > > On Tue, Apr 6, 2010 at 12:17 AM, Dan Magenheimer > > <dan.magenheimer@xxxxxxxxxx> wrote: > > > For the record, I am seeing the same problem (first one, > > > haven't yet got multiple runs) with vcpus=1 for all domains. > > > Only on 32-bit this time and only 20%, but those may > > > be random scheduling factors. This is also with > > > tap:aio instead of file so as to eliminate dom0 page > > > cacheing effects. > > > > > > 394s dom0 > > > 2265s 64-bit #1 > > > 2275s 64-bit #2 > > > 2912s 32-bit #1 > > > 2247s 32-bit #2 <-- 20% less! > > > > > > I'm going to try a dom0_vcpus=1 run next. > > > > > >> -----Original Message----- > > >> From: Dan Magenheimer > > >> Sent: Monday, April 05, 2010 2:18 PM > > >> To: George Dunlap > > >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxx > > >> Subject: RE: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6) > > >> > > >> Thanks for the reply! > > >> > > >> Well I'm now seeing something a little more alarming: Running > > >> an identical but CPU-overcommitted workload (just normal PV > domains, > > >> no tmem or ballooning or anything), what would you expect the > > >> variance to be between successive identical measured runs > > >> on identical hardware? > > >> > > >> I am seeing total runtimes, both measured by elapsed time and by > > >> sum-of-CPUsec across all domains (incl dom0), vary by 6-7% or > more. > > >> This seems a bit unusual/excessive to me and makes it very hard > > >> to measure improvements (e.g. by tmem, for an upcoming Xen summit > > >> presentation) or benchmark anything complex. > > >> > > >> > Is it possible that Linux is just favoring one vcpu over the > other > > >> for > > >> > some reason? Did you try running the same test but with only > one > > VM? > > >> > > >> Well "make -j8" will likely be single-threaded part of the time, > > >> but I wouldn't expect that to make that big a difference between > > >> two identical workloads. > > >> > > >> I'm not sure I understand how I would run the same test with > > >> only one VM when the observation of the strangeness requires > > >> two VMs (and even then must be observed at random points during > > >> execution). > > >> > > >> > Another theory would be that most interrupts are delivered to > vcpu > > 0, > > >> > so it may end up in "boost" priority more often. > > >> > > >> Hmmm... I'm not sure I get that, but what about _physical_ cpu 0 > > >> for Xen? If all physical cpu's are not the same and one VM > > >> has an affinity for vcpu0-on-pcpu0 and the other has an affinity > > >> for vcpu1-in-pcpu0, would that make a difference? > > >> > > >> But still, 40% seems very large and almost certainly a bug, > > >> especially given the new observations above. > > >> > > >> > -----Original Message----- > > >> > From: George Dunlap [mailto:George.Dunlap@xxxxxxxxxxxxx] > > >> > Sent: Monday, April 05, 2010 8:44 AM > > >> > To: Dan Magenheimer > > >> > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx > > >> > Subject: Re: [Xen-devel] Scheduling anomaly with 4.0.0 (rc6) > > >> > > > >> > Is it possible that Linux is just favoring one vcpu over the > other > > >> for > > >> > some reason? Did you try running the same test but with only > one > > VM? > > >> > > > >> > Another theory would be that most interrupts are delivered to > vcpu > > 0, > > >> > so it may end up in "boost" priority more often. > > >> > > > >> > I'll re-post the credit2 series shortly; Keir said he'd accept > it > > >> > post-4.0. You could try it with that and see what the > performance > > is > > >> > like. > > >> > > > >> > -George > > >> > > > >> > On Fri, Apr 2, 2010 at 5:48 PM, Dan Magenheimer > > >> > <dan.magenheimer@xxxxxxxxxx> wrote: > > >> > > I've been running some heavy testing on a recent Xen 4.0 > > >> > > snapshot and seeing a strange scheduling anomaly that > > >> > > I thought I should report. I don't know if this is > > >> > > a regression... I suspect not. > > >> > > > > >> > > System is a Core 2 Duo (Conroe). Load is four 2-VCPU > > >> > > EL5u4 guests, two of which are 64-bit and two of which > > >> > > are 32-bit. Otherwise they are identical. All four > > >> > > are running a sequence of three Linux compiles with > > >> > > (make -j8 clean; make -j8). All are started approximately > > >> > > concurrently: I synchronize the start of the test after > > >> > > all domains are launched with an external NFS semaphore > > >> > > file that is checked every 30 seconds. > > >> > > > > >> > > What I am seeing is a rather large discrepancy in the > > >> > > amount of time consumed "underway" by the four domains > > >> > > as reported by xentop and xm list. I have seen this > > >> > > repeatedly, but the numbers in front of me right now are: > > >> > > > > >> > > 1191s dom0 > > >> > > 3182s 64-bit #1 > > >> > > 2577s 64-bit #2 <-- 20% less! > > >> > > 4316s 32-bit #1 > > >> > > 2667s 32-bit #2 <-- 40% less! > > >> > > > > >> > > Again these are identical workloads and the pairs > > >> > > are identical released kernels running from identical > > >> > > "file"-based virtual block devices containing released > > >> > > distros. Much of my testing had been with tmem and > > >> > > self-ballooning so I had blamed them for awhile, > > >> > > but I have reproduced it multiple times with both > > >> > > of those turned off. > > >> > > > > >> > > At start and after each kernel compile, I record > > >> > > a timestamp, so I know the same work is being done. > > >> > > Eventually the workload finishes on each domain and > > >> > > intentionally crashes the kernel so measurement is > > >> > > stopped. At the conclusion, the 64-bit pair have > > >> > > very similar total CPU sec and the 32-bit pair have > > >> > > very similar total CPU sec so eventually (presumably > > >> > > when the #1's are done hogging CPU), the "slower" > > >> > > domains do finish the same amount of work. As a > > >> > > result, it is hard to tell from just the final > > >> > > results that the four domains are getting scheduled > > >> > > at very different rates. > > >> > > > > >> > > Does this seem like a scheduler problem, or are there > > >> > > other explanations? Anybody care to try to reproduce it? > > >> > > Unfortunately, I have to use the machine now for other > > >> > > work. > > >> > > > > >> > > P.S. According to xentop, there is almost no network > > >> > > activity, so it is all CPU and VBD. And the ratio > > >> > > of VBD activity looks to be approximately the same > > >> > > ratio as CPU(sec). > > >> > > > > >> > > _______________________________________________ > > >> > > Xen-devel mailing list > > >> > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > >> > > http://lists.xensource.com/xen-devel > > >> > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@xxxxxxxxxxxxxxxxxxx > > > http://lists.xensource.com/xen-devel > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.