[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Poor HVM performance with 8 vcpus
Jeurgen, I think this problem is a good candidate for xentrace/xenalyze. If you take a 30-second trace (xentrace -D -e all -T 30 /tmp/[traceid].trace) while the benchmark is at its heaviest, and then analyze it using xenalyze (http://xenbits.xensource.com/ext/xenalyze.hg), it should show up whether the shadow performance is due to brute-force search or something else. If you're using 3.3, you'll have to apply the back-patch to xenalyze to make it work properly. If you post the summary output (xenalyze -s [traceid].trace > [traceid].summary), I can help interpret it. -George On Wed, Oct 7, 2009 at 10:40 AM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote: > Tim Deegan wrote: >> At 09:08 +0100 on 07 Oct (1254906487), James Harper wrote: >>>> At the very least it would be good to have a predictor which figured >>> out which >>>> of the several heuristics should actually be used for a given VM. A >>> simple >>>> "try whichever one worked last time first" should work fine. >>>> >>>> Even smarter would be two just have heuristics for the two general >>> classes of >>>> mapping (1:1 and recursive), and have the code automatically figure >>> out the >>>> starting virtual address being used for a given guest. >>>> >>> Are there any other of these heuristics tucked away in xen? Would there >>> be any benefit to specifying the OS being virtualised in the config? Eg >>> "os=windows"? >> >> It would be better to allow the specific heuristic to be specified in >> the Xen interface (e.g. that it's a recursive pagetable at a particular >> address, or a one-to-one mapping). Which isn't to say the python layer >> couldn't put some syntactic sugar on it. >> >> But the bulk of the win will be had from adding BS2000 to the list of >> heuristics. There's probably some benefit in making the heuristic list >> pull-to-front, too. >> >> Automatically detecting 1:1 mappings and linear pagetable schemes would >> be fun and is probably the Right Thing[tm], but making sure it works >> with all the OSes that currently work (e.g. all HALs of all Windows >> versions) will be a significant investment in time. :) >> >> Also, before getting too stuck into this it'd be worth running once more >> with performance counters enabled and checking that this is actually >> your problem! You should see a much higher number for "shadow writeable >> brute-force" running BS2000 than running Windows. > > I still had the numbers for a test with 6 vcpus, which already showed severe > performance degradation. I edited the numbers a little bit to show only the > counters for the cpus running BS2000 and no other domain. The test ran for > 60 seconds. > > calls to shadow_alloc 438 427 424 480 436 422 > number of shadow pages in use 2765 2151 2386 2509 4885 1391 > calls to shadow_free 168 132 185 144 181 105 > calls to shadow_fault 65271 69132 60495 53756 73363 52449 > shadow_fault fast path n/p 7347 8081 6713 6134 8521 6112 > shadow_fault fast path error 14 12 15 3 13 11 > shadow_fault really guest fault 24004 25723 22815 19709 27049 19190 > shadow_fault emulates a write 1045 949 1018 995 1015 901 > shadow_fault fast emulate 424 361 449 348 387 314 > shadow_fault fixed fault 32503 34264 29624 26689 36641 26096 > calls to shadow_validate_gl2e 875 748 917 731 795 667 > calls to shadow_validate_gl3e 481 456 443 491 489 446 > calls to shadow_validate_gl4e 104 97 95 112 105 95 > calls to shadow_hash_lookup 2109654 2203254 2228896 2245849 2164727 2309059 > shadow hash hit in bucket head 2012828 2111164 2161113 2177591 2104099 2242458 > shadow hash misses 851 840 841 910 852 838 > calls to get_shadow_status 2110031 2202828 2228769 2246689 2164213 2309241 > calls to shadow_hash_insert 438 436 428 481 437 430 > calls to shadow_hash_delete 168 150 185 154 202 128 > shadow removes write access 335 324 329 385 330 336 > shadow writeable: linux high 130 139 152 155 138 149 > shadow writeable: sl1p 14508 15402 12961 11823 16474 11472 > shadow writeable brute-force 205 185 177 230 192 187 > shadow unshadows for fork/exit 9 12 12 12 18 12 > shadow unshadows a page 10 13 13 13 19 13 > shadow walks guest tables 647527 727336 649397 646601 659655 621289 > shadow checks gwalk 526 544 535 550 614 554 > shadow flush tlb by rem wr perm 235 233 229 268 238 237 > shadow emulates invlpg 14688 15499 14604 12630 16627 11370 > shadow OOS fixup adds 14467 15335 13059 11840 16624 11339 > shadow OOS unsyncs 14467 15335 13058 11840 16624 11339 > shadow OOS evictions 566 449 565 369 589 336 > shadow OOS resyncs 14510 15407 12964 11828 16478 11481 > > I don't think the "shadow writable brute-force" is the problem. > get_shadow_status seems to be a more critical candidate. > > > Juergen > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 > Fujitsu Technolgy Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx > Otto-Hahn-Ring 6 Internet: ts.fujitsu.com > D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |