[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Poor HVM performance with 8 vcpus
Gianluca Guida wrote: > Ah, those good old OOS talks. I fear I am going to fail on my attempt > to be laconic. :-) > > On Wed, Oct 14, 2009 at 10:35 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> > wrote: >> On 14/10/2009 09:16, "Juergen Gross" <juergen.gross@xxxxxxxxxxxxxx> wrote: >> >>> as the performance of BS2000 seems to be hit by OOS optimization, I'm >>> thinking of making a patch to disable this feature by a domain parameter. >>> >>> Is there a way to do this without having to change all places where the >>> #if statements are placed? >>> I think there should be some central routines where adding an "if" could >>> be enough (setting oos_active to 0 seems not to be enough, I fear). >>> >>> Do you have any hint? >> How about disabling it for domains with more than four VCPUs? Have you >> measured performance with OOS for 1-4 VCPU guests? This is perhaps not >> something that needs to be baked into guest configs. > > In general, shadow code loses performances as the vcpus increase (>=4) > because of the single shadow lock (and getting rid of the shadow lock, > i.e. having per-vcpu shadows wouldn't help, since it would make much > slower the most common operation, that is removing writable access of > guest pages). > But the two algorithms (always in-sync vs. OOS) will show their > performance penalties in two different areas: in a scenario where > guests do lot of PTE writes (read Windows in most of its operations) > the in-sync approach will be more penalizing, because emulation is > slow and needs the shadow lock, while scenarios were guests tend to > have many dirty CR3 switches (that is CR3 switches with freshly > written PTEs, as in the case with Juergen benchmark and the famous > Windows parallel ddk build) will be penalized more by the OOS > algorithm. > > Disabling OOS for domains more than 4 vcpus might be a good idea, but > not necessarily optimal. Furthermore, I always understood that a good > practice for VM performance is to have many small VMs instead of a VM > eating all of the host's CPUs, at least when shadow code is on. With > big VMs, EPT/NPT has always been the best approach, since even with > lot of TLB misses, the system was definitely lock-free in most of the > VM's life. > > Creating a per-domain switch should be a good idea, but a more generic > (and correct) approach would be to have a dynamic policy for OOSing > pages, in which we would stop putting OOS pages when we realize that > we are resynch'ing too many pages in CR3 switches. This was taken in > consideration during the development of the OOS, but it was finally > discarded because performance were decent and big VMs were not in the > interest range. > > Yes, definitely away from spartan wit. But I hope this clarifies the issue. I really does. I think I'll start with a per-domain switch and leave the generic approach to the specialists. ;-) If, however, Keir rejects such a switch, I could try the generic solution, but I think this solution would need very much work to find the correct parameters. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |