[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Poor HVM performance with 8 vcpus
Ah, those good old OOS talks. I fear I am going to fail on my attempt to be laconic. On Wed, Oct 14, 2009 at 10:35 AM, Keir Fraser <keir.fraser@xxxxxxxxxxxxx> wrote: > On 14/10/2009 09:16, "Juergen Gross" <juergen.gross@xxxxxxxxxxxxxx> wrote: > >> as the performance of BS2000 seems to be hit by OOS optimization, I'm >> thinking of making a patch to disable this feature by a domain parameter. >> >> Is there a way to do this without having to change all places where the >> #if statements are placed? >> I think there should be some central routines where adding an "if" could >> be enough (setting oos_active to 0 seems not to be enough, I fear). >> >> Do you have any hint? > > How about disabling it for domains with more than four VCPUs? Have you > measured performance with OOS for 1-4 VCPU guests? This is perhaps not > something that needs to be baked into guest configs. In general, shadow code loses performances as the vcpus increase (>=4) because of the single shadow lock (and getting rid of the shadow lock, i.e. having per-vcpu shadows wouldn't help, since it would make much slower the most common operation, that is removing writable access of guest pages). But the two algorithms (always in-sync vs. OOS) will show their performance penalties in two different areas: in a scenario where guests do lot of PTE writes (read Windows in most of its operations) the in-sync approach will be more penalizing, because emulation is slow and needs the shadow lock, while scenarios were guests tend to have many dirty CR3 switches (that is CR3 switches with freshly written PTEs, as in the case with Juergen benchmark and the famous Windows parallel ddk build) will be penalized more by the OOS algorithm. Disabling OOS for domains more than 4 vcpus might be a good idea, but not necessarily optimal. Furthermore, I always understood that a good practice for VM performance is to have many small VMs instead of a VM eating all of the host's CPUs, at least when shadow code is on. With big VMs, EPT/NPT has always been the best approach, since even with lot of TLB misses, the system was definitely lock-free in most of the VM's life. Creating a per-domain switch should be a good idea, but a more generic (and correct) approach would be to have a dynamic policy for OOSing pages, in which we would stop putting OOS pages when we realize that we are resynch'ing too many pages in CR3 switches. This was taken in consideration during the development of the OOS, but it was finally discarded because performance were decent and big VMs were not in the interest range. Yes, definitely away from spartan wit. But I hope this clarifies the issue. Thanks, Gianluca -- It was a type of people I did not know, I found them very strange and they did not inspire confidence at all. Later I learned that I had been introduced to electronic engineers. E. W. Dijkstra _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |