[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Poor HVM performance with 8 vcpus
At 09:16 +0100 on 14 Oct (1255511785), Juergen Gross wrote: > as the performance of BS2000 seems to be hit by OOS optimization, I'm > thinking of making a patch to disable this feature by a domain parameter. > > Is there a way to do this without having to change all places where the > #if statements are placed? > I think there should be some central routines where adding an "if" could > be enough (setting oos_active to 0 seems not to be enough, I fear). > > Do you have any hint? The simplest way is to cause sh_unsync() to immediately return 0. That won't be quite as fast as #defining it all away but will avoid the expensive paths that cause lock contention. You can add your flag to the big if statement that's already there to avoid unsafe cases. Incidentally, although your benchmark does poorly on 8 VCPUs it might be worth trying a less aggressively targeted benchmark -- we found on Windows VMs that more realistic tests (e.g. Sysmark) still showed a slight improvement from the OOS optimization at 8 vcpus. Cheers, Tim. > Juergen Gross wrote: > > Hi, > > > > Gianluca Guida wrote: > >> Hi, > >> > >> On Wed, Oct 7, 2009 at 8:55 AM, Juergen Gross > >> <juergen.gross@xxxxxxxxxxxxxx> wrote: > >>> we've got massive performance problems running a 8 vcpu HVM-guest (BS2000) > >>> under XEN (xen 3.3.1). > >>> > >>> With a specific benchmark producing a rather high load on memory > >>> management > >>> operations (lots of process creation/deletion and memory allocation) the 8 > >>> vcpu performance was worse than the 4 vcpu performance. On other platforms > >>> (/390, MIPS, SPARC) this benchmark scaled rather well with the number of > >>> cpus. > >>> > >>> The result of the usage of the software performance counters of XEN seemed > >>> to point to the shadow lock being the reason. I modified the Hypervisor to > >>> gather some lock statistics (patch will be sent soon) and found that the > >>> shadow lock is really the bottleneck. On average 4 vcpus are waiting to > >>> get > >>> the lock! > >>> > >>> Is this a known issue? > >> Acutally, I think so. The OOS optimization is widely known not to be > >> too scalable at 8vcpus in the current state, since its weak point is > >> the CR3 switching time increasing linearly with the number of cpus. If > >> you have lot of processes switches together with lot of PTE writings > >> (as it seems to be the case for your benchmark) then that's probably > >> the cause. > >> > >> Could you try disabling the OOS optimization from the > >> SHADOW_OPTIMIZATIONS definition? > > > > Great! > > First performance data looks okay! > > We will have to run different benchmarks in different configurations, but I > > think you gave an excellent hint. :-) > > > -- > Juergen Gross Principal Developer Operating Systems > TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 > Fujitsu Technolgy Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx > Otto-Hahn-Ring 6 Internet: ts.fujitsu.com > D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel -- Tim Deegan <Tim.Deegan@xxxxxxxxxx> Principal Software Engineer, Citrix Systems (R&D) Ltd. [Company #02300071, SL9 0DZ, UK.] _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |