Xen project Mailing List

Re: [Xen-devel] [for-4.9] Re: HVM guest performance regression

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Thu, 8 Jun 2017 11:09:12 -0700 (PDT)

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>

Delivery-date: Thu, 08 Jun 2017 18:09:26 +0000

Dmarc-filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1A3323698

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, 8 Jun 2017, Juergen Gross wrote: > On 07/06/17 20:19, Stefano Stabellini wrote: > > On Wed, 7 Jun 2017, Juergen Gross wrote: > >> On 06/06/17 21:08, Stefano Stabellini wrote: > >>> On Tue, 6 Jun 2017, Juergen Gross wrote: > >>>> On 06/06/17 18:39, Stefano Stabellini wrote: > >>>>> On Tue, 6 Jun 2017, Juergen Gross wrote: > >>>>>> On 26/05/17 21:01, Stefano Stabellini wrote: > >>>>>>> On Fri, 26 May 2017, Juergen Gross wrote: > >>>>>>>> On 26/05/17 18:19, Ian Jackson wrote: > >>>>>>>>> Juergen Gross writes ("HVM guest performance regression"): > >>>>>>>>>> Looking for the reason of a performance regression of HVM guests > >>>>>>>>>> under > >>>>>>>>>> Xen 4.7 against 4.5 I found the reason to be commit > >>>>>>>>>> c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove > >>>>>>>>>> freemem_slack") > >>>>>>>>>> in Xen 4.6. > >>>>>>>>>> > >>>>>>>>>> The problem occurred when dom0 had to be ballooned down when > >>>>>>>>>> starting > >>>>>>>>>> the guest. The performance of some micro benchmarks dropped by > >>>>>>>>>> about > >>>>>>>>>> a factor of 2 with above commit. > >>>>>>>>>> > >>>>>>>>>> Interesting point is that the performance of the guest will depend > >>>>>>>>>> on > >>>>>>>>>> the amount of free memory being available at guest creation time. > >>>>>>>>>> When there was barely enough memory available for starting the > >>>>>>>>>> guest > >>>>>>>>>> the performance will remain low even if memory is being freed > >>>>>>>>>> later. > >>>>>>>>>> > >>>>>>>>>> I'd like to suggest we either revert the commit or have some other > >>>>>>>>>> mechanism to try to have some reserve free memory when starting a > >>>>>>>>>> domain. > >>>>>>>>> > >>>>>>>>> Oh, dear. The memory accounting swamp again. Clearly we are not > >>>>>>>>> going to drain that swamp now, but I don't like regressions. > >>>>>>>>> > >>>>>>>>> I am not opposed to reverting that commit. I was a bit iffy about > >>>>>>>>> it > >>>>>>>>> at the time; and according to the removal commit message, it was > >>>>>>>>> basically removed because it was a piece of cargo cult for which we > >>>>>>>>> had no justification in any of our records. > >>>>>>>>> > >>>>>>>>> Indeed I think fixing this is a candidate for 4.9. > >>>>>>>>> > >>>>>>>>> Do you know the mechanism by which the freemem slack helps ? I > >>>>>>>>> think > >>>>>>>>> that would be a prerequisite for reverting this. That way we can > >>>>>>>>> have > >>>>>>>>> an understanding of why we are doing things, rather than just > >>>>>>>>> flailing at random... > >>>>>>>> > >>>>>>>> I wish I would understand it. > >>>>>>>> > >>>>>>>> One candidate would be 2M/1G pages being possible with enough free > >>>>>>>> memory, but I haven't proofed this yet. I can have a try by disabling > >>>>>>>> big pages in the hypervisor. > >>>>>>> > >>>>>>> Right, if I had to bet, I would put my money on superpages shattering > >>>>>>> being the cause of the problem. > >>>>>> > >>>>>> Seems you would have lost your money... > >>>>>> > >>>>>> Meanwhile I've found a way to get the "good" performance in the micro > >>>>>> benchmark. Unfortunately this requires to switch off the pv interfaces > >>>>>> in the HVM guest via "xen_nopv" kernel boot parameter. > >>>>>> > >>>>>> I have verified that pv spinlocks are not to blame (via "xen_nopvspin" > >>>>>> kernel boot parameter). Switching to clocksource TSC in the running > >>>>>> system doesn't help either. > >>>>> > >>>>> What about xen_hvm_exit_mmap (an optimization for shadow pagetables) and > >>>>> xen_hvm_smp_init (PV IPI)? > >>>> > >>>> xen_hvm_exit_mmap isn't active (kernel message telling me so was > >>>> issued). > >>>> > >>>>>> Unfortunately the kernel seems no longer to be functional when I try to > >>>>>> tweak it not to use the PVHVM enhancements. > >>>>> > >>>>> I guess you are not talking about regular PV drivers like netfront and > >>>>> blkfront, right? > >>>> > >>>> The plan was to be able to use PV drivers without having to use PV > >>>> callbacks and PV timers. This isn't possible right now. > >>> > >>> I think the code to handle that scenario was gradually removed over time > >>> to simplify the code base. > >> > >> Hmm, too bad. > >> > >>>>>> I'm wondering now whether > >>>>>> there have ever been any benchmarks to proof PVHVM really being faster > >>>>>> than non-PVHVM? My findings seem to suggest there might be a huge > >>>>>> performance gap with PVHVM. OTOH this might depend on hardware and > >>>>>> other > >>>>>> factors. > >>>>>> > >>>>>> Stefano, didn't you do the PVHVM stuff back in 2010? Do you have any > >>>>>> data from then regarding performance figures? > >>>>> > >>>>> Yes, I still have these slides: > >>>>> > >>>>> https://www.slideshare.net/xen_com_mgr/linux-pv-on-hvm > >>>> > >>>> Thanks. So you measured the overall package, not the single items like > >>>> callbacks, timers, time source? I'm asking because I start to believe > >>>> there are some of those slower than their non-PV variants. > >>> > >>> There isn't much left in terms of individual optimizations: you already > >>> tried switching clocksource and removing pv spinlocks. xen_hvm_exit_mmap > >>> is not used. Only the following are left (you might want to double check > >>> I haven't missed anything): > >>> > >>> 1) PV IPI > >> > >> Its a 1 vcpu guest. > >> > >>> 2) PV suspend/resume > >>> 3) vector callback > >>> 4) interrupt remapping > >>> > >>> 2) is not on the hot path. > >>> I did individual measurements of 3) at some points and it was a clear win. > >> > >> That might depend on the hardware. Could it be newer processors are > >> faster here? > > > > I don't think so: the alternative it's an emulated interrupt. It's > > slower under all points of view. > > What about APIC virtualization of modern processors? Are you sure e.g. > timer interrupts aren't handled completely by the processor? I guess > this might be faster than letting it be handled by the hypervisor and > then use the callback into the guest. > > > I would try to run the test with xen_emul_unplug="never" which means > > that you are going to end up using the emulated network card and > > emulated IDE controller, but some of the other optimizations (like the > > vector callback) will still be active. > > Now this is something I wouldn't like to do. My test isn't using any > I/O at all and is showing bad performance with pv interfaces being used. > The only remedy right now seems to be to switch off pv interfaces > leading to a bad I/O performance, but a good non-I/O performance. > > You are suggesting a mode with bad I/O performance _and_ bad non-I/O > performance. I was only suggesting this for debugging, to better understand the problem, not as a solution. > > If the cause of the problem is ballooning for example, using emulated > > interfaces for IO will reduce the amount of ballooned out pages > > significantly. > > No I/O involved in my benchmark. I admit that if your test doesn't do any I/O, it is not likely that xen_emul_unplug="never" will help us understand the problem. Nonetheless, I believe that a simple blkfront/blkback or netfront/netback connection, even without any I/O being done, leads to a couple of calls into the ballooning code (xenbus_map_ring_valloc_hvm -> alloc_xenballooned_pages). _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.