Xen project Mailing List

Re: [Xen-devel] [PATCH v2] x86/mm: also flush TLB when putting writable foreign page reference

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 3 May 2017 10:55:48 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Tim Deegan <tim@xxxxxxx>, Jann Horn <jannh@xxxxxxxxxx>

Delivery-date: Wed, 03 May 2017 09:55:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/05/17 08:21, Jan Beulich wrote: >>>> On 02.05.17 at 19:37, <andrew.cooper3@xxxxxxxxxx> wrote: >> On 02/05/17 10:43, Tim Deegan wrote: >>> At 02:50 -0600 on 02 May (1493693403), Jan Beulich wrote: >>>>>>> On 02.05.17 at 10:32, <tim@xxxxxxx> wrote: >>>>> At 04:52 -0600 on 28 Apr (1493355160), Jan Beulich wrote: >>>>>>>>> On 27.04.17 at 11:51, <tim@xxxxxxx> wrote: >>>>>>> At 03:23 -0600 on 27 Apr (1493263380), Jan Beulich wrote: >>>>>>>> ... it wouldn't better be the other way around: We use the patch >>>>>>>> in its current (or even v1) form, and try to do something about >>>>>>>> performance only if we really find a case where it matters. To be >>>>>>>> honest, I'm not even sure how I could meaningfully measure the >>>>>>>> impact here: Simply counting how many extra flushes there would >>>>>>>> end up being wouldn't seem all that useful, and whether there >>>>>>>> would be any measurable difference in the overall execution time >>>>>>>> of e.g. domain creation I would highly doubt (but if it's that what >>>>>>>> you're after, I could certainly collect a few numbers). >>>>>>> I think that would be a good idea, just as a sanity-check. >>>>>> As it turns out there is a measurable effect: xc_dom_boot_image() >>>>>> for a 4Gb PV guest takes about 70% longer now. Otoh it is itself >>>>>> responsible for less than 10% of the overall time libxl__build_dom() >>>>>> takes, and that in turn is only a pretty small portion of the overall >>>>>> "xl create". >>>>> Do you think that slowdown is OK? I'm not sure -- I'd be inclined to >>>>> avoid it, but could be persuaded, and it's not me doing the work. :) >>>> Well, if there was a way to avoid it in a clean way without too much >>>> code churn, I'd be all for avoiding it. The avenues we've explored so >>>> far either didn't work (using pg_owner's dirty mask) or didn't promise >>>> to actually reduce the flush overhead in a meaningful way (adding a >>>> separate mask to be merged into the mask used for the flush in >>>> __get_page_type()), unless - as has been the case before - I didn't >>>> fully understand your thoughts there. >>> Quoting your earlier response: >>> >>>> Wouldn't it suffice to set bits in this mask in put_page_from_l1e() >>>> and consume/clear them in __get_page_type()? Right now I can't >>>> see it being necessary for correctness to fiddle with any of the >>>> other flushes using the domain dirty mask. >>>> >>>> But then again this may not be much of a win, unless the put >>>> operations come through in meaningful batches, not interleaved >>>> by any type changes (the latter ought to be guaranteed during >>>> domain construction and teardown at least, as the guest itself >>>> can't do anything at that time to effect type changes). >>> I'm not sure how much batching there needs to be. I agree that the >>> domain creation case should work well though. Let me think about the >>> scenarios when dom B is live: >>> >>> 1. Dom A drops its foreign map of page X; dom B immediately changes the >>> type of page X. This case isn't helped at all, but I don't see any >>> way to improve it -- dom A's TLBs need to be flushed right away. >>> >>> 2. Dom A drops its foreign map of page X; dom B immediately changes >>> the type of page Y. Now dom A's dirty CPUs are in the new map, but B >>> may not need to flush them right away. B can filter by page Y's >>> timestamp, and flush (and clear) only some of the cpus in the map. >>> >>> So that seems good, but then there's a risk that cpus never get >>> cleared from the map, and __get_page_type() ends up doing a lot of >>> unnecessary work filtering timestaps. When is it safe to remove a CPU >>> from that map? >>> - obvs safe if we IPI it to flush the TLB (though may need memory >>> barriers -- need to think about a race with CPU C putting A _into_ >>> the map at the same time...) >>> - we could track the timestamp of the most recent addition to the >>> map, and drop any CPU whose TLB has been flushed since that, >>> but that still lets unrelated unmaps keep CPUs alive in the map... >>> - we could double-buffer the map: always add CPUs to the active map; >>> from time to time, swap maps and flush everything in the non-active >>> map (filtered by the TLB timestamp when we last swapped over). >>> >>> Bah, this is turning into a tar pit. Let's stick to the v2 patch as >>> being (relatively) simple and correct, and revisit this if it causes >>> trouble. :) >> :( >> >> A 70% performance hit for guest creation is certainly going to cause >> problems, but we obviously need to prioritise correctness in this case. > Hmm, you did understand that the 70% hit is on a specific sub-part > of the overall process, not guest creation as a whole? Anyway, > your reply is neither an ack nor a nak nor an indication of what needs > to change ... Yes - I realise it isn't all of domain creation, but this performance hit will also hit migration, qemu DMA mappings, etc. XenServer has started a side-by-side performance work-up of this change, as presented at the root of this thread. We should hopefully have some number in the next day or two. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.