Xen project Mailing List

Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 2 Mar 2020 11:52:33 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Delivery-date: Mon, 02 Mar 2020 10:52:52 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.02.2020 18:23, Roger Pau Monné wrote: > On Fri, Feb 28, 2020 at 06:00:44PM +0100, Jan Beulich wrote: >> On 19.02.2020 18:43, Roger Pau Monne wrote: >>> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes. >>> This greatly increases the performance of TLB flushes when running >>> with a high amount of vCPUs as a Xen guest, and is specially important >>> when running in shim mode. >>> >>> The following figures are from a PV guest running `make -j32 xen` in >>> shim mode with 32 vCPUs and HAP. >>> >>> Using x2APIC and ALLBUT shorthand: >>> real 4m35.973s >>> user 4m35.110s >>> sys 36m24.117s >>> >>> Using L0 assisted flush: >>> real 1m2.596s >>> user 4m34.818s >>> sys 5m16.374s >>> >>> The implementation adds a new hook to hypervisor_ops so other >>> enlightenments can also implement such assisted flush just by filling >>> the hook. Note that the Xen implementation completely ignores the >>> dirty CPU mask and the linear address passed in, and always performs a >>> global TLB flush on all vCPUs. >> >> This isn't because of an implementation choice of yours, but because >> of how HVMOP_flush_tlbs works. I think the statement should somehow >> express this. I also think it wants clarifying that using the >> hypercall is indeed faster even in the case of single-page, single- >> CPU flush > > Are you sure about this? I think taking a vmexit is going to be more > costly than executing a local invlpg? The answer to this was already ... >> (which I suspect may not be the case especially as vCPU >> count grows). ... here (or at least it was meant to address questions back like this one). >> The stats above prove a positive overall effect, but >> they don't say whether the effect could be even bigger by being at >> least a little selective. > > I assume that being able to provide a bitmap with the vCPUs on whether > the TLB flush should be performed would give us some more performance, > but I haven't looked into it yet. > >>> @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820) >>> ops.e820_fixup(e820); >>> } >>> >>> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va, >>> + unsigned int order) >>> +{ >>> + if ( ops.flush_tlb ) >>> + return alternative_call(ops.flush_tlb, mask, va, order); >>> + >>> + return -ENOSYS; >>> +} >> >> Please no new -ENOSYS anywhere (except in new ports' top level >> hypercall handlers). > > Ack. Is EOPNOTSUPP OK? Sure. >>> @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void >>> *va, unsigned int flags) >>> if ( (flags & ~FLUSH_ORDER_MASK) && >>> !cpumask_subset(mask, cpumask_of(cpu)) ) >>> { >>> + if ( cpu_has_hypervisor && >>> + !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID | >>> + FLUSH_ORDER_MASK)) && >>> + !hypervisor_flush_tlb(mask, va, (flags - 1) & >>> FLUSH_ORDER_MASK) ) >>> + { >>> + if ( tlb_clk_enabled ) >>> + tlb_clk_enabled = false; >> >> Why does this need doing here? Couldn't Xen guest setup code >> clear the flag? > > Because it's possible that the hypercalls fails, and hence the tlb > clock should be kept enabled. There's no reason to disable it until > Xen knows the assisted flush is indeed available. > > I don't mind moving it to Xen guest setup code, but I'm not sure I see > why it would be any better than doing it here. The only reason I guess > is to avoid checking tlb_clk_enabled on every successful assisted > flush? Well, iirc there had already been a question on why the if() here is needed. I second the reason, but the whole construct looks misplaced, i.e. is prone to cause further questions down the road. I think if it was to stay here, a comment would be needed to address any such possible questions. But I still think it should live in the init code. This isn't a feature that simply lacks a CPUID bit (or alike) and hence needs probing (unless of course we expect people to want to put a modern Xen [shim] on top of a pre-3.<whatever> Xen; and even then you could probe the underlying hypercall once at boot time). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.