[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 01/12] x86: introduce ioremap_wc()
Hi Jan, On 27/05/2021 13:30, Jan Beulich wrote: In order for a to-be-introduced ERMS form of memcpy() to not regress boot performance on certain systems when video output is active, we first need to arrange for avoiding further dependency on firmware setting up MTRRs in a way we can actually further modify. On many systems, due to the continuously growing amounts of installed memory, MTRRs get configured with at least one huge WB range, and with MMIO ranges below 4Gb then forced to UC via overlapping MTRRs. mtrr_add(), as it is today, can't deal with such a setup. Hence on such systems we presently leave the frame buffer mapped UC, leading to significantly reduced performance when using REP STOSB / REP MOVSB. On post-PentiumII hardware (i.e. any that's capable of running 64-bit code), an effective memory type of WC can be achieved without MTRRs, by simply referencing the respective PAT entry from the PTEs. While this will leave the switch to ERMS forms of memset() and memcpy() with largely unchanged performance, the change here on its own improves performance on affected systems quite significantly: Measuring just the individual affected memcpy() invocations yielded a speedup by a factor of over 250 on my initial (Skylake) test system. memset() isn't getting improved by as much there, but still by a factor of about 20. While adding {__,}PAGE_HYPERVISOR_WC, also add {__,}PAGE_HYPERVISOR_WT to, at the very least, make clear what PTE flags this memory type uses. Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> --- v2: Mark ioremap_wc() __init. --- TBD: If the VGA range is WC in the fixed range MTRRs, reusing the low 1st Mb mapping (like ioremap() does) would be an option. --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -5881,6 +5881,20 @@ void __iomem *ioremap(paddr_t pa, size_t return (void __force __iomem *)va; }+void __iomem *__init ioremap_wc(paddr_t pa, size_t len)+{ + mfn_t mfn = _mfn(PFN_DOWN(pa)); + unsigned int offs = pa & (PAGE_SIZE - 1); + unsigned int nr = PFN_UP(offs + len); + void *va; + + WARN_ON(page_is_ram_type(mfn_x(mfn), RAM_TYPE_CONVENTIONAL)); + + va = __vmap(&mfn, nr, 1, 1, PAGE_HYPERVISOR_WC, VMAP_DEFAULT); + + return (void __force __iomem *)(va + offs); +} Arm is already providing ioremap_wc() which is a wrapper to ioremap_attr(). Can this be moved to the common code to avoid duplication? Cheers, -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |