Xen project Mailing List

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 22 Jun 2020 11:31:23 +0200

Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, paul@xxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Mon, 22 Jun 2020 09:31:42 +0000

Ironport-sdr: 0X73z1w2Bk0Skytn43gA6dE5BYrGftvErUanFqlLzE7YG/FZFVp0XNcBlxOSr/pr/6L1ylYy9O CBO0Idq3+FECjmKpMelTaJA9SBLgqMJ0o2eiHetZ4n0cTsxiEAVn4nSKs/TW5X4jh3A0jdNw8H HUz9J5vTi/PjK/05Y66OQIZFT5Zy+fHG4v0J/R6BtoP8EwhBsNszzxnORIdhynC1qJLRn/DktF c3l8HpZ7XWBGyQdMwRlHMmLgMWt1Os/qM4rxI+gIv5d6/fMOG1qlDh0zSMwauU9j7KXVHUxpNO 4sw=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote: > On 18.06.2020 18:04, Roger Pau Monne wrote: > > Commit e9aca9470ed86 introduced a regression when avoiding sending > > IPIs for certain flush operations. Xen page fault handler > > (spurious_page_fault) relies on blocking interrupts in order to > > prevent handling TLB flush IPIs and thus preventing other CPUs from > > removing page tables pages. Switching to assisted flushing avoided such > > IPIs, and thus can result in pages belonging to the page tables being > > removed (and possibly re-used) while __page_fault_type is being > > executed. > > > > Force some of the TLB flushes to use IPIs, thus avoiding the assisted > > TLB flush. Those selected flushes are the page type change (when > > switching from a page table type to a different one, ie: a page that > > has been removed as a page table) and page allocation. This sadly has > > a negative performance impact on the pvshim, as less assisted flushes > > can be used. > > > > Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush > > using an IPI (flush_tlb_mask_sync). Note that the flag is only > > meaningfully defined when the hypervisor supports PV mode, as > > otherwise translated domains are in charge of their page tables and > > won't share page tables with Xen, thus not influencing the result of > > page walks performed by the spurious fault handler. > > Is this true for shadow mode? If a page shadowing a guest one was > given back quickly enough to the allocator and then re-used, I think > the same situation could in principle arise. Hm, I think it's not applicable to HVM shadow mode at least, because CR3 is switched as part of vmentry/vmexit, and the page tables are not shared between Xen and the guest, so there's no way for a HVM shadow guest to modify the page-tables while Xen is walking them in spurious_page_fault (note spurious_page_fault is only called when the fault happens in non-guest context). > > Note the flag is not defined on Arm, and the introduced helper is just > > a dummy alias to the existing flush_tlb_mask. > > > > Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when > > available') > > Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > It's my understanding that not doing such IPI flushes could lead to > > the pages tables being read by __page_fault_type being modified by a > > third party, which could make them point to other mfns out of the > > scope of the guest allowed physical memory addresses. However those > > accesses would be limited to __page_fault_type, and hence the main > > worry would be that a guest could make it point to read from a > > physical memory region that has side effects? > > I don't think so, no - the memory could be changed such that the > PTEs are invalid altogether (like having reserved bits set). Consider > for example the case of reading an MFN out of such a PTE that's larger > than the physical address width supported by the CPU. Afaict > map_domain_page() will happily install a respective page table entry, > but we'd get a reserved-bit #PF upon reading from that mapping. So there are no hazards from executing __page_fault_type against a page-table that could be modified by a user? > > --- > > xen/arch/x86/mm.c | 12 +++++++++++- > > xen/common/memory.c | 2 +- > > xen/common/page_alloc.c | 2 +- > > xen/include/asm-arm/flushtlb.h | 1 + > > xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++ > > xen/include/xen/mm.h | 8 ++++++-- > > 6 files changed, 34 insertions(+), 5 deletions(-) > > Not finding a consumer of the new flag, my first reaction was to > ask whether there's code missing somewhere. Having looked at > flush_area_mask() another time I now understand the itended > behavior results because of the extra flag now allowing > hypervisor_flush_tlb() to be entered. I think that's something > that's worth calling out in the description, or perhaps even in > the comment next to the #define. Oh right, the condition to use assisted flush is not actually changed in flush_area_mask since setting any bit in the flags would prevent using it. > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, > > unsigned long type, > > ((nx & PGT_type_mask) == PGT_writable_page)) ) > > { > > perfc_incr(need_flush_tlb_flush); > > - flush_tlb_mask(mask); > > + if ( (x & PGT_type_mask) && > > + (x & PGT_type_mask) <= PGT_l4_page_table ) > > With there being 5-level page tables around the corner, I think > we ought to get used to use PGT_root_page_table (or alike) > whenever possible, to avoid having to touch such code when > adding support for the new paging mode. > > > --- a/xen/include/asm-x86/flushtlb.h > > +++ b/xen/include/asm-x86/flushtlb.h > > @@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long > > cr4); > > #else > > #define FLUSH_HVM_ASID_CORE 0 > > #endif > > +#if CONFIG_PV > > #ifdef > > > +/* Force an IPI to be sent */ > > +# define FLUSH_FORCE_IPI 0x8000 > > +#else > > +# define FLUSH_FORCE_IPI 0 > > +#endif > > If my shadow mode concern above is unwarranted, this overhead could > also be avoided if there's no PV domain at all in the system. > Perhaps an improvement not for now, but for the future ... Hm, right, I guess it would be possible to turn FLUSH_FORCE_IPI into a dynamic flag. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.