Xen project Mailing List

Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Mon, 22 Jun 2020 15:24:10 +0200

Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, paul@xxxxxxx, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>

Delivery-date: Mon, 22 Jun 2020 13:24:24 +0000

Ironport-sdr: 72JLREy5tlnWQy2N//vIJ+8DWcwz/J3H3u0UlwN6DNyt08rAowWiXLXf8VohKcriNiYVrPQqt0 0qRpFHSLhgsOUL6YRvx9VQMKbvjskcn8iSlaYqzmw0YhL5qxa/+EDvHUOrsUNS3+WyZnAozU0d mw701/nAkFWDcTDw3WiV114Qnv9dj7fTOM2dCtdbRCS7ZJhcd1laVFrF0lvr1bUyX/Q3rW4Ipg t9gs7Qvf0O7oGnlFwPjX/d0TElTwOKqAl7djGt1kt1ZruKiukcrYuDWz/R8X9zZtFqJ0TNyGaB NL0=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon, Jun 22, 2020 at 01:07:10PM +0200, Jan Beulich wrote: > On 22.06.2020 11:31, Roger Pau Monné wrote: > > On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote: > >> On 18.06.2020 18:04, Roger Pau Monne wrote: > >>> Commit e9aca9470ed86 introduced a regression when avoiding sending > >>> IPIs for certain flush operations. Xen page fault handler > >>> (spurious_page_fault) relies on blocking interrupts in order to > >>> prevent handling TLB flush IPIs and thus preventing other CPUs from > >>> removing page tables pages. Switching to assisted flushing avoided such > >>> IPIs, and thus can result in pages belonging to the page tables being > >>> removed (and possibly re-used) while __page_fault_type is being > >>> executed. > >>> > >>> Force some of the TLB flushes to use IPIs, thus avoiding the assisted > >>> TLB flush. Those selected flushes are the page type change (when > >>> switching from a page table type to a different one, ie: a page that > >>> has been removed as a page table) and page allocation. This sadly has > >>> a negative performance impact on the pvshim, as less assisted flushes > >>> can be used. > >>> > >>> Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush > >>> using an IPI (flush_tlb_mask_sync). Note that the flag is only > >>> meaningfully defined when the hypervisor supports PV mode, as > >>> otherwise translated domains are in charge of their page tables and > >>> won't share page tables with Xen, thus not influencing the result of > >>> page walks performed by the spurious fault handler. > >> > >> Is this true for shadow mode? If a page shadowing a guest one was > >> given back quickly enough to the allocator and then re-used, I think > >> the same situation could in principle arise. > > > > Hm, I think it's not applicable to HVM shadow mode at least, because > > CR3 is switched as part of vmentry/vmexit, and the page tables are not > > shared between Xen and the guest, so there's no way for a HVM shadow > > guest to modify the page-tables while Xen is walking them in > > spurious_page_fault (note spurious_page_fault is only called when the > > fault happens in non-guest context). > > I'm afraid I disagree, because of shadow's use of "linear page tables". You will have to bear with me, but I don't follow. Could you provide some pointers at how/where the shadow (I assume guest controlled) "linear page tables" are used while in Xen context? do_page_fault will only call spurious_page_fault (and thus attempt a page walk) when the fault happens in non-guest context, yet on HVM all accesses to guest memory are performed using __hvm_copy which doesn't load any guest page tables into CR3, but instead performs a software walk of the paging structures in order to find and map the destination address into the hypervisor page tables and perform the read or copy. > >>> Note the flag is not defined on Arm, and the introduced helper is just > >>> a dummy alias to the existing flush_tlb_mask. > >>> > >>> Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when > >>> available') > >>> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > >>> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > >>> --- > >>> It's my understanding that not doing such IPI flushes could lead to > >>> the pages tables being read by __page_fault_type being modified by a > >>> third party, which could make them point to other mfns out of the > >>> scope of the guest allowed physical memory addresses. However those > >>> accesses would be limited to __page_fault_type, and hence the main > >>> worry would be that a guest could make it point to read from a > >>> physical memory region that has side effects? > >> > >> I don't think so, no - the memory could be changed such that the > >> PTEs are invalid altogether (like having reserved bits set). Consider > >> for example the case of reading an MFN out of such a PTE that's larger > >> than the physical address width supported by the CPU. Afaict > >> map_domain_page() will happily install a respective page table entry, > >> but we'd get a reserved-bit #PF upon reading from that mapping. > > > > So there are no hazards from executing __page_fault_type against a > > page-table that could be modified by a user? > > There are - I realize only now that the way I started my earlier > reply was ambiguous. I meant to negate your "only with side effects" > way of thinking. Ack. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.