[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.14] x86/tlb: fix assisted flush usage
On Fri, Jun 19, 2020 at 04:06:55PM +0200, Jan Beulich wrote: > On 18.06.2020 18:04, Roger Pau Monne wrote: > > Commit e9aca9470ed86 introduced a regression when avoiding sending > > IPIs for certain flush operations. Xen page fault handler > > (spurious_page_fault) relies on blocking interrupts in order to > > prevent handling TLB flush IPIs and thus preventing other CPUs from > > removing page tables pages. Switching to assisted flushing avoided such > > IPIs, and thus can result in pages belonging to the page tables being > > removed (and possibly re-used) while __page_fault_type is being > > executed. > > > > Force some of the TLB flushes to use IPIs, thus avoiding the assisted > > TLB flush. Those selected flushes are the page type change (when > > switching from a page table type to a different one, ie: a page that > > has been removed as a page table) and page allocation. This sadly has > > a negative performance impact on the pvshim, as less assisted flushes > > can be used. > > > > Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush > > using an IPI (flush_tlb_mask_sync). Note that the flag is only > > meaningfully defined when the hypervisor supports PV mode, as > > otherwise translated domains are in charge of their page tables and > > won't share page tables with Xen, thus not influencing the result of > > page walks performed by the spurious fault handler. > > Is this true for shadow mode? If a page shadowing a guest one was > given back quickly enough to the allocator and then re-used, I think > the same situation could in principle arise. Hm, I think it's not applicable to HVM shadow mode at least, because CR3 is switched as part of vmentry/vmexit, and the page tables are not shared between Xen and the guest, so there's no way for a HVM shadow guest to modify the page-tables while Xen is walking them in spurious_page_fault (note spurious_page_fault is only called when the fault happens in non-guest context). > > Note the flag is not defined on Arm, and the introduced helper is just > > a dummy alias to the existing flush_tlb_mask. > > > > Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when > > available') > > Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > --- > > It's my understanding that not doing such IPI flushes could lead to > > the pages tables being read by __page_fault_type being modified by a > > third party, which could make them point to other mfns out of the > > scope of the guest allowed physical memory addresses. However those > > accesses would be limited to __page_fault_type, and hence the main > > worry would be that a guest could make it point to read from a > > physical memory region that has side effects? > > I don't think so, no - the memory could be changed such that the > PTEs are invalid altogether (like having reserved bits set). Consider > for example the case of reading an MFN out of such a PTE that's larger > than the physical address width supported by the CPU. Afaict > map_domain_page() will happily install a respective page table entry, > but we'd get a reserved-bit #PF upon reading from that mapping. So there are no hazards from executing __page_fault_type against a page-table that could be modified by a user? > > --- > > xen/arch/x86/mm.c | 12 +++++++++++- > > xen/common/memory.c | 2 +- > > xen/common/page_alloc.c | 2 +- > > xen/include/asm-arm/flushtlb.h | 1 + > > xen/include/asm-x86/flushtlb.h | 14 ++++++++++++++ > > xen/include/xen/mm.h | 8 ++++++-- > > 6 files changed, 34 insertions(+), 5 deletions(-) > > Not finding a consumer of the new flag, my first reaction was to > ask whether there's code missing somewhere. Having looked at > flush_area_mask() another time I now understand the itended > behavior results because of the extra flag now allowing > hypervisor_flush_tlb() to be entered. I think that's something > that's worth calling out in the description, or perhaps even in > the comment next to the #define. Oh right, the condition to use assisted flush is not actually changed in flush_area_mask since setting any bit in the flags would prevent using it. > > --- a/xen/arch/x86/mm.c > > +++ b/xen/arch/x86/mm.c > > @@ -2894,7 +2894,17 @@ static int _get_page_type(struct page_info *page, > > unsigned long type, > > ((nx & PGT_type_mask) == PGT_writable_page)) ) > > { > > perfc_incr(need_flush_tlb_flush); > > - flush_tlb_mask(mask); > > + if ( (x & PGT_type_mask) && > > + (x & PGT_type_mask) <= PGT_l4_page_table ) > > With there being 5-level page tables around the corner, I think > we ought to get used to use PGT_root_page_table (or alike) > whenever possible, to avoid having to touch such code when > adding support for the new paging mode. > > > --- a/xen/include/asm-x86/flushtlb.h > > +++ b/xen/include/asm-x86/flushtlb.h > > @@ -126,6 +126,12 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long > > cr4); > > #else > > #define FLUSH_HVM_ASID_CORE 0 > > #endif > > +#if CONFIG_PV > > #ifdef > > > +/* Force an IPI to be sent */ > > +# define FLUSH_FORCE_IPI 0x8000 > > +#else > > +# define FLUSH_FORCE_IPI 0 > > +#endif > > If my shadow mode concern above is unwarranted, this overhead could > also be avoided if there's no PV domain at all in the system. > Perhaps an improvement not for now, but for the future ... Hm, right, I guess it would be possible to turn FLUSH_FORCE_IPI into a dynamic flag. Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |