[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/9] x86: limit the amount of TLB flushing in switch_cr3_cr4()
On Thu, Sep 12, 2019 at 12:11:55PM +0200, Jan Beulich wrote: > On 12.09.2019 11:54, Roger Pau Monné wrote: > > On Wed, Sep 11, 2019 at 05:22:17PM +0200, Jan Beulich wrote: > >> We really need to flush the TLB just once, if we do so with or after the > >> CR3 write. The only case where two flushes are unavoidable is when we > >> mean to turn off CR4.PGE (perhaps just temporarily; see the code > >> comment). > >> > >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > > > > Thanks, this seems to make the logic of the function easier, but I'm > > slightly worried about the performance impact given that a full flush > > of all PCID contexts is done instead of the previous selective flush. > > I think you've misunderstood: > > >> --- a/xen/arch/x86/flushtlb.c > >> +++ b/xen/arch/x86/flushtlb.c > >> @@ -104,82 +104,65 @@ static void do_tlb_flush(void) > >> void switch_cr3_cr4(unsigned long cr3, unsigned long cr4) > >> { > >> unsigned long flags, old_cr4; > >> - unsigned int old_pcid; > >> u32 t; > >> > >> + /* Throughout this function we make this assumption: */ > >> + ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE)); > >> + > >> /* This non-reentrant function is sometimes called in interrupt > >> context. */ > >> local_irq_save(flags); > >> > >> t = pre_flush(); > >> > >> old_cr4 = read_cr4(); > >> - if ( old_cr4 & X86_CR4_PGE ) > >> + ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE)); > >> + > >> + /* > >> + * We need to write CR4 before CR3 if we're about to enable PCIDE, at > >> the > >> + * very least when the new PCID is non-zero. > >> + * > >> + * As we also need to do two CR4 writes in total when PGE is enabled > >> and > >> + * is to remain enabled, do the one temporarily turning off the bit > >> right > >> + * here as well. > >> + * > >> + * The only TLB flushing effect we depend on here is in case we move > >> from > >> + * PGE set to PCIDE set, where we want global page entries gone (and > >> none > >> + * to re-appear) after this write. > >> + */ > >> + if ( !(old_cr4 & X86_CR4_PCIDE) && > >> + ((cr4 & X86_CR4_PCIDE) || (cr4 & old_cr4 & X86_CR4_PGE)) ) > >> { > >> - /* > >> - * X86_CR4_PGE set means PCID is inactive. > >> - * We have to purge the TLB via flipping cr4.pge. > >> - */ > >> old_cr4 = cr4 & ~X86_CR4_PGE; > >> write_cr4(old_cr4); > >> } > >> - else if ( use_invpcid ) > >> - { > >> - /* > >> - * Flushing the TLB via INVPCID is necessary only in case PCIDs > >> are > >> - * in use, which is true only with INVPCID being available. > >> - * Without PCID usage the following write_cr3() will purge the TLB > >> - * (we are in the cr4.pge off path) of all entries. > >> - * Using invpcid_flush_all_nonglobals() seems to be faster than > >> - * invpcid_flush_all(), so use that. > >> - */ > >> - invpcid_flush_all_nonglobals(); > > This simply gets moved, while ... > > >> - /* > >> - * CR4.PCIDE needs to be set before the CR3 write below. Otherwise > >> - * - the CR3 write will fault when CR3.NOFLUSH is set (which is > >> the > >> - * case normally), > >> - * - the subsequent CR4 write will fault if CR3.PCID != 0. > >> - */ > >> - if ( (old_cr4 & X86_CR4_PCIDE) < (cr4 & X86_CR4_PCIDE) ) > >> - { > >> - write_cr4(cr4); > >> - old_cr4 = cr4; > >> - } > >> - } > >> > >> /* > >> - * If we don't change PCIDs, the CR3 write below needs to flush this > >> very > >> - * PCID, even when a full flush was performed above, as we are > >> currently > >> - * accumulating TLB entries again from the old address space. > >> - * NB: Clearing the bit when we don't use PCID is benign (as it is > >> clear > >> - * already in that case), but allows the if() to be more simple. > >> + * If the CR4 write is to turn off PCIDE, we don't need the CR3 write > >> to > >> + * flush anything, as that transition is a full flush itself. > >> */ > >> - old_pcid = cr3_pcid(read_cr3()); > >> - if ( old_pcid == cr3_pcid(cr3) ) > >> - cr3 &= ~X86_CR3_NOFLUSH; > >> - > >> + if ( (old_cr4 & X86_CR4_PCIDE) > (cr4 & X86_CR4_PCIDE) ) > >> + cr3 |= X86_CR3_NOFLUSH; > >> write_cr3(cr3); > >> > >> if ( old_cr4 != cr4 ) > >> write_cr4(cr4); > >> > >> /* > >> - * Make sure no TLB entries related to the old PCID created between > >> - * flushing the TLB and writing the new %cr3 value remain in the TLB. > >> - * > >> - * The write to CR4 just above has performed a wider flush in certain > >> - * cases, which therefore get excluded here. Since that write is > >> - * conditional, note in particular that it won't be skipped if PCIDE > >> - * transitions from 1 to 0. This is because the CR4 write further up > >> will > >> - * have been skipped in this case, as PCIDE and PGE won't both be set > >> at > >> - * the same time. > >> - * > >> - * Note also that PGE is always clear in old_cr4. > >> + * PGE | PCIDE | flush at > >> + * ------+-------+------------------------ > >> + * 0->0 | 0->0 | CR3 write > >> + * 0->0 | 0->1 | n/a (see 1st CR4 write) > >> + * 0->x | 1->0 | CR4 write > >> + * x->1 | x->1 | n/a > >> + * 0->0 | 1->1 | INVPCID > >> + * 0->1 | 0->0 | CR3 and CR4 writes > >> + * 1->0 | 0->0 | CR4 write > >> + * 1->0 | 0->1 | n/a (see 1st CR4 write) > >> + * 1->1 | 0->0 | n/a (see 1st CR4 write) > >> + * 1->x | 1->x | n/a > >> */ > >> - if ( old_pcid != cr3_pcid(cr3) && > > > > You seem to have dropped all the users of cr3_pcid, I guess the > > function is not removed because you plan to use it in other sites? > > > >> - !(cr4 & X86_CR4_PGE) && > >> - (old_cr4 & X86_CR4_PCIDE) <= (cr4 & X86_CR4_PCIDE) ) > >> - invpcid_flush_single_context(old_pcid); > >> + if ( cr4 & X86_CR4_PCIDE ) > >> + invpcid_flush_all_nonglobals(); > > > > Isn't this going to be quite expensive compared to the single PCID > > flushing done before? (ie: invpcid_flush_single_context vs > > invpcid_flush_all_nonglobals) > > ... the invpcid_flush_single_context() gets eliminated altogether > (by doing the main flush _after_ the control register writes). Oh, thanks, I've certainly missed this move, sorry. Reviewed-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |