|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 5/7] xen/x86: disable global pages for domains with XPTI active
On 06/04/18 17:17, Andrew Cooper wrote:
> On 06/04/18 08:52, Juergen Gross wrote:
>> Instead of flushing the TLB from global pages when switching address
>> spaces with XPTI being active just disable global pages via %cr4
>> completely when a domain subject to XPTI is active. This avoids the
>> need for extra TLB flushes as loading %cr3 will remove all TLB
>> entries.
>>
>> In order to avoid states with cr3/cr4 having inconsistent values
>> (e.g. global pages being activated while cr3 already specifies a XPTI
>> address space) move loading of the new cr4 value to write_ptbase()
>> (actually to write_cr3_cr4() called by write_ptbase()).
>>
>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
>
> This is contrary to the performance optimisations taken by Linux,
> Windows and Apple, which borrowing Xen's 64bit PV optimisation of having
> global user pages, because it really is a (small) performance improvement.
>
> Are there performance numbers for this change alone, and/or are later
> changes strictly dependent on this functionality?
Yes and yes.
Performance is much better for XPTI (again my standard compile test):
elapsed time dropped from 112 to 106 seconds, system time from 160 to
139 seconds, user time from 187 to 186 seconds.
Page invalidation with PCID enabled is _much_ easier.
> On the Xen side of things, an argument could probably be made that the
> extra cr4 rewrites due to the L4 shadowing might eat away the
> performance we would otherwise gain, but I'd be hesitant to blindly
> assume that this is the case.
Another problem I wanted to avoid was the global page handling of Xen
private pages: I would have needed to remove all the global bits, either
even for AMD cpus, or do that dynamically.
> A complicating factor is that Intel have said that the performance gains
> from user global pages would be more noticeable on older hardware, due
> to differences in the TLB architecture.
>
>> ---
>> V4:
>> - don't use mmu_cr4_features for setting new cr4 value (Jan Beulich)
>> - use simpler scheme for setting X86_CR4_PGE in
>> pv_guest_cr4_to_real_cr4() (Jan Beulich)
>>
>> V3:
>> - move cr4 loading for all domains from *_ctxt_switch_to() to
>> write_cr3_cr4() called by write_ptbase() (Jan Beulich)
>> - rebase
>> ---
>> xen/arch/x86/domain.c | 5 -----
>> xen/arch/x86/flushtlb.c | 13 ++++++++-----
>> xen/arch/x86/mm.c | 14 +++++++++++---
>> xen/arch/x86/x86_64/entry.S | 10 ----------
>> xen/common/efi/runtime.c | 4 ++--
>> xen/include/asm-x86/domain.h | 3 ++-
>> xen/include/asm-x86/flushtlb.h | 2 +-
>> 7 files changed, 24 insertions(+), 27 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 9c229594f4..c2bb70c483 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1522,17 +1522,12 @@ void paravirt_ctxt_switch_from(struct vcpu *v)
>> void paravirt_ctxt_switch_to(struct vcpu *v)
>> {
>> root_pgentry_t *root_pgt = this_cpu(root_pgt);
>> - unsigned long cr4;
>>
>> if ( root_pgt )
>> root_pgt[root_table_offset(PERDOMAIN_VIRT_START)] =
>> l4e_from_page(v->domain->arch.perdomain_l3_pg,
>> __PAGE_HYPERVISOR_RW);
>>
>> - cr4 = pv_guest_cr4_to_real_cr4(v);
>> - if ( unlikely(cr4 != read_cr4()) )
>> - write_cr4(cr4);
>> -
>> if ( unlikely(v->arch.debugreg[7] & DR7_ACTIVE_MASK) )
>> activate_debugregs(v);
>>
>> diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
>> index f793b70696..5dcd9a2bf6 100644
>> --- a/xen/arch/x86/flushtlb.c
>> +++ b/xen/arch/x86/flushtlb.c
>> @@ -89,20 +89,23 @@ static void do_tlb_flush(void)
>> post_flush(t);
>> }
>>
>> -void write_cr3(unsigned long cr3)
>> +void write_cr3_cr4(unsigned long cr3, unsigned long cr4)
>> {
>> - unsigned long flags, cr4;
>> + unsigned long flags;
>> u32 t;
>>
>> /* This non-reentrant function is sometimes called in interrupt
>> context. */
>> local_irq_save(flags);
>>
>> t = pre_flush();
>> - cr4 = read_cr4();
>>
>> - write_cr4(cr4 & ~X86_CR4_PGE);
>> + if ( read_cr4() & X86_CR4_PGE )
>> + write_cr4(cr4 & ~X86_CR4_PGE);
>> +
>> asm volatile ( "mov %0, %%cr3" : : "r" (cr3) : "memory" );
>> - write_cr4(cr4);
>> +
>> + if ( read_cr4() != cr4 )
>
> read_cr4(), despite being a cached read, isn't free because of the %rsp
> manipulation required to access the variable. I'd keep the locally
> cached cr4, and use "if ( cr4 & X86_CR4_PGE )" here.
I did this on purpose: it might be cr4 is being modified not only
regarding pge. I can nevertheless cache the read cr4 value, of course.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |