[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/mm: Split read_cr3() into read_cr3_pa() and __read_cr3()



On Tue, Jun 13, 2017 at 2:26 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Mon, Jun 12, 2017 at 10:26:14AM -0700, Andy Lutomirski wrote:
>> The kernel has several code paths that read CR3.  Most of them assume that
>> CR3 contains the PGD's physical address, whereas some of them awkwardly
>> use PHYSICAL_PAGE_MASK to mask off low bits.
>>
>> Add explicit mask macros for CR3 and convert all of the CR3 readers.
>> This will keep them from breaking when PCID is enabled.
>
> ...
>
>> +/*
>> + * CR3's layout varies depending on several things.
>> + *
>> + * If CR4.PCIDE is set (64-bit only), then CR3[11:0] is the address space 
>> ID.
>> + * If PAE is enabled, then CR3[11:5] is part of the PDPT address
>> + * (i.e. it's 32-byte aligned, not page-aligned) and CR3[4:0] is ignored.
>> + * Otherwise (non-PAE, non-PCID), CR3[3] is PWT, CR3[4] is PCD, and
>> + * CR3[2:0] and CR3[11:5] are ignored.
>> + *
>> + * In all cases, Linux puts zeros in the low ignored bits and in PWT and 
>> PCD.
>> + *
>> + * CR3[63] is always read as zero.  If CR4.PCIDE is set, then CR3[63] may be
>> + * written as 1 to prevent the write to CR3 from flushing the TLB.
>> + *
>> + * On systems with SME, one bit (in a variable position!) is stolen to 
>> indicate
>> + * that the top-level paging structure is encrypted.
>> + *
>> + * All of the remaining bits indicate the physical address of the top-level
>> + * paging structure.
>> + *
>> + * CR3_ADDR_MASK is the mask used by read_cr3_pa().
>> + */
>> +#ifdef CONFIG_X86_64
>> +/* Mask off the address space ID bits. */
>> +#define CR3_ADDR_MASK 0x7FFFFFFFFFFFF000ull
>> +#define CR3_PCID_MASK 0xFFFull
>> +#else
>> +/*
>> + * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
>> + * a tiny bit of code size by setting all the bits.
>> + */
>> +#define CR3_ADDR_MASK 0xFFFFFFFFull
>> +#define CR3_PCID_MASK 0ull
>
> All those can do GENMASK_ULL for better readability.
>

Hmm.  I'll look at that.

>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>> index 3586996fc50d..bc0a849589bb 100644
>> --- a/arch/x86/kernel/paravirt.c
>> +++ b/arch/x86/kernel/paravirt.c
>> @@ -391,7 +391,7 @@ struct pv_mmu_ops pv_mmu_ops __ro_after_init = {
>>
>>       .read_cr2 = native_read_cr2,
>>       .write_cr2 = native_write_cr2,
>> -     .read_cr3 = native_read_cr3,
>> +     .read_cr3 = __native_read_cr3,
>>       .write_cr3 = native_write_cr3,
>>
>>       .flush_tlb_user = native_flush_tlb,
>> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
>> index ffeae818aa7a..c6d6dc5f8bb2 100644
>> --- a/arch/x86/kernel/process_32.c
>> +++ b/arch/x86/kernel/process_32.c
>> @@ -92,7 +92,7 @@ void __show_regs(struct pt_regs *regs, int all)
>>
>>       cr0 = read_cr0();
>>       cr2 = read_cr2();
>> -     cr3 = read_cr3();
>> +     cr3 = __read_cr3();
>>       cr4 = __read_cr4();
>
> This is a good example for my confusion. So we have __read_cr4() - with
> the underscores - but not read_cr4().
>
> Now we get __read_cr3 *and* read_cr3_pa(). So __read_cr3() could just as
> well lose the "__", right?
>
> Or are the PCID series bringing a read_cr3() without "__" too?

The intent was twofold:

1. Make sure that every read_cr3() instance got converted.  I didn't
want a mid-air collision with someone else's patch in which it would
appear to apply and compile but the result would randomly fail on PCID
systems.

2. Make users realize that CR3 ain't what it used to be.  __read_cr3()
means "return this complicated register value -- I know what I'm
doing" and read_cr3_pa() means "give me the PA".

Maybe we could rename __read_cr3() to read_cr3_raw()?  If we really
wanted lots of clarity, __read_cr4() could become read_cr4_noshadow(),
I suppose.

What do you think?  My general preference is to clean this up after
the rest of the big patchsets (SME and PCID) land.

>
> Oh, and to confuse me even more, there's __native_read_cr3() which is
> *finally* accessing %cr3 :-) But there's native_write_cr3() without
> underscores.
>
> So can we make those names a bit more balanced please?

write_cr3() was less widespread, so I worried about it less.

--Andy

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.