[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 03/22] x86/dom0: only disable SMAP for the PV dom0 build
On Tue, Jul 30, 2024 at 12:06:45PM +0100, Andrew Cooper wrote: > On 30/07/2024 11:55 am, Roger Pau Monné wrote: > > On Mon, Jul 29, 2024 at 06:51:31PM +0100, Andrew Cooper wrote: > >> On 29/07/2024 5:18 pm, Roger Pau Monné wrote: > >>> On Mon, Jul 29, 2024 at 04:52:22PM +0100, Andrew Cooper wrote: > >>>> On 29/07/2024 12:53 pm, Jan Beulich wrote: > >>>>> On 26.07.2024 17:21, Roger Pau Monne wrote: > >>>>>> The PVH dom0 builder doesn't switch page tables and has no need to run > >>>>>> with > >>>>>> SMAP disabled. > >>>>>> > >>>>>> Put the SMAP disabling close to the code region where it's necessary, > >>>>>> as it > >>>>>> then becomes obvious why switch_cr3_cr4() is required instead of > >>>>>> write_ptbase(). > >>>>>> > >>>>>> Note removing SMAP from cr4_pv32_mask is not required, as we never > >>>>>> jump into > >>>>>> guest context, and hence updating the value of cr4_pv32_mask is not > >>>>>> relevant. > >>>>> I'm okay-ish with that being dropped, but iirc the goal was to keep the > >>>>> variable in sync with CPU state. > >>>> Removing SMAP from cr4_pv32_mask is necessary. > >>>> > >>>> Otherwise IST vectors will reactive SMAP behind the back of the > >>>> dombuilder. > >>>> > >>>> This will probably only manifest in practice in a CONFIG_PV32=y build, > >>> Sorry, I'm possibly missing some context here. When running the dom0 > >>> builder we switch to the guest page-tables, but not to the guest vCPU, > >>> (iow: current == idle) and hence the context is always the Xen > >>> context. > >> Correct. > >> > >>> Why would the return path of the IST use cr4_pv32_mask when the > >>> context in which the IST happened was the Xen one, and the current > >>> vCPU is the idle one (a 64bit PV guest from Xen's PoV). > >>> > >>> My understanding is that cr4_pv32_mask should only be used when the > >>> current context is running a 32bit PV vCPU. > >> This logic is evil to follow, because you need to look at both > >> cr4_pv32_mask and XEN_CR4_PV32_BITS to see both halves of it. > >> > >> Notice how cr4_pv32_restore() only ever OR's cr4_pv32_mask into %cr4? > >> > >> CR4_PV32_RESTORE is called from every entry path which *might* have come > >> from a 32bit PV guest, and it always results in Xen having SMEP/SMAP > >> active (as applicable). This includes NMI. > >> > >> The change is only undone in compat_restore_all_guest(), where > >> XEN_CR4_PV32_BITS is cleared from %cr4 iff returning to Ring1/2. This > >> is logic cunningly disguised in the use of the Parity flag. > >> > >> > >> Because the NMI handler does reactive SMEP/SMAP (based on the value in > >> cr4_pv32_mask), and returning to Xen does not pass through > >> compat_restore_all_guest(), taking an NMI in the middle of of the > >> dombuilder will reactive SMAP behind your back. > > After further conversations with Andrew we believe the current > > disabling of X86_CR4_SMAP in %cr4 during dom0 build is not safe. > > > > Regardless of whether cr4_pv32_mask is properly adjusted return to Xen > > context from interrupt would be done with SMAP enabled if > > X86_FEATURE_XEN_SMAP is set. > > Sorry - that's not what I intended to convey. > > The logic prior to this patch is safe. SMAP is cleared from > cr4_pv32_mask before clearing CR4.SMAP, and reinstated in the opposite > order. Therefore, an NMI hitting the region won't reactivate SMAP > because it's not (instantaniously) set in cr4_pv32_mask. My bad, I was getting confused with the `clac` instructions in the event entry points. I think with my proposed change we would also hit the BUG in cr4_pv32_restore on debug builds if Xen got an interrupt in the middle of the SMAP disabled dom0 build region if SMEP is enabled, as cr4_pv32_mask would differ from the current %cr4 value (not all bits intended would be actually set). > Arguably it wants some barrier()'s for clarity, and an explanation of > why this works. > > The problem your patch has is that by not clearing SMAP from > cr4_pv32_mask, it becomes unsafe iff an NMI/#MC/#DB hits the region. Won't such issue also affect common_interrupt, and hence not be limited to NMI/#MC/#DB only? Regards, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |