[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/PV32: restore PAE-extended-CR3 logic
On 04.04.2023 13:41, Roger Pau Monné wrote: > On Tue, Apr 04, 2023 at 12:31:31PM +0200, Jan Beulich wrote: >> On 04.04.2023 12:12, Roger Pau Monné wrote: >>> On Wed, Feb 15, 2023 at 03:54:11PM +0100, Jan Beulich wrote: >>>> While the PAE-extended-CR3 VM assist is a 32-bit only concept, it still >>>> applies to guests also when run on a 64-bit hypervisor: The "extended >>>> CR3" format has to be used there as well, to fit the address in the only >>>> 32-bit wide register there. As a result it was a mistake that the check >>>> was never enabled for that case, and was then mistakenly deleted in the >>>> course of removal of 32-bit-Xen code (218adf199e68 ["x86: We can assume >>>> CONFIG_PAGING_LEVELS==4"]). >>>> >>>> Similarly during Dom0 construction kernel awareness needs to be taken >>>> into account, and respective code was again mistakenly never enabled for >>>> 32-bit Dom0 when running on 64-bit Xen (and thus wrongly deleted by >>>> 5d1181a5ea5e ["xen: Remove x86_32 build target"]). >>>> >>>> At the same time restrict enabling of the assist for Dom0 to just the >>>> 32-bit case. Furthermore there's no need for an atomic update there. >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>>> --- >>>> I was uncertain whether to add a check to the CR3 guest read path, >>>> raising e.g. #GP(0) when the value read wouldn't fit but also may not >>>> be converted to "extended" format (overflow is possible there in >>>> principle because of the control tools "slack" in promote_l3_table()). >>>> >>>> In that context I was puzzled to find no check on the CR3 guest write >>>> path even in 4.2: A guest (bogusly) setting the PCD or PWT bits (or any >>>> of the low reserved ones) could observe anomalous behavior rather than >>>> plain failure. >>>> >>>> As to a Fixes: tag - it's pretty unclear which of the many original >>>> 32-on-64 changes to blame. I don't think the two cited commits should >>>> be referenced there, as they didn't break anything that wasn't already >>>> broken. >>>> >>>> --- a/xen/arch/x86/mm.c >>>> +++ b/xen/arch/x86/mm.c >>>> @@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_ >>>> unsigned int partial_flags = page->partial_flags; >>>> l3_pgentry_t l3e = l3e_empty(); >>>> >>>> + /* >>>> + * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does not >>>> + * understand the weird 'extended cr3' format for dealing with >>>> high-order >>>> + * address bits. We cut some slack for control tools (before vcpu0 is >>>> + * initialised). >>> >>> Don't we then need some check in the vCPU init path to assure that the >>> cr3 is < 32bits if we allow those to initially be set? >>> >>> Or will the initialization unconditionally overwrite any previous cr3 >>> value? >> >> That's not the way I understand this "cut some slack". Instead I read it >> to be meant to cover for the VM-assist bit not being set, yet. Beyond >> that it is assumed to be tool stack's responsibility to constrain >> addresses suitably. If it doesn't, it'll simply break the guest. (There >> is some guessing on my part involved here, as the original introduction >> of that code didn't further explain things.) > > If it's just the guest that's broken I would think it's fine. As long > as such mismatch doesn't cause issues in the hypervisor internal state. > > Did you see a toolstack setting such entries before pae_extended_cr3 > is set? To be honest - I didn't look. As said in the longer reply to Andrew, I think it is more logical this way (the page table root already being validated as an L3 table when vCPU 0 is inititalized, which includes setting its CR3). Hence even if right now the order was the other way around (which I doubt it is), I wouldn't want to make impossible to restore the original ordering again. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |