[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/PV32: restore PAE-extended-CR3 logic


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 5 Apr 2023 10:59:30 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RaRUamhn11nMvnYnVK64rs9/zrMUREZ4KwHbmPGqQfI=; b=FsmQGRcFmP/48s0/6JD8yxcW6oLrWdb54HXptGKcCpJNUQhenGVU7iNqeu8nzgoRdXpueOa8Gw7jro8OWCD7YuinsEXA1c27tS3TK+jK2GdGwxoPAoXcNrNoYkCjyIQUmsWfK0avhdSlBi12celpDv9flokdQ85/v4mgxuvJpElcblKbvIgIosfhx0fSot++y7h0Uba7DyCF5vNC5RfAo0witg1MoNK0vHSI2wSPADjKZdB7sFP+Lb5etn87L2nBT8X+Fde/xfTi3mSZ9pn5OLVq3asIcnwDNk1P8aN6UU9bsWFr4iL9MkWa8/qt/QHzXAJNuddCg8OwzoW8+Eo8ZQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UC4GSn2QRqdmiVeshQZMsQwrtI6/t7jzzGdZkF+qJc2DXze1FrrpUf2h4LY0SqwczGgwwFymAsdbgQ+AT+bmJHThNIyWlkq+d0ioy6Z6K2mdZCt1VufEt9bCg/CzwGPwB27EBAe49WmOf2H5Yk0bmkjY72Qw0uIHS6vMI6DYZ+lREd0EM6nphzXbfatY+RkfCGL5BIKQupOSuNyWqSRU+6m2uuEdGB4YmbjnKvNmbe2vO0j306/Dql+3NP9NtWxqp/HwAXBjdc97Pe2aBKmUFJltOoWYuA0dBB2KFCpv27ulEc6KNK3SgsxkOEALspmqn3JUb7O1C3rZQovyxmuv8Q==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 05 Apr 2023 08:59:54 +0000
  • Ironport-data: A9a23:1gSm+61J+irmLNdJ4PbD5fVwkn2cJEfYwER7XKvMYLTBsI5bp2YFx jQYW2GPbKqJNmb1e4t/aITgp0pQscPTyoBgHlZqpC1hF35El5HIVI+TRqvS04F+DeWYFR46s J9OAjXkBJppJpMJjk71atANlVEliefTAOK6ULWeUsxIbVcMYD87jh5+kPIOjIdtgNyoayuAo tq3qMDEULOf82cc3lk8tuTS+HuDgNyo4GlD5gBmOKgS1LPjvyJ94Kw3dPnZw0TQGuG4LsbiL 87fwbew+H/u/htFIrtJRZ6iLyXm6paLVeS/oiI+t5qK23CulQRrukoPD9IOaF8/ttm8t4sZJ OOhF3CHYVxB0qXkwIzxWvTDes10FfUuFLTveRBTvSEPpqFvnrSFL/hGVSkL0YMkFulfPCJk9 dgELSAxaDuqhN/p/reFabkxmZF2RCXrFNt3VnBI6xj8VKxjbbWdBqLA6JlfwSs6gd1IEbDGf c0FZDFzbRPGJRpSJlMQD5F4l+Ct7pX9W2QA9BTJ+uxqsi6Kk1IZPLvFabI5fvSQQspYhACAr 3/u9GXlGBAKcteYzFJp91r13rSQwXKlAdh6+LuQx6FDnAW0mGwpORQpElzjj8e51WX5YocKQ 6AT0m90xUQoz2SpRNTgWxyzoFafowURHdFXFoUS+AyLj6bZ/QudLmwFVSJaLswrstcsQj4n3 UPPmMnmbRRwtJWFRHTb8a2bxQ5eIgAQJG4GICMBEw0M5oC5pJlp102RCNF+DKSyk9v5Xynqx CyHpzQ/gLNVitMX06K8/hbMhDfESoX1czPZLz7/BgqNhj6Vrqb/D2B0wTA3Ncp9Ebs=
  • Ironport-hdrordr: A9a23:X8veiKwGZp78MOn7J6xfKrPxLOgkLtp133Aq2lEZdPULSK2lfp GV8sjziyWatN9IYgBdpTiBUJPwJU80hqQFnrX5XI3SEDUO3VHCEGgM1/qb/9SNIVydygcZ79 YcT0EcMqy+MbEZt7eA3ODQKb9JrLn3k5xAx92utUuFJjsaDJ2Imj0JczpzZXcGIjWua6BJca Z04PAsygaISDAyVICWF3MFV+/Mq5nik4/nWwcPA1oK+RSDljSh7Z/9Cly90g0FWz1C7L8++S yd+jaJp5mLgrWe8FvxxmXT55NZlJ/IzcZCPtWFjow4OyjhkQGhYaVmQvmnsCouqO+ixV42mJ 3nogsmPe5093TNF1vF7yfF6k3F6nID+nXiwViXjT/IusriXg83DMJHmMZwbgbZw1BIhqA+7I t7m0ai87ZHBxLJmyrwo/LSUQtxq0ayqX0+1cYOkn1kV5cEYrM5l/1cwKoVKuZEIMvJ0vFhLA BcNrCb2B+QSyLCU5nthBgq/DVrZAVqIv7JeDlYhiXf6UkpoJkw9Tpo+CVYpAZByHt1ceg22w zJX54Y5I1mX4sYa7lwC/wGRtbyAmvRQQjUOGbXOlj/ErobUki94KIfzY9Frd1CQqZ4hKcaid DEShdVpGQyc0XhBYmH24BK6AnERCG4US72ws9T6pBlsvmkLYCbfBGrWRQriY+tsv8fCsrUV7 K6P49XGebqKS/rFZxS1wPzVpFOIT0VUdETuNw8R1WSy/i7YLHCp6jearLeNbDtGTErVif2BW YCRiH6IIFa4kWiShbD8W7ssrPWCzvCFL5LYdznFrIoufow36V3w30otWg=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, Apr 05, 2023 at 10:11:30AM +0200, Jan Beulich wrote:
> On 04.04.2023 18:38, Roger Pau Monné wrote:
> > On Tue, Apr 04, 2023 at 05:57:11PM +0200, Roger Pau Monné wrote:
> >> On Tue, Apr 04, 2023 at 04:24:16PM +0200, Jan Beulich wrote:
> >>> On 04.04.2023 13:41, Roger Pau Monné wrote:
> >>>> On Tue, Apr 04, 2023 at 12:31:31PM +0200, Jan Beulich wrote:
> >>>>> On 04.04.2023 12:12, Roger Pau Monné wrote:
> >>>>>> On Wed, Feb 15, 2023 at 03:54:11PM +0100, Jan Beulich wrote:
> >>>>>>> While the PAE-extended-CR3 VM assist is a 32-bit only concept, it 
> >>>>>>> still
> >>>>>>> applies to guests also when run on a 64-bit hypervisor: The "extended
> >>>>>>> CR3" format has to be used there as well, to fit the address in the 
> >>>>>>> only
> >>>>>>> 32-bit wide register there. As a result it was a mistake that the 
> >>>>>>> check
> >>>>>>> was never enabled for that case, and was then mistakenly deleted in 
> >>>>>>> the
> >>>>>>> course of removal of 32-bit-Xen code (218adf199e68 ["x86: We can 
> >>>>>>> assume
> >>>>>>> CONFIG_PAGING_LEVELS==4"]).
> >>>>>>>
> >>>>>>> Similarly during Dom0 construction kernel awareness needs to be taken
> >>>>>>> into account, and respective code was again mistakenly never enabled 
> >>>>>>> for
> >>>>>>> 32-bit Dom0 when running on 64-bit Xen (and thus wrongly deleted by
> >>>>>>> 5d1181a5ea5e ["xen: Remove x86_32 build target"]).
> >>>>>>>
> >>>>>>> At the same time restrict enabling of the assist for Dom0 to just the
> >>>>>>> 32-bit case. Furthermore there's no need for an atomic update there.
> >>>>>>>
> >>>>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>>>>> ---
> >>>>>>> I was uncertain whether to add a check to the CR3 guest read path,
> >>>>>>> raising e.g. #GP(0) when the value read wouldn't fit but also may not
> >>>>>>> be converted to "extended" format (overflow is possible there in
> >>>>>>> principle because of the control tools "slack" in promote_l3_table()).
> >>>>>>>
> >>>>>>> In that context I was puzzled to find no check on the CR3 guest write
> >>>>>>> path even in 4.2: A guest (bogusly) setting the PCD or PWT bits (or 
> >>>>>>> any
> >>>>>>> of the low reserved ones) could observe anomalous behavior rather than
> >>>>>>> plain failure.
> >>>>>>>
> >>>>>>> As to a Fixes: tag - it's pretty unclear which of the many original
> >>>>>>> 32-on-64 changes to blame. I don't think the two cited commits should
> >>>>>>> be referenced there, as they didn't break anything that wasn't already
> >>>>>>> broken.
> >>>>>>>
> >>>>>>> --- a/xen/arch/x86/mm.c
> >>>>>>> +++ b/xen/arch/x86/mm.c
> >>>>>>> @@ -1520,6 +1520,23 @@ static int promote_l3_table(struct page_
> >>>>>>>      unsigned int   partial_flags = page->partial_flags;
> >>>>>>>      l3_pgentry_t   l3e = l3e_empty();
> >>>>>>>  
> >>>>>>> +    /*
> >>>>>>> +     * PAE pgdirs above 4GB are unacceptable if a 32-bit guest does 
> >>>>>>> not
> >>>>>>> +     * understand the weird 'extended cr3' format for dealing with 
> >>>>>>> high-order
> >>>>>>> +     * address bits. We cut some slack for control tools (before 
> >>>>>>> vcpu0 is
> >>>>>>> +     * initialised).
> >>>>>>
> >>>>>> Don't we then need some check in the vCPU init path to assure that the
> >>>>>> cr3 is < 32bits if we allow those to initially be set?
> >>>>>>
> >>>>>> Or will the initialization unconditionally overwrite any previous cr3
> >>>>>> value?
> >>>>>
> >>>>> That's not the way I understand this "cut some slack". Instead I read it
> >>>>> to be meant to cover for the VM-assist bit not being set, yet. Beyond
> >>>>> that it is assumed to be tool stack's responsibility to constrain
> >>>>> addresses suitably. If it doesn't, it'll simply break the guest. (There
> >>>>> is some guessing on my part involved here, as the original introduction
> >>>>> of that code didn't further explain things.)
> >>>>
> >>>> If it's just the guest that's broken I would think it's fine.  As long
> >>>> as such mismatch doesn't cause issues in the hypervisor internal state.
> >>>>
> >>>> Did you see a toolstack setting such entries before pae_extended_cr3
> >>>> is set?
> >>>
> >>> To be honest - I didn't look. As said in the longer reply to Andrew, I
> >>> think it is more logical this way (the page table root already being
> >>> validated as an L3 table when vCPU 0 is inititalized, which includes
> >>> setting its CR3). Hence even if right now the order was the other way
> >>> around (which I doubt it is), I wouldn't want to make impossible to
> >>> restore the original ordering again.
> >>
> >> IMO I think it would be better if we could already report error at
> >> domain creation time if the toolstack is attempting to create a domain
> >> that the hypervisor knows is not going to work properly, rather than
> >> allowing it and the guest failing in maybe non obvious ways.
> >>
> >> It seems to me however that we would need to fix xc_dom_boot_image()
> >> in order to setup the vCPU before creating the initial page-tables.
> >> (->setup_pgtables() hook being called before ->vcpu() hook)
> 
> This might be a possibility, yes, but it's (imo severe) scope creep
> in the context here. All I'm after is to restore code which was
> delete in error (and which was, when it was still there, not
> properly put in use in all cases it would have been needed).

I realize the check was wrongly removed in 218adf199e68, so I guess
it's fine to restore it, albeit it would be better if we could remove
the weirdness with setting up page tables before the hypervisor has
the full information about the domain capabilities.

> >> So I don't think this is strictly worse than what we have, but it
> >> would also be nice to get things sorted out so the ability of the
> >> toolstack to shot its own foot is limited.
> > 
> > Maybe I'm confused after all day, but isn't the hypercall used by the
> > toolstack to set CR3 the same one used to set the vm_assist bits?
> > (XEN_DOMCTL_setvcpucontext)
> > 
> > At which point we just need to make sure d->vm_assist gets set before
> > attempting to load the new CR3 (seems that way from the quick look
> > I've given at arch_set_info_guest()).
> 
> Right, it is this way already. But that's not the point.
> MMUEXT_PIN_L3_TABLE may (will?) come ahead of this.

Oh, I see.

We should likely move the setting of vm_assist to the domain create
hypercall, instead of doing it at vCPU initialization.

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.