Xen project Mailing List

Re: [PATCH v4 4/4] x86/PV: issue branch prediction barrier when switching 64-bit guest to kernel mode

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Tue, 19 Dec 2023 16:11:09 +0100

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>

Delivery-date: Tue, 19 Dec 2023 15:11:29 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Dec 19, 2023 at 03:06:50PM +0100, Jan Beulich wrote: > On 19.12.2023 12:48, Roger Pau Monné wrote: > > On Tue, Dec 19, 2023 at 10:56:16AM +0100, Jan Beulich wrote: > >> On 18.12.2023 18:24, Roger Pau Monné wrote: > >>> On Tue, Feb 14, 2023 at 05:12:08PM +0100, Jan Beulich wrote: > >>>> --- a/xen/arch/x86/pv/domain.c > >>>> +++ b/xen/arch/x86/pv/domain.c > >>>> @@ -455,6 +455,7 @@ static void _toggle_guest_pt(struct vcpu > >>>> void toggle_guest_mode(struct vcpu *v) > >>>> { > >>>> const struct domain *d = v->domain; > >>>> + struct cpu_info *cpu_info = get_cpu_info(); > >>>> unsigned long gs_base; > >>>> > >>>> ASSERT(!is_pv_32bit_vcpu(v)); > >>>> @@ -467,15 +468,21 @@ void toggle_guest_mode(struct vcpu *v) > >>>> if ( v->arch.flags & TF_kernel_mode ) > >>>> v->arch.pv.gs_base_kernel = gs_base; > >>>> else > >>>> + { > >>>> v->arch.pv.gs_base_user = gs_base; > >>>> + > >>>> + if ( opt_ibpb_mode_switch && > >>>> + !(d->arch.spec_ctrl_flags & SCF_entry_ibpb) && > >>>> + !VM_ASSIST(d, mode_switch_no_ibpb) ) > >>>> + cpu_info->spec_ctrl_flags |= SCF_new_pred_ctxt; > >>> > >>> Likewise similar to the remarks I've made before, if doing an IBPB on > >>> entry is enough to cover for the case here, it must also be fine to > >>> issue the IBPB right here, instead of deferring to return to guest > >>> context? > >>> > >>> The only concern would be (as you mentioned before) to avoid clearing > >>> valid Xen predictions, but I would rather see some figures about what > >>> effect the delaying to return to guest has vs issuing it right here. > >> > >> Part of the reason (aiui) to do things on the exit path was to > >> consolidate the context switch induced one and the user->kernel switch > >> one into the same place and mechanism. > > > > Isn't it kind of a very specific case that we end up doing a > > user->kernel switch as part of a context switch? IOW: would require > > the vCPU to be scheduled out at that very specific point. > > No, there's no user->kernel switch at the same time as context switch. > What I was trying to explain is that with the actual IBPB being issued > on exit to guest, both the context switch path and the user->kernel > mode switch path set the same indicator, for the exit path to consume. Deferring to exit to guest path could be OK, but unless strictly needed, which I don't think it's the case, I would request for IBPB to be executed in C context rather than assembly one. > >>>> + * > >>>> + * By default (on affected and capable hardware) as a safety measure > >>>> Xen, > >>>> + * to cover for the fact that guest-kernel and guest-user modes are both > >>>> + * running in ring 3 (and hence share prediction context), would issue a > >>>> + * barrier for user->kernel mode switches of PV guests. > >>>> + */ > >>>> +#define VMASST_TYPE_mode_switch_no_ibpb 33 > >>> > >>> Would it be possible to define the assist as > >>> VMASST_TYPE_mode_switch_ibpb and have it on when enabled? So that the > >>> guest would disable it if unneeded? IMO negated options are in > >>> general harder to understand. > >> > >> Negative options aren't nice, yes, but VM assists start out as all > >> clear. > > > > Are you sure? I see VMASST_TYPE_pae_extended_cr3 getting set in > > dom0_construct_pv() and that makes me wonder whether other bits > > couldn't start set also. > > > > Maybe there's some restriction I'm missing, but I don't see any > > wording in the description of the interface that states that all > > assists are supposed to start disabled. > > Well, that setting of pae_extended_cr3 is in response to the kernel's > notes section having a respective indicator. So we still only set the > bit in response to what the kernel's asking us to do, just that here > we carry out the request ahead of launching the kernel. > > Also consider what would happen during migration if there was a > default-on assist: At the destination we can't know whether the > source simply didn't know of the bit, or whether the guest elected to > clear it. Hm, I see, so I was indeed missing that aspect. VM assist is passed as a plain bitmap, and there's no signal on which assists the VM had available on the source side if not enabled. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.