[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.17 v4 2/2] amd: remove VIRT_SC_MSR_HVM synthetic feature
On 15.11.2022 17:21, Andrew Cooper wrote: > On 15/11/2022 13:26, Roger Pau Monne wrote: >> Since the VIRT_SPEC_CTRL.SSBD selection is no longer context switched >> on vm{entry,exit} there's no need to use a synthetic feature bit for >> it anymore. >> >> Remove the bit and instead use a global variable. >> >> No functional change intended. >> >> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx> >> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> >> Release-acked-by: Henry Wang <Henry.Wang@xxxxxxx> > > This is definitely not appropriate for 4.17, but it's a performance > regression in general, hence my firm and repeated objection to this > style of patch. > > General synthetic bits have existed for several decades longer than > alternatives. It has never ever been a rule, or even a recommendation, > to aggressively purge the non-alternative bits, because it's a provably > bad thing to do. There we are again - you state something as bad without really saying why it is bad. My view is that synthetic bits were wrong to introduce when they don't stand a chance of being used in an alternative. I agree though that there's no strong need for this to make 4.17. It may end up making backports slightly easier, as no such bit existed in 4.16. > You are attempting a micro-optimisation, that won't produce any > improvement at all in centuries, while... > >> diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c >> index a332087604..9e3b9094d3 100644 >> --- a/xen/arch/x86/cpu/amd.c >> +++ b/xen/arch/x86/cpu/amd.c >> @@ -49,6 +49,7 @@ boolean_param("allow_unsafe", opt_allow_unsafe); >> /* Signal whether the ACPI C1E quirk is required. */ >> bool __read_mostly amd_acpi_c1e_quirk; >> bool __ro_after_init amd_legacy_ssbd; >> +bool __ro_after_init amd_virt_spec_ctrl; > > ... actually expending .rodata with something 8 times less efficiently > packed, and ... ... as long as you're talking of just a single CPU. The break-even is at 8 CPUs (8 bits used either way). >> --- a/xen/arch/x86/spec_ctrl.c >> +++ b/xen/arch/x86/spec_ctrl.c >> @@ -514,12 +514,12 @@ static void __init print_details(enum ind_thunk thunk, >> uint64_t caps) >> (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) || >> boot_cpu_has(X86_FEATURE_SC_RSB_HVM) || >> boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) || >> - boot_cpu_has(X86_FEATURE_VIRT_SC_MSR_HVM) || >> + amd_virt_spec_ctrl || > > ... breaking apart a single TEST instruction, which not only adds an > extra conditional merge, but now hits an cold-ish cache line everywhere > it's used. > > Count how many synthetic feature bits it will actually take to change > the per-cpu data size, and realise that, when it will take more than 200 > years at the current rate of accumulation, any believe that this is an > improvement to be had disappears. > > Yes, it is only a micro regression, but you need a far better > justification than "there is a gain of 64 bytes per CPU which will be > non-theoretical in more than 200 years" when traded off vs the actual > 512 bytes, plus extra code bloat bloat, plus reduced locality of data > that this "improvement" genuinely costs today. I don't see Roger stating anything like this. I think we need to settle on at least halfway firm rules on when to use synthetic feature bits and when to use plain global booleans. Without that the tastes of the three of us are going to collide again every once in a while. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |