[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Intended behavior/usage of SSBD setting
On 20/10/2022 12:01, Roger Pau Monné wrote: > Hello, > > As part of some follow up improvements to my VIRT_SPEC_CTRL series we > have been discussing what the usage of SSBD should be for the > hypervisor itself. There's currently a `spec-ctrl=ssbd` option [0], > that has an out of date description, as now SSBD is always offered to > guests on AMD hardware, either using SPEC_CTRL or VIRT_SPEC_CTRL. > > It has been pointed out by Andrew that toggling SSBD on AMD using > VIRT_SPEC_CTRL or the non-architectural way (MSR_AMD64_LS_CFG) can > have a high impact on performance, and hence switching it on every > guest <-> hypervisor context switch is likely a very high > performance penalty. > > It's been suggested that it could be more appropriate to run Xen with > the guest SSBD selection on those systems, however that clashes with > the current intent of the `spec-ctrl=ssbd` option. > > I hope I have captured the expressed opinions correctly in the text > above. > > I see two ways to solve this: > > * Keep the current logic for switching SSBD on guest <-> hypervisor > context switch, but only use it if `spec-ctrl=ssbd` is set on the > command line. > > * Remove the logic for switching SSBD on guest <-> hypervisor context > switch, ignore setting of `spec-ctrl=ssbd` on those systems and run > hypervisor code with the guest selection of SSBD. > > Which has raised me the question of whether there's an use case > for always running hypervisor code with SSBD enabled, or that's no > longer relevant if we always offer guests a way for them to toggle the > setting when required. > > I would like to settle on a way forward, so we can get this fixed > before 4.17. > > Thanks, Roger. > > [0] > https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html#spec-ctrl-x86 There are many issues at play here. Not least that virt spec ctrl is technically a leftover task that ought to force a re-issue of XSA-263. Accessing MSRs (even reading) is very expensive, typically >1k cycles. The core CFG registers are more expensive than most, because they're intended to be configured once after reset and then left alone. Throughout the speculation work, we've seen crippling performance hits from accessing MSRs in fastpaths. The fact we're forced to use MSRs in fastpaths even on new CPUs with built in (rather than retrofitted) speculation support is is an area of concern still being worked on with the CPU vendors. Case in point. We found for XSA-398 that toggling AMD's MSR_SPEC_CTRL.IBRS on the PV entrypath was so bad that setting it unilaterally behind the back of PV guests was the faster option. (Another todo is to stop doing this on Intel eIBRS systems, and this will recover us a decent chunk of performance.) SSBD mitigations are (rightly or wrongly) off by default for performance reasons. AMD are less affected than Intel, for microarchitectural reasons which are discussed in relevant whitepapers, and which are expected to remain true for future CPUs. When Xen doesn't care about the protecting itself against SSBD by default, I guarantee you that it will be faster to omit the MSR accesses and run in the guest kernel's choice, than to clear the SSBD protection. We simply don't spend long enough in the hypervisor for the hit against memory accesses to dwarf the hit for MSR accesses taken on entry/exit. The reason we put in spec-ctrl=ssbd was as a stopgap, because at the time we didn't know how bad SSB really was, and it was decided that the admin should have a big hammer to use if they really needed. When Xen does care about protecting itself, the above reasoning bites back hard. Because we spend (or should be spending!) >99% of time in the guest, the hit to memory accesses is far more likely to be able dwarf the hit from the MSR accesses, but now, the dominating factor for performance is the vmexit rate. The problem is that if you've got a completely compute bound workload, there are very few exits, while if you've got an IO bound workload, there are plenty of exits. I honestly don't know if it will be more efficient to leave SSBD active unilaterally (whether or not we hide this, e.g. synthesizing SSB_NO), or to let the guest run with it kernels choice. I suspect the answer is different with different workloads. But, one other factor helps us. Given that the default is fast (rather than secure), anyone opting in to spec-ctrl=ssbd is saying "I care more about security than performance", at which point we can simplify what we do because we don't need to cater to everyone. As a slight tangent, there is a cost to having too many options, which must not be ignored. Xen's speculation safety is far too complicated already and needs to get more simple; this has a material impact on how easy it is to follow, and how easy it to make changes. It is the way it is because we've had 6 years of drip feeding one problem after another, and haven't had the time to take a step and design something more sensible from having 6 years of knowledge/learnings as a basis. There are definitely things which I would have done differently, if 6 years ago, I'd known what I know now, and part of the reason why the recent speculation security work has taken so much effort is because it has involved reworking the effort which came before, to a deadline which never has enough time to plan properly within. So, first question, do we care about having an "SSBD active while in Xen" mode? Probably yes, because we a) still don't have a working solution for PV guests on AMD and b) who knows if there's something far worse lurking in the future. Sods law says that if we decide no here, it will be critical for some future issue. But as it's off by default and noone's made has made any noise about having it on, we ought to prioritise simplicity. Given that off is the default, but we know that kernels do offer it to userspace, and it does get used by certain processes, we need to prioritise performance. And here, this is net system performance, not "ensure it's off whenever it can be". Having Xen run in the guest kernel's choice of value will result in much better overall performance, than trying to modify the setting in the VMentry/exit path. Sorry that this is a very long and somewhat open ended answer, but it is genuinely the level of complexity I grapple with on every security issue in this area. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |