Xen project Mailing List

Re: [Xen-devel] [PATCH] x86/spec-ctrl: Scrub stale segment registers on leaky hardware

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Mon, 12 Aug 2019 13:30:40 +0100

Authentication-results: esa2.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@xxxxxxxxxx; spf=Pass smtp.mailfrom=Andrew.Cooper3@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx

Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABtClBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPokCOgQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86LkCDQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAYkC HwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Tue, 13 Aug 2019 06:57:52 +0000

Ironport-sdr: OdGg8HEJ5ioR32G6URJdDSBfTNjK83+129xnMUaHHTRrfLbNB/mvA5nlCuDqaBgC2uRJQ/6g3s zoT3QLaCgEWri8fMuxCjI1KOSlQya4UQKTy71hCcwNCxFbOmuHJeyaeRyaKcbR5LgYxTazBm6S gzOc4yIrQg1O0cBT0YtmwiPOff1Ca3YsqneMOxtYvFCgvpXo1mPakazeNJ2gDA9VsvdIz4NWTr ivZTv3Ung0R9gkK4XaGqAhZ2tuq5QqOXsRJY4rYSnsbfYfJfpqSrR+Cca45bVJBjdlDDq21igt ykg=

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Openpgp: preference=signencrypt

On 12/08/2019 09:00, Jan Beulich wrote: > On 09.08.2019 19:16, Andrew Cooper wrote: >> --- a/docs/misc/xen-command-line.pandoc >> +++ b/docs/misc/xen-command-line.pandoc >> @@ -1914,7 +1914,7 @@ By default SSBD will be mitigated at runtime >> (i.e `ssbd=runtime`). >> ### spec-ctrl (x86) >> > `= List of [ <bool>, xen=<bool>, >> {pv,hvm,msr-sc,rsb,md-clear}=<bool>, >> > bti-thunk=retpoline|lfence|jmp, >> {ibrs,ibpb,ssbd,eager-fpu, >> -> l1d-flush,l1tf-barrier}=<bool> ]` >> +> l1d-flush,stale-seg-clear,l1tf-barrier}=<bool> ]` >> Controls for speculative execution sidechannel mitigations. By >> default, Xen >> will pick the most appropriate mitigations based on compiled in >> support, >> @@ -1986,6 +1986,12 @@ Irrespective of Xen's setting, the feature is >> virtualised for HVM guests to >> use. By default, Xen will enable this mitigation on hardware >> believed to be >> vulnerable to L1TF. >> +On all hardware, the `stale-seg-clear=` option can be used to >> force or prevent >> +Xen from clearing the microarchitectural segment register copies on >> context >> +switch. By default, Xen will choose to use stale segment clearing >> on affected >> +hardware. The clearing logic is tuned to microarchitectural details >> of the >> +affected CPUs. >> + >> On hardware vulnerable to L1TF, the `l1tf-barrier=` option can be >> used to force >> or prevent Xen from protecting evaluations inside the hypervisor >> with a barrier >> instruction to not load potentially secret information into L1 >> cache. By > > Purely out of curiosity: Is there a reason both insertions happen between > two pre-existing sub-options, not after all of them? Easier backportability. That and l1tf-barrier is not going to say in this form by the time 4.13 gets released. > >> --- a/xen/arch/x86/domain.c >> +++ b/xen/arch/x86/domain.c >> @@ -1328,6 +1328,29 @@ static void load_segments(struct vcpu *n) >> dirty_segment_mask = per_cpu(dirty_segment_mask, cpu); >> per_cpu(dirty_segment_mask, cpu) = 0; >> + /* >> + * CPUs which suffer from stale segment register leakage have >> two copies >> + * of each segment register [Correct at the time of writing - >> Aug 2019]. >> + * >> + * We must write to both of them to scrub state from the >> previous vcpu. >> + * However, two writes in quick succession stall the pipeline, >> as only one >> + * write per segment can be speculatively outstanding. >> + * >> + * Split the two sets of writes to each register to maximise the >> chance >> + * that these writes have retired before the second set starts, >> thus >> + * reducing the chance of stalling. >> + */ >> + if ( opt_stale_seg_clear ) >> + { >> + asm volatile ( "mov %[sel], %%ds;" >> + "mov %[sel], %%es;" >> + "mov %[sel], %%fs;" >> + "mov %[sel], %%gs;" >> + :: [sel] "r" (0) ); >> + /* Force a reload of all segments to be the second scrubbing >> write. */ >> + dirty_segment_mask = ~0; >> + } > > Having the command line option, do we care about people actually using > it on AMD hardware? I care that it can be forced on to test. Beyond that, I expect the overhead will be large on Atom and AMD, which is why the command like doc was worded a bit more forcefully than other options, and is hinting at "don't try using this on non-affected hardware". > For %ds and %es this would now lead to up to three > selector register loads each, with the one above being completely > pointless (due to not clearing the segment bases anyway). Question of > course is whether adding yet another conditional (to the added code > above or to preload_segment()) would be any better than having this > extra selector register write. preload_segment() needs to change anyway given Rome's behaviour, but I need to wait for feedback from AMD as to whether they are happy with my CPUID bit and erratum proposal. > > Furthermore, if we cared, shouldn't SVM code also honor the flag and > gain extra loads, albeit perhaps with unlikely() used in the if()? If you observe, svm_ctxt_switch_to() already has this unconditionally. This was added in f9caf5ffaba but I'm not convinced it can be correct. I don't see equivalent logic in KVM, or any description to this effect in the APM. It might be an errata on an early SVM-capable processor. > > As to your choice of loading a nul selector: For the VERW change iirc > we've been told that using a nul selector is a bad choice from > performance pov. Are nul selectors being treated better for selector > register writes? VERW and segment loads are very different. VERW is microcoded, unconditionally references memory, and doesn't have a plausible usecase in the 286+ where you would call it with a NUL selector (until the MDS side effect came along). As a result, handling NUL is the slowpath in microcode. Loading a segment selector with NUL has a specific meaning, and is the common case in 64bit code. It does not reference memory. In practice, my microperf tests show no difference in net time between using a real selector and NUL. The majority of samples come in cycle-identical. This might mean that NUL is slowpath'd (considering it doesn't have a memory read to perform), but it probably means that other aspects of segment loading dominate. > >> @@ -872,11 +873,38 @@ static void vmx_ctxt_switch_from(struct vcpu *v) >> static void vmx_ctxt_switch_to(struct vcpu *v) >> { >> + /* >> + * CPUs which suffer from stale segment register leakage have >> two copies >> + * of each segment register [Correct at the time of writing - >> Aug 2019]. >> + * >> + * We must write to both of them to scrub state from the >> previous vcpu. >> + * However, two writes in quick succession stall the pipeline, >> as only one >> + * write per segment can be speculatively outstanding. >> + * >> + * Split the two sets of writes to each register to maximise the >> chance >> + * that these writes have retired before the second set starts, >> thus >> + * reducing the chance of stalling. >> + */ >> + if ( opt_stale_seg_clear ) >> + asm volatile ( "mov %[sel], %%ds;" >> + "mov %[sel], %%es;" >> + "mov %[sel], %%fs;" >> + "mov %[sel], %%gs;" >> + :: [sel] "r" (0) ); >> + >> vmx_restore_guest_msrs(v); >> vmx_restore_dr(v); >> if ( v->domain->arch.hvm.pi_ops.flags & PI_CSW_TO ) >> vmx_pi_switch_to(v); >> + >> + /* Should be last in the function. See above. */ >> + if ( opt_stale_seg_clear ) >> + asm volatile ( "mov %[sel], %%ds;" >> + "mov %[sel], %%es;" >> + "mov %[sel], %%fs;" >> + "mov %[sel], %%gs;" >> + :: [sel] "r" (0) ); >> } > > Why two instances? Aren't the selector register loads during VM > entry sufficient to clear both instances? (The question is > rhetorical in a way, because I think I know the answer, but > neither the comment here nor the patch description provide it.) This really depends on how much information information Intel are willing to make public on the subject. I don't think it is reasonable to presume that VMEntry works as a sequence of architectural instructions (it provably doesn't when it comes to control registers), which is why I commented the logic in this way. > >> @@ -111,6 +114,7 @@ static int __init parse_spec_ctrl(const char *s) >> opt_ibpb = false; >> opt_ssbd = false; >> opt_l1d_flush = 0; >> + opt_stale_seg_clear = 0; >> } > > Is it really correct for this to be affected by both "spec-ctrl=no" > and "spec-ctrl=no-xen"? It would seem to me that it would belong > above the "disable_common" label, as the control also is meant to > guard against guest->guest leaks. spec-ctrl=no-xen is ill-defined. As stated, it says "don't do anything in Xen except enough to virtualise capabilities for guests". However, an argument could be made to say that this is conceptually similar to eager-fpu. Either way, I think a clarification to what no-xen means is needed first. > >> @@ -864,6 +871,83 @@ static __init void mds_calculations(uint64_t caps) >> } >> } >> +/* Calculate whether this CPU leaks segment registers between >> contexts. */ >> +static void __init stale_segment_calculations(void) >> +{ >> + /* >> + * Assume all unrecognised processors are ok. This is only >> known to >> + * affect Intel Family 6 processors. >> + */ >> + if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL || >> + boot_cpu_data.x86 != 6 ) >> + return; > > The comment above here and ... > >> + switch ( boot_cpu_data.x86_model ) >> + { >> + /* >> + * Core processors since at least Nehalem are vulnerable. >> + */ >> + case 0x1e: /* Nehalem */ >> + case 0x1f: /* Auburndale / Havendale */ >> + case 0x1a: /* Nehalem EP */ >> + case 0x2e: /* Nehalem EX */ >> + case 0x25: /* Westmere */ >> + case 0x2c: /* Westmere EP */ >> + case 0x2f: /* Westmere EX */ >> + case 0x2a: /* SandyBridge */ >> + case 0x2d: /* SandyBridge EP/EX */ >> + case 0x3a: /* IvyBridge */ >> + case 0x3e: /* IvyBridge EP/EX */ >> + case 0x3c: /* Haswell */ >> + case 0x3f: /* Haswell EX/EP */ >> + case 0x45: /* Haswell D */ >> + case 0x46: /* Haswell H */ >> + case 0x3d: /* Broadwell */ >> + case 0x47: /* Broadwell H */ >> + case 0x4f: /* Broadwell EP/EX */ >> + case 0x56: /* Broadwell D */ >> + case 0x4e: /* Skylake M */ >> + case 0x55: /* Skylake X */ >> + case 0x5e: /* Skylake D */ >> + case 0x66: /* Cannonlake */ >> + case 0x67: /* Cannonlake? */ >> + case 0x8e: /* Kabylake M */ >> + case 0x9e: /* Kabylake D */ >> + cpu_has_bug_stale_seg = true; >> + break; >> + >> + /* >> + * Atom processors are not vulnerable. >> + */ >> + case 0x1c: /* Pineview */ >> + case 0x26: /* Lincroft */ >> + case 0x27: /* Penwell */ >> + case 0x35: /* Cloverview */ >> + case 0x36: /* Cedarview */ >> + case 0x37: /* Baytrail / Valleyview (Silvermont) */ >> + case 0x4d: /* Avaton / Rangely (Silvermont) */ >> + case 0x4c: /* Cherrytrail / Brasswell */ >> + case 0x4a: /* Merrifield */ >> + case 0x5a: /* Moorefield */ >> + case 0x5c: /* Goldmont */ >> + case 0x5f: /* Denverton */ >> + case 0x7a: /* Gemini Lake */ >> + break; >> + >> + /* >> + * Knights processors are not vulnerable. >> + */ >> + case 0x57: /* Knights Landing */ >> + case 0x85: /* Knights Mill */ >> + break; >> + >> + default: >> + printk("Unrecognised CPU model %#x - assuming vulnerable to >> StaleSeg\n", >> + boot_cpu_data.x86_model); >> + break; > > ... the plain "break" here are not in line with the log message text. > Did you mean to add "not"? No. I intended this path to set cpu_has_bug_stale_seg. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.