|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v5 1/7] x86emul: support LKGS
On 04.09.2024 16:24, Andrew Cooper wrote:
> On 04/09/2024 1:28 pm, Jan Beulich wrote:
>> ---
>> Instead of ->read_segment() we could of course also use ->read_msr() to
>> fetch the original GS base. I don't think I can see a clear advantage of
>> either approach; the way it's done it matches how we handle SWAPGS.
>
> It turns out this is littered with broken corners. See below.
I'm afraid it hasn't become clear to me which of your further comments
are the "broken corners".
>> --- a/tools/tests/x86_emulator/test_x86_emulator.c
>> +++ b/tools/tests/x86_emulator/test_x86_emulator.c
>> @@ -693,6 +719,20 @@ static int read_msr(
>> *val = ctxt->addr_size > 32 ? 0x500 /* LME|LMA */ : 0;
>> return X86EMUL_OKAY;
>>
>> +#ifdef __x86_64__
>> + case 0xc0000101: /* GS_BASE */
>
> It's only just occurred to me, but given x86-defns.h, isn't msr-index.h
> suitably usable too ?
We are doing so already. Just not in this function. And since there
were hex numbers with comments here, I (blindly) added more. I'll submit
a cleanup patch to change the pre-existing ones, and I've already
switched over this and further patches to use the named constants
instead.
>> @@ -1335,6 +1400,41 @@ int main(int argc, char **argv)
>> printf("%u bytes read - ", bytes_read);
>> goto fail;
>> }
>> + printf("okay\n");
>> +
>> + emulops.write_segment = write_segment;
>> + emulops.write_msr = write_msr;
>> +
>> + printf("%-40s", "Testing swapgs...");
>> + instr[0] = 0x0f; instr[1] = 0x01; instr[2] = 0xf8;
>> + regs.eip = (unsigned long)&instr[0];
>> + gs_base = 0xffffeeeecccc8888UL;
>> + gs_base_shadow = 0x0000111122224444UL;
>> + rc = x86_emulate(&ctxt, &emulops);
>> + if ( (rc != X86EMUL_OKAY) ||
>> + (regs.eip != (unsigned long)&instr[3]) ||
>> + (gs_base != 0x0000111122224444UL) ||
>> + (gs_base_shadow != 0xffffeeeecccc8888UL) )
>> + goto fail;
>> + printf("okay\n");
>> +
>> + printf("%-40s", "Testing lkgs 2(%rdx)...");
>> + instr[0] = 0xf2; instr[1] = 0x0f; instr[2] = 0x00; instr[3] = 0x72;
>> instr[4] = 0x02;
>> + regs.eip = (unsigned long)&instr[0];
>> + regs.edx = (unsigned long)res;
>> + res[0] = 0x00004444;
>> + res[1] = 0x8888cccc;
>> + i = cp.extd.nscb; cp.extd.nscb = true; /* for AMD */
>> + rc = x86_emulate(&ctxt, &emulops);
>> + if ( (rc != X86EMUL_OKAY) ||
>> + (regs.eip != (unsigned long)&instr[5]) ||
>> + (gs_base != 0x0000111122224444UL) ||
>> + gs_base_shadow )
>> + goto fail;
>> +
>> + cp.extd.nscb = i;
>
> I think I acked the patches to rename this?
>
> I'd suggest putting those in now, rather than creating more (re)work later.
That was sitting on top, and I was kind of hoping that I could avoid the
re-basing ahead. But I've meanwhile done so, including the committing of
the result, as you've probably seen.
>> --- a/xen/arch/x86/x86_emulate/decode.c
>> +++ b/xen/arch/x86/x86_emulate/decode.c
>> @@ -743,8 +743,12 @@ decode_twobyte(struct x86_emulate_state
>> case 0:
>> s->desc |= DstMem | SrcImplicit | Mov;
>> break;
>> + case 6:
>> + if ( !(s->modrm_reg & 1) && mode_64bit() )
>> + {
>> case 2: case 4:
>> - s->desc |= SrcMem16;
>> + s->desc |= SrcMem16;
>> + }
>
> Well - not something I was expecting, but I've just had to go and find
> the Itanium instruction manuals...
>
> Do we really need this complexity? JMPE is a decoding wrinkle of
> Itanium processors, which I think we can reasonably ignore.
>
> IMO we should treat Grp6 as uniformly Reg/Mem16, and rely on the
> !mode_64bit() to exclude the encodings commonly used as JMPE.
We already handle modrm_reg 0 and 1 differently. I'm not convinced of
making 7 match 6 without need. We can't predict what Intel will put
there - JMPE (which I'm not really concerned about here, and which
the logic being added also doesn't exclude) already didn't match the
reg/mem16 pattern.
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -2870,8 +2870,35 @@ x86_emulate(
>> break;
>> }
>> break;
>> - default:
>> - generate_exception_if(true, X86_EXC_UD);
>> + case 6: /* lkgs */
>> + generate_exception_if((modrm_reg & 1) || vex.pfx != vex_f2,
>> + X86_EXC_UD);
>> + generate_exception_if(!mode_64bit() || !mode_ring0(),
>> X86_EXC_UD);
>> + vcpu_must_have(lkgs);
>> + fail_if(!ops->read_segment || !ops->read_msr ||
>> + !ops->write_segment || !ops->write_msr);
>> + if ( (rc = ops->read_msr(MSR_SHADOW_GS_BASE, &msr_val,
>> + ctxt)) != X86EMUL_OKAY ||
>> + (rc = ops->read_segment(x86_seg_gs, &sreg,
>> + ctxt)) != X86EMUL_OKAY )
>> + goto done;
>> + dst.orig_val = sreg.base; /* Preserve full GS Base. */
>> + if ( (rc = protmode_load_seg(x86_seg_gs, src.val, false, &sreg,
>> + ctxt, ops)) != X86EMUL_OKAY ||
>> + /* Write (32-bit) base into SHADOW_GS. */
>> + (rc = ops->write_msr(MSR_SHADOW_GS_BASE, sreg.base,
>
> The comment says 32-bit, but that's the full base, isn't it?
The function writes the full base, but what we retrieved via
protmode_load_seg() is only 32 bits wide. Hence the parenthesization
in the comment. I can add e.g. "zero-extended" if you think that makes
things more clear?
>> + ctxt)) != X86EMUL_OKAY )
>> + goto done;
>> + sreg.base = dst.orig_val; /* Reinstate full GS Base. */
>> + if ( (rc = ops->write_segment(x86_seg_gs, &sreg,
>> + ctxt)) != X86EMUL_OKAY )
>> + {
>> + /* Best effort unwind (i.e. no real error checking). */
>> + if ( ops->write_msr(MSR_SHADOW_GS_BASE, msr_val,
>> + ctxt) == X86EMUL_EXCEPTION )
>> + x86_emul_reset_event(ctxt);
>> + goto done;
>> + }
>
> Do we need all of this?
>
> Either protmode_load_seg() fails and there's nothing to unwind, or
> write_msr() fails and we only have to unwind GS.
>
> I think?
Since you say "all" I can only assume you mean both the write_segment()
and the write_msr(). We need the former, as we replaced the segment
base if protmode_load_seg() succeeded. It's only the write_msr() which
is debatable, yet as indicated that matches SWAPGS handling. I'd like
to keep the two as similar as possible.
> This is actually a good example of where pipeline microcode has a much
> easier time than we do. Inside the pipeline, there's no such thing as
> "can't store to gs & GS_KERN once the checks are done".
Indeed.
> Although it does make me wonder. Would LKGS trigger the MSR
> intercepts? Architecturally, it writes MSR_GS_KERN, so ought to trigger
> the Write intercept.
>
> However, version 7 of the FRED spec says:
>
> "Because the base address in the descriptor is only 32 bits, LKGS clears
> the upper 32 bits of the 64-bit IA32_KERNEL_GS_BASE MSR."
>
> so I suspect it does not architecturally read MSR_GS_KERN, so would not
> trigger the Read intercept (or introspection for that matter.)
Well, I'm looking at this differently anyway: The MSR is merely an alias
for the segment base. Just like LFS/LGS won't trigger respective MSR
intercepts, LKGS shouldn't either.
> AFAICT, we're only performing the read in order to do the best-effort
> unwind, so I think that path needs dropping.
No, as said - we need to put back the correct base of the "real" GS.
>> --- a/xen/include/public/arch-x86/cpufeatureset.h
>> +++ b/xen/include/public/arch-x86/cpufeatureset.h
>> @@ -296,6 +296,8 @@ XEN_CPUFEATURE(AVX512_BF16, 10*32+ 5) /
>> XEN_CPUFEATURE(FZRM, 10*32+10) /*A Fast Zero-length REP MOVSB */
>> XEN_CPUFEATURE(FSRS, 10*32+11) /*A Fast Short REP STOSB */
>> XEN_CPUFEATURE(FSRCS, 10*32+12) /*A Fast Short REP CMPSB/SCASB */
>> +XEN_CPUFEATURE(FRED, 10*32+17) /* Flexible Return and Event
>> Delivery */
>> +XEN_CPUFEATURE(LKGS, 10*32+18) /*S Load Kernel GS Base */
>
> Can we please keep this 's' until we've had a play on real hardware?
Sure.
> Also, as we're going for CPUID bits more generally these days, bit 20 is
> NMI_SRC also from the FRED spec.
I can add that, sure. It just seemed unrelated to me. I wanted to have
FRED to put in place the dependency in gen-cpuid.py. What isn't quite
clear to me is whether there should then also be a dependency recorded
between FRED and NMI_SRC.
>> @@ -338,6 +338,9 @@ def crunch_numbers(state):
>>
>> # The behaviour described by RRSBA depend on eIBRS being active.
>> EIBRS: [RRSBA],
>> +
>> + # FRED builds on the LKGS instruction.
>> + LKGS: [FRED],
>
> I'd be tempted to justify this differently.
>
> It is intentional that LKGS is usable with CR4.FRED=0, for the benefit
> of FRED-aware-but-not-active OSes running on FRED-capable hardware.
>
> However, FRED=1 systems cannot operate without LKGS.
This is what I'm meaning to say with the comment. Whereas ...
> So, perhaps "There is no hard dependency, but the spec requires that
> LKGS is available in FRED systems" ?
... this is weaker than what I think is wanted/needed.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |