[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 7/7] x86emul: support SYSRET



On 25.03.2020 11:00, Andrew Cooper wrote:
> On 24/03/2020 16:29, Jan Beulich wrote:
>> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
>> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
>> @@ -5975,6 +5975,60 @@ x86_emulate(
>>              goto done;
>>          break;
>>  
>> +    case X86EMUL_OPC(0x0f, 0x07): /* sysret */
>> +        vcpu_must_have(syscall);
>> +        /* Inject #UD if syscall/sysret are disabled. */
>> +        fail_if(!ops->read_msr);
>> +        if ( (rc = ops->read_msr(MSR_EFER, &msr_val, ctxt)) != X86EMUL_OKAY 
>> )
>> +            goto done;
>> +        generate_exception_if((msr_val & EFER_SCE) == 0, EXC_UD);
> 
> (as with the SYSCALL side), no need for the vcpu_must_have(syscall) as
> well as this check.

Hmm, yes, we do so elsewhere too, so I'll adjust this there and here.

>> +        generate_exception_if(!amd_like(ctxt) && !mode_64bit(), EXC_UD);
>> +        generate_exception_if(!mode_ring0(), EXC_GP, 0);
>> +        generate_exception_if(!in_protmode(ctxt, ops), EXC_GP, 0);
>> +
> 
> The Intel SYSRET vulnerability checks regs->rcx for canonicity here, and
> raises #GP here.
> 
> I see you've got it below, but this is where the Intel pseudocode puts
> it, before MSR_STAR gets read, and logically it should be grouped with
> the other excpetions.

I had it here first, then moved it down to avoid yet another mode_64bit()
instance. I didn't see why the ordering would matter for the overall
result, on the basis that the STAR read ought not to fail under normal
circumstances. I'll move it back where it was since you ask for it.

>> +        if ( (rc = ops->read_msr(MSR_STAR, &msr_val, ctxt)) != X86EMUL_OKAY 
>> )
>> +            goto done;
>> +        sreg.sel = ((msr_val >> 48) + 8) | 3; /* SELECTOR_RPL_MASK */
> 
> This would be the logical behaviour...
> 
> AMD CPUs |3 into %cs.sel, but don't make an equivalent adjustment for
> %ss.sel, and simply take MSR_START.SYSRET_CS + 8.
> 
> If you aren't careful with MSR_STAR, SYSRET will return to userspace
> with mismatching RPL/DPL and userspace can really find itself with an
> %ss with an RPL of 0.  (Of course, when you take an interrupt and
> attempt to IRET back to this context, things fall apart).
> 
> I discovered this entirely by accident in XTF, but it is confirmed by
> careful reading of the AMD SYSRET pseudocode.

I did notice this in their pseudocode, but it looked too wrong to
be true. Will change.

>> +        cs.sel = op_bytes == 8 ? sreg.sel + 8 : sreg.sel - 8;
>> +
>> +        cs.base = sreg.base = 0; /* flat segment */
>> +        cs.limit = sreg.limit = ~0u; /* 4GB limit */
>> +        cs.attr = 0xcfb; /* G+DB+P+DPL3+S+Code */
>> +        sreg.attr = 0xcf3; /* G+DB+P+DPL3+S+Data */
> 
> Again, that would be the logical behaviour...
> 
> AMD CPU's don't update anything but %ss.sel, and even comment the fact
> in pseudocode now.
> 
> This was discovered by Andy Luto, where he found that taking an
> interrupt (unconditionally sets %ss to NUL), and opportunistic sysret
> back to 32bit userspace lets userspace see a sane %ss value, but with
> the attrs still empty, and the stack unusable.
> 
>> +
>> +#ifdef __x86_64__
>> +        if ( mode_64bit() )
>> +        {
>> +            if ( op_bytes == 8 )
>> +            {
>> +                cs.attr = 0xafb; /* L+DB+P+DPL3+S+Code */
>> +                generate_exception_if(!is_canonical_address(_regs.rcx) &&
>> +                                      !amd_like(ctxt), EXC_GP, 0);
> 
> Wherever this ends up living, I think it needs calling out with a
> comment /* CVE-xxx, Intel privilege escalation hole */, as it is a very
> subtle piece of vendor specific behaviour.
> 
> Do we have a Centaur/other CPU to try with?  I'd err on the side of
> going with == Intel rather than !AMD to avoid introducing known
> vulnerabilities into models which stand half a chance of not being affected.

I'd rather not - this exception behavior is spelled out by the
SDM, and hence imo pretty likely to be followed by clones.
While I do have a VIA box somewhere, it's not stable enough to
run for more than a couple of minutes.

>> +                _regs.rip = _regs.rcx;
>> +            }
>> +            else
>> +                _regs.rip = _regs.ecx;
>> +
>> +            _regs.eflags = _regs.r11 & ~(X86_EFLAGS_RF | X86_EFLAGS_VM);
>> +        }
>> +        else
>> +#endif
>> +        {
>> +            _regs.r(ip) = _regs.ecx;
>> +            _regs.eflags |= X86_EFLAGS_IF;
>> +        }
>> +
>> +        fail_if(!ops->write_segment);
>> +        if ( (rc = ops->write_segment(x86_seg_cs, &cs, ctxt)) != 
>> X86EMUL_OKAY ||
>> +             (!amd_like(ctxt) &&
>> +              (rc = ops->write_segment(x86_seg_ss, &sreg,
>> +                                       ctxt)) != X86EMUL_OKAY) )
> 
> Oh - here is the AMD behaviour with %ss, but its not quite correct.
> 
> AFAICT, the correct behaviour is to read the old %ss on AMD-like, set
> flat attributes on Intel, and write back normally, because %ss.sel does
> get updated.

Oh, of course - I meant to, got distracted, and then forgot. Will fix.

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.