[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH for 4.9 3/6] x86/hvm: Fix segmentation logic for system segments



>>> On 03.04.17 at 17:42, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 03/04/17 16:07, Jan Beulich wrote:
>>>>> On 03.04.17 at 16:27, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> On 03/04/17 10:13, Jan Beulich wrote:
>>>>>>> On 31.03.17 at 21:50, <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>> --- a/xen/arch/x86/hvm/hvm.c
>>>>> +++ b/xen/arch/x86/hvm/hvm.c
>>>>> @@ -2374,13 +2374,27 @@ int hvm_set_cr4(unsigned long value, bool_t 
>>> may_defer)
>>>>>      return X86EMUL_OKAY;
>>>>>  }
>>>>>  
>>>>> +enum hvm_segmentation_mode hvm_seg_mode(
>>>>> +    const struct vcpu *v, enum x86_segment seg,
>>>>> +    const struct segment_register *cs)
>>>> The inputs here are at least somewhat counterintuitive (for example,
>>>> from an abstract pov it is unexpected that the result depends on seg
>>>> and cs). At the very least I think the naming should make clear that
>>>> cs is not just some code segment, but the CS v has currently in use
>>>> (e.g. active_cs). Going further the question is whether having this
>>>> helper is really useful (and not perhaps inviting mis-use), and hence
>>>> whether the inputs passed here wouldn't better be passed directly
>>>> to hvm_virtual_to_linear_addr(), the more that the "seg" argument
>>>> is required to match up between the two calls.
>>> I purposefully wanted to avoid people opencoding the logic and getting
>>> it wrong (looks like even I got it wrong).
>>>
>>> I'm not convinced that passing the parameters individually is better.
>>>
>>>>> +{
>>>>> +    if ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) )
>>>>> +        return hvm_seg_mode_real;
>>>> What about VM86 mode?
>>> Very good point.
>>>
>>>>> +    if ( hvm_long_mode_active(v) &&
>>>>> +         (is_x86_system_segment(seg) || cs->attr.fields.l) )
>>>>> +        return hvm_seg_mode_long;
>>>>> +
>>>>> +    return hvm_seg_mode_prot;
>>>> Did you verify this actually matching real hardware behavior? There
>>>> being obvious anomalies when compat ring-0 code executes
>>>> LGDT/LIDT/SGDT/SIDT (in long mode these ought to have 10-byte
>>>> operands, yet 32-bit/compat code would expect 6-byte ones, so
>>>> one of the two views is necessarily wrong, and whichever it is, it
>>>> introduces an inconsistency), I wouldn't take it for given that _all_
>>>> descriptor table accessing insns behave like they would from a
>>>> 64-bit code segment (I nevertheless assume they do, but as I
>>>> can't see it written down anywhere, we shouldn't assume so,
>>>> considering how many other oddities there are in x86).
>>>>
>>>> This question is also being supported by the SDM using the same
>>>> standard "Same exceptions as in protected mode" in the
>>>> respective insns' "Compatibility Mode Exceptions" sections, yet
>>>> the behavior above implies that #GP(0) might also result for
>>>> compat mode descriptor table accesses if the descriptor address
>>>> ends up being non-canonical. Interestingly enough the PM
>>>> doesn't separate exception specifications for 32-bit protected,
>>>> compat, and 64-bit modes.
>>> You are mistaken.
>>>
>>> {L,S}{I,G}DT are {read,write}_segment_register() operations, using a
>>> plain memory operand.
>>>
>>> When we come to the segmentation check, it will be by default
>>> %ds-relative, with size as calculated by op_bytes in the instruction
>>> emulator.
>> I think I didn't make myself clear then: I'm not at all talking about how
>> the memory access of these insns get carried out, I solely talk about
>> the size of their operands:
> 
> I still fail to see what the size of the operands have to do with the
> choice of segmentation mode.
> 
>> In long mode to load IDTR or GDTR you'd expect a 64-bit base and a 16-bit 
> limit.
> 
> Why?  I'd expect nothing of the sort, because 32bit compat segments are
> purposefully designed to be no functional difference from regular 32bit
> protected mode segments.  That includes not changing the behaviour of
> instructions like this.

Well, one can of course take the position that ring-0 compat code
is simply a useless thing.

>> Hence if _all_ system segment
>> register related insns worked consistently in long mode, the four
>> named insns would have to have 10-byte operands.
> 
> This isn't a valid expectation to draw.
> 
>>  I'm pretty sure
>> they don't though, so there is _one_ anomaly already.
> 
> Indeed they don't.  In a compatibility mode segment, they have take a
> 6-byte operand, identically to 32bit mode.
> 
>> With that I don't think we can rule out there being other anomalies, with 
>> this
>> not being talked about explicitly anywhere in the doc.
> 
> I don't think any of this is relevant to the correctness of this patch.

I don't question the correctness; all I question is how far it is built
upon assumptions vs actually observed (short of documented)
behavior.

> Without this fix, implicit accesses to system segments in a
> compatibility mode segment will truncate the resulting linear address as
> part of performing the segmentation calculations, which is obviously not
> how real hardware behaves.

I understand this. But things are a little more complicated. Just
extend your line of thinking regarding IDTR/GDTR to LDTR and
TR: Above you suggest that the former two get loaded in a fully
32-bit mode compatible way. LTR and LLDT (as well as LAR and
LSL), however, access a descriptor table. 32-bit code would
expect an 8-byte descriptor to be read. Is compat mode code
then not allowed to make the same assumption? I think you've
made pretty much explicit that you'd expect a 16-byte descriptor
to be loaded in this case. So ring-0 compat code operates
identically to 32-bit mode in some respects, and identically to
64-bit code in others. Fundamentally I'd expect consistency here,
but along the lines of the usefulness remark higher up I simply
think that no-one had spent any thought on this when designing
x86-64; otherwise it would have been easy to simply disallow
ring-0 to enter compat mode, thus avoiding any inconsistencies.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.