[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH for 4.9 3/6] x86/hvm: Fix segmentation logic for system segments
>>> On 03.04.17 at 17:42, <andrew.cooper3@xxxxxxxxxx> wrote: > On 03/04/17 16:07, Jan Beulich wrote: >>>>> On 03.04.17 at 16:27, <andrew.cooper3@xxxxxxxxxx> wrote: >>> On 03/04/17 10:13, Jan Beulich wrote: >>>>>>> On 31.03.17 at 21:50, <andrew.cooper3@xxxxxxxxxx> wrote: >>>>> --- a/xen/arch/x86/hvm/hvm.c >>>>> +++ b/xen/arch/x86/hvm/hvm.c >>>>> @@ -2374,13 +2374,27 @@ int hvm_set_cr4(unsigned long value, bool_t >>> may_defer) >>>>> return X86EMUL_OKAY; >>>>> } >>>>> >>>>> +enum hvm_segmentation_mode hvm_seg_mode( >>>>> + const struct vcpu *v, enum x86_segment seg, >>>>> + const struct segment_register *cs) >>>> The inputs here are at least somewhat counterintuitive (for example, >>>> from an abstract pov it is unexpected that the result depends on seg >>>> and cs). At the very least I think the naming should make clear that >>>> cs is not just some code segment, but the CS v has currently in use >>>> (e.g. active_cs). Going further the question is whether having this >>>> helper is really useful (and not perhaps inviting mis-use), and hence >>>> whether the inputs passed here wouldn't better be passed directly >>>> to hvm_virtual_to_linear_addr(), the more that the "seg" argument >>>> is required to match up between the two calls. >>> I purposefully wanted to avoid people opencoding the logic and getting >>> it wrong (looks like even I got it wrong). >>> >>> I'm not convinced that passing the parameters individually is better. >>> >>>>> +{ >>>>> + if ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_PE) ) >>>>> + return hvm_seg_mode_real; >>>> What about VM86 mode? >>> Very good point. >>> >>>>> + if ( hvm_long_mode_active(v) && >>>>> + (is_x86_system_segment(seg) || cs->attr.fields.l) ) >>>>> + return hvm_seg_mode_long; >>>>> + >>>>> + return hvm_seg_mode_prot; >>>> Did you verify this actually matching real hardware behavior? There >>>> being obvious anomalies when compat ring-0 code executes >>>> LGDT/LIDT/SGDT/SIDT (in long mode these ought to have 10-byte >>>> operands, yet 32-bit/compat code would expect 6-byte ones, so >>>> one of the two views is necessarily wrong, and whichever it is, it >>>> introduces an inconsistency), I wouldn't take it for given that _all_ >>>> descriptor table accessing insns behave like they would from a >>>> 64-bit code segment (I nevertheless assume they do, but as I >>>> can't see it written down anywhere, we shouldn't assume so, >>>> considering how many other oddities there are in x86). >>>> >>>> This question is also being supported by the SDM using the same >>>> standard "Same exceptions as in protected mode" in the >>>> respective insns' "Compatibility Mode Exceptions" sections, yet >>>> the behavior above implies that #GP(0) might also result for >>>> compat mode descriptor table accesses if the descriptor address >>>> ends up being non-canonical. Interestingly enough the PM >>>> doesn't separate exception specifications for 32-bit protected, >>>> compat, and 64-bit modes. >>> You are mistaken. >>> >>> {L,S}{I,G}DT are {read,write}_segment_register() operations, using a >>> plain memory operand. >>> >>> When we come to the segmentation check, it will be by default >>> %ds-relative, with size as calculated by op_bytes in the instruction >>> emulator. >> I think I didn't make myself clear then: I'm not at all talking about how >> the memory access of these insns get carried out, I solely talk about >> the size of their operands: > > I still fail to see what the size of the operands have to do with the > choice of segmentation mode. > >> In long mode to load IDTR or GDTR you'd expect a 64-bit base and a 16-bit > limit. > > Why? I'd expect nothing of the sort, because 32bit compat segments are > purposefully designed to be no functional difference from regular 32bit > protected mode segments. That includes not changing the behaviour of > instructions like this. Well, one can of course take the position that ring-0 compat code is simply a useless thing. >> Hence if _all_ system segment >> register related insns worked consistently in long mode, the four >> named insns would have to have 10-byte operands. > > This isn't a valid expectation to draw. > >> I'm pretty sure >> they don't though, so there is _one_ anomaly already. > > Indeed they don't. In a compatibility mode segment, they have take a > 6-byte operand, identically to 32bit mode. > >> With that I don't think we can rule out there being other anomalies, with >> this >> not being talked about explicitly anywhere in the doc. > > I don't think any of this is relevant to the correctness of this patch. I don't question the correctness; all I question is how far it is built upon assumptions vs actually observed (short of documented) behavior. > Without this fix, implicit accesses to system segments in a > compatibility mode segment will truncate the resulting linear address as > part of performing the segmentation calculations, which is obviously not > how real hardware behaves. I understand this. But things are a little more complicated. Just extend your line of thinking regarding IDTR/GDTR to LDTR and TR: Above you suggest that the former two get loaded in a fully 32-bit mode compatible way. LTR and LLDT (as well as LAR and LSL), however, access a descriptor table. 32-bit code would expect an 8-byte descriptor to be read. Is compat mode code then not allowed to make the same assumption? I think you've made pretty much explicit that you'd expect a 16-byte descriptor to be loaded in this case. So ring-0 compat code operates identically to 32-bit mode in some respects, and identically to 64-bit code in others. Fundamentally I'd expect consistency here, but along the lines of the usefulness remark higher up I simply think that no-one had spent any thought on this when designing x86-64; otherwise it would have been easy to simply disallow ring-0 to enter compat mode, thus avoiding any inconsistencies. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |