[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] include/public: add new elf note for support of huge physical addresses



>>> On 14.08.17 at 14:21, <jgross@xxxxxxxx> wrote:
> On 14/08/17 13:40, Jan Beulich wrote:
>>>>> On 14.08.17 at 13:05, <jgross@xxxxxxxx> wrote:
>>> On 14/08/17 12:48, Jan Beulich wrote:
>>>>>>> On 14.08.17 at 12:35, <jgross@xxxxxxxx> wrote:
>>>>> On 14/08/17 12:29, Jan Beulich wrote:
>>>>>>>>> On 14.08.17 at 12:21, <jgross@xxxxxxxx> wrote:
>>>>>>> Current pv guests will only see physical addresses up to 46 bits wide.
>>>>>>> In order to be able to run on a host supporting 5 level paging and to
>>>>>>> make use of any possible memory page there, physical addresses with up
>>>>>>> to 52 bits have to be supported.
>>>>>>
>>>>>> Is this a Xen shortcoming or a Linux one (I assume the latter)?
>>>>>
>>>>> It is a shortcoming of the Xen pv interface.
>>>>
>>>> Please be more precise: Where in the interface to we have a
>>>> restriction to 46 bits?
>>>
>>> We have no definition that the mfn width in a pte can be larger than
>>> the pfn width for a given architecture (in this case a 4 level paging
>>> 64 bit x86 host).
>>>
>>> So Xen has to assume a guest not telling otherwise has to be limited
>>> to mfns not exceeding 4 level hosts maximum addresses.
>> 
>> The number of page table levels affects only virtual address
>> width. Physical addresses can architecturally be 52 bits wide,
>> and what CPUID extended leaf 8 provides is what limits
>> physical address width.
> 
> Yes.
> 
> OTOH up to now there have been no x86 platforms supporting more than
> 46 bits physical address width (at least AFAIK), and this limit is
> explicitly specified for all current processors.

As said, AMD CPUs support 48 bits (and actually have hypertransport
stuff sitting at the top end, just not RAM extending that far).

>>> Or would you like to not limit current pv guests to the lower 64TB and
>>> risk them crashing, just because they interpreted the lack of any
>>> specific mfn width definition in another way as you do?
>> 
>> Again - you saying "current pv guests" rather than "current
>> Linux PV guests" makes me assume you've found some
>> limitation in the PV ABI. Yet so far you didn't point out where
>> that is, which then again makes me assume you're talking
>> about a Linux limitation.
> 
> Yes, I am talking of Linux here.
> 
> And no, you are wrong that I haven't pointed out where the limitation
> is: I have said that the PV ABI nowhere states that MFNs can be wider
> than any current processor's PFNs.

Why would it need to? The relevant limits are imposed by CPUID
output. There's no PV ABI aspect here.

> So when being pedantic you are right: the Linux kernel is violating
> the specification by not being able to run on a processor specifying
> physical address width to be 52 bits via CPUID.
> 
> OTOH as there hasn't been any such processor up to now this was no
> problem for Linux.
> 
> We could say, of course, this is a problem of Linux which should be
> fixed. I think this wouldn't be a wise thing to do: we don't want to
> do finger pointing at Linux, but we want a smooth user's experience
> with Xen. So we need some kind of interface to handle the current
> situation that no Linux kernel up to 4.13 will be able to make use of
> physical host memory above 64TB. Again: I don't think we want to let
> those kernel's just crash and tell the users its Linux' fault, they
> should either use a new kernel or KVM.

That's all fine, just that I'd expect you to make the hypervisor at
once honor the new note. Before accepting the addition to the
ABI, I'd at least like to see sketched out how the resulting
restriction would be enforced by the hypervisor. With the way
we do this for 32-bit guests I don't expect this to be entirely
straightforward.

>>> This can be easily compared to the support of 5 level paging in the
>>> kernel happening right now: When the 5 level paging machines are
>>> available in the future you won't be limited to a rather recent kernel,
>>> but you can use one already being part of some distribution.
>> 
>> Yes and no. Since we don't mean to introduce 5-level PV guests,
>> we're not adding respective MMU ops anyway. If we would, it
>> would still seem strange to introduced, say, MMUEXT_PIN_L5_TABLE
>> without also implementing it. But yes, it would be possible, just
>> that other than here there really would not be a need for the
>> hypervisor to do anything for it as long as it doesn't itself know
>> of 5 page table levels.
> 
> The patch I'm thinking of would just avoid masking away MFN bits as
> it is done today. Look in pte_mfn_to_pfn(): the MFN is obtained by
> masking the pte value with PTE_PFN_MASK. I'd like to use
> XEN_PTE_PFN_MASK instead allowing for 52 bit physical addresses.

Hmm, so you mean to nevertheless fix this on the Linux side.
Which then makes me wonder again - what do you need the note
for if you want to make Linux behave properly? By now it feels
like I'm really missing some of your rationale and/or intended plan
of action here.

> So we wouldn't need any other new interfaces. Its just handling of
> pv pte values which is different by widening the mask. And this would
> touch pv-specific code only.
> 
> I could do the Linux patch without the new ELF note. But this would mean
> Xen couldn't tell whether a pv domain is capable to use memory above
> 64TB or not. So we would have to locate _all_ pv guests below 64TB or we
> would risk crashing domains. With the ELF note we can avoid this
> dilemma.

I don't follow: It seems like you're implying that in the absence of
the note we'd restrict PV guests that way. But why would we? We
should not penalize non-Linux PV guests just because Linux has a
restriction. IOW the note needs to be present for a restriction to
be enforced, which in turn means the hypervisor first needs to
honor the note. Otherwise running a 4-level hypervisor on 5-level
capable hardware (with wider than 46-bit physical addresses)
would break Linux as well.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.