[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposal for physical address based hypercalls


  • To: Juergen Gross <jgross@xxxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Thu, 29 Sep 2022 13:32:51 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3e8biq75uQu/4U7m+07C5R7B/p4UnYz5av0tcBksW6I=; b=jRW2SMfJVJkpbnRB4DFesDhVCZj5qRiW8k2LzFNpcdnXJ4jNfd3ThO2MeuVP0CUlDG5EsrEYb9zJ9QhU9m18gu3zBsXPI/iXqFWwd55iauAnIWuvE4+J5PDVTYW7K+lE3pqdUbXaMlHczHniE/aoKLSkzVonQR/Geq2bA/7eKXi8NsvCz2cSXMr9LIezlVBa0xNPfdL+nHucWsK2Rks6O16QEhFkCcdwLWpJf1rks1I5lmkDcnEYkq3cKRqUWp9uN8HjrAIDnQDBq1uo57dboHWV2K91ladeOK3pqUfhGqtYfwqOe/QWnKZTM1r1OvxAeu3fvUK5/LnrWzkVGTsUDA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XTsnRs6Axqqt8YPX3oDUz9pwkC/7LH3wKjBfM4/06+FNzkuqhb2yDePGkKlv/imZVUMMtNhy0OadqIY8Y9E8Ov6aVwYHVIld/1cJcr3xHPFrOmjP7RfPCcY6N+isnCPExW6mJHaaQyFn5GFBmvyzbKS6xq9dRjJOipnSwgXhhgDr8U9aTAJUHqZYeZFg2zrAkD3xsz8DCRBm3AAeczVSuVPhLB3ymBlFz7vO/d/FICQDJWJXegCasy/FBo2u8l8VmTmHDjDpcAsKrCCbKYSjkRFeAwanePgWXWUYGVsm/86mW6+ruJMnfUHBa+Iu+rcoNvWZ0yQk6GMWmbRh2I7Tfg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Delivery-date: Thu, 29 Sep 2022 11:32:59 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.09.2022 15:03, Juergen Gross wrote:
> On 28.09.22 14:06, Jan Beulich wrote:
>> On 28.09.2022 12:58, Andrew Cooper wrote:
>>> On 28/09/2022 11:38, Jan Beulich wrote:
>>>> As an alternative I'd like to propose the introduction of a bit (or 
>>>> multiple
>>>> ones, see below) augmenting the hypercall number, to control the flavor of 
>>>> the
>>>> buffers used for every individual hypercall.  This would likely involve the
>>>> introduction of a new hypercall page (or multiple ones if more than one 
>>>> bit is
>>>> to be used), to retain the present abstraction where it is the hypervisor 
>>>> which
>>>> actually fills these pages.
>>>
>>> There are other concerns which need to be accounted for.
>>>
>>> Encrypted VMs cannot use a hypercall page; they don't trust the
>>> hypervisor in the first place, and the hypercall page is (specifically)
>>> code injection.  So the sensible new ABI cannot depend on a hypercall table.
>>
>> I don't think there's a dependency, and I think there never really has been.
>> We've been advocating for its use, but we've not enforced that anywhere, I
>> don't think.
>>
>>> Also, rewriting the hypercall page on migrate turns out not to have been
>>> the most clever idea, and only works right now because the instructions
>>> are the same length in the variations for each mode.
>>>
>>> Also continuations need to change to avoid userspace liveness problems,
>>> and existing hypercalls that we do have need splitting between things
>>> which are actually privileged operations (within the guest context) and
>>> things which are logical control operations, so the kernel can expose
>>> the latter to userspace without retaining the gaping root hole which is
>>> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot.
>>>
>>> So yes, starting some new clean(er) interface from hypercall 64 is the
>>> plan, but it very much does not want to be a simple mirror of the
>>> existing 0-63 with a differing calling convention.
>>
>> All of these look like orthogonal problems to me. That's likely all
>> relevant for, as I think you've been calling it, ABI v2, but shouldn't
>> hinder our switching to a physical address based hypercall model.
>> Otherwise I'm afraid we'll never make any progress in that direction.
> 
> What about an alternative model allowing to use most of the current
> hypercalls unmodified?
> 
> We could add a new hypercall for registering hypercall buffers via
> virtual address, physical address, and size of the buffers (kind of a
> software TLB).

Why not?

> The buffer table would want to be physically addressed
> by the hypercall, of course.

I'm not convinced of this, as it would break uniformity of the hypercall
interfaces. IOW in the hypervisor we then wouldn't be able to use
copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't
be a table, but a hypercall not involving any buffers (i.e. every
discontiguous piece would need registering separately). I expect such a
software TLB wouldn't have many entries, so needing to use a couple of
hypercalls shouldn't be a major issue.

> It might be interesting to have this table per vcpu (it should be
> allowed to use the same table for multiple vcpus) in order to speed
> up finding translation entries of percpu buffers.

Yes. Perhaps insertion and purging could simply be two new VCPUOP_*.

As a prereq I think we'd need to sort the cross-vCPU accessing of guest
data, coincidentally pointed out in a post-commit-message remark in
https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The
subject vCPU isn't available in copy_to_user_hvm(), which is where I'd
expect the TLB lookup to occur (while assuming handles point at globally
mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't).

> Any hypercall buffer being addressed virtually could first tried to
> be found via the SW-TLB. This wouldn't require any changes for most
> of the hypercall interfaces. Only special cases with very large buffers
> might need indirect variants (like Jan said: via GFN lists, which could
> be passed in registered buffers).
> 
> Encrypted guests would probably want to use static percpu buffers in
> order to avoid switching the encryption state of the buffers all the
> time.
> 
> An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one
> giant buffer with the domain's memory size via the physical memory
> mapping of the kernel. All kmalloc() addresses would be in that region.

That's Linux-centric. I'm not convinced all OSes maintain a directmap.
Without such, switching to this model might end up quite intrusive on
the OS side.

Thinking of Linux, we'd need a 2nd range covering the data part of the
kernel image.

Further this still wouldn't (afaics) pave a reasonable route towards
dealing with privcmd-invoked hypercalls.

Finally - in how far are we concerned of PV guests using linear
addresses for hypercall buffers? I ask because I don't think the model
lends itself to use also for the PV guest interfaces.

Jan

> A buffer address not found would need to be translated like today (and
> fail for an encrypted guest).
> 
> Thoughts?
> 
> 
> Juergen




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.