Xen project Mailing List

Re: Proposal for physical address based hypercalls

To: Jan Beulich <jbeulich@xxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

From: Juergen Gross <jgross@xxxxxxxx>

Date: Wed, 28 Sep 2022 15:03:24 +0200

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 28 Sep 2022 13:03:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.09.22 14:06, Jan Beulich wrote:

On 28.09.2022 12:58, Andrew Cooper wrote:

On 28/09/2022 11:38, Jan Beulich wrote:

As an alternative I'd like to propose the introduction of a bit (or multiple
ones, see below) augmenting the hypercall number, to control the flavor of the
buffers used for every individual hypercall.  This would likely involve the
introduction of a new hypercall page (or multiple ones if more than one bit is
to be used), to retain the present abstraction where it is the hypervisor which
actually fills these pages.


There are other concerns which need to be accounted for.

Encrypted VMs cannot use a hypercall page; they don't trust the
hypervisor in the first place, and the hypercall page is (specifically)
code injection.  So the sensible new ABI cannot depend on a hypercall table.


I don't think there's a dependency, and I think there never really has been.
We've been advocating for its use, but we've not enforced that anywhere, I
don't think.

Also, rewriting the hypercall page on migrate turns out not to have been
the most clever idea, and only works right now because the instructions
are the same length in the variations for each mode.

Also continuations need to change to avoid userspace liveness problems,
and existing hypercalls that we do have need splitting between things
which are actually privileged operations (within the guest context) and
things which are logical control operations, so the kernel can expose
the latter to userspace without retaining the gaping root hole which is
/dev/xen/privcmd, and a blocker to doing UEFI Secureboot.

So yes, starting some new clean(er) interface from hypercall 64 is the
plan, but it very much does not want to be a simple mirror of the
existing 0-63 with a differing calling convention.


All of these look like orthogonal problems to me. That's likely all
relevant for, as I think you've been calling it, ABI v2, but shouldn't
hinder our switching to a physical address based hypercall model.
Otherwise I'm afraid we'll never make any progress in that direction.

What about an alternative model allowing to use most of the current hypercalls unmodified? We could add a new hypercall for registering hypercall buffers via virtual address, physical address, and size of the buffers (kind of a software TLB). The buffer table would want to be physically addressed by the hypercall, of course. It might be interesting to have this table per vcpu (it should be allowed to use the same table for multiple vcpus) in order to speed up finding translation entries of percpu buffers. Any hypercall buffer being addressed virtually could first tried to be found via the SW-TLB. This wouldn't require any changes for most of the hypercall interfaces. Only special cases with very large buffers might need indirect variants (like Jan said: via GFN lists, which could be passed in registered buffers). Encrypted guests would probably want to use static percpu buffers in order to avoid switching the encryption state of the buffers all the time. An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one giant buffer with the domain's memory size via the physical memory mapping of the kernel. All kmalloc() addresses would be in that region. A buffer address not found would need to be translated like today (and fail for an encrypted guest). Thoughts? Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.