Xen project Mailing List

Re: Proposal for physical address based hypercalls

Date: Thu, 29 Sep 2022 13:32:51 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3e8biq75uQu/4U7m+07C5R7B/p4UnYz5av0tcBksW6I=; b=jRW2SMfJVJkpbnRB4DFesDhVCZj5qRiW8k2LzFNpcdnXJ4jNfd3ThO2MeuVP0CUlDG5EsrEYb9zJ9QhU9m18gu3zBsXPI/iXqFWwd55iauAnIWuvE4+J5PDVTYW7K+lE3pqdUbXaMlHczHniE/aoKLSkzVonQR/Geq2bA/7eKXi8NsvCz2cSXMr9LIezlVBa0xNPfdL+nHucWsK2Rks6O16QEhFkCcdwLWpJf1rks1I5lmkDcnEYkq3cKRqUWp9uN8HjrAIDnQDBq1uo57dboHWV2K91ladeOK3pqUfhGqtYfwqOe/QWnKZTM1r1OvxAeu3fvUK5/LnrWzkVGTsUDA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XTsnRs6Axqqt8YPX3oDUz9pwkC/7LH3wKjBfM4/06+FNzkuqhb2yDePGkKlv/imZVUMMtNhy0OadqIY8Y9E8Ov6aVwYHVIld/1cJcr3xHPFrOmjP7RfPCcY6N+isnCPExW6mJHaaQyFn5GFBmvyzbKS6xq9dRjJOipnSwgXhhgDr8U9aTAJUHqZYeZFg2zrAkD3xsz8DCRBm3AAeczVSuVPhLB3ymBlFz7vO/d/FICQDJWJXegCasy/FBo2u8l8VmTmHDjDpcAsKrCCbKYSjkRFeAwanePgWXWUYGVsm/86mW6+ruJMnfUHBa+Iu+rcoNvWZ0yQk6GMWmbRh2I7Tfg==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>

Delivery-date: Thu, 29 Sep 2022 11:32:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.09.2022 15:03, Juergen Gross wrote: > On 28.09.22 14:06, Jan Beulich wrote: >> On 28.09.2022 12:58, Andrew Cooper wrote: >>> On 28/09/2022 11:38, Jan Beulich wrote: >>>> As an alternative I'd like to propose the introduction of a bit (or >>>> multiple >>>> ones, see below) augmenting the hypercall number, to control the flavor of >>>> the >>>> buffers used for every individual hypercall. This would likely involve the >>>> introduction of a new hypercall page (or multiple ones if more than one >>>> bit is >>>> to be used), to retain the present abstraction where it is the hypervisor >>>> which >>>> actually fills these pages. >>> >>> There are other concerns which need to be accounted for. >>> >>> Encrypted VMs cannot use a hypercall page; they don't trust the >>> hypervisor in the first place, and the hypercall page is (specifically) >>> code injection. So the sensible new ABI cannot depend on a hypercall table. >> >> I don't think there's a dependency, and I think there never really has been. >> We've been advocating for its use, but we've not enforced that anywhere, I >> don't think. >> >>> Also, rewriting the hypercall page on migrate turns out not to have been >>> the most clever idea, and only works right now because the instructions >>> are the same length in the variations for each mode. >>> >>> Also continuations need to change to avoid userspace liveness problems, >>> and existing hypercalls that we do have need splitting between things >>> which are actually privileged operations (within the guest context) and >>> things which are logical control operations, so the kernel can expose >>> the latter to userspace without retaining the gaping root hole which is >>> /dev/xen/privcmd, and a blocker to doing UEFI Secureboot. >>> >>> So yes, starting some new clean(er) interface from hypercall 64 is the >>> plan, but it very much does not want to be a simple mirror of the >>> existing 0-63 with a differing calling convention. >> >> All of these look like orthogonal problems to me. That's likely all >> relevant for, as I think you've been calling it, ABI v2, but shouldn't >> hinder our switching to a physical address based hypercall model. >> Otherwise I'm afraid we'll never make any progress in that direction. > > What about an alternative model allowing to use most of the current > hypercalls unmodified? > > We could add a new hypercall for registering hypercall buffers via > virtual address, physical address, and size of the buffers (kind of a > software TLB). Why not? > The buffer table would want to be physically addressed > by the hypercall, of course. I'm not convinced of this, as it would break uniformity of the hypercall interfaces. IOW in the hypervisor we then wouldn't be able to use copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't be a table, but a hypercall not involving any buffers (i.e. every discontiguous piece would need registering separately). I expect such a software TLB wouldn't have many entries, so needing to use a couple of hypercalls shouldn't be a major issue. > It might be interesting to have this table per vcpu (it should be > allowed to use the same table for multiple vcpus) in order to speed > up finding translation entries of percpu buffers. Yes. Perhaps insertion and purging could simply be two new VCPUOP_*. As a prereq I think we'd need to sort the cross-vCPU accessing of guest data, coincidentally pointed out in a post-commit-message remark in https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The subject vCPU isn't available in copy_to_user_hvm(), which is where I'd expect the TLB lookup to occur (while assuming handles point at globally mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't). > Any hypercall buffer being addressed virtually could first tried to > be found via the SW-TLB. This wouldn't require any changes for most > of the hypercall interfaces. Only special cases with very large buffers > might need indirect variants (like Jan said: via GFN lists, which could > be passed in registered buffers). > > Encrypted guests would probably want to use static percpu buffers in > order to avoid switching the encryption state of the buffers all the > time. > > An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one > giant buffer with the domain's memory size via the physical memory > mapping of the kernel. All kmalloc() addresses would be in that region. That's Linux-centric. I'm not convinced all OSes maintain a directmap. Without such, switching to this model might end up quite intrusive on the OS side. Thinking of Linux, we'd need a 2nd range covering the data part of the kernel image. Further this still wouldn't (afaics) pave a reasonable route towards dealing with privcmd-invoked hypercalls. Finally - in how far are we concerned of PV guests using linear addresses for hypercall buffers? I ask because I don't think the model lends itself to use also for the PV guest interfaces. Jan > A buffer address not found would need to be translated like today (and > fail for an encrypted guest). > > Thoughts? > > > Juergen

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.