[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Proposal for physical address based hypercalls
On 29.09.22 13:32, Jan Beulich wrote: On 28.09.2022 15:03, Juergen Gross wrote:On 28.09.22 14:06, Jan Beulich wrote:On 28.09.2022 12:58, Andrew Cooper wrote:On 28/09/2022 11:38, Jan Beulich wrote:As an alternative I'd like to propose the introduction of a bit (or multiple ones, see below) augmenting the hypercall number, to control the flavor of the buffers used for every individual hypercall. This would likely involve the introduction of a new hypercall page (or multiple ones if more than one bit is to be used), to retain the present abstraction where it is the hypervisor which actually fills these pages.There are other concerns which need to be accounted for. Encrypted VMs cannot use a hypercall page; they don't trust the hypervisor in the first place, and the hypercall page is (specifically) code injection. So the sensible new ABI cannot depend on a hypercall table.I don't think there's a dependency, and I think there never really has been. We've been advocating for its use, but we've not enforced that anywhere, I don't think.Also, rewriting the hypercall page on migrate turns out not to have been the most clever idea, and only works right now because the instructions are the same length in the variations for each mode. Also continuations need to change to avoid userspace liveness problems, and existing hypercalls that we do have need splitting between things which are actually privileged operations (within the guest context) and things which are logical control operations, so the kernel can expose the latter to userspace without retaining the gaping root hole which is /dev/xen/privcmd, and a blocker to doing UEFI Secureboot. So yes, starting some new clean(er) interface from hypercall 64 is the plan, but it very much does not want to be a simple mirror of the existing 0-63 with a differing calling convention.All of these look like orthogonal problems to me. That's likely all relevant for, as I think you've been calling it, ABI v2, but shouldn't hinder our switching to a physical address based hypercall model. Otherwise I'm afraid we'll never make any progress in that direction.What about an alternative model allowing to use most of the current hypercalls unmodified? We could add a new hypercall for registering hypercall buffers via virtual address, physical address, and size of the buffers (kind of a software TLB).Why not?The buffer table would want to be physically addressed by the hypercall, of course.I'm not convinced of this, as it would break uniformity of the hypercall interfaces. IOW in the hypervisor we then wouldn't be able to use copy_from_guest() to retrieve the contents. Perhaps this simply shouldn't be a table, but a hypercall not involving any buffers (i.e. every discontiguous piece would need registering separately). I expect such a software TLB wouldn't have many entries, so needing to use a couple of hypercalls shouldn't be a major issue. Fine with me. It might be interesting to have this table per vcpu (it should be allowed to use the same table for multiple vcpus) in order to speed up finding translation entries of percpu buffers.Yes. Perhaps insertion and purging could simply be two new VCPUOP_*. Again fine with me. As a prereq I think we'd need to sort the cross-vCPU accessing of guest data, coincidentally pointed out in a post-commit-message remark in https://lists.xen.org/archives/html/xen-devel/2022-09/msg01761.html. The subject vCPU isn't available in copy_to_user_hvm(), which is where I'd expect the TLB lookup to occur (while assuming handles point at globally mapped space _might_ be okay, using the wrong vCPU's TLB surely isn't). Any per-vcpu buffer should only be used by the respective vcpu. Any hypercall buffer being addressed virtually could first tried to be found via the SW-TLB. This wouldn't require any changes for most of the hypercall interfaces. Only special cases with very large buffers might need indirect variants (like Jan said: via GFN lists, which could be passed in registered buffers). Encrypted guests would probably want to use static percpu buffers in order to avoid switching the encryption state of the buffers all the time. An unencrypted PVH/HVM domain (e.g. PVH dom0) could just define one giant buffer with the domain's memory size via the physical memory mapping of the kernel. All kmalloc() addresses would be in that region.That's Linux-centric. I'm not convinced all OSes maintain a directmap. Without such, switching to this model might end up quite intrusive on the OS side. This model is especially interesting for dom0. The majority of installations is running a Linux dom0 AFAIK, so having an easy way to speed this case up is a big plus. Thinking of Linux, we'd need a 2nd range covering the data part of the kernel image. Probably, yes. Further this still wouldn't (afaics) pave a reasonable route towards dealing with privcmd-invoked hypercalls. Today the hypercall buffers are all allocated via the privcmd driver. It should be fairly easy to add an ioctl to get the buffer's kernel address instead of using the user address. Multi-page buffers might be problematic, though, so either we need to have special variants for hypercalls with such buffers, or we are just falling back to use virtual addresses for the cases where no guest physically contiguous buffer could be allocated (doesn't apply to encrypted guests, of course, as those need to have large enough buffers anyway). Finally - in how far are we concerned of PV guests using linear addresses for hypercall buffers? I ask because I don't think the model lends itself to use also for the PV guest interfaces. Good question. As long as we support PV guests we can't drop support for linear addresses IMO. So the question is whether we are fine with PV guests not using the pre-registered buffers, or if we want to introduce an interface for PV guests using GFNs instead of MFNs. Juergen A buffer address not found would need to be translated like today (and fail for an encrypted guest). Thoughts? Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |