[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Proposal for physical address based hypercalls
For quite some time we've been talking about replacing the present virtual address based hypercall interface with one using physical addresses. This is in particular a prerequisite to being able to support guests with encrypted memory, as for such guests we cannot perform the page table walks necessary to translate virtual to (guest-)physical addresses. But using (guest) physical addresses is also expected to help performance of non-PV guests (i.e. all Arm ones plus HVM/PVH on x86), because of the no longer necessary address translation. Clearly to be able to run existing guests, we need to continue to support the present virtual address based interface. Previously it was suggested to change the model on a per-domain basis, perhaps by a domain creation control. This has two major shortcomings: - Entire guest OSes would need to switch over to the new model all in one go. This could be particularly problematic for in-guest interfaces like Linux'es privcmd driver, which is passed hypercall argument from user space. Such necessarily use virtual addresses, and hence the kernel would need to learn of all hypercalls legitimately coming in, in order to translate the buffer addresses. Reaching sufficient coverage there might take some time. - All base components within an individual guest instance which might run in succession (firmware, boot loader, kernel, kexec) would need to agree on the hypercall ABI to use. As an alternative I'd like to propose the introduction of a bit (or multiple ones, see below) augmenting the hypercall number, to control the flavor of the buffers used for every individual hypercall. This would likely involve the introduction of a new hypercall page (or multiple ones if more than one bit is to be used), to retain the present abstraction where it is the hypervisor which actually fills these pages. For multicalls the wrapping multicall itself would be controlled independently of the constituent hypercalls. A model involving just a single bit to indicate "flat" buffers has limitations when it comes to large buffers passed to a hypercall. Since in many cases hypercalls (currently) allowing for rather large buffers wouldn't normally be used with buffers significantly larger than a single page (several of the mem-ops for example), special casing the (presumably) few hypercalls which have an actual need for large buffers might be an option. Another approach would be to build in a scatter/gather model for buffers right away. Jürgen suggests that the low two address bits could be used as a "descriptor" here. Alternatively, since buffer sizes should always be known, using a multi-bit augmentation to the hypercall number could also be a viable model, distinguishing between e.g. all-linear buffers, all-single-S/G-level ones, and size-dependent selection of zero or more S/G levels. This would affect all buffers used by a single hypercall. With the level of indirection needed derivable from buffer size, in the last of the variants small buffers could still have their addresses provided directly while only larger buffers would be described by e.g. a list of GFNs or a list of (address,length) tuples, using multiple levels if even that list would still end up large. Of course any one of the models could be selected as the only one to use (in addition to the existing virtual address based one), allowing to stick to a single bit augmenting the hypercall number. Note that a dynamic model (indirection levels derived from buffer size) would be quite impactful, as the overall buffer size would need passing to the copying helpers alongside the size of the data which actually is to be copied. How to express S/G lists will want to take into account existing uses. For example, an array of (address,length) tuples would be quite inefficient to use with operations like copy_from_guest_offset(). Perhaps this would want to be an array of xen_ulong_t, with the first slot holding the offset into the first page and all further slots holding GFNs (albeit that would still require two [generally] discontiguous reads from the array for a single copy_from_guest_offset()). Otoh, since calling code will need changing anyway to use this new model, we might also require that such indirectly specified buffers are page-aligned. Virtual addresses will continue to be used in certain places. Such aren't normally expressed via handles, e.g. callback or exception handling entry points. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |