Xen project Mailing List

Re: [PATCH v3] xen/arm: Convert runstate address during hypcall

On Thu, 30 Jul 2020, Julien Grall wrote: > Hi Bertrand, > > To avoid extra work on your side, I would recommend to wait a bit before > sending a new version. It would be good to at least settle the conversation in > v2 regarding the approach taken. > > On 30/07/2020 11:24, Bertrand Marquis wrote: > > At the moment on Arm, a Linux guest running with KTPI enabled will > > cause the following error when a context switch happens in user mode: > > (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 > > > > The error is caused by the virtual address for the runstate area > > registered by the guest only being accessible when the guest is running > > in kernel space when KPTI is enabled. > > > > To solve this issue, this patch is doing the translation from virtual > > address to physical address during the hypercall and mapping the > > required pages using vmap. This is removing the conversion from virtual > > to physical address during the context switch which is solving the > > problem with KPTI. > > To echo what Jan said on the previous version, this is a change in a stable > ABI and therefore may break existing guest. FAOD, I agree in principle with > the idea. However, we want to explain why breaking the ABI is the *only* > viable solution. > > From my understanding, it is not possible to fix without an ABI breakage > because the hypervisor doesn't know when the guest will switch back from > userspace to kernel space. The risk is the information provided by the > runstate wouldn't contain accurate information and could affect how the guest > handle stolen time. > > Additionally there are a few issues with the current interface: > 1) It is assuming the virtual address cannot be re-used by the userspace. > Thanksfully Linux have a split address space. But this may change with KPTI in > place. > 2) When update the page-tables, the guest has to go through an invalid > mapping. So the translation may fail at any point. > > IOW, the existing interface can lead to random memory corruption and > inacurracy of the stolen time. > > > > > This is done only on arm architecture, the behaviour on x86 is not > > modified by this patch and the address conversion is done as before > > during each context switch. > > > > This is introducing several limitations in comparison to the previous > > behaviour (on arm only): > > - if the guest is remapping the area at a different physical address Xen > > will continue to update the area at the previous physical address. As > > the area is in kernel space and usually defined as a global variable this > > is something which is believed not to happen. If this is required by a > > guest, it will have to call the hypercall with the new area (even if it > > is at the same virtual address). > > - the area needs to be mapped during the hypercall. For the same reasons > > as for the previous case, even if the area is registered for a different > > vcpu. It is believed that registering an area using a virtual address > > unmapped is not something done. > > This is not clear whether the virtual address refer to the current vCPU or the > vCPU you register the runstate for. From the past discussion, I think you > refer to the former. It would be good to clarify. > > Additionally, all the new restrictions should be documented in the public > interface. So an OS developper can find the differences between the > architectures. Just to paraphrase what Julien wrote, it would be good to improve the commit message with the points suggested and also write a note in the header file about the changes to the interface. > To answer Jan's concern, we certainly don't know all the guest OSes existing, > however we also need to balance the benefit for a large majority of the users. > > From previous discussion, the current approach was deemed to be acceptable on > Arm and, AFAICT, also x86 (see [1]). > > TBH, I would rather see the approach to be common. For that, we would an > agreement from Andrew and Jan in the approach here. Meanwhile, I think this is > the best approach to address the concern from Arm users. +1 > > inline functions in headers could not be used as the architecture > > domain.h is included before the global domain.h making it impossible > > to use the struct vcpu inside the architecture header. > > This should not have any performance impact as the hypercall is only > > called once per vcpu usually. > > > > Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx> > > > > --- > > Changes in v2 > > - use vmap to map the pages during the hypercall. > > - reintroduce initial copy during hypercall. > > > > Changes in v3 > > - Fix Coding style > > - Fix vaddr printing on arm32 > > - use write_atomic to modify state_entry_time update bit (only > > in guest structure as the bit is not used inside Xen copy) > > > > --- > > xen/arch/arm/domain.c | 161 ++++++++++++++++++++++++++++++----- > > xen/arch/x86/domain.c | 29 ++++++- > > xen/arch/x86/x86_64/domain.c | 4 +- > > xen/common/domain.c | 19 ++--- > > xen/include/asm-arm/domain.h | 9 ++ > > xen/include/asm-x86/domain.h | 16 ++++ > > xen/include/xen/domain.h | 5 ++ > > xen/include/xen/sched.h | 16 +--- > > 8 files changed, 206 insertions(+), 53 deletions(-) > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > index 31169326b2..8b36946017 100644 > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -19,6 +19,7 @@ > > #include <xen/sched.h> > > #include <xen/softirq.h> > > #include <xen/wait.h> > > +#include <xen/vmap.h> > > #include <asm/alternative.h> > > #include <asm/cpuerrata.h> > > @@ -275,36 +276,156 @@ static void ctxt_switch_to(struct vcpu *n) > > virt_timer_restore(n); > > } > > -/* Update per-VCPU guest runstate shared memory area (if registered). */ > > -static void update_runstate_area(struct vcpu *v) > > +static void cleanup_runstate_vcpu_locked(struct vcpu *v) > > { > > - void __user *guest_handle = NULL; > > + if ( v->arch.runstate_guest ) > > + { > > + vunmap((void *)((unsigned long)v->arch.runstate_guest & > > PAGE_MASK)); > > + > > + put_page(v->arch.runstate_guest_page[0]); > > + > > + if ( v->arch.runstate_guest_page[1] ) > > + put_page(v->arch.runstate_guest_page[1]); > > + > > + v->arch.runstate_guest = NULL; > > + } > > +} > > + > > +void arch_vcpu_cleanup_runstate(struct vcpu *v) > > +{ > > + spin_lock(&v->arch.runstate_guest_lock); > > + > > + cleanup_runstate_vcpu_locked(v); > > + > > + spin_unlock(&v->arch.runstate_guest_lock); > > +} > > + > > +static int setup_runstate_vcpu_locked(struct vcpu *v, vaddr_t vaddr) > > +{ > > + unsigned int offset; > > + mfn_t mfn[2]; > > + struct page_info *page; > > + unsigned int numpages; > > struct vcpu_runstate_info runstate; > > + void *p; > > - if ( guest_handle_is_null(runstate_guest(v)) ) > > - return; > > + /* user can pass a NULL address to unregister a previous area */ > > + if ( vaddr == 0 ) > > + return 0; > > + > > + offset = vaddr & ~PAGE_MASK; > > + > > + /* provided address must be aligned to a 64bit */ > > + if ( offset % alignof(struct vcpu_runstate_info) ) > > This new restriction wants to be explained in the commit message and public > header. > > > + { > > + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at > > 0x%"PRIvaddr > > + ": Invalid offset\n", vaddr); > > We usually enforce 80 character per lines except for format string. So it is > easier to grep them in the code. > > > + return -EINVAL; > > + } > > + > > + page = get_page_from_gva(v, vaddr, GV2M_WRITE); > > + if ( !page ) > > + { > > + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at > > 0x%"PRIvaddr > > + ": Page is not mapped\n", vaddr); > > + return -EINVAL; > > + } > > + > > + mfn[0] = page_to_mfn(page); > > + v->arch.runstate_guest_page[0] = page; > > + > > + if ( offset > (PAGE_SIZE - sizeof(struct vcpu_runstate_info)) ) > > + { > > + /* guest area is crossing pages */ > > + page = get_page_from_gva(v, vaddr + PAGE_SIZE, GV2M_WRITE); > > + if ( !page ) > > + { > > + put_page(v->arch.runstate_guest_page[0]); > > + gprintk(XENLOG_WARNING, > > + "Cannot map runstate pointer at 0x%"PRIvaddr > > + ": 2nd Page is not mapped\n", vaddr); > > + return -EINVAL; > > + } > > + mfn[1] = page_to_mfn(page); > > + v->arch.runstate_guest_page[1] = page; > > + numpages = 2; > > + } > > + else > > + { > > + v->arch.runstate_guest_page[1] = NULL; > > + numpages = 1; > > + } > > - memcpy(&runstate, &v->runstate, sizeof(runstate)); > > + p = vmap(mfn, numpages); > > + if ( !p ) > > + { > > + put_page(v->arch.runstate_guest_page[0]); > > + if ( numpages == 2 ) > > + put_page(v->arch.runstate_guest_page[1]); > > - if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > + gprintk(XENLOG_WARNING, "Cannot map runstate pointer at > > 0x%"PRIvaddr > > + ": vmap error\n", vaddr); > > + return -EINVAL; > > + } > > + > > + v->arch.runstate_guest = p + offset; > > + > > + if (v == current) > > + memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate)); > > + else > > { > > - guest_handle = &v->runstate_guest.p->state_entry_time + 1; > > - guest_handle--; > > - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > > - __raw_copy_to_guest(guest_handle, > > - (void *)(&runstate.state_entry_time + 1) - 1, > > 1); > > - smp_wmb(); > > + vcpu_runstate_get(v, &runstate); > > + memcpy(v->arch.runstate_guest, &runstate, sizeof(v->runstate)); > > } > > - __copy_to_guest(runstate_guest(v), &runstate, 1); > > + return 0; > > +} > > + > > +int arch_vcpu_setup_runstate(struct vcpu *v, > > + struct vcpu_register_runstate_memory_area > > area) > > +{ > > + int rc; > > + > > + spin_lock(&v->arch.runstate_guest_lock); > > + > > + /* cleanup if we are recalled */ > > + cleanup_runstate_vcpu_locked(v); > > + > > + rc = setup_runstate_vcpu_locked(v, (vaddr_t)area.addr.v); > > + > > + spin_unlock(&v->arch.runstate_guest_lock); > > - if ( guest_handle ) > > + return rc; > > +} > > + > > + > > +/* Update per-VCPU guest runstate shared memory area (if registered). */ > > +static void update_runstate_area(struct vcpu *v) > > +{ > > + spin_lock(&v->arch.runstate_guest_lock); > > + > > + if ( v->arch.runstate_guest ) > > { > > - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > - smp_wmb(); > > - __raw_copy_to_guest(guest_handle, > > - (void *)(&runstate.state_entry_time + 1) - 1, > > 1); > > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > + { > > + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; > > + write_atomic(&(v->arch.runstate_guest->state_entry_time), > > + v->runstate.state_entry_time); > > NIT: You want to indent v-> at the same level as the argument from the first > line. > > Also, I think you are missing a smp_wmb() here. I just wanted to add that I reviewed the patch and aside from the smp_wmb (and the couple of code style NITs), there is no other issue in the patch that I could find. No further comments from my side. > > + } > > + > > + memcpy(v->arch.runstate_guest, &v->runstate, sizeof(v->runstate)); > > + > > + if ( VM_ASSIST(v->domain, runstate_update_flag) ) > > + { > > + /* copy must be done before switching the bit */ > > + smp_wmb(); > > + v->runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE; > > + write_atomic(&(v->arch.runstate_guest->state_entry_time), > > + v->runstate.state_entry_time); > > Same remark for the indentation.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.