|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 1/1] xen: Use a global mapping for runstate
Hi Julien,
> On 28 May 2020, at 19:54, Julien Grall <julien@xxxxxxx> wrote:
>
> Hi Bertrand,
>
> Thank you for the patch.
>
> On 28/05/2020 16:25, Bertrand Marquis wrote:
>> At the moment on Arm, a Linux guest running with KTPI enabled will
>> cause the following error when a context switch happens in user mode:
>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>> This patch is modifying runstate handling to map the area given by the
>> guest inside Xen during the hypercall.
>> This is removing the guest virtual to physical conversion during context
>> switches which removes the bug
>
> It would be good to spell out that a virtual address is not stable. So
> relying on it is wrong.
>
>> and improve performance by preventing to
>> walk page tables during context switches.
>
> With Secret free hypervisor in mind, I would like to suggest to map/unmap the
> runstate during context switch.
>
> The cost should be minimal when there is a direct map (i.e on Arm64 and x86)
> and still provide better performance on Arm32.
Even with a minimal cost this is still adding some non real-time behaviour to
the context switch.
But definitely from the security point of view as we have to map a page from
the guest, we could have accessible in Xen some data that should not be there.
There is a trade here where:
- xen can protect by map/unmapping
- a guest which wants to secure his data should either not use it or make sure
there is nothing else in the page
That sounds like a thread local storage kind of problematic where we want data
from xen to be accessible fast from the guest and easy to be modified from xen.
>
> The change should be minimal compare to the current approach but this could
> be taken care separately if you don't have time.
I could add that to the serie in a separate patch so that it can be discussed
and test separately ?
>
>> --
>> In the current status, this patch is only working on Arm and needs to
>> be fixed on X86 (see #error on domain.c for missing get_page_from_gva).
>> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx>
>> ---
>> xen/arch/arm/domain.c | 32 +++++++++-------
>> xen/arch/x86/domain.c | 51 ++++++++++++++-----------
>> xen/common/domain.c | 84 ++++++++++++++++++++++++++++++++++-------
>> xen/include/xen/sched.h | 11 ++++--
>> 4 files changed, 124 insertions(+), 54 deletions(-)
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 31169326b2..799b0e0103 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -278,33 +278,37 @@ static void ctxt_switch_to(struct vcpu *n)
>> /* Update per-VCPU guest runstate shared memory area (if registered). */
>> static void update_runstate_area(struct vcpu *v)
>> {
>> - void __user *guest_handle = NULL;
>> - struct vcpu_runstate_info runstate;
>> + struct vcpu_runstate_info *runstate;
>> - if ( guest_handle_is_null(runstate_guest(v)) )
>> + /* XXX why do we accept not to block here */
>> + if ( !spin_trylock(&v->runstate_guest_lock) )
>> return;
>> - memcpy(&runstate, &v->runstate, sizeof(runstate));
>> + runstate = runstate_guest(v);
>> +
>> + if (runstate == NULL)
>> + {
>> + spin_unlock(&v->runstate_guest_lock);
>> + return;
>> + }
>> if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> {
>> - guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>> - guest_handle--;
>> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> - __raw_copy_to_guest(guest_handle,
>> - (void *)(&runstate.state_entry_time + 1) - 1,
>> 1);
>> + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>> smp_wmb();
>
> Because you set v->runstate.state_entry_time below, the placement of the
> barrier seems a bit odd.
>
> I would suggest to update v->runstate.state_entry_time first and then update
> runstate->state_entry_time.
We do want the guest to know when we modify the runstate so we need to make
sure the XEN_RUNSTATE_UPDATE is actually set in a visible way before we do the
memcpy.
That’s why the barrier is after.
>
>> + v->runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>> }
>> - __copy_to_guest(runstate_guest(v), &runstate, 1);
>> + memcpy(runstate, &v->runstate, sizeof(v->runstate));
>> - if ( guest_handle )
>> + if ( VM_ASSIST(v->domain, runstate_update_flag) )
>> {
>> - runstate.state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> + runstate->state_entry_time &= ~XEN_RUNSTATE_UPDATE;
>> smp_wmb();
>
> You want to update runstate->state_entry_time after the barrier not before.
Agree
>
>> static void _update_runstate_area(struct vcpu *v)
>> {
>> + /* XXX: this should be removed */
>> if ( !update_runstate_area(v) && is_pv_vcpu(v) &&
>> !(v->arch.flags & TF_kernel_mode) )
>> v->arch.pv.need_update_runstate_area = 1;
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index 7cc9526139..acc6f90ba3 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -161,6 +161,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int
>> vcpu_id)
>> v->dirty_cpu = VCPU_CPU_CLEAN;
>> spin_lock_init(&v->virq_lock);
>> + spin_lock_init(&v->runstate_guest_lock);
>> tasklet_init(&v->continue_hypercall_tasklet, NULL, NULL);
>> @@ -691,6 +692,66 @@ int rcu_lock_live_remote_domain_by_id(domid_t dom,
>> struct domain **d)
>> return 0;
>> }
>> +static void unmap_runstate_area(struct vcpu *v, unsigned int lock)
>
> NIT: There is an extra space after void
>
> Also, AFAICT, the lock is only taking two values. Please switch to a bool.
Agree
>
>> +{
>> + mfn_t mfn;
>> +
>> + if ( ! runstate_guest(v) )
>
> NIT: We don't usually put a space after !.
>
> But shouldn't this be checked within the lock?
Agree
>
>
>> + return;
>> +
>> + if (lock)
>
> NIT: if ( ... )
>
Ack
>> + spin_lock(&v->runstate_guest_lock);
>> +
>> + mfn = domain_page_map_to_mfn(runstate_guest(v));
>> +
>> + unmap_domain_page_global((void *)
>> + ((unsigned long)v->runstate_guest &
>> + PAGE_MASK));
>> +
>> + put_page_and_type(mfn_to_page(mfn));
>> + runstate_guest(v) = NULL;
>> +
>> + if (lock)
>
> Ditto.
Ack
>
>> + spin_unlock(&v->runstate_guest_lock);
>> +}
>> +
>> +static int map_runstate_area(struct vcpu *v,
>> + struct vcpu_register_runstate_memory_area *area)
>> +{
>> + unsigned long offset = area->addr.p & ~PAGE_MASK;
>> + void *mapping;
>> + struct page_info *page;
>> + size_t size = sizeof(struct vcpu_runstate_info);
>> +
>> + ASSERT(runstate_guest(v) == NULL);
>> +
>> + /* do not allow an area crossing 2 pages */
>> + if ( offset > (PAGE_SIZE - size) )
>> + return -EINVAL;
>
> This is a change in behavior for the guest. If we are going forward with
> this, this will want a separate patch with its own explanation why this is
> done.
Ack I need to add support for an area crossing pages
>
>> +
>> +#ifdef CONFIG_ARM
>> + page = get_page_from_gva(v, area->addr.p, GV2M_WRITE);
>
> A guest is allowed to setup the runstate for a different vCPU than the
> current one. This will lead to get_page_from_gva() to fail as the function
> cannot yet work with a vCPU other than current.
If the area is mapped per cpu but isn’t the aim of this to have a way to check
other cpus status ?
>
> AFAICT, there is no restriction on when the runstate hypercall can be called.
> So this can even be called before the vCPU is brought up.
I understand the remark but it still feels very weird to allow an invalid
address in an hypercall.
Wouldn’t we have a lot of potential issues accepting an address that we cannot
check ?
>
> I was going to suggest to use the current vCPU for translating the address.
> However, it would be reasonable for an OS to use the same virtual address for
> all the vCPUs assuming the page-tables are different per vCPU.
>
> Recent Linux are using a per-cpu area, so the virtual address should be
> different for each vCPU. But I don't know how the other OSes works. Roger
> should be able to help for FreeBSD at least.
>
> I have CCed Paul for the Windows drivers.
>
> If we decide to introduce some restriction then they should be explained in
> the commit message and also documented in the public header (we have been
> pretty bad at documenting change in the past!).
Agree
Cheers
Bertrand
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |