|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [RFC PATCH 1/1] xen: Use a global mapping for runstate
> On 29 May 2020, at 10:43, Julien Grall <julien@xxxxxxx> wrote:
>
> Hi Bertrand,
>
> On 29/05/2020 09:13, Bertrand Marquis wrote:
>> Hi Julien,
>>> On 28 May 2020, at 19:54, Julien Grall <julien@xxxxxxx> wrote:
>>>
>>> Hi Bertrand,
>>>
>>> Thank you for the patch.
>>>
>>> On 28/05/2020 16:25, Bertrand Marquis wrote:
>>>> At the moment on Arm, a Linux guest running with KTPI enabled will
>>>> cause the following error when a context switch happens in user mode:
>>>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0
>>>> This patch is modifying runstate handling to map the area given by the
>>>> guest inside Xen during the hypercall.
>>>> This is removing the guest virtual to physical conversion during context
>>>> switches which removes the bug
>>>
>>> It would be good to spell out that a virtual address is not stable. So
>>> relying on it is wrong.
>>>
>>>> and improve performance by preventing to
>>>> walk page tables during context switches.
>>>
>>> With Secret free hypervisor in mind, I would like to suggest to map/unmap
>>> the runstate during context switch.
>>>
>>> The cost should be minimal when there is a direct map (i.e on Arm64 and
>>> x86) and still provide better performance on Arm32.
>> Even with a minimal cost this is still adding some non real-time behaviour
>> to the context switch.
>
> Just to be clear, by minimal I meant the mapping part is just a virt_to_mfn()
> call and the unmapping is a NOP.
>
> IHMO, if virt_to_mfn() ends up to add non-RT behavior then you have much
> bigger problem than just this call.
>
>> But definitely from the security point of view as we have to map a page from
>> the guest, we could have accessible in Xen some data that should not be
>> there.
>> There is a trade here where:
>> - xen can protect by map/unmapping
>> - a guest which wants to secure his data should either not use it or make
>> sure there is nothing else in the page
>
> Both are valid and depends on your setup. One may want to protect all the
> existing guests, so modifying a guest may not be a solution.
The fact to map/unmap is increasing the protection but not removing the problem
completely.
>
>> That sounds like a thread local storage kind of problematic where we want
>> data from xen to be accessible fast from the guest and easy to be modified
>> from xen.
>
> Agree. On x86, they have a perdomain mapping so it would be possible to do
> it. We would need to add the feature on Arm.
That would definitely be cleaner.
>
>>>
>>> The change should be minimal compare to the current approach but this could
>>> be taken care separately if you don't have time.
>> I could add that to the serie in a separate patch so that it can be
>> discussed and test separately ?
>
> If you are picking the task, then I would suggest to add it directly in this
> patch.
I will see that but the discussion is leading more on a result where we accept
the current status.
>
>>>
>>>> --
>>>> In the current status, this patch is only working on Arm and needs to
>>>> be fixed on X86 (see #error on domain.c for missing get_page_from_gva).
>>>> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx>
>>>> ---
>>>> xen/arch/arm/domain.c | 32 +++++++++-------
>>>> xen/arch/x86/domain.c | 51 ++++++++++++++-----------
>>>> xen/common/domain.c | 84 ++++++++++++++++++++++++++++++++++-------
>>>> xen/include/xen/sched.h | 11 ++++--
>>>> 4 files changed, 124 insertions(+), 54 deletions(-)
>>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>>> index 31169326b2..799b0e0103 100644
>>>> --- a/xen/arch/arm/domain.c
>>>> +++ b/xen/arch/arm/domain.c
>>>> @@ -278,33 +278,37 @@ static void ctxt_switch_to(struct vcpu *n)
>>>> /* Update per-VCPU guest runstate shared memory area (if registered). */
>>>> static void update_runstate_area(struct vcpu *v)
>>>> {
>>>> - void __user *guest_handle = NULL;
>>>> - struct vcpu_runstate_info runstate;
>>>> + struct vcpu_runstate_info *runstate;
>>>> - if ( guest_handle_is_null(runstate_guest(v)) )
>>>> + /* XXX why do we accept not to block here */
>>>> + if ( !spin_trylock(&v->runstate_guest_lock) )
>>>> return;
>>>> - memcpy(&runstate, &v->runstate, sizeof(runstate));
>>>> + runstate = runstate_guest(v);
>>>> +
>>>> + if (runstate == NULL)
>>>> + {
>>>> + spin_unlock(&v->runstate_guest_lock);
>>>> + return;
>>>> + }
>>>> if ( VM_ASSIST(v->domain, runstate_update_flag) )
>>>> {
>>>> - guest_handle = &v->runstate_guest.p->state_entry_time + 1;
>>>> - guest_handle--;
>>>> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>> - __raw_copy_to_guest(guest_handle,
>>>> - (void *)(&runstate.state_entry_time + 1) - 1,
>>>> 1);
>>>> + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE;
>>>> smp_wmb();
>>>
>>> Because you set v->runstate.state_entry_time below, the placement of the
>>> barrier seems a bit odd.
>>>
>>> I would suggest to update v->runstate.state_entry_time first and then
>>> update runstate->state_entry_time.
>> We do want the guest to know when we modify the runstate so we need to make
>> sure the XEN_RUNSTATE_UPDATE is actually set in a visible way before we do
>> the memcpy.
>> That’s why the barrier is after.
>
> I think you misunderstood my comment here. I meant to write the following
> code:
>
> v->runstate.state_entry_time = ...
> runstate->state_entry_time = ...
> smp_wmb()
>
> So it is much clearer that the barrier is because runstate->state_entry_time
> needs to be updated before the memcpy().
>
>>>> +
>>>> +#ifdef CONFIG_ARM
>>>> + page = get_page_from_gva(v, area->addr.p, GV2M_WRITE);
>>>
>>> A guest is allowed to setup the runstate for a different vCPU than the
>>> current one. This will lead to get_page_from_gva() to fail as the function
>>> cannot yet work with a vCPU other than current.
>> If the area is mapped per cpu but isn’t the aim of this to have a way to
>> check other cpus status ?
>
> The aim is to collect how much time a vCPU has been unscheduled. This doesn't
> have to be run from a different vCPU.
>
> Anyway, my point is the hypercall allows it today. If you want to make such
> restriction, then we need to agree on it and document it.
>
>>>
>>> AFAICT, there is no restriction on when the runstate hypercall can be
>>> called. So this can even be called before the vCPU is brought up.
>> I understand the remark but it still feels very weird to allow an invalid
>> address in an hypercall.
>> Wouldn’t we have a lot of potential issues accepting an address that we
>> cannot check ?
>
> Well, that's why you see errors when using KPTI ;). If you use a global
> mapping, then it is not possible to continue without validating the address.
>
> But to do this, you will have to put restriction on the hypercalls. I would
> be OK to make such restriction on Arm.
>
> Cheers,
>
> --
> Julien Grall
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |