Xen project Mailing List

Re: [RFC PATCH 1/1] xen: Use a global mapping for runstate

From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>

Date: Fri, 29 May 2020 14:02:29 +0000

Accept-language: en-GB, en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=97AUB9UKezqq0E2Ia39v4XMDJZMNKBJzPcztESS3ZLw=; b=gz5ScWkhWlQgY6v4EI48oSHgL2q3gOv9NwNXJ2tFp8ZKDSR1E6GZsVSMS3ySI05GHdWILqoD7kHrADtJ/HjEEb2gbkIi0IKLc0Xj0sk3ocPRttdrE4uH8SrvCK4/GTglDmveCsZuMh8ysdxnH0BSXgPA5/SXZAkSx9k8eVxuCtbhZSxBg9OQ+PcfEHW/mwmOYbGJxAqQMG+3s0XauNTfU5HQHdYy7pVXkgPufeEjy293hKJWevgiRD4XSvguA1CdDJuCocf9MtJcKbTJvTu/gTmVcDnScvU1rNvLzK506+DSFPzJGzi/OrAKe+1Io2u6MG1ZRdi4uqE8XEuGBHIy0w==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U0dGsYGgGgc4OjFT40K13gA2Jtv/oKjchl5s+K7CoZypLszutEhvW2ptprPsIRSAlC+A9oqeAdkyPhrvTHULK5AACa48VLOxYBhsdOOW1GmKiT3kKwyrFkA/g8uLVz5GKiHP3kWunzbpUHBXkvQDZuqrmk83D1jEQd48RgdGuSSYfPr9kBgyO8f5wOxdBYxnrqwudYBbidw0WTmCF7+LJTtb+QWEkOHQssPhNFMfmQv6gL+YD4T9IjsrVWd8e69KT9OipemRY2bNkEkVkXpJ/tRciflqtpzksHudjwffrFqC8ZJhQX0eC5UaM+IIDVUHme+bKZ+Yt7ydPLPYCaX21g==

Authentication-results-original: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "paul@xxxxxxx" <paul@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, "Xia, Hongyan" <hongyxia@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, nd <nd@xxxxxxx>, Volodymyr Babchuk <Volodymyr_Babchuk@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Fri, 29 May 2020 14:02:49 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Nodisclaimer: true

Original-authentication-results: xen.org; dkim=none (message not signed) header.d=none;xen.org; dmarc=none action=none header.from=arm.com;

Thread-index: AQHWNP6nIxIB/r3XC025DLnqIbmaJai92N6AgADfWQCAABkLAIAASFMA

Thread-topic: [RFC PATCH 1/1] xen: Use a global mapping for runstate

> On 29 May 2020, at 10:43, Julien Grall <julien@xxxxxxx> wrote: > > Hi Bertrand, > > On 29/05/2020 09:13, Bertrand Marquis wrote: >> Hi Julien, >>> On 28 May 2020, at 19:54, Julien Grall <julien@xxxxxxx> wrote: >>> >>> Hi Bertrand, >>> >>> Thank you for the patch. >>> >>> On 28/05/2020 16:25, Bertrand Marquis wrote: >>>> At the moment on Arm, a Linux guest running with KTPI enabled will >>>> cause the following error when a context switch happens in user mode: >>>> (XEN) p2m.c:1890: d1v0: Failed to walk page-table va 0xffffff837ebe0cd0 >>>> This patch is modifying runstate handling to map the area given by the >>>> guest inside Xen during the hypercall. >>>> This is removing the guest virtual to physical conversion during context >>>> switches which removes the bug >>> >>> It would be good to spell out that a virtual address is not stable. So >>> relying on it is wrong. >>> >>>> and improve performance by preventing to >>>> walk page tables during context switches. >>> >>> With Secret free hypervisor in mind, I would like to suggest to map/unmap >>> the runstate during context switch. >>> >>> The cost should be minimal when there is a direct map (i.e on Arm64 and >>> x86) and still provide better performance on Arm32. >> Even with a minimal cost this is still adding some non real-time behaviour >> to the context switch. > > Just to be clear, by minimal I meant the mapping part is just a virt_to_mfn() > call and the unmapping is a NOP. > > IHMO, if virt_to_mfn() ends up to add non-RT behavior then you have much > bigger problem than just this call. > >> But definitely from the security point of view as we have to map a page from >> the guest, we could have accessible in Xen some data that should not be >> there. >> There is a trade here where: >> - xen can protect by map/unmapping >> - a guest which wants to secure his data should either not use it or make >> sure there is nothing else in the page > > Both are valid and depends on your setup. One may want to protect all the > existing guests, so modifying a guest may not be a solution. The fact to map/unmap is increasing the protection but not removing the problem completely. > >> That sounds like a thread local storage kind of problematic where we want >> data from xen to be accessible fast from the guest and easy to be modified >> from xen. > > Agree. On x86, they have a perdomain mapping so it would be possible to do > it. We would need to add the feature on Arm. That would definitely be cleaner. > >>> >>> The change should be minimal compare to the current approach but this could >>> be taken care separately if you don't have time. >> I could add that to the serie in a separate patch so that it can be >> discussed and test separately ? > > If you are picking the task, then I would suggest to add it directly in this > patch. I will see that but the discussion is leading more on a result where we accept the current status. > >>> >>>> -- >>>> In the current status, this patch is only working on Arm and needs to >>>> be fixed on X86 (see #error on domain.c for missing get_page_from_gva). >>>> Signed-off-by: Bertrand Marquis <bertrand.marquis@xxxxxxx> >>>> --- >>>> xen/arch/arm/domain.c | 32 +++++++++------- >>>> xen/arch/x86/domain.c | 51 ++++++++++++++----------- >>>> xen/common/domain.c | 84 ++++++++++++++++++++++++++++++++++------- >>>> xen/include/xen/sched.h | 11 ++++-- >>>> 4 files changed, 124 insertions(+), 54 deletions(-) >>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >>>> index 31169326b2..799b0e0103 100644 >>>> --- a/xen/arch/arm/domain.c >>>> +++ b/xen/arch/arm/domain.c >>>> @@ -278,33 +278,37 @@ static void ctxt_switch_to(struct vcpu *n) >>>> /* Update per-VCPU guest runstate shared memory area (if registered). */ >>>> static void update_runstate_area(struct vcpu *v) >>>> { >>>> - void __user *guest_handle = NULL; >>>> - struct vcpu_runstate_info runstate; >>>> + struct vcpu_runstate_info *runstate; >>>> - if ( guest_handle_is_null(runstate_guest(v)) ) >>>> + /* XXX why do we accept not to block here */ >>>> + if ( !spin_trylock(&v->runstate_guest_lock) ) >>>> return; >>>> - memcpy(&runstate, &v->runstate, sizeof(runstate)); >>>> + runstate = runstate_guest(v); >>>> + >>>> + if (runstate == NULL) >>>> + { >>>> + spin_unlock(&v->runstate_guest_lock); >>>> + return; >>>> + } >>>> if ( VM_ASSIST(v->domain, runstate_update_flag) ) >>>> { >>>> - guest_handle = &v->runstate_guest.p->state_entry_time + 1; >>>> - guest_handle--; >>>> - runstate.state_entry_time |= XEN_RUNSTATE_UPDATE; >>>> - __raw_copy_to_guest(guest_handle, >>>> - (void *)(&runstate.state_entry_time + 1) - 1, >>>> 1); >>>> + runstate->state_entry_time |= XEN_RUNSTATE_UPDATE; >>>> smp_wmb(); >>> >>> Because you set v->runstate.state_entry_time below, the placement of the >>> barrier seems a bit odd. >>> >>> I would suggest to update v->runstate.state_entry_time first and then >>> update runstate->state_entry_time. >> We do want the guest to know when we modify the runstate so we need to make >> sure the XEN_RUNSTATE_UPDATE is actually set in a visible way before we do >> the memcpy. >> That’s why the barrier is after. > > I think you misunderstood my comment here. I meant to write the following > code: > > v->runstate.state_entry_time = ... > runstate->state_entry_time = ... > smp_wmb() > > So it is much clearer that the barrier is because runstate->state_entry_time > needs to be updated before the memcpy(). > >>>> + >>>> +#ifdef CONFIG_ARM >>>> + page = get_page_from_gva(v, area->addr.p, GV2M_WRITE); >>> >>> A guest is allowed to setup the runstate for a different vCPU than the >>> current one. This will lead to get_page_from_gva() to fail as the function >>> cannot yet work with a vCPU other than current. >> If the area is mapped per cpu but isn’t the aim of this to have a way to >> check other cpus status ? > > The aim is to collect how much time a vCPU has been unscheduled. This doesn't > have to be run from a different vCPU. > > Anyway, my point is the hypercall allows it today. If you want to make such > restriction, then we need to agree on it and document it. > >>> >>> AFAICT, there is no restriction on when the runstate hypercall can be >>> called. So this can even be called before the vCPU is brought up. >> I understand the remark but it still feels very weird to allow an invalid >> address in an hypercall. >> Wouldn’t we have a lot of potential issues accepting an address that we >> cannot check ? > > Well, that's why you see errors when using KPTI ;). If you use a global > mapping, then it is not possible to continue without validating the address. > > But to do this, you will have to put restriction on the hypercalls. I would > be OK to make such restriction on Arm. > > Cheers, > > -- > Julien Grall

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.