|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 06/11] optee: add std call handling
Hi Julien,
Julien Grall writes:
> Hi Volodymyr,
>
> On 17/01/2019 15:21, Volodymyr Babchuk wrote:
>> Julien Grall writes:
>>
>>> Hi Volodymyr,
>>>
>>> On 18/12/2018 21:11, Volodymyr Babchuk wrote:
>>>> From: Volodymyr Babchuk <vlad.babchuk@xxxxxxxxx>
>>>>
>>>> The main way to communicate with OP-TEE is to issue standard SMCCC
>>>> call. "Standard" is a SMCCC term and it means that call can be
>>>> interrupted and OP-TEE can return control to NW before completing
>>>> the call.
>>>>
>>>> In contrast with fast calls, where arguments and return values
>>>> are passed in registers, standard calls use shared memory. Register
>>>> pair a1,a2 holds 64-bit PA of command buffer, where all arguments
>>>> are stored and which is used to return data. OP-TEE internally
>>>> copies contents of this buffer into own secure memory before accessing
>>>> and validating any data in command buffer. This is done to make sure
>>>> that NW will not change contents of the validated parameters.
>>>>
>>>> Mediator needs to do the same for number of reasons:
>>>>
>>>> 1. To make sure that guest will not change data after validation.
>>>> 2. To translate IPAs to PAs in the command buffer (this is not done
>>>> in this patch).
>>>> 3. To hide translated address from guest, so it will not be able
>>>> to do IPA->PA translation by misusing mediator.
>>>>
>>>> During standard call OP-TEE can issue multiple "RPC returns", asking
>>>> NW to do some work for OP-TEE. NW then issues special call
>>>> OPTEE_SMC_CALL_RETURN_FROM_RPC to resume handling of the original call.
>>>> Thus, mediator needs to maintain context for original standard call
>>>> during multiple SMCCC calls.
>>>>
>>>> Standard call is considered complete, when returned value is
>>>> not a RPC request.
>>>>
>>>> Signed-off-by: Volodymyr Babchuk <vlad.babchuk@xxxxxxxxx>
>>>> ---
>>>>
>>>> Changes from v2:
>>>> - renamed struct domain_ctx to struct optee_domain
>>>> - fixed coding style
>>>> - Now I use access_guest_memory_by_ipa() instead of mappings
>>>> to read command buffer
>>>> - Added tracking for in flight calls, so guest can't resume
>>>> the same call from two CPUs simultaniously
>>>>
>>>> xen/arch/arm/tee/optee.c | 319 ++++++++++++++++++++++++++++++++++-
>>>> xen/include/asm-arm/domain.h | 3 +
>>>> 2 files changed, 320 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
>>>> index 584241b03a..dc90e2ed8e 100644
>>>> --- a/xen/arch/arm/tee/optee.c
>>>> +++ b/xen/arch/arm/tee/optee.c
>>>> @@ -12,6 +12,8 @@
>>>> */
>>>> #include <xen/device_tree.h>
>>>> +#include <xen/domain_page.h>
>>>> +#include <xen/guest_access.h>
>>>> #include <xen/sched.h>
>>>> #include <asm/smccc.h>
>>>> #include <asm/tee/tee.h>
>>>> @@ -22,11 +24,38 @@
>>>> /* Client ID 0 is reserved for hypervisor itself */
>>>> #define OPTEE_CLIENT_ID(domain) (domain->domain_id + 1)
>>>> +/*
>>>> + * Maximal number of concurrent standard calls from one guest. This
>>>> + * corresponds to OPTEE configuration option CFG_NUM_THREADS, because
>>>> + * OP-TEE spawns a thread for every standard call.
>>>
>>> Looking at OP-TEE, CFG_NUM_THREADS will vary depending on the
>>> platform. Is there any way to probe that number of threads from Xen?
>> Yes, this is per-platform configuration.
>> Easiest way is to add Fast SMC to OP-TEE that will report this
>> parameter.
>> Jens, what do you think about adding additional call?
>>
>>> In any case, I think we should update the comment to reflect that this
>>> seems to be the maximum CFG_NUM_THREADS supported by any upstream
>>> platform.
>>
>> Actually, OP-TEE protocol have possibility to handle limited number of
>> threads correctly. OP-TEE can report that all threads are busy and
>> client will wait for a free one. XEN can do the same, although this is not
>> implemented in this patch series. But I can add this.
>
> Could you expand by wait? Will it block in OP-TEE/Xen or does it
> return to the guest?
It returns to the guest with response code
OPTEE_SMC_RETURN_ETHREAD_LIMIT. Linux driver blocks calling application
thread until one of the calls to OP-TEE is finished. Then driver awakens
one of the blocked threads, so it can perform the call.
>>
>> Basically this means that all can work correctly even with
>> MAX_STD_CALLS==1. It just will be not so efficient.
>
> Given the OS is not aware of the number of threads, The problem would
> be the same without Xen between. Am I right?
Exactly.
> [...]
>
>>>> +
>>>> + /*
>>>> + * Command buffer should start at page boundary.
>>>> + * This is OP-TEE ABI requirement.
>>>> + */
>>>> + if ( call->guest_arg_ipa & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1) )
>>>> + return false;
>>>> +
>>>> + call->xen_arg = _xmalloc(OPTEE_MSG_NONCONTIG_PAGE_SIZE,
>>>> + OPTEE_MSG_NONCONTIG_PAGE_SIZE);
>>>> + if ( !call->xen_arg )
>>>> + return false;
>>>> +
>>>> + BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
>>>
>>> As you use _xmalloc, you should not need this. This is only necessary
>>> if you use alloc_xenheap_page.
>>>
>> Yes, right. This is leftover from time when I mapped guest page into
>> XEN. I'll move it down, to place where I map that page.
>>
>>> I am wondering whether it is wise to allocate the memory from xenheap
>>> and not domheap. While on Arm64 (for now) xenheap and domheap are the
>>> same, on Arm32 they are different. The xenheap is at most 1GB, so
>>> pretty limited.
>> Honestly, I have no opinion there. What are limitations of domheap in
>> this case?
>
> domheap pages may not always be mapped in Xen page-tables. So you have
> to call map_domain_page/unmap_domain_page at every use.
> In practice, on Arm64, those operations are today a NOP because the
> memory is always mapped. On Arm32, domheap is never mapped so those
> operations will require to modify the page-tables.
> There would be potentially ways to optimize the Arm32 case. So I think
> this is not a big concern as it would allow to account the memory to
> the domain and take advantage of potential new safety feature around
> domheap.
>
> BTW, I am not asking to implement the accounting :). You can still
> allocate domheap memory without a domain assigned. I am only giving
> the advantages of using domheap over xenheap :).
Actually, I considered using domheap in the beginning. But for some
reason I though that domheap is limited by a total memory assigned for a
domain. Looks like I was mistaken.
So yes, I'll used domheap for a large allocations.
[...]
>>>> +{
>>>> + struct optee_msg_arg *guest_arg;
>>>> + struct page_info *page;
>>>> + unsigned int i;
>>>> + uint32_t attr;
>>>> +
>>>> + /* copy_std_request() validated IPA for us */
>>>
>>> Not really, the guest is free to modify the stage-2 mapping on another
>>> vCPU while this is happening. I agree that the guest will shoot
>>> himself, but we at least need to not have weird behavior happening.
>>>
>>> In that case, I would check that the type is p2m_ram_rw as you don't
>>> want to write in read-only or foreign mapping.
>> How I can do this atomically? I.e. what if guest will mangle p2m right
>> after the check?
>
> What you want to prevent is Xen writing to the wrong page. The guest
> should not play with page that are shared with an higher exception
> level.
>
> get_page_from_gfn() takes a reference on the current page, that will
> be release by put_page(). Between that you are sure the page can not
> disappear under your feet.
Ahhh, right. Thank you.
>
> Furthermore, AFAIK, there are no way for an Arm guest to modify the
> p2m type of a page once inserted. It can only remove or replace with a
> newly allocated page the mapping. If the guest instructs to
> - remove the page, as you have a reference that page will not disappear.
> - replace the page with a new one, then the guest will not be
> able to see the result. Tough luck, but it was not meant to do that
> :).
>
>>
>>> Also, as copy_std_request() and copy_std_request_back may not be
>>> called in the same "thread" it would be useful if you specify a bit
>>> more the interaction.
>> I not sure what do you mean there.
>
> What I meant is the following can happen:
>
> guest vCPU A | Xen
>
> Initiate call 1
> copy_std_request()
> call OP-TEE
> -> Call "preempted"
> return to guest
> Resume call 1
> resume call in OP-TEE
> copy_std_back_request()
>
Yes, this is right.
> AFAICT, the call could even resume from a different vCPU.
This is also true.
> It is not
> entirely trivial to understand this from just reading the code and the
> comment "copy_std_request() validated IPA for us" leads to think
> copy_std_request() was called right before. This may not be the
> case. So can you detail a bit more the interaction in the code?
Yes, there can be an issue. My previous approach pined guest page for a
call duration, so I was sure that IPA is still valid. But now this is
not true anymore. So yes, this comment is misleading. I'll fix both the
code and the comment.
[...]
>>>> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
>>>> index 175de44927..88b48697bd 100644
>>>> --- a/xen/include/asm-arm/domain.h
>>>> +++ b/xen/include/asm-arm/domain.h
>>>> @@ -97,6 +97,9 @@ struct arch_domain
>>>> struct vpl011 vpl011;
>>>> #endif
>>>> +#ifdef CONFIG_TEE
>>>> + void *tee;
>>>> +#endif
>>>
>>> Did you look whether there are any hole in arch_domain that could be
>>> re-used?
>> I thought about that. But what are chances to find 64bit-wide,
>> 64bit-aligned hole in a structure? If I remember C standard correctly,
>> there are no reasons for compiler to leave such holes.
>
> It depends on the alignment requested for each structure. Have a look
> at pahole to see the number (and size) of holes we have in some
> structures ;).
Wow, 108 bytes hole in rcu_ctrlblk :) And yes, only two 8 byte holes in
the whole xen.
--
Best regards, Volodymyr Babchuk
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |