[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy


  • To: Julien Grall <julien@xxxxxxx>
  • From: Bertrand Marquis <Bertrand.Marquis@xxxxxxx>
  • Date: Tue, 5 Dec 2023 08:14:32 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=lists.xenproject.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com])
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none
  • Arc-message-signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4DxCdyJHeqP3pz7iPZ8ZZwlVp2S1lyQAMvolsbtSrxU=; b=NzslR14hHZ5ZQLAd3A+lT5Iq93beJ0zxPzQU3BSmXFYs3qH17vLY43jDKy6tJFS+3cgv8y5qzUJ9ir0WXS5k7fezouHAWgOOmi0fjT2UAT5S3H0kjgpJMcda5dToAstUOIk72mKP1u0J7MYaWLwlyzt+x/C9d8KuiGdCOtyTq3hCHHP3x9qdQCdnH5aN185s4orjnKSqAbxVrCwImcd7dn6OfH4JcV47M7KRwNxOh4U+cTjl6PofWeDRLeIkxUqQTOdCv7ZDfXeVwwXxQJ196EkcXj+yTk5QZTBTSRrrRTdOb0T1Wweq5G48s2rTKaGrFI1/Q6bDzz2wfEz7VcSCZw==
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4DxCdyJHeqP3pz7iPZ8ZZwlVp2S1lyQAMvolsbtSrxU=; b=ea0Et3P5efdAx4CVjJoQ++DsVR6KlYYsxkcjIfML1L/vca/Meu94ULC7Mo9ezotU7KgAsxgSgvXXF8+75kZAIxN2x11MXVyMcjMHy2Dt6DcwRr1nfeUMpFxYOUTpMTOs0kYGCf8+eTlonjV10AoeSXLXLYKKj7nruRXmamDkjOF3rgybS8EQJF906QJx3AjI53/sOjlH3QNo7u8qqcTY4SuBYhHr01ovC1DiCW5bu82C1DK/XpZ5WF/h7kTLxnifjBSTbqrgF2Sgsi9yte7eNKZ6+aoHTFfHRcFCMMr5XAfqxbe4djaqcv8OQ6XiW95Xuk2ZFJbHBbxnMZoOFMv1tg==
  • Arc-seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=kLBvsd3Poiez/TN5gnol85RJBiKl6EKbyGB2+kdxPk4wrXW4M9j91IlrtLlSoHJ+zfhLhDnET65teIygn3LrVGdAj7UIwzsq/XOAg8wPxWXyntSKH6esCkvLGm6MVD4HclzON9sTAMt+iGV4fSy25MvnAKyTqYQ+Wf+QGGR9yNBpHMIFqbu3jatX0V0XwOWeD2v5J1diO2FvYsNQgD4Y6TX3lkgoXStJRYCrkBoEwXq/rdBuC40Ph+WRTtRUE9KvqvZedjUFTSxpM9P3PuBaaZqfA8U90u2HY3fyVF/CHvUJqdpK6iGEZTGHgddqVSPPFT/aVJfMmLTQ1HZrjRKzkw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y1AFsKK5U16xXWqmjWGxvSgM0wnG2IEe2pO7TazaDtqNbrQkdr5kCYjKYrfZ//DMWY0TWu1Bh25roDjZrhY6VCDZuBF6YniVRRMvo24K5gg6RkjkBrWvoTFA/P6aMhSm0X8d7GT+o2Za3zlB0Sy0Ja8wdt0Jtj4kQQLAx46gebQAraIcedqJUxDd5b+lhNShbYwBu1TsP4jKrX/1f5oKUWJsW5CrHiGymlVpwgLA667OKMmncNNuVLGVilQzbTzU2kGSrXQWLhBfeybVSksIHkvIDx6uO6L83cPKue0N3g6OEXi9HGm1AusInYL2QDDNdWQ+ZE2ZUYmkNH8crPCwXg==
  • Authentication-results-original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Cc: Jens Wiklander <jens.wiklander@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "patches@xxxxxxxxxx" <patches@xxxxxxxxxx>, Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Tue, 05 Dec 2023 08:14:52 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Nodisclaimer: true
  • Original-authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com;
  • Thread-index: AQHaJoeGwF1xR3NTTUaqMmqvuUyS9rCZgdwAgADXLwA=
  • Thread-topic: [RFC XEN PATCH] xen/arm: ffa: reclaim shared memory on guest destroy

Hi Julien,

Thanks a lot for your review and comment, this is very helpful.

> On 4 Dec 2023, at 20:24, Julien Grall <julien@xxxxxxx> wrote:
> 
> Hi Jens,
> 
> On 04/12/2023 07:55, Jens Wiklander wrote:
>> When an FF-A enabled guest is destroyed it may leave behind memory
>> shared with SPs. This memory must be reclaimed before it's reused or an
>> SP may make changes to memory used by a new unrelated guest. So when the
>> domain is teared down add FF-A requests to reclaim all remaining shared
>> memory.
>> SPs in the secure world are notified using VM_DESTROYED that a guest has
>> been destroyed. An SP is supposed to relinquish all shared memory to allow
>> reclaiming the memory. The relinquish operation may need to be delayed if
>> the shared memory is for instance part of a DMA operation.
>> If the FF-A memory reclaim request fails, return -ERESTART to retry
>> again. This will effectively block the destruction of the guest until
>> all memory has been reclaimed.
>> Signed-off-by: Jens Wiklander <jens.wiklander@xxxxxxxxxx>
>> ---
>> Hi,
>> This patch is a bit crude, but gets the job done. In a well designed
>> system this might even be good enough since the SP or the secure world
>> will let the memory be reclaimed and we can move on. But, if for some
>> reason reclaiming the memory is refused it must not be possible to reuse
>> the memory.
> 
> IIUC, we are trying to harden against a buggy SP. Is that correct?

This is not hardening as this is a possible scenario with a correctly 
implemented SP.
This is valid for the SP to not be able to relinquish the memory directly so we 
have
to take this possibility into account and retry.

What is not expected if for the SP to never release the memory hence the 
possible
"todo" at the end of the code where i think we might have to implement a counter
to bound the possible number of loops but as always the question is how many...

In this case the only solution would be to park the memory as suggested after
but we are not completely sure where hence the RFC.

> 
>> These shared memory ranges are typically quite small compared to the
>> total memory usage of a guest so it would be an improvement if only
>> refused shared memory ranges where set aside from future reuse while the
>> guest was destroyed and other resources made available for reuse. This
>> could be done by for instance assign the refused shared memory ranges
>> to a dummy VM like DOMID_IO.
> 
> I like the idea to use a dummy VM, but I don't think DOMID_IO is right. Once 
> teardown has completed, the domain will stay around until the last reference 
> on all pages are dropped. At this point, the amount of memory left-over is 
> minimum (this is mostly bookeeping in Xen).
> 
> From the userland PoV, the domain will still show-up in the list but tools 
> like "xl list" will show "(null)". They are called zombie domains.
> 
> So I would consider to keep the same domain around. The advantage is you can 
> call "xl destroy" again to retry the operation.

In this scenario the "restart" implementation here is right but how could we 
park the VM as "zombie" instead of busy looping in
the "kill" loop of userland ?

Also we need to release all the memory of the VM but the one shared with the 
SP. 

I will let Jens answer the more implementation questions here after and try to 
help on the more "system" ones.

> 
>> Thanks,
>> Jens
>> ---
>>  xen/arch/arm/tee/ffa.c | 36 ++++++++++++++++++++++++++++++++++++
>>  1 file changed, 36 insertions(+)
>> diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
>> index 183528d13388..9c596462a8a2 100644
>> --- a/xen/arch/arm/tee/ffa.c
>> +++ b/xen/arch/arm/tee/ffa.c
>> @@ -1539,6 +1539,7 @@ static bool is_in_subscr_list(const uint16_t *subscr, 
>> uint16_t start,
>>  static int ffa_domain_teardown(struct domain *d)
>>  {
>>      struct ffa_ctx *ctx = d->arch.tee;
>> +    struct ffa_shm_mem *shm, *tmp;
>>      unsigned int n;
>>      int32_t res;
>>  @@ -1564,10 +1565,45 @@ static int ffa_domain_teardown(struct domain *d)
>>              printk(XENLOG_ERR "ffa: Failed to report destruction of vm_id 
>> %u to  %u: res %d\n",
>>                     get_vm_id(d), subscr_vm_destroyed[n], res);
>>      }
>> +    /*
>> +     * If this function is called again due to -ERESTART below, make sure
>> +     * not to send the FFA_MSG_SEND_VM_DESTROYED's.
>> +     */
>> +    subscr_vm_destroyed_count = 0;
> 
> AFAICT, this variable is global. So wouldn't you effectively break other 
> domain if let say the unmapping error is temporary?
> 
>>        if ( ctx->rx )
>>          rxtx_unmap(ctx);
>>  +
>> +    list_for_each_entry_safe(shm, tmp, &ctx->shm_list, list)
>> +    {
>> +        register_t handle_hi;
>> +        register_t handle_lo;
>> +
>> +        uint64_to_regpair(&handle_hi, &handle_lo, shm->handle);
>> +        res = ffa_mem_reclaim(handle_lo, handle_hi, 0);
> 
> Is this call expensive? If so, we may need to handle continuation here.

This call should not be expensive in the normal case as memory is reclaimable
so there is no processing required in the SP and all is done in the SPMC which
should basically just return a yes or no depending on a state for the handle.

So I think this is the best trade.

@Jens: One thing to consider is that a Destroy might get a retry or busy answer 
and we
will have to issue it again and this is not considered in the current 
implementation.

After discussing the subject internally we could in fact consider that if an SP 
cannot release
some memory shared with the VM destroyed, it should tell it by returning 
"retry" to the message.
Here that could simplify things by doing a strategy where:
- we retry on the VM_DESTROY message if required
- if some memory is not reclaimable we check if we could park it and make the 
VM a zombie.
What do you think ?


> 
>> +        if ( res )
>> +        {
>> +            printk(XENLOG_INFO, "ffa: Failed to reclaim handle %#lx : %d\n",
>> +                   shm->handle, res);
> 
> I think you want to use XENLOG_G_INFO to use the guest ratelimit. Also, I 
> would suggest to print the domain ID in the logs (see '%pd').
> 
> 
>> +        }
>> +        else
>> +        {
>> +            printk(XENLOG_DEBUG, "ffa: Reclaimed handle %#lx\n", 
>> shm->handle);
> 
> Same here. You want to use XENLOG_G_DEBUG and print the domain ID.
> 
>> +            ctx->shm_count--;
>> +            list_del(&shm->list);
>> +        }
>> +    }
> 
> NIT: New line here please for clarity.
> 
>> +    if ( !list_empty(&ctx->shm_list) )
>> +    {
>> +        printk(XENLOG_INFO, "ffa: Remaining unclaimed handles, retrying\n");
> 
> Same as the other printks.
> 
>> +        /*
>> +         * TODO: add a timeout where we either panic or let the guest be
>> +         * fully destroyed.
>> +         */
> Timeout with proper handling would be a solution. I am not sure about 
> panic-ing. Do you think the TEE would be in a bad state if we can't release 
> memory?
> 
>> +        return -ERESTART;
>> +    }
>> +
>>      XFREE(d->arch.tee);
>>        return 0;
> 
> Cheers,
> 

Cheers
Bertrand

> -- 
> Julien Grall





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.