[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v5 7/7] xen/arm: ffa: support notification





On 03/06/2024 10:50, Jens Wiklander wrote:
Hi Julien,

Hi Jens,


On Mon, Jun 3, 2024 at 11:12 AM Julien Grall <julien@xxxxxxx> wrote:

Hi Jens,

On 03/06/2024 10:01, Jens Wiklander wrote:
On Fri, May 31, 2024 at 4:28 PM Bertrand Marquis
<Bertrand.Marquis@xxxxxxx> wrote:

Hi Jens,

On 29 May 2024, at 09:25, Jens Wiklander <jens.wiklander@xxxxxxxxxx> wrote:

Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.

Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler triggers a tasklet to retrieve the
notifications using the FF-A ABI and deliver them to their destinations.

Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().

Signed-off-by: Jens Wiklander <jens.wiklander@xxxxxxxxxx>
---
v4->v5:
- Move the freeing of d->arch.tee to the new TEE mediator free_domain_ctx
   callback to make the context accessible during rcu_lock_domain_by_id()
   from a tasklet
- Initialize interrupt handlers for secondary CPUs from the new TEE mediator
   init_interrupt() callback
- Restore the ffa_probe() from v3, that is, remove the
   presmp_initcall(ffa_init) approach and use ffa_probe() as usual now that we
   have the init_interrupt callback.
- A tasklet is added to handle the Schedule Receiver interrupt. The tasklet
   finds each relevant domain with rcu_lock_domain_by_id() which now is enough
   to guarantee that the FF-A context can be accessed.
- The notification interrupt handler only schedules the notification
   tasklet mentioned above

v3->v4:
- Add another note on FF-A limitations
- Clear secure_pending in ffa_handle_notification_get() if both SP and SPM
   bitmaps are retrieved
- ASSERT that ffa_rcu_lock_domain_by_vm_id() isn't passed the vm_id FF-A
   uses for Xen itself
- Replace the get_domain_by_id() call done via ffa_get_domain_by_vm_id() in
   notif_irq_handler() with a call to rcu_lock_live_remote_domain_by_id() via
   ffa_rcu_lock_domain_by_vm_id()
- Remove spinlock in struct ffa_ctx_notif and use atomic functions as needed
   to access and update the secure_pending field
- In notif_irq_handler(), look for the first online CPU instead of assuming
   that the first CPU is online
- Initialize FF-A via presmp_initcall() before the other CPUs are online,
   use register_cpu_notifier() to install the interrupt handler
   notif_irq_handler()
- Update commit message to reflect recent updates

v2->v3:
- Add a GUEST_ prefix and move FFA_NOTIF_PEND_INTR_ID and
   FFA_SCHEDULE_RECV_INTR_ID to public/arch-arm.h
- Register the Xen SRI handler on each CPU using on_selected_cpus() and
   setup_irq()
- Check that the SGI ID retrieved with FFA_FEATURE_SCHEDULE_RECV_INTR
   doesn't conflict with static SGI handlers

v1->v2:
- Addressing review comments
- Change ffa_handle_notification_{bind,unbind,set}() to take struct
   cpu_user_regs *regs as argument.
- Update ffa_partinfo_domain_init() and ffa_notif_domain_init() to return
   an error code.
- Fixing a bug in handle_features() for FFA_FEATURE_SCHEDULE_RECV_INTR.
---
xen/arch/arm/tee/Makefile       |   1 +
xen/arch/arm/tee/ffa.c          |  72 +++++-
xen/arch/arm/tee/ffa_notif.c    | 409 ++++++++++++++++++++++++++++++++
xen/arch/arm/tee/ffa_partinfo.c |   9 +-
xen/arch/arm/tee/ffa_private.h  |  56 ++++-
xen/arch/arm/tee/tee.c          |   2 +-
xen/include/public/arch-arm.h   |  14 ++
7 files changed, 548 insertions(+), 15 deletions(-)
create mode 100644 xen/arch/arm/tee/ffa_notif.c

[...]

@@ -517,8 +567,10 @@ err_rxtx_destroy:
static const struct tee_mediator_ops ffa_ops =
{
      .probe = ffa_probe,
+    .init_interrupt = ffa_notif_init_interrupt,

With the previous change on init secondary i would suggest to
have a ffa_init_secondary here actually calling the ffa_notif_init_interrupt.

Yes, that makes sense. I'll update.


      .domain_init = ffa_domain_init,
      .domain_teardown = ffa_domain_teardown,
+    .free_domain_ctx = ffa_free_domain_ctx,
      .relinquish_resources = ffa_relinquish_resources,
      .handle_call = ffa_handle_call,
};
diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
new file mode 100644
index 000000000000..e8e8b62590b3
--- /dev/null
+++ b/xen/arch/arm/tee/ffa_notif.c
[...]
+static void notif_vm_pend_intr(uint16_t vm_id)
+{
+    struct ffa_ctx *ctx;
+    struct domain *d;
+    struct vcpu *v;
+
+    /*
+     * vm_id == 0 means a notifications pending for Xen itself, but
+     * we don't support that yet.
+     */
+    if ( !vm_id )
+        return;
+
+    d = ffa_rcu_lock_domain_by_vm_id(vm_id);
+    if ( !d )
+        return;
+
+    ctx = d->arch.tee;
+    if ( !ctx )
+        goto out_unlock;

In both previous cases you are silently ignoring an interrupt
due to an internal error.
Is this something that we should trace ? maybe just debug ?

Could you add a comment to explain why this could happen
(when possible) or not and the possible side effects ?

The second one would be a notification for a domain without
FF-A enabled which is ok but i am a bit more wondering on
the first one.

The SPMC must be out of sync in both cases. I've been looking for a
window where that can happen, but I can't find any. SPMC is called
with FFA_NOTIFICATION_BITMAP_DESTROY during domain teardown so the
SPMC shouldn't try to deliver any notifications after that.

I don't think I agree with the conclusion. I believe, this can also
happen in normal operation.

For example, the SPMC could have trigger the interrupt before
FFA_NOTIFICATION_BITMAP_DESTROY but Xen didn't handle the interrupt (or
run the tasklet) until later.

You're right, there is a window. Delayed handling is OK since
FFA_NOTIFICATION_INFO_GET_64 is invoked from the tasklet, but there is
a window if the tasklet is suspended or another core destroys the
domain before the tasklet has called ffa_rcu_lock_domain_by_vm_id().
So far it's harmless and I guess we can afford a print.

I think it would confuse more the user than anything else because this is an expected race. If we wanted to print a message, then I would argue it should be in the case where...



This could be at the time where the domain has been fully destroyed or
even when...

In the second case, the domain ID might have been reused for a domain
without FF-A enabled, but the SPMC should have known that already.

... a new domain has been created. Although, the latter is rather unlikely.

So what if the new domain has FFA enabled? Is there any potential
security issue?

In this case, we'll inject an NPI in the guest, but when it invokes
FFA_NOTIFICATION_GET it will get accurate information from the SPMC.
The worst case is a spurious NPI. This shouldn't be a security issue.

... we inject the interrupt to the "wrong" domain. But I also understand that it would be difficult for Xen to detect it.

So I would say no print should be needed. Bertrand, what do you think?

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.