[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v5 7/7] xen/arm: ffa: support notification



Hi Bertrand,

On Fri, May 31, 2024 at 4:28 PM Bertrand Marquis
<Bertrand.Marquis@xxxxxxx> wrote:
>
> Hi Jens,
>
> > On 29 May 2024, at 09:25, Jens Wiklander <jens.wiklander@xxxxxxxxxx> wrote:
> >
> > Add support for FF-A notifications, currently limited to an SP (Secure
> > Partition) sending an asynchronous notification to a guest.
> >
> > Guests and Xen itself are made aware of pending notifications with an
> > interrupt. The interrupt handler triggers a tasklet to retrieve the
> > notifications using the FF-A ABI and deliver them to their destinations.
> >
> > Update ffa_partinfo_domain_init() to return error code like
> > ffa_notif_domain_init().
> >
> > Signed-off-by: Jens Wiklander <jens.wiklander@xxxxxxxxxx>
> > ---
> > v4->v5:
> > - Move the freeing of d->arch.tee to the new TEE mediator free_domain_ctx
> >  callback to make the context accessible during rcu_lock_domain_by_id()
> >  from a tasklet
> > - Initialize interrupt handlers for secondary CPUs from the new TEE mediator
> >  init_interrupt() callback
> > - Restore the ffa_probe() from v3, that is, remove the
> >  presmp_initcall(ffa_init) approach and use ffa_probe() as usual now that we
> >  have the init_interrupt callback.
> > - A tasklet is added to handle the Schedule Receiver interrupt. The tasklet
> >  finds each relevant domain with rcu_lock_domain_by_id() which now is enough
> >  to guarantee that the FF-A context can be accessed.
> > - The notification interrupt handler only schedules the notification
> >  tasklet mentioned above
> >
> > v3->v4:
> > - Add another note on FF-A limitations
> > - Clear secure_pending in ffa_handle_notification_get() if both SP and SPM
> >  bitmaps are retrieved
> > - ASSERT that ffa_rcu_lock_domain_by_vm_id() isn't passed the vm_id FF-A
> >  uses for Xen itself
> > - Replace the get_domain_by_id() call done via ffa_get_domain_by_vm_id() in
> >  notif_irq_handler() with a call to rcu_lock_live_remote_domain_by_id() via
> >  ffa_rcu_lock_domain_by_vm_id()
> > - Remove spinlock in struct ffa_ctx_notif and use atomic functions as needed
> >  to access and update the secure_pending field
> > - In notif_irq_handler(), look for the first online CPU instead of assuming
> >  that the first CPU is online
> > - Initialize FF-A via presmp_initcall() before the other CPUs are online,
> >  use register_cpu_notifier() to install the interrupt handler
> >  notif_irq_handler()
> > - Update commit message to reflect recent updates
> >
> > v2->v3:
> > - Add a GUEST_ prefix and move FFA_NOTIF_PEND_INTR_ID and
> >  FFA_SCHEDULE_RECV_INTR_ID to public/arch-arm.h
> > - Register the Xen SRI handler on each CPU using on_selected_cpus() and
> >  setup_irq()
> > - Check that the SGI ID retrieved with FFA_FEATURE_SCHEDULE_RECV_INTR
> >  doesn't conflict with static SGI handlers
> >
> > v1->v2:
> > - Addressing review comments
> > - Change ffa_handle_notification_{bind,unbind,set}() to take struct
> >  cpu_user_regs *regs as argument.
> > - Update ffa_partinfo_domain_init() and ffa_notif_domain_init() to return
> >  an error code.
> > - Fixing a bug in handle_features() for FFA_FEATURE_SCHEDULE_RECV_INTR.
> > ---
> > xen/arch/arm/tee/Makefile       |   1 +
> > xen/arch/arm/tee/ffa.c          |  72 +++++-
> > xen/arch/arm/tee/ffa_notif.c    | 409 ++++++++++++++++++++++++++++++++
> > xen/arch/arm/tee/ffa_partinfo.c |   9 +-
> > xen/arch/arm/tee/ffa_private.h  |  56 ++++-
> > xen/arch/arm/tee/tee.c          |   2 +-
> > xen/include/public/arch-arm.h   |  14 ++
> > 7 files changed, 548 insertions(+), 15 deletions(-)
> > create mode 100644 xen/arch/arm/tee/ffa_notif.c
> >
[...]
> >
> > @@ -517,8 +567,10 @@ err_rxtx_destroy:
> > static const struct tee_mediator_ops ffa_ops =
> > {
> >     .probe = ffa_probe,
> > +    .init_interrupt = ffa_notif_init_interrupt,
>
> With the previous change on init secondary i would suggest to
> have a ffa_init_secondary here actually calling the ffa_notif_init_interrupt.

Yes, that makes sense. I'll update.

>
> >     .domain_init = ffa_domain_init,
> >     .domain_teardown = ffa_domain_teardown,
> > +    .free_domain_ctx = ffa_free_domain_ctx,
> >     .relinquish_resources = ffa_relinquish_resources,
> >     .handle_call = ffa_handle_call,
> > };
> > diff --git a/xen/arch/arm/tee/ffa_notif.c b/xen/arch/arm/tee/ffa_notif.c
> > new file mode 100644
> > index 000000000000..e8e8b62590b3
> > --- /dev/null
> > +++ b/xen/arch/arm/tee/ffa_notif.c
[...]
> > +static void notif_vm_pend_intr(uint16_t vm_id)
> > +{
> > +    struct ffa_ctx *ctx;
> > +    struct domain *d;
> > +    struct vcpu *v;
> > +
> > +    /*
> > +     * vm_id == 0 means a notifications pending for Xen itself, but
> > +     * we don't support that yet.
> > +     */
> > +    if ( !vm_id )
> > +        return;
> > +
> > +    d = ffa_rcu_lock_domain_by_vm_id(vm_id);
> > +    if ( !d )
> > +        return;
> > +
> > +    ctx = d->arch.tee;
> > +    if ( !ctx )
> > +        goto out_unlock;
>
> In both previous cases you are silently ignoring an interrupt
> due to an internal error.
> Is this something that we should trace ? maybe just debug ?
>
> Could you add a comment to explain why this could happen
> (when possible) or not and the possible side effects ?
>
> The second one would be a notification for a domain without
> FF-A enabled which is ok but i am a bit more wondering on
> the first one.

The SPMC must be out of sync in both cases. I've been looking for a
window where that can happen, but I can't find any. SPMC is called
with FFA_NOTIFICATION_BITMAP_DESTROY during domain teardown so the
SPMC shouldn't try to deliver any notifications after that.
In the second case, the domain ID might have been reused for a domain
without FF-A enabled, but the SPMC should have known that already.
I'll add comments and prints since these errors indicate a bug
somewhere.

>
> > +
> > +    /*
> > +     * arch.tee is freed from complete_domain_destroy() so the RCU lock
> > +     * guarantees that the data structure isn't freed while we're accessing
> > +     * it.
> > +     */
> > +    ACCESS_ONCE(ctx->notif.secure_pending) = true;
> > +
> > +    /*
> > +     * Since we're only delivering global notification, always
> > +     * deliver to the first online vCPU. It doesn't matter
> > +     * which we chose, as long as it's available.
> > +     */
> > +    for_each_vcpu(d, v)
> > +    {
> > +        if ( is_vcpu_online(v) )
> > +        {
> > +            vgic_inject_irq(d, v, GUEST_FFA_NOTIF_PEND_INTR_ID,
> > +                            true);
> > +            break;
> > +        }
> > +    }
> > +    if ( !v )
> > +        printk(XENLOG_ERR "ffa: can't inject NPI, all vCPUs offline\n");
> > +
> > +out_unlock:
> > +    rcu_unlock_domain(d);
> > +}

Thanks,
Jens



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.