[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v5 7/7] xen/arm: ffa: support notification



Hi,

On Wed, Jun 5, 2024 at 12:45 PM Bertrand Marquis
<Bertrand.Marquis@xxxxxxx> wrote:
>
> Hi,
>
> > On 4 Jun 2024, at 12:24, Julien Grall <julien@xxxxxxx> wrote:
> >
> >
> >
> > On 03/06/2024 10:50, Jens Wiklander wrote:
> >> Hi Julien,
> >
> > Hi Jens,
> >
> >
> >> On Mon, Jun 3, 2024 at 11:12 AM Julien Grall <julien@xxxxxxx> wrote:
> >>>
> >>> Hi Jens,
> >>>
> >>> On 03/06/2024 10:01, Jens Wiklander wrote:
> >>>> On Fri, May 31, 2024 at 4:28 PM Bertrand Marquis
> >>>> <Bertrand.Marquis@xxxxxxx> wrote:
> >>>>>
> >>>>> Hi Jens,
> >>>>>
> >>>>>> On 29 May 2024, at 09:25, Jens Wiklander <jens.wiklander@xxxxxxxxxx> 
> >>>>>> wrote:
> >>>>>>
> >>>>>> Add support for FF-A notifications, currently limited to an SP (Secure
> >>>>>> Partition) sending an asynchronous notification to a guest.
> >>>>>>
> >>>>>> Guests and Xen itself are made aware of pending notifications with an
> >>>>>> interrupt. The interrupt handler triggers a tasklet to retrieve the
> >>>>>> notifications using the FF-A ABI and deliver them to their 
> >>>>>> destinations.
> >>>>>>
> >>>>>> Update ffa_partinfo_domain_init() to return error code like
> >>>>>> ffa_notif_domain_init().
> >>>>>>
> >>>>>> Signed-off-by: Jens Wiklander <jens.wiklander@xxxxxxxxxx>
> >>>>>> ---
> >>>>>> v4->v5:
> >>>>>> - Move the freeing of d->arch.tee to the new TEE mediator 
> >>>>>> free_domain_ctx
> >>>>>>   callback to make the context accessible during 
> >>>>>> rcu_lock_domain_by_id()
> >>>>>>   from a tasklet
> >>>>>> - Initialize interrupt handlers for secondary CPUs from the new TEE 
> >>>>>> mediator
> >>>>>>   init_interrupt() callback
> >>>>>> - Restore the ffa_probe() from v3, that is, remove the
> >>>>>>   presmp_initcall(ffa_init) approach and use ffa_probe() as usual now 
> >>>>>> that we
> >>>>>>   have the init_interrupt callback.
> >>>>>> - A tasklet is added to handle the Schedule Receiver interrupt. The 
> >>>>>> tasklet
> >>>>>>   finds each relevant domain with rcu_lock_domain_by_id() which now is 
> >>>>>> enough
> >>>>>>   to guarantee that the FF-A context can be accessed.
> >>>>>> - The notification interrupt handler only schedules the notification
> >>>>>>   tasklet mentioned above
> >>>>>>
> >>>>>> v3->v4:
> >>>>>> - Add another note on FF-A limitations
> >>>>>> - Clear secure_pending in ffa_handle_notification_get() if both SP and 
> >>>>>> SPM
> >>>>>>   bitmaps are retrieved
> >>>>>> - ASSERT that ffa_rcu_lock_domain_by_vm_id() isn't passed the vm_id 
> >>>>>> FF-A
> >>>>>>   uses for Xen itself
> >>>>>> - Replace the get_domain_by_id() call done via 
> >>>>>> ffa_get_domain_by_vm_id() in
> >>>>>>   notif_irq_handler() with a call to 
> >>>>>> rcu_lock_live_remote_domain_by_id() via
> >>>>>>   ffa_rcu_lock_domain_by_vm_id()
> >>>>>> - Remove spinlock in struct ffa_ctx_notif and use atomic functions as 
> >>>>>> needed
> >>>>>>   to access and update the secure_pending field
> >>>>>> - In notif_irq_handler(), look for the first online CPU instead of 
> >>>>>> assuming
> >>>>>>   that the first CPU is online
> >>>>>> - Initialize FF-A via presmp_initcall() before the other CPUs are 
> >>>>>> online,
> >>>>>>   use register_cpu_notifier() to install the interrupt handler
> >>>>>>   notif_irq_handler()
> >>>>>> - Update commit message to reflect recent updates
> >>>>>>
> >>>>>> v2->v3:
> >>>>>> - Add a GUEST_ prefix and move FFA_NOTIF_PEND_INTR_ID and
> >>>>>>   FFA_SCHEDULE_RECV_INTR_ID to public/arch-arm.h
> >>>>>> - Register the Xen SRI handler on each CPU using on_selected_cpus() and
> >>>>>>   setup_irq()
> >>>>>> - Check that the SGI ID retrieved with FFA_FEATURE_SCHEDULE_RECV_INTR
> >>>>>>   doesn't conflict with static SGI handlers
> >>>>>>
> >>>>>> v1->v2:
> >>>>>> - Addressing review comments
> >>>>>> - Change ffa_handle_notification_{bind,unbind,set}() to take struct
> >>>>>>   cpu_user_regs *regs as argument.
> >>>>>> - Update ffa_partinfo_domain_init() and ffa_notif_domain_init() to 
> >>>>>> return
> >>>>>>   an error code.
> >>>>>> - Fixing a bug in handle_features() for FFA_FEATURE_SCHEDULE_RECV_INTR.
> >>>>>> ---
> >>>>>> xen/arch/arm/tee/Makefile       |   1 +
> >>>>>> xen/arch/arm/tee/ffa.c          |  72 +++++-
> >>>>>> xen/arch/arm/tee/ffa_notif.c    | 409 ++++++++++++++++++++++++++++++++
> >>>>>> xen/arch/arm/tee/ffa_partinfo.c |   9 +-
> >>>>>> xen/arch/arm/tee/ffa_private.h  |  56 ++++-
> >>>>>> xen/arch/arm/tee/tee.c          |   2 +-
> >>>>>> xen/include/public/arch-arm.h   |  14 ++
> >>>>>> 7 files changed, 548 insertions(+), 15 deletions(-)
> >>>>>> create mode 100644 xen/arch/arm/tee/ffa_notif.c
> >>>>>>
> >>>> [...]
> >>>>>>
> >>>>>> @@ -517,8 +567,10 @@ err_rxtx_destroy:
> >>>>>> static const struct tee_mediator_ops ffa_ops =
> >>>>>> {
> >>>>>>      .probe = ffa_probe,
> >>>>>> +    .init_interrupt = ffa_notif_init_interrupt,
> >>>>>
> >>>>> With the previous change on init secondary i would suggest to
> >>>>> have a ffa_init_secondary here actually calling the 
> >>>>> ffa_notif_init_interrupt.
> >>>>
> >>>> Yes, that makes sense. I'll update.
> >>>>
> >>>>>
> >>>>>>      .domain_init = ffa_domain_init,
> >>>>>>      .domain_teardown = ffa_domain_teardown,
> >>>>>> +    .free_domain_ctx = ffa_free_domain_ctx,
> >>>>>>      .relinquish_resources = ffa_relinquish_resources,
> >>>>>>      .handle_call = ffa_handle_call,
> >>>>>> };
> >>>>>> diff --git a/xen/arch/arm/tee/ffa_notif.c 
> >>>>>> b/xen/arch/arm/tee/ffa_notif.c
> >>>>>> new file mode 100644
> >>>>>> index 000000000000..e8e8b62590b3
> >>>>>> --- /dev/null
> >>>>>> +++ b/xen/arch/arm/tee/ffa_notif.c
> >>>> [...]
> >>>>>> +static void notif_vm_pend_intr(uint16_t vm_id)
> >>>>>> +{
> >>>>>> +    struct ffa_ctx *ctx;
> >>>>>> +    struct domain *d;
> >>>>>> +    struct vcpu *v;
> >>>>>> +
> >>>>>> +    /*
> >>>>>> +     * vm_id == 0 means a notifications pending for Xen itself, but
> >>>>>> +     * we don't support that yet.
> >>>>>> +     */
> >>>>>> +    if ( !vm_id )
> >>>>>> +        return;
> >>>>>> +
> >>>>>> +    d = ffa_rcu_lock_domain_by_vm_id(vm_id);
> >>>>>> +    if ( !d )
> >>>>>> +        return;
> >>>>>> +
> >>>>>> +    ctx = d->arch.tee;
> >>>>>> +    if ( !ctx )
> >>>>>> +        goto out_unlock;
> >>>>>
> >>>>> In both previous cases you are silently ignoring an interrupt
> >>>>> due to an internal error.
> >>>>> Is this something that we should trace ? maybe just debug ?
> >>>>>
> >>>>> Could you add a comment to explain why this could happen
> >>>>> (when possible) or not and the possible side effects ?
> >>>>>
> >>>>> The second one would be a notification for a domain without
> >>>>> FF-A enabled which is ok but i am a bit more wondering on
> >>>>> the first one.
> >>>>
> >>>> The SPMC must be out of sync in both cases. I've been looking for a
> >>>> window where that can happen, but I can't find any. SPMC is called
> >>>> with FFA_NOTIFICATION_BITMAP_DESTROY during domain teardown so the
> >>>> SPMC shouldn't try to deliver any notifications after that.
> >>>
> >>> I don't think I agree with the conclusion. I believe, this can also
> >>> happen in normal operation.
> >>>
> >>> For example, the SPMC could have trigger the interrupt before
> >>> FFA_NOTIFICATION_BITMAP_DESTROY but Xen didn't handle the interrupt (or
> >>> run the tasklet) until later.
> >> You're right, there is a window. Delayed handling is OK since
> >> FFA_NOTIFICATION_INFO_GET_64 is invoked from the tasklet, but there is
> >> a window if the tasklet is suspended or another core destroys the
> >> domain before the tasklet has called ffa_rcu_lock_domain_by_vm_id().
> >> So far it's harmless and I guess we can afford a print.
> >
> > I think it would confuse more the user than anything else because this is 
> > an expected race. If we wanted to print a message, then I would argue it 
> > should be in the case where...
> >
> >>>
> >>> This could be at the time where the domain has been fully destroyed or
> >>> even when...
> >>>
> >>>> In the second case, the domain ID might have been reused for a domain
> >>>> without FF-A enabled, but the SPMC should have known that already.
> >>>
> >>> ... a new domain has been created. Although, the latter is rather 
> >>> unlikely.
> >>>
> >>> So what if the new domain has FFA enabled? Is there any potential
> >>> security issue?
> >> In this case, we'll inject an NPI in the guest, but when it invokes
> >> FFA_NOTIFICATION_GET it will get accurate information from the SPMC.
> >> The worst case is a spurious NPI. This shouldn't be a security issue.
> >
> > ... we inject the interrupt to the "wrong" domain. But I also understand 
> > that it would be difficult for Xen to detect it.
> >
> > So I would say no print should be needed. Bertrand, what do you think?
>
> Yes i agree that it could confuse the user.
> I would just put comments to explain the situations where it could occur in 
> the code.

Thanks, I'll add comments for the next version.

Cheers,
Jens



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.