Xen project Mailing List

Re: Referencing domain struct from interrupt handler

From: Jens Wiklander <jens.wiklander@xxxxxxxxxx>

Date: Wed, 15 May 2024 08:21:00 +0200

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>

Delivery-date: Wed, 15 May 2024 06:21:32 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, May 14, 2024 at 9:12 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 08.05.2024 09:10, Jens Wiklander wrote: > > On Fri, May 3, 2024 at 12:32 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: > >> Furthermore, is it guaranteed that the IRQ handler won't interrupt code > >> fiddling with the domain list? I don't think it is, since > >> domlist_update_lock isn't acquired in an IRQ-safe manner. Looks like > >> you need to defer the operation on the domain until softirq or tasklet > >> context. > > > > Thanks for the suggestion, I'm testing it as: > > static DECLARE_TASKLET(notif_sri_tasklet, notif_sri_action, NULL); > > > > static void notif_irq_handler(int irq, void *data) > > { > > tasklet_schedule(&notif_sri_tasklet); > > } > > > > Where notif_sri_action() does what notif_irq_handler() did before > > (using rcu_lock_domain_by_id()). > > > > I have one more question regarding this. > > > > Even with the RCU lock if I understand it correctly, it's possible for > > domain_kill() to tear down the domain. Or as Julien explained it in > > another thread [3]: > >> CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive) > >> > >> CPU1: call domain_kill() > >> CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL) > >> > >> d->arch.tee is now a dangling pointer > >> > >> CPU0: access d->arch.tee > >> > >> This implies you may need to gain a global lock (I don't have a better > >> idea so far) to protect the IRQ handler against domains teardown. > > > > I'm trying to address that (now in a tasklet) with: > > /* > > * domain_kill() calls ffa_domain_teardown() which will free > > * d->arch.tee, but not set it to NULL. This can happen while holding > > * the RCU lock. > > * > > * domain_lock() will stop rspin_barrier() in domain_kill(), unless > > * we're already past rspin_barrier(), but then will d->is_dying be > > * non-zero. > > */ > > domain_lock(d); > > if ( !d->is_dying ) > > { > > struct ffa_ctx *ctx = d->arch.tee; > > > > ACCESS_ONCE(ctx->notif.secure_pending) = true; > > } > > domain_unlock(d); > > > > It seems to work, but I'm worried I'm missing something or abusing > > domain_lock(). > > Well. Yes, this is one way of dealing with the issue. Yet as you suspect it > feels like an abuse of domain_lock(); that function would better be avoided > whenever possible. (It had some very unhelpful uses long ago.) > > Another approach would generally be to do respective cleanup not from > underneath domain_kill(), but complete_domain_destroy(). It's not really > clear to me which of the two approaches is better in this case. Thanks for the feedback. I tried moving the freeing of d->arch.tee to complete_domain_destroy() while keeping the rest of the cleanup as is. It works as expected, I'll use this for the next version of the patch set. Cheers, Jens

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.