[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Referencing domain struct from interrupt handler
Hi Jan, On Fri, May 3, 2024 at 12:32 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 03.05.2024 09:45, Jens Wiklander wrote: > > Hi Xen maintainers, > > > > In my patch series [XEN PATCH v4 0/5] FF-A notifications [1] I need to > > get a reference to a domain struct from a domain ID and keep it from > > being destroyed while using it in the interrupt handler > > notif_irq_handler() (introduced in the last patch "[XEN PATCH v4 5/5] > > xen/arm: ffa: support notification"). In my previous patch set [2] I > > used get_domain_by_id(), but from the following discussion > > rcu_lock_live_remote_domain_by_id() seems to be a better choice so > > that's what I'm using now in the v4 patch set. The domain lock is held > > during a call to vgic_inject_irq() and unlocked right after. > > > > While we're reviewing the patch set in [1] I'd like to check the > > approach with rcu_lock_live_remote_domain_by_id() here. > > > > What do you think? Is using rcu_lock_live_remote_domain_by_id() the > > best approach? > > Is it guaranteed that the IRQ handler won't ever run in the context of a > vCPU belonging to the domain in question? If not, why the "remote" form > of the function? No, that's my mistake. > > Furthermore, is it guaranteed that the IRQ handler won't interrupt code > fiddling with the domain list? I don't think it is, since > domlist_update_lock isn't acquired in an IRQ-safe manner. Looks like > you need to defer the operation on the domain until softirq or tasklet > context. Thanks for the suggestion, I'm testing it as: static DECLARE_TASKLET(notif_sri_tasklet, notif_sri_action, NULL); static void notif_irq_handler(int irq, void *data) { tasklet_schedule(¬if_sri_tasklet); } Where notif_sri_action() does what notif_irq_handler() did before (using rcu_lock_domain_by_id()). I have one more question regarding this. Even with the RCU lock if I understand it correctly, it's possible for domain_kill() to tear down the domain. Or as Julien explained it in another thread [3]: > CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive) > > CPU1: call domain_kill() > CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL) > > d->arch.tee is now a dangling pointer > > CPU0: access d->arch.tee > > This implies you may need to gain a global lock (I don't have a better > idea so far) to protect the IRQ handler against domains teardown. I'm trying to address that (now in a tasklet) with: /* * domain_kill() calls ffa_domain_teardown() which will free * d->arch.tee, but not set it to NULL. This can happen while holding * the RCU lock. * * domain_lock() will stop rspin_barrier() in domain_kill(), unless * we're already past rspin_barrier(), but then will d->is_dying be * non-zero. */ domain_lock(d); if ( !d->is_dying ) { struct ffa_ctx *ctx = d->arch.tee; ACCESS_ONCE(ctx->notif.secure_pending) = true; } domain_unlock(d); It seems to work, but I'm worried I'm missing something or abusing domain_lock(). I can do this in v5 of the patch set if that helps to see what I mean. [3] https://patchew.org/Xen/20240426084723.4149648-1-jens.wiklander@xxxxxxxxxx/20240426084723.4149648-6-jens.wiklander@xxxxxxxxxx/#c7a672a7-02f8-4d24-b87e-1b8439d7eb4c@xxxxxxx Thanks, Jens > > Jan > > > [1] > > https://patchew.org/Xen/20240502145645.1201613-1-jens.wiklander@xxxxxxxxxx/ > > [2] > > https://patchew.org/Xen/20240426084723.4149648-1-jens.wiklander@xxxxxxxxxx/ > > > > Thanks, > > Jens >
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |