[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Referencing domain struct from interrupt handler



Hi Jan,

On Fri, May 3, 2024 at 12:32 PM Jan Beulich <jbeulich@xxxxxxxx> wrote:
>
> On 03.05.2024 09:45, Jens Wiklander wrote:
> > Hi Xen maintainers,
> >
> > In my patch series [XEN PATCH v4 0/5] FF-A notifications [1] I need to
> > get a reference to a domain struct from a domain ID and keep it from
> > being destroyed while using it in the interrupt handler
> > notif_irq_handler() (introduced in the last patch "[XEN PATCH v4 5/5]
> > xen/arm: ffa: support notification"). In my previous patch set [2] I
> > used get_domain_by_id(), but from the following discussion
> > rcu_lock_live_remote_domain_by_id() seems to be a better choice so
> > that's what I'm using now in the v4 patch set. The domain lock is held
> > during a call to vgic_inject_irq() and unlocked right after.
> >
> > While we're reviewing the patch set in [1] I'd like to check the
> > approach with rcu_lock_live_remote_domain_by_id() here.
> >
> > What do you think? Is using rcu_lock_live_remote_domain_by_id() the
> > best approach?
>
> Is it guaranteed that the IRQ handler won't ever run in the context of a
> vCPU belonging to the domain in question? If not, why the "remote" form
> of the function?

No, that's my mistake.

>
> Furthermore, is it guaranteed that the IRQ handler won't interrupt code
> fiddling with the domain list? I don't think it is, since
> domlist_update_lock isn't acquired in an IRQ-safe manner. Looks like
> you need to defer the operation on the domain until softirq or tasklet
> context.

Thanks for the suggestion, I'm testing it as:
static DECLARE_TASKLET(notif_sri_tasklet, notif_sri_action, NULL);

static void notif_irq_handler(int irq, void *data)
{
    tasklet_schedule(&notif_sri_tasklet);
}

Where notif_sri_action() does what notif_irq_handler() did before
(using rcu_lock_domain_by_id()).

I have one more question regarding this.

Even with the RCU lock if I understand it correctly, it's possible for
domain_kill() to tear down the domain. Or as Julien explained it in
another thread [3]:
> CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive)
>
> CPU1: call domain_kill()
> CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL)
>
> d->arch.tee is now a dangling pointer
>
> CPU0: access d->arch.tee
>
> This implies you may need to gain a global lock (I don't have a better
> idea so far) to protect the IRQ handler against domains teardown.

I'm trying to address that (now in a tasklet) with:
    /*
     * domain_kill() calls ffa_domain_teardown() which will free
     * d->arch.tee, but not set it to NULL. This can happen while holding
     * the RCU lock.
     *
     * domain_lock() will stop rspin_barrier() in domain_kill(), unless
     * we're already past rspin_barrier(), but then will d->is_dying be
     * non-zero.
     */
    domain_lock(d);
    if ( !d->is_dying )
    {
        struct ffa_ctx *ctx = d->arch.tee;

        ACCESS_ONCE(ctx->notif.secure_pending) = true;
    }
    domain_unlock(d);

It seems to work, but I'm worried I'm missing something or abusing
domain_lock(). I can do this in v5 of the patch set if that helps to
see what I mean.

[3] 
https://patchew.org/Xen/20240426084723.4149648-1-jens.wiklander@xxxxxxxxxx/20240426084723.4149648-6-jens.wiklander@xxxxxxxxxx/#c7a672a7-02f8-4d24-b87e-1b8439d7eb4c@xxxxxxx

Thanks,
Jens

>
> Jan
>
> > [1] 
> > https://patchew.org/Xen/20240502145645.1201613-1-jens.wiklander@xxxxxxxxxx/
> > [2] 
> > https://patchew.org/Xen/20240426084723.4149648-1-jens.wiklander@xxxxxxxxxx/
> >
> > Thanks,
> > Jens
>



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.