Xen project Mailing List

Re: Referencing domain struct from interrupt handler

To: Jens Wiklander <jens.wiklander@xxxxxxxxxx>

Date: Tue, 14 May 2024 09:13:01 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Bertrand Marquis <Bertrand.Marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>

Delivery-date: Tue, 14 May 2024 07:13:02 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.05.2024 09:10, Jens Wiklander wrote: > On Fri, May 3, 2024 at 12:32 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: >> Furthermore, is it guaranteed that the IRQ handler won't interrupt code >> fiddling with the domain list? I don't think it is, since >> domlist_update_lock isn't acquired in an IRQ-safe manner. Looks like >> you need to defer the operation on the domain until softirq or tasklet >> context. > > Thanks for the suggestion, I'm testing it as: > static DECLARE_TASKLET(notif_sri_tasklet, notif_sri_action, NULL); > > static void notif_irq_handler(int irq, void *data) > { > tasklet_schedule(&notif_sri_tasklet); > } > > Where notif_sri_action() does what notif_irq_handler() did before > (using rcu_lock_domain_by_id()). > > I have one more question regarding this. > > Even with the RCU lock if I understand it correctly, it's possible for > domain_kill() to tear down the domain. Or as Julien explained it in > another thread [3]: >> CPU0: ffa_get_domain_by_vm_id() (return the domain as it is alive) >> >> CPU1: call domain_kill() >> CPU1: teardown is called, free d->arch.tee (the pointer is not set to NULL) >> >> d->arch.tee is now a dangling pointer >> >> CPU0: access d->arch.tee >> >> This implies you may need to gain a global lock (I don't have a better >> idea so far) to protect the IRQ handler against domains teardown. > > I'm trying to address that (now in a tasklet) with: > /* > * domain_kill() calls ffa_domain_teardown() which will free > * d->arch.tee, but not set it to NULL. This can happen while holding > * the RCU lock. > * > * domain_lock() will stop rspin_barrier() in domain_kill(), unless > * we're already past rspin_barrier(), but then will d->is_dying be > * non-zero. > */ > domain_lock(d); > if ( !d->is_dying ) > { > struct ffa_ctx *ctx = d->arch.tee; > > ACCESS_ONCE(ctx->notif.secure_pending) = true; > } > domain_unlock(d); > > It seems to work, but I'm worried I'm missing something or abusing > domain_lock(). Well. Yes, this is one way of dealing with the issue. Yet as you suspect it feels like an abuse of domain_lock(); that function would better be avoided whenever possible. (It had some very unhelpful uses long ago.) Another approach would generally be to do respective cleanup not from underneath domain_kill(), but complete_domain_destroy(). It's not really clear to me which of the two approaches is better in this case. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.