Xen project Mailing List

Re: [PATCH v6 1/7] xen/events: fix race with set_global_virq_handler()

From: Juergen Gross <jgross@xxxxxxxx>

Date: Tue, 7 Jan 2025 16:56:53 +0100

Authentication-results: smtp-out1.suse.de; none

Autocrypt: addr=jgross@xxxxxxxx; keydata= xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOB ycWxw3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJve dYm8Of8Zd621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJ NwQpd369y9bfIhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvx XP3FAp2pkW0xqG7/377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEB AAHNH0p1ZXJnZW4gR3Jvc3MgPGpncm9zc0BzdXNlLmNvbT7CwHkEEwECACMFAlOMcK8CGwMH CwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRCw3p3WKL8TL8eZB/9G0juS/kDY9LhEXseh mE9U+iA1VsLhgDqVbsOtZ/S14LRFHczNd/Lqkn7souCSoyWsBs3/wO+OjPvxf7m+Ef+sMtr0 G5lCWEWa9wa0IXx5HRPW/ScL+e4AVUbL7rurYMfwCzco+7TfjhMEOkC+va5gzi1KrErgNRHH kg3PhlnRY0Udyqx++UYkAsN4TQuEhNN32MvN0Np3WlBJOgKcuXpIElmMM5f1BBzJSKBkW0Jc Wy3h2Wy912vHKpPV/Xv7ZwVJ27v7KcuZcErtptDevAljxJtE7aJG6WiBzm+v9EswyWxwMCIO RoVBYuiocc51872tRGywc03xaQydB+9R7BHPzsBNBFOMcBYBCADLMfoA44MwGOB9YT1V4KCy vAfd7E0BTfaAurbG+Olacciz3yd09QOmejFZC6AnoykydyvTFLAWYcSCdISMr88COmmCbJzn sHAogjexXiif6ANUUlHpjxlHCCcELmZUzomNDnEOTxZFeWMTFF9Rf2k2F0Tl4E5kmsNGgtSa aMO0rNZoOEiD/7UfPP3dfh8JCQ1VtUUsQtT1sxos8Eb/HmriJhnaTZ7Hp3jtgTVkV0ybpgFg w6WMaRkrBh17mV0z2ajjmabB7SJxcouSkR0hcpNl4oM74d2/VqoW4BxxxOD1FcNCObCELfIS auZx+XT6s+CE7Qi/c44ibBMR7hyjdzWbABEBAAHCwF8EGAECAAkFAlOMcBYCGwwACgkQsN6d 1ii/Ey9D+Af/WFr3q+bg/8v5tCknCtn92d5lyYTBNt7xgWzDZX8G6/pngzKyWfedArllp0Pn fgIXtMNV+3t8Li1Tg843EXkP7+2+CQ98MB8XvvPLYAfW8nNDV85TyVgWlldNcgdv7nn1Sq8g HwB2BHdIAkYce3hEoDQXt/mKlgEGsLpzJcnLKimtPXQQy9TxUaLBe9PInPd+Ohix0XOlY+Uk QFEx50Ki3rSDl2Zt2tnkNYKUCvTJq7jvOlaPd6d/W0tZqpyy7KVay+K4aMobDsodB3dvEAs6 ScCnh03dDAFgIq5nsB11j3KPKdVoPlfucX2c7kGNH+LUMbzqV6beIENfNexkOfxHfw==

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Julien Grall <julien@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Tue, 07 Jan 2025 15:57:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 07.01.25 16:49, Jan Beulich wrote:

On 07.01.2025 16:37, Juergen Gross wrote:

On 07.01.25 16:23, Jan Beulich wrote:

On 07.01.2025 11:17, Juergen Gross wrote:

--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -979,6 +979,7 @@ void send_global_virq(uint32_t virq)
   int set_global_virq_handler(struct domain *d, uint32_t virq)
   {
       struct domain *old;
+    int rc = 0;

if (virq >= NR_VIRQS)

           return -EINVAL;
@@ -992,14 +993,23 @@ int set_global_virq_handler(struct domain *d, uint32_t 
virq)
           return -EINVAL;

spin_lock(&global_virq_handlers_lock);

-    old = global_virq_handlers[virq];
-    global_virq_handlers[virq] = d;
+
+    if ( d->is_dying != DOMDYING_alive )
+    {
+        old = d;
+        rc = -EINVAL;
+    }


While I can see how this eliminates the zombie domain aspect, this doesn't
fully eliminate the race. Doing so would require (also) using the domain's
event lock. Assuming we're okay with the remaining race, imo a code comment
would be needed to state this (including the fact that it's then
unpredictable whether this operation might still succeed for a domain
already having d->is_dying != DOMDYING_alive).


AFAIU you mean that it is still possible to set a domain to handle a virq
when it is in the process of going down, especially if is_dying is set just
after it has been tested to be DOMDYING_alive?

I don't see this being a problem, as the same would happen if the domain
would go down just a millisecond later. This is something we will never be
able to handle.


Right, but the sequence of events in the case you mention is different: The
insertion into the array would still happen while the domain isn't marked
dying.

And after all the call of clear_global_virq_handlers() will now reset the
handling domain to the hardware domain in all cases.


Of course, but in the meantime an event may be sent to such a domain already
marked dying. That likely isn't going to cause problems, but is unexpected
with what description here says is being addressed.

Plus the way you do it the early success path remains; ideally that case
would also fail for an already dying domain.


Same again: clear_global_virq_handlers() will reset the handling domain.


Right.

In summary: As indicated, we may be okay with the remaining race, but then
we also should be making clear that we've decided to leave it at that.
Hence my earlier request: If we accept this, say (and briefly justify) this
in a code comment.

Okay, would you be fine with: Note that this check won't guarantee that a domain just going down can't be set as the handling domain of a virq, as the is_dying indicator might change just after testing it. This isn't going to be a major problem, as clear_global_virq_handlers() is guaranteed to run afterwards and it will reset the handling domain for the virq to the hardware domain. Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.