[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 1/7] xen/events: fix race with set_global_virq_handler()
On 07.01.25 16:49, Jan Beulich wrote: On 07.01.2025 16:37, Juergen Gross wrote:On 07.01.25 16:23, Jan Beulich wrote:On 07.01.2025 11:17, Juergen Gross wrote:--- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -979,6 +979,7 @@ void send_global_virq(uint32_t virq) int set_global_virq_handler(struct domain *d, uint32_t virq) { struct domain *old; + int rc = 0;if (virq >= NR_VIRQS)return -EINVAL; @@ -992,14 +993,23 @@ int set_global_virq_handler(struct domain *d, uint32_t virq) return -EINVAL;spin_lock(&global_virq_handlers_lock);- old = global_virq_handlers[virq]; - global_virq_handlers[virq] = d; + + if ( d->is_dying != DOMDYING_alive ) + { + old = d; + rc = -EINVAL; + }While I can see how this eliminates the zombie domain aspect, this doesn't fully eliminate the race. Doing so would require (also) using the domain's event lock. Assuming we're okay with the remaining race, imo a code comment would be needed to state this (including the fact that it's then unpredictable whether this operation might still succeed for a domain already having d->is_dying != DOMDYING_alive).AFAIU you mean that it is still possible to set a domain to handle a virq when it is in the process of going down, especially if is_dying is set just after it has been tested to be DOMDYING_alive? I don't see this being a problem, as the same would happen if the domain would go down just a millisecond later. This is something we will never be able to handle.Right, but the sequence of events in the case you mention is different: The insertion into the array would still happen while the domain isn't marked dying.And after all the call of clear_global_virq_handlers() will now reset the handling domain to the hardware domain in all cases.Of course, but in the meantime an event may be sent to such a domain already marked dying. That likely isn't going to cause problems, but is unexpected with what description here says is being addressed.Plus the way you do it the early success path remains; ideally that case would also fail for an already dying domain.Same again: clear_global_virq_handlers() will reset the handling domain.Right. In summary: As indicated, we may be okay with the remaining race, but then we also should be making clear that we've decided to leave it at that. Hence my earlier request: If we accept this, say (and briefly justify) this in a code comment. Okay, would you be fine with: Note that this check won't guarantee that a domain just going down can't be set as the handling domain of a virq, as the is_dying indicator might change just after testing it. This isn't going to be a major problem, as clear_global_virq_handlers() is guaranteed to run afterwards and it will reset the handling domain for the virq to the hardware domain. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature.asc
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |