[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs
> -----Original Message----- > From: Jan Beulich <jbeulich@xxxxxxxx> > Sent: 09 March 2020 16:29 > To: paul@xxxxxxx > Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Varad Gautam <vrd@xxxxxxxxx>; Julien > Grall <julien@xxxxxxx>; Roger > Pau Monné <roger.pau@xxxxxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > Subject: Re: [PATCH v4] x86: irq: Do not BUG_ON multiple unbind calls for > shared pirqs > > On 06.03.2020 17:02, paul@xxxxxxx wrote: > > From: Varad Gautam <vrd@xxxxxxxxx> > > > > XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS. > > In that scenario, it is possible to receive multiple __pirq_guest_unbind > > calls for the same pirq from domain_kill, if the pirq has not yet been > > removed from the domain's pirq_tree, as: > > domain_kill() > > -> domain_relinquish_resources() > > -> pci_release_devices() > > -> pci_clean_dpci_irq() > > -> pirq_guest_unbind() > > -> __pirq_guest_unbind() > > > > For a shared pirq (nr_guests > 1), the first call would zap the current > > domain from the pirq's guests[] list, but the action handler is never freed > > as there are other guests using this pirq. As a result, on the second call, > > __pirq_guest_unbind searches for the current domain which has been removed > > from the guests[] list, and hits a BUG_ON. > > > > Make __pirq_guest_unbind safe to be called multiple times by letting xen > > continue if a shared pirq has already been unbound from this guest. The > > PIRQ will be cleaned up from the domain's pirq_tree during the destruction > > in complete_domain_destroy anyway. > > > > Signed-off-by: Varad Gautam <vrd@xxxxxxxxx> > > [taking over from Varad at v4] > > Signed-off-by: Paul Durrant <paul@xxxxxxx> > > --- > > Cc: Jan Beulich <jbeulich@xxxxxxxx> > > Cc: Julien Grall <julien@xxxxxxx> > > Cc: Roger Pau Monné <roger.pau@xxxxxxxxxx> > > Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > > > > Roger suggested cleaning the entry from the domain pirq_tree so that > > we need not make it safe to re-call __pirq_guest_unbind(). This seems like > > a reasonable suggestion but the semantics of the code are almost > > impenetrable (e.g. 'pirq' is used to mean an index, a pointer and is also > > the name of struct so you generally have little idea what it actally means) > > so I prefer to stick with a small fix that I can actually reason about. > > > > v4: > > - Re-work the guest array search to make it clearer > > I.e. there are cosmetic differences to v3 (see below), but > technically it's still the same. I can't believe the re-use > of "pirq" for different entities is this big of a problem. Please suggest code if you think it ought to be done differentely. I tried. > But anyway: > > > --- a/xen/arch/x86/irq.c > > +++ b/xen/arch/x86/irq.c > > @@ -1680,9 +1680,23 @@ static irq_guest_action_t *__pirq_guest_unbind( > > > > BUG_ON(!(desc->status & IRQ_GUEST)); > > > > - for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > > - continue; > > - BUG_ON(i == action->nr_guests); > > + for ( i = 0; i < action->nr_guests; i++ ) > > + if ( action->guest[i] == d ) > > + break; > > + > > + if ( i == action->nr_guests ) /* No matching entry */ > > + { > > + /* > > + * In case the pirq was shared, unbound for this domain in an > > earlier > > + * call, but still existed on the domain's pirq_tree, we still > > reach > > + * here if there are any later unbind calls on the same pirq. > > Return > > + * if such an unbind happens. > > + */ > > + ASSERT(action->shareable); > > + return NULL; > > + } > > + > > + ASSERT(action->nr_guests > 0); > > This seems pointless to have here - v3 had it inside the if(), > where it would actually guard against coming here with nr_guests > equal to zero. Why. The code just after this decrements nr_guests so it seems like entirely the right point to have the ASSERT. I can make it ASSERT >= 0, if that makes more sense. > v3 also used if() and BUG() instead of ASSERT() > inside this if(), which to me would seem more in line with our > current ./CODING_STYLE guidelines of handling unexpected > conditions. Could you clarify why you switched things? > Because I don't think there is need to kill the host in a non-debug context if we hit this condition; it is entirely survivable as far as I can tell so a BUG_ON() did not seem appropriate. Paul _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |