[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs





On 29/01/2020 09:27, vrd@xxxxxxxxxx wrote:
Hey Julien,

Hi,


On 12/18/19 2:57 PM, Julien Grall wrote:
Hi Varad,

Please send new version of a patch in a new thread rather than in-reply to the first version.

On 18/12/2019 10:53, Varad Gautam wrote:
XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
In that scenario, it is possible to receive multiple _pirq_guest_unbind
calls for the same pirq from domain_kill, if the pirq has not yet been
removed from the domain's pirq_tree, as:
   domain_kill()
     -> domain_relinquish_resources()
       -> pci_release_devices()
         -> pci_clean_dpci_irq()
           -> pirq_guest_unbind()
             -> __pirq_guest_unbind()

For a shared pirq (nr_guests > 1), the first call would zap the current
domain from the pirq's guests[] list, but the action handler is never freed as there are other guests using this pirq. As a result, on the second call, __pirq_guest_unbind searches for the current domain which has been removed
from the guests[] list, and hits a BUG_ON.

Make __pirq_guest_unbind safe to be called multiple times by letting xen
continue if a shared pirq has already been unbound from this guest. The
PIRQ will be cleaned up from the domain's pirq_tree during the destruction
in complete_domain_destroy anyways.

Signed-off-by: Varad Gautam <vrd@xxxxxxxxx>
CC: Jan Beulich <jbeulich@xxxxxxxx>
CC: Roger Pau Monné <roger.pau@xxxxxxxxxx>
CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

v2: Split the check on action->nr_guests > 0 and make it an ASSERT, reword.
---
  xen/arch/x86/irq.c | 11 ++++++++++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 5d0d94c..3eb7b22 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
        for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
          continue;
-    BUG_ON(i == action->nr_guests);
+    if ( i == action->nr_guests ) {

The { should be a new line.

+        ASSERT(action->nr_guests > 0) ;

The space before ; is not necessary.

+        /* In case the pirq was shared, unbound for this domain in an earlier call, but still +         * existed on the domain's pirq_tree, we still reach here if there are any later +         * unbind calls on the same pirq. Return if such an unbind happens. */

The coding style for comment is:

/*
 * Foo
 * Bar
 */

+        if ( action->shareable )
+            return NULL;
+        BUG();

Given that the previous BUG_ON() was hit, would it make sense to try to avoid a new BUG().

So why not just returning NULL as you do for action->shareable?


Thanks, I've done the style fixups in v3.

I'd argue that is indeed a BUG, if the pirq was _not_ shareable and the loop above couldn't find a matching domain for it - that implies the pirq shouldn't have existed in the first place.

I am afraid this is only telling me how the BUG() could be triggered and not why a BUG() is more warrant than an ASSERT().

AFAIU, the BUG() can only be triggered if there is a programatic error. This is no different than your ASSERT(action->nr_guest > 0) you just added.

Reading the section "Handling unexpected conditions" in CODING_STYLE, it feels to me the BUG() is not the correct handling as you can return an error here and it would continue fine.

Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.