[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [PATCH v5 for-xen-4.5 2/3] dpci: Safeguard against race with hvm_dirq_assist crashing and pt_irq_[create|destroy]_bind



There is a potential for race where the pt_irq_destroy_bind is called while
the interrupt for the PIRQ has been taken - and hvm_dirq_assist is scheduled
for execution.

The timing window is short enough - and the issue we can encounter is that
the hvm_dirq_assist could get a page fault while trying to access the
'struct domain' which was extracted from 'struct hvm_pirq_dpci->dom' field
- while the pt_irq_destroy_bind resets 'struct hvm_pirq_dpci->dom' to
NULL. Imagine this example:

CPU0                              CPU1
   do_IRQ                        pt_irq_destroy_bind
     -> __do_IRQ_guest
        -> schedule_dpci_for
      ...                          pirq_guest_unbind
                                   [action unbound]
                                   pirq_dpci->dom = NULL;
      ...
      dpci_softirq
         hvm_dirq_assist
         pirq_dpci->dom dereference
         [BOOM!]

The way to fix this is by:
 a) Do not reset ->dom field to NULL at all. Only allow 'dpci_kill'
    once it has established that the hvm_dirq_assist is done - to
    reset the field.
 b) At the start of 'pt_irq_create_bind' check if the softirq
    is still running (by using dpci_kill) - and if it is spin
    - but only for less than a second. If the time is greater
    than a second return -EAGAIN and let QEMU handle it
    (which is that it returns to the OS that the MSI was not set).
    If it was less than a second restart the creation/re-use process.

The b) could be done in 'pt_irq_destroy_bind' as well if that
would be better. Either option works.

The return of -EAGAIN could be removed as QEMU is not that smart
to handle the error.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
---
 xen/drivers/passthrough/io.c | 54 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 5f82aba..13433a5 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -163,6 +163,7 @@ int pt_irq_create_bind(
     struct pirq *info;
     int rc, pirq = pt_irq_bind->machine_irq;
 
+ restart:
     if ( pirq < 0 || pirq >= d->nr_pirqs )
         return -EINVAL;
 
@@ -194,15 +195,34 @@ int pt_irq_create_bind(
     }
     pirq_dpci = pirq_dpci(info);
 
-    /* Something is horrible wrong if it has been scheduled or is running. */
-    ASSERT(pirq_dpci->state == 0);
+    /*
+     * We can hit this scenario if the dpci_softirq is running while we are
+     * doing pt_irq_destroy_bind followed by this call on another CPU.
+     * We have to wait until the dpci_softirq is done.
+     */
+    if ( pirq_dpci->state )
+    {
+        s_time_t start;
 
-    /* Guest unbinds (pt_irq_destroy_bind) and rebinds it back. */
-    if ( !pirq_dpci->dom )
-        pirq_dpci->dom = d;
+        /* Need to drop the lock so hvm_dirq_assist can progress. */
+        spin_unlock(&d->event_lock);
+        start = NOW();
+        while ( dpci_kill(pirq_dpci) )
+        {
+            process_pending_softirqs();
+            if ( (NOW() - start) >> 30 )
+            {
+                return -EAGAIN;
+            }
+        }
+        /* And since we dropped the lock, another pt_irq_[destroy|create]_bind
+         * could be called. Lets start from scratch.
+         */
+        goto restart;
+    }
 
     /* Otherwise 'schedule_dpci_for' will crash. */
-    ASSERT(pirq_dpci->dom == d);
+    pirq_dpci->dom = d;
 
     switch ( pt_irq_bind->irq_type )
     {
@@ -229,9 +249,15 @@ int pt_irq_create_bind(
             {
                 pirq_dpci->gmsi.gflags = 0;
                 pirq_dpci->gmsi.gvec = 0;
-                pirq_dpci->dom = NULL;
                 pirq_dpci->flags = 0;
-                dpci_kill(pirq_dpci);
+                /*
+                 * If we fail, we have flags reset and hvm_dirq_assist will
+                 * just exit. We cannot touch pirq_dpci->dom unless
+                 * the softirq is truly dead - and we reap it at the start or
+                 * in pci_clean_dpci_irq.
+                 */
+                if ( !dpci_kill(pirq_dpci) )
+                    pirq_dpci->dom = NULL;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
                 return rc;
@@ -332,12 +358,18 @@ int pt_irq_create_bind(
             {
                 if ( pt_irq_need_timer(pirq_dpci->flags) )
                     kill_timer(&pirq_dpci->timer);
-                pirq_dpci->dom = NULL;
                 list_del(&girq->list);
                 list_del(&digl->list);
                 hvm_irq_dpci->link_cnt[link]--;
                 pirq_dpci->flags = 0;
-                dpci_kill(pirq_dpci);
+                /*
+                 * If we fail, we have flags reset and hvm_dirq_assist will
+                 * just exit. We cannot touch pirq_dpci->dom unless
+                 * the softirq is truly dead - and we reap it at the start or
+                 * in pci_clean_dpci_irq.
+                 */
+                if ( !dpci_kill(pirq_dpci) )
+                    pirq_dpci->dom = NULL;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
                 xfree(girq);
@@ -466,7 +498,6 @@ int pt_irq_destroy_bind(
         msixtbl_pt_unregister(d, pirq);
         if ( pt_irq_need_timer(pirq_dpci->flags) )
             kill_timer(&pirq_dpci->timer);
-        pirq_dpci->dom   = NULL;
         pirq_dpci->flags = 0;
         dpci_kill(pirq_dpci);
         pirq_cleanup_check(pirq, d);
@@ -491,7 +522,6 @@ void pt_pirq_init(struct domain *d, struct hvm_pirq_dpci 
*dpci)
 {
     INIT_LIST_HEAD(&dpci->digl_list);
     INIT_LIST_HEAD(&dpci->softirq_list);
-    dpci->dom = d;
     dpci->gmsi.dest_vcpu_id = -1;
 }
 
-- 
1.9.3


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.