[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.



On Mon, Jan 12, 2015 at 11:45:30AM -0500, Konrad Rzeszutek Wilk wrote:
> There is race when we clear the STATE_SCHED in the softirq
> - which allows the 'raise_softirq_for' to progress and
> schedule another dpci. During that time the other CPU could
> receive an interrupt and calls 'raise_softirq_for' and put
> the dpci on its per-cpu list. There would be two 'dpci_softirq'
> running at the same time (on different CPUs) where the
> dpci state is STATE_RUN (and STATE_SCHED is cleared). This
> ends up hitting:
> 
>  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
>       BUG()
> 
> Instead of that put the dpci back on the per-cpu list to deal
> with later.
> 
> The reason we can get his with this is when an interrupt
> affinity is set over multiple CPUs.
> 
> Another potential fix would be to add a guard in the raise_softirq_for
> to check for 'STATE_RUN' bit being set and not schedule the
> dpci until that bit has been cleared.
> 
> Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
> Reported-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> ---
>  xen/drivers/passthrough/io.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..9b77334 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -804,7 +804,17 @@ static void dpci_softirq(void)
>          d = pirq_dpci->dom;
>          smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
>          if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> -            BUG();
> +        {
> +            unsigned long flags;
> +
> +            /* Put back on the list and retry. */
> +            local_irq_save(flags);
> +            list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));

I chatted with Jan on IRC this, and one worry is that if we add
on our per-cpu list an 'dpci' that is running on another CPU - if
the other CPU runs 'list_del' it will go BOOM.

However, I am not sure if I can come up with a scenario in which
this will be triggered - as by the time we get to checking
the STATE_RUN condition the list removal has already been done.
So adding in on the per-cpu list is OK - and since the STATE_SCHED
is set, it guards against double-list addition.


CPU1                                    CPU2                                    
CPU3:
softirq_dpci:
  list_del() from per-cpu-list
  test-and-set(STATE_RUN)
  test-and-clear(STATE_SCHED)
   .. 
                                        raise_softirq
                                                test-and-set(STATE_SCHED)
                                        softirq_dpci
                                                list_del from per-cpu-list
                                                if-test-and-set(STATE_RUN) 
fails:
                                                        list_add_tail(..)
                                                        continue
                                                                                
raise_softirq
                                                                                
        test-and-set(STATE_SCHED)
                                                                                
        [fails, so no adding]
  ..
                                        softirq_dpci
                                                list_del
                                                if-test-and-set(STATE_RUN) 
fails:
                                                        list_add_tail(..) [OK, 
we did the list_del]
                                                        continue..
  clear_bit(STATE_RUN)
                                        softrqi_dpci
                                                list_del()
                                                test-and-set(STATE_RUN)
                                                test-and-clear(STATE_SCHED)


> +            local_irq_restore(flags);
> +
> +            raise_softirq(HVM_DPCI_SOFTIRQ);
> +            continue;
> +        }
>          /*
>           * The one who clears STATE_SCHED MUST refcount the domain.
>           */
> -- 
> 2.1.0
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.