[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.



On Mon, Feb 02, 2015 at 03:19:33PM +0000, Jan Beulich wrote:
> >>> On 02.02.15 at 15:29, <konrad.wilk@xxxxxxxxxx> wrote:
> > On Tue, Jan 13, 2015 at 10:20:00AM +0000, Jan Beulich wrote:
> >> >>> On 12.01.15 at 17:45, <konrad.wilk@xxxxxxxxxx> wrote:
> >> > There is race when we clear the STATE_SCHED in the softirq
> >> > - which allows the 'raise_softirq_for' to progress and
> >> > schedule another dpci. During that time the other CPU could
> >> > receive an interrupt and calls 'raise_softirq_for' and put
> >> > the dpci on its per-cpu list. There would be two 'dpci_softirq'
> >> > running at the same time (on different CPUs) where the
> >> > dpci state is STATE_RUN (and STATE_SCHED is cleared). This
> >> > ends up hitting:
> >> > 
> >> >  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> >> >  BUG()
> >> > 
> >> > Instead of that put the dpci back on the per-cpu list to deal
> >> > with later.
> >> > 
> >> > The reason we can get his with this is when an interrupt
> >> > affinity is set over multiple CPUs.
> >> > 
> >> > Another potential fix would be to add a guard in the raise_softirq_for
> >> > to check for 'STATE_RUN' bit being set and not schedule the
> >> > dpci until that bit has been cleared.
> >> 
> >> I indeed think this should be investigated, because it would make
> >> explicit what ...
> >> 
> >> > --- a/xen/drivers/passthrough/io.c
> >> > +++ b/xen/drivers/passthrough/io.c
> >> > @@ -804,7 +804,17 @@ static void dpci_softirq(void)
> >> >          d = pirq_dpci->dom;
> >> >          smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> >> >          if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> >> > -            BUG();
> >> > +        {
> >> > +            unsigned long flags;
> >> > +
> >> > +            /* Put back on the list and retry. */
> >> > +            local_irq_save(flags);
> >> > +            list_add_tail(&pirq_dpci->softirq_list, 
> >> > &this_cpu(dpci_list));
> >> > +            local_irq_restore(flags);
> >> > +
> >> > +            raise_softirq(HVM_DPCI_SOFTIRQ);
> >> > +            continue;
> >> > +        }
> >> 
> >> ... this does implicitly - spin until the bad condition cleared.
> > 
> > I still haven't gotten to trying to reproduce the issues Malcolm
> > saw which is easier for me to do (use Intel super-fast storage
> > in a Windows guest) - since I've one of those boxes.
> > 
> > However in lieu of that, here is a patch that does pass my testing
> > of SR-IOV, and I believe should work fine.  I _think_ it covers
> > what you want it to have Jan, but of course please correct me
> > if I made a mistake in the logic.
> 
> Since the code quoted above is still there in the new patch, the new
> patch can at best be half of what I suggested.
> 
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -63,10 +63,37 @@ enum {
> >  static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
> >  {
> >      unsigned long flags;
> > +    unsigned long old, new, val = pirq_dpci->state;
> >  
> > -    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> > -        return;
> > +    /*
> > +     * This cmpxch is a more clear version of:
> > +     * if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> > +     *         return;
> > +     * since it also checks for STATE_RUN conditions.
> > +     */
> > +    for ( ;; )
> > +    {
> > +        new = 1 << STATE_SCHED;
> > +        if ( val )
> > +            new |= val;
> 
> Why the if()?

To 'or' the variable new with '1 << STATE_RUN' in case 'val' changed from
the first read ('val = pirq_dpci->state') to the moment when
we do the cmpxchg.

> 
> >  
> > +        old = cmpxchg(&pirq_dpci->state, val, new);
> > +        switch ( old )
> > +        {
> > +        case (1 << STATE_SCHED):
> > +        case (1 << STATE_RUN) | (1 << STATE_SCHED):
> > +            /*
> > +             * Whenever STATE_SCHED is set we MUST not schedule it.
> > +             */
> > +            return;
> > +        }
> > +        /*
> > +         * If the 'state' is 0 or (1 << STATE_RUN) we can schedule it.
> > +         */
> 
> Really? Wasn't the intention to _not_ schedule when STATE_RUN is
> set? (Also the above is a only line comment, i.e. wants different style.)

I must confess I must have misread your last review. You said:

        > Here is a patch that does this. I don't yet have an setup to test
        > the failing scenario (working on that). I removed the part in
        > the softirq because with this patch I cannot see a way it would
        > ever get there (in the softirq hitting the BUG).

        Hmm, but how do you now schedule the second instance that needed ...

        > +    case (1 << STATE_RUN):

        ... in this case?

which to me implied you still want to schedule an 'dpci' when STATE_RUN is set?

My thinking is that we should still schedule an 'dpci' when STATE_RUN is set
as that is inline with what the old tasklet code did. It would
schedule the tasklet the moment said tasklet was off the global list (but
it would not add it in the global list - that would be the job of the
tasklet function wrapper to detect and insert said tasklet back on
the global list).


> 
> I.e. with what you do now you could as well keep the old code.
> 
> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.