[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.



> > If we do explicitly we run risk of dead-lock. See below of an draft
> > (not even compiled tested) of what I think you mean.
> 
> That's no different from the code you proposed before, just that
> the live-lock is better hidden there: By re-raising a softirq from a
> softirq handler, you arrange for yourself to be called again right
> away.

Interestingly enough this issue of live-lock has been there since
the start (when this code was written).
The tasklet (so Xen 4.5 and earlier) based implementation allows this:


CPU0                            CPU1                     CPU2

hvm_do_IRQ_dpci()
 - tasklet_schedule()
   [t->scheduled_on = 0]
   if (!t->is_running)
        put tasklet on per-cpu list. raise softirq


 . do_softirq
    [do_tasklet_work]
    t->scheduled_on = -1
    t->is_running = 1
    unlock the lock
     -> call hvm_dirq_assist
                                do_IRQ
                                 hvm_do_IRQ_dpci
                                t->scheduled_on = 1;
                                if (!t->is_running).. [it is 1, so skip]

                                                        do_IRQ
                                                         hvm_do_IRQ_dpci
                                                         t->scheduled_on = 2;
                                                         if (!t->is_running).. 
[it is 1, so skip]

   takes the lock
    if (t->scheduled_on >= 0)
         tasklet_enqueue:
          put tasklet on per-cpu list, raise softirq.

   [do_tasklet_work]
   .. repeat the story. Inject interrupts right as 'hvm_dirq_assist' it 
executing.

With interrupts happening right as the dpci_work is being called we can
cause an exact live-lock.

It also suffers from "skipping" interrupts if they span more
than on CPU. The implementation I had does not address this issue
either (it only picks up the 'second' interrupt while the older
picks the 'last' one).

Anyhow the patch below makes the code from a behavior standpoint
the same as what we have in Xen 4.5 and earlier (with warts).
The difference is that the warts are so much faster so you don't
seem them :-)
> 
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -63,10 +63,35 @@ enum {
> >  static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
> >  {
> >      unsigned long flags;
> > +    unsigned long old, new, val = pirq_dpci->state;
> >  
> > -    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> > -        return;
> > -
> > +    for ( ;; )
> > +    {
> > +        old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED);
> > +        switch ( old )
> > +        {
> > +        case (1 << STATE_SCHED):
> > +            /*
> > +             * Whenever STATE_SCHED is set we MUST not schedule it.
> > +             */
> > +            return;
> > +        case (1 << STATE_RUN) | (1 << STATE_SCHED):
> > +        case (1 << STATE_RUN):
> > +            /* Getting close to finish. Spin. */
> > +            continue;
> > +        }
> > +        /*
> > +         * If the 'state' is 0 we can schedule it.
> > +         */
> > +        if ( old == 0 )
> > +            break;
> > +    }
> >      get_knownalive_domain(pirq_dpci->dom);
> >  
> >      local_irq_save(flags);
> 
> Yes, this looks better. And I again wonder whether STATE_*
> couldn't simply become a tristate - dpci_softirq(), rather than setting
> STATE_RUN and then clearing STATE_SCHED, could simply
> cmpxchg(, STATE_SCHED, STATE_RUN), acting as necessary when
> that operation fails.

One issue that needs to be figured out is what to do when
we have an interrupt mini-storm and need to queue it up.

One option could be to convert the ->mapping to be an
simple atomic counter - incremented every time
hvm_do_IRQ_dpci is called (and hvm_dirq_assist would
have an while loop decrementing this).

However as previous reviews demonstrated there are a lot
of subtle things that need to be taken care of:
 - Legacy interrupts and its 'lets-reset mapping to zero' when
   the interrupt timer has elapsed.
 - Deal with error states when assigning an PIRQ, or during
   the window time between creating/destroying loop and having
   to make sure that the state in the dpci_softirq is sane.

That took me two months to get right and only with other
folks testing and finding this. While I am totally up for
restarting this to make it awesome - I am also aghast at my
progress on that and fear that this might take _right_ past
the feature freeze window.

Hence was wondering if it would just be easier to put
this patch in (see above) - with the benfit that folks have
an faster interrupt passthrough experience and then I work on another
variant of this with tristate cmpxchg and ->mapping atomic counter.

Thoughts?
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.