|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
> > If we do explicitly we run risk of dead-lock. See below of an draft
> > (not even compiled tested) of what I think you mean.
>
> That's no different from the code you proposed before, just that
> the live-lock is better hidden there: By re-raising a softirq from a
> softirq handler, you arrange for yourself to be called again right
> away.
Interestingly enough this issue of live-lock has been there since
the start (when this code was written).
The tasklet (so Xen 4.5 and earlier) based implementation allows this:
CPU0 CPU1 CPU2
hvm_do_IRQ_dpci()
- tasklet_schedule()
[t->scheduled_on = 0]
if (!t->is_running)
put tasklet on per-cpu list. raise softirq
. do_softirq
[do_tasklet_work]
t->scheduled_on = -1
t->is_running = 1
unlock the lock
-> call hvm_dirq_assist
do_IRQ
hvm_do_IRQ_dpci
t->scheduled_on = 1;
if (!t->is_running).. [it is 1, so skip]
do_IRQ
hvm_do_IRQ_dpci
t->scheduled_on = 2;
if (!t->is_running)..
[it is 1, so skip]
takes the lock
if (t->scheduled_on >= 0)
tasklet_enqueue:
put tasklet on per-cpu list, raise softirq.
[do_tasklet_work]
.. repeat the story. Inject interrupts right as 'hvm_dirq_assist' it
executing.
With interrupts happening right as the dpci_work is being called we can
cause an exact live-lock.
It also suffers from "skipping" interrupts if they span more
than on CPU. The implementation I had does not address this issue
either (it only picks up the 'second' interrupt while the older
picks the 'last' one).
Anyhow the patch below makes the code from a behavior standpoint
the same as what we have in Xen 4.5 and earlier (with warts).
The difference is that the warts are so much faster so you don't
seem them :-)
>
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -63,10 +63,35 @@ enum {
> > static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
> > {
> > unsigned long flags;
> > + unsigned long old, new, val = pirq_dpci->state;
> >
> > - if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> > - return;
> > -
> > + for ( ;; )
> > + {
> > + old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED);
> > + switch ( old )
> > + {
> > + case (1 << STATE_SCHED):
> > + /*
> > + * Whenever STATE_SCHED is set we MUST not schedule it.
> > + */
> > + return;
> > + case (1 << STATE_RUN) | (1 << STATE_SCHED):
> > + case (1 << STATE_RUN):
> > + /* Getting close to finish. Spin. */
> > + continue;
> > + }
> > + /*
> > + * If the 'state' is 0 we can schedule it.
> > + */
> > + if ( old == 0 )
> > + break;
> > + }
> > get_knownalive_domain(pirq_dpci->dom);
> >
> > local_irq_save(flags);
>
> Yes, this looks better. And I again wonder whether STATE_*
> couldn't simply become a tristate - dpci_softirq(), rather than setting
> STATE_RUN and then clearing STATE_SCHED, could simply
> cmpxchg(, STATE_SCHED, STATE_RUN), acting as necessary when
> that operation fails.
One issue that needs to be figured out is what to do when
we have an interrupt mini-storm and need to queue it up.
One option could be to convert the ->mapping to be an
simple atomic counter - incremented every time
hvm_do_IRQ_dpci is called (and hvm_dirq_assist would
have an while loop decrementing this).
However as previous reviews demonstrated there are a lot
of subtle things that need to be taken care of:
- Legacy interrupts and its 'lets-reset mapping to zero' when
the interrupt timer has elapsed.
- Deal with error states when assigning an PIRQ, or during
the window time between creating/destroying loop and having
to make sure that the state in the dpci_softirq is sane.
That took me two months to get right and only with other
folks testing and finding this. While I am totally up for
restarting this to make it awesome - I am also aghast at my
progress on that and fear that this might take _right_ past
the feature freeze window.
Hence was wondering if it would just be easier to put
this patch in (see above) - with the benfit that folks have
an faster interrupt passthrough experience and then I work on another
variant of this with tristate cmpxchg and ->mapping atomic counter.
Thoughts?
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |