Xen project Mailing List

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Mon, 16 Mar 2015 13:59:24 -0400

Cc: andrew.cooper3@xxxxxxxxxx, malcolm.crossley@xxxxxxxxxx, linux@xxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Mon, 16 Mar 2015 17:59:37 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

> > If we do explicitly we run risk of dead-lock. See below of an draft > > (not even compiled tested) of what I think you mean. > > That's no different from the code you proposed before, just that > the live-lock is better hidden there: By re-raising a softirq from a > softirq handler, you arrange for yourself to be called again right > away. Interestingly enough this issue of live-lock has been there since the start (when this code was written). The tasklet (so Xen 4.5 and earlier) based implementation allows this: CPU0 CPU1 CPU2 hvm_do_IRQ_dpci() - tasklet_schedule() [t->scheduled_on = 0] if (!t->is_running) put tasklet on per-cpu list. raise softirq . do_softirq [do_tasklet_work] t->scheduled_on = -1 t->is_running = 1 unlock the lock -> call hvm_dirq_assist do_IRQ hvm_do_IRQ_dpci t->scheduled_on = 1; if (!t->is_running).. [it is 1, so skip] do_IRQ hvm_do_IRQ_dpci t->scheduled_on = 2; if (!t->is_running).. [it is 1, so skip] takes the lock if (t->scheduled_on >= 0) tasklet_enqueue: put tasklet on per-cpu list, raise softirq. [do_tasklet_work] .. repeat the story. Inject interrupts right as 'hvm_dirq_assist' it executing. With interrupts happening right as the dpci_work is being called we can cause an exact live-lock. It also suffers from "skipping" interrupts if they span more than on CPU. The implementation I had does not address this issue either (it only picks up the 'second' interrupt while the older picks the 'last' one). Anyhow the patch below makes the code from a behavior standpoint the same as what we have in Xen 4.5 and earlier (with warts). The difference is that the warts are so much faster so you don't seem them :-) > > > --- a/xen/drivers/passthrough/io.c > > +++ b/xen/drivers/passthrough/io.c > > @@ -63,10 +63,35 @@ enum { > > static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) > > { > > unsigned long flags; > > + unsigned long old, new, val = pirq_dpci->state; > > > > - if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) > > - return; > > - > > + for ( ;; ) > > + { > > + old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED); > > + switch ( old ) > > + { > > + case (1 << STATE_SCHED): > > + /* > > + * Whenever STATE_SCHED is set we MUST not schedule it. > > + */ > > + return; > > + case (1 << STATE_RUN) | (1 << STATE_SCHED): > > + case (1 << STATE_RUN): > > + /* Getting close to finish. Spin. */ > > + continue; > > + } > > + /* > > + * If the 'state' is 0 we can schedule it. > > + */ > > + if ( old == 0 ) > > + break; > > + } > > get_knownalive_domain(pirq_dpci->dom); > > > > local_irq_save(flags); > > Yes, this looks better. And I again wonder whether STATE_* > couldn't simply become a tristate - dpci_softirq(), rather than setting > STATE_RUN and then clearing STATE_SCHED, could simply > cmpxchg(, STATE_SCHED, STATE_RUN), acting as necessary when > that operation fails. One issue that needs to be figured out is what to do when we have an interrupt mini-storm and need to queue it up. One option could be to convert the ->mapping to be an simple atomic counter - incremented every time hvm_do_IRQ_dpci is called (and hvm_dirq_assist would have an while loop decrementing this). However as previous reviews demonstrated there are a lot of subtle things that need to be taken care of: - Legacy interrupts and its 'lets-reset mapping to zero' when the interrupt timer has elapsed. - Deal with error states when assigning an PIRQ, or during the window time between creating/destroying loop and having to make sure that the state in the dpci_softirq is sane. That took me two months to get right and only with other folks testing and finding this. While I am totally up for restarting this to make it awesome - I am also aghast at my progress on that and fear that this might take _right_ past the feature freeze window. Hence was wondering if it would just be easier to put this patch in (see above) - with the benfit that folks have an faster interrupt passthrough experience and then I work on another variant of this with tristate cmpxchg and ->mapping atomic counter. Thoughts? > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxx > http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.