Xen project Mailing List

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

From: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>

Date: Thu, 19 Mar 2015 15:18:50 -0400

Delivery-date: Thu, 19 Mar 2015 19:19:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Mar 19, 2015 at 07:53:35AM +0000, Jan Beulich wrote: > >>> On 17.03.15 at 16:38, <konrad.wilk@xxxxxxxxxx> wrote: > > There is race when we clear the STATE_SCHED in the softirq > > - which allows the 'raise_softirq_for' (on another CPU or > > on the one running the softirq) to schedule the dpci. > > > > Specifically this can happen when the other CPU receives > > an interrupt, calls 'raise_softirq_for', and puts the dpci > > on its per-cpu list (same dpci structure). Note that > > this could also happen on the same physical CPU, however > > the explanation for simplicity will assume two CPUs actors. > > > > There would be two 'dpci_softirq' running at the same time > > (on different CPUs) where on one CPU it would be executing > > hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) > > and on the other CPU it is trying to call: > > > > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > > BUG(); > > > > Since STATE_RUN is already set it would end badly. > > > > The reason we can get his with this is when an interrupt > > affinity is set over multiple CPUs. > > > > Potential solutions: > > > > a) Instead of the BUG() we can put the dpci back on the per-cpu > > list to deal with later (when the softirq are activated again). > > This putting the 'dpci' back on the per-cpu list is an spin > > until the bad condition clears. > > > > b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for > > to detect for 'STATE_RUN' bit being set and schedule the dpci. > > The BUG() check in dpci_softirq would be replace with a spin until > > 'STATE_RUN' has been cleared. The dpci would still not > > be scheduled when STATE_SCHED bit was set. > > > > c) Only schedule the dpci when the state is cleared > > (no STATE_SCHED and no STATE_RUN). It would spin if STATE_RUN is set > > (as it is in progress and will finish). If the STATE_SCHED is set > > (so hasn't run yet) we won't try to spin and just exit. > > > > Down-sides of the solutions: > > > > a). Live-lock of the CPU. We could be finishing an dpci, then adding > > the dpci, exiting, and the processing the dpci once more. And so on. > > We would eventually stop as the TIMER_SOFTIRQ would be set, which will > > cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop. > > > > Interestingly the old ('tasklet') code used this mechanism. > > If the function assigned to the tasklet was running - the softirq > > that ran said function (hvm_dirq_assist) would be responsible for > > putting the tasklet back on the per-cpu list. This would allow > > to have an running tasklet and an 'to-be-scheduled' tasklet > > at the same time. > > > > b). is similar to a) - instead of re-entering the dpci_softirq > > we are looping in the softirq waiting for the correct condition to > > arrive. As it does not allow unwedging ourselves because the other > > softirqs are not called - it is less preferable. > > > > c) can cause an dead-lock if the interrupt comes in when we are > > processing the dpci in the softirq - iff this happens on the same CPU. > > We would be looping in on raise_softirq waiting for STATE_RUN > > to be cleared, while the softirq that was to clear it - is preempted > > by our interrupt handler. > > > > As such, this patch - which implements a) is the best candidate > > for this quagmire. > > > > Reported-and-Tested-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx> > > Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > > So I now agree that in the state we're in this is the most reasonable > fix. My reservations against the extra logic introduced earlier (and Thank you. Are you OK with me checking it in? > being fixed here) stand though: From an abstract perspective the > IRQ and softirq logic alone should be sufficient to deal with the > needs we have. The complications really result from the desire to > use a per-CPU list of hvm_pirq_dpci-s, which I still think a simpler > alternative should be found for (after all real hardware doesn't use > such lists). > > A first thought would be to put them all on a per-domain list and > have a cpumask tracking which CPUs they need servicing on. The > downside of this would be (apart from again not being a proper > equivalent of how actual hardware handles this) that - the softirq > handler not having any other context - domains needing servicing > would also need to be tracked in some form (in order to avoid > having to iterate over all of them), and a per-CPU list would be > undesirable for the exact same reasons. Yet a per-CPU > domain-needs-service bitmap doesn't seem very attractive either, > i.e. this would need further thought (also to make sure such an > alternative model doesn't become even more involved than what > we have now). HA! (yes, I completly agree on - "complex" == "unpleasant") Perhaps we can brainstorm some of this at XenHackathon in Shanghai? > > Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.